English Deutsch Français Italiano Español Português 繁體中文 Bahasa Indonesia Tiếng Việt ภาษาไทย
All categories

I am trying to create a regex conditional match. A portion of the regex match should be reused to further match against the string. The following example:
"sometext http://www.somedomain.com/test.html somemoretext http://www.someotherdomain/www/hello/more.html someothertext"
I would like to match on the domain portion of the url http://www.somedomain.com and determine whether the following urls originate from the same domain. if not, I would like a match to occur. So:
"sometext http://www.somedomain.com/test.html somemoretext http://www.someotherdomain/www/hello/more.html someothertext" <- this is a match
"sometext http://www.somedomain.com/test.html somemoretext http://www.somedomain.com/www/hello/more.html someothertext" <- this is not a match

Any regex gurus that know the answer to this one out there?

2006-08-27 23:51:39 · 2 answers · asked by christian 1 in Computers & Internet Programming & Design

2 answers

I don't entirely understand your question but I'll give it a shot, and then refine and post later again if necessary.

Lets start simple, I'm guessing you want to match the 'http://www.domain.com' part when text can appear after or before.

So in Perl/CGI, this would be

# firstly we create the domain variable to store the url
my $domain = "http://www.domain.com";

# then for fun we create the url variable which contains a string with our domain
my $url = "sometexthttp://www.domain.comsometext";

# now we create an if statement which attempts to verify whether the url variable contains our domain
if ($url =~ m/(.*)?$domain(.*)?/) {do this}

I haven't tested this but as far as I know it should work :)

I'll explain what the if conditional does above step by step, in this case where trying to match the domain to this other string.

1. so if (condition is true) {do this}
2. $url is the variable that contains string of text that in turn may or may not contain the domain where looking for
3. the =~ sign is a binding operator
4. m/specificcontent/, the m stands for match and were trying to match specific content against the $url variable
5. the dot (.) represents any character which includes letters, numbers, symbols, and so fourth
6. the asterisk (*) applies it self to the character before, which means in this case that you can have an unlimited amount of any characters before the $domain. Now this can be a problem for processing since it doesn't actually know when to stop in a sense. I can't really explain this very easily.
7. the question mark (?) indicates that the character before is optional
8. If we put these all together you get (.*)? and what this means is that what is in brackets is optional, so in our matching string it means there can or can't be any text before or/and after the domain where looking for, its a choice thing.

Besides actually talking to you via chat with any questions you might have related to this its actually quite difficult to explain further.

Hope this helps :) Good Luck

2006-08-28 03:08:25 · answer #1 · answered by Omnis 1 · 0 0

At first you find the entrypoint in the text starting with http.//www. then you split from the starting point the domain and assign it to a string variable. Use this string variable in Regex to check if the next Domain is similar or different. Thats all


Helmut

2006-08-28 09:49:16 · answer #2 · answered by hswes 2 · 0 0

fedest.com, questions and answers