Hi,

I'm trying to use regular expressions on a log file and am trying to extract search terms. Each line in the file is of the form:

mystring = "00:00:11 192.168.21.44 GET /images/help.gif - 200 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+98;+DigExt) ASPSESSIONIDGGGGGQEG=CLDIHIJBJAPFAGBFIOMCFGGA;+PRO%5FOnline=SEARCHQUERY=%09%3Cinput+type%3Dhidden+name%3D%27HrowColumns%27+ID%3D%27HrowColumns%27++value%3D%271%3B6%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData10%27+ID%3D%27txtScopeData10%27++value%3D%27catherine%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData30%27+ID%3D%27txtScopeData30%27++value%3D%27porter%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData50%27+ID%3D%27txtScopeData50%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData11%27+ID%3D%27txtScopeData11%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData31%27+ID%3D%27txtScopeData31%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData51%27+ID%3D%27txtScopeData51%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27HCollection%5Fid%27+ID%3D%27HCollection%5Fid%27++value%3D%271%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27Submit1%2Ex%27+ID%3D%27Submit1%2Ex%27++value%3D%2736%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27Submit1%2Ey%27+ID%3D%27Submit1%2Ey%27++value%3D%2718%27%3E%0D%0A http://www.foo-bar.com/SearchResult.asp"

I have put the search terms that I'm trying to extract above in red. I have constructed the following regex to extract those terms.

pattern=re.compile(r"\+\+value\%3D\%27([A-Z0-9._%+-]+)\%27\%3E",re.IGNORECASE)

Running the above regex on the given string with

pattern.findall(mystring)

just coughs out the entire string at me. I have escaped '+' and '%' in the regex as they are special characters but I am obviously doing something wrong.

I would be very grateful for any help.

Not using a regex, it would be

mystring = "00:00:11 192.168.21.44 GET /images/help.gif - 200 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+98;+DigExt) ASPSESSIONIDGGGGGQEG=CLDIHIJBJAPFAGBFIOMCFGGA;+PRO%5FOnline=SEARCHQUERY=%09%3Cinput+type%3Dhidden+name%3D%27HrowColumns%27+ID%3D%27HrowColumns%27++value%3D%271%3B6%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData10%27+ID%3D%27txtScopeData10%27++value%3D%27catherine%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData30%27+ID%3D%27txtScopeData30%27++value%3D%27porter%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData50%27+ID%3D%27txtScopeData50%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData11%27+ID%3D%27txtScopeData11%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData31%27+ID%3D%27txtScopeData31%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData51%27+ID%3D%27txtScopeData51%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27HCollection%5Fid%27+ID%3D%27HCollection%5Fid%27++value%3D%271%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27Submit1%2Ex%27+ID%3D%27Submit1%2Ex%27++value%3D%2736%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27Submit1%2Ey%27+ID%3D%27Submit1%2Ey%27++value%3D%2718%27%3E%0D%0A http://www.foo-bar.com/SearchResult.asp"
substrs = mystring.split('++value%3D%27')
for s in substrs:
   if s[0].isalpha():
      print s.split("%27")

voolvif,

You were not far off.

>>> pattern=re.compile(r"\%3D\%27([A-Z]+?)\%27\%3E",re.IGNORECASE)
>>> pattern.findall(mystring)
['catherine', 'porter']
>>>
commented: Nice use of the lazy quantifier. +1
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.