Hi,
I'm trying to use regular expressions on a log file and am trying to extract search terms. Each line in the file is of the form:
mystring = "00:00:11 192.168.21.44 GET /images/help.gif - 200 Mozilla/4.0+(compatible;+MSIE+5.0;+Windows+98;+DigExt) ASPSESSIONIDGGGGGQEG=CLDIHIJBJAPFAGBFIOMCFGGA;+PRO%5FOnline=SEARCHQUERY=%09%3Cinput+type%3Dhidden+name%3D%27HrowColumns%27+ID%3D%27HrowColumns%27++value%3D%271%3B6%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData10%27+ID%3D%27txtScopeData10%27++value%3D%27catherine%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData30%27+ID%3D%27txtScopeData30%27++value%3D%27porter%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData50%27+ID%3D%27txtScopeData50%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData11%27+ID%3D%27txtScopeData11%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData31%27+ID%3D%27txtScopeData31%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27txtScopeData51%27+ID%3D%27txtScopeData51%27++value%3D%27%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27HCollection%5Fid%27+ID%3D%27HCollection%5Fid%27++value%3D%271%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27Submit1%2Ex%27+ID%3D%27Submit1%2Ex%27++value%3D%2736%27%3E%0D%0A%09%3Cinput+type%3Dhidden+name%3D%27Submit1%2Ey%27+ID%3D%27Submit1%2Ey%27++value%3D%2718%27%3E%0D%0A http://www.foo-bar.com/SearchResult.asp"
I have put the search terms that I'm trying to extract above in red. I have constructed the following regex to extract those terms.
pattern=re.compile(r"\+\+value\%3D\%27([A-Z0-9._%+-]+)\%27\%3E",re.IGNORECASE)
Running the above regex on the given string with
pattern.findall(mystring)
just coughs out the entire string at me. I have escaped '+' and '%' in the regex as they are special characters but I am obviously doing something wrong.
I would be very grateful for any help.