Hey,
Do any one know how to get only Lang ID from Google chrome site ("view-source:https://www.google.com/chrome?hl=en-GB") using this regex "<option value=""([a-zA-Z])"">[�-9; a-zA-Z()]</option>" ?
Hey,
Do any one know how to get only Lang ID from Google chrome site ("view-source:https://www.google.com/chrome?hl=en-GB") using this regex "<option value=""([a-zA-Z])"">[�-9; a-zA-Z()]</option>" ?
I have tried following but did not work
def getsourcecode():
url ="https://www.google.com/chrome?hl=da"
req = urllib2.Request(url, None)
source_code = urllib2.urlopen(req).read()
#return (source_code)
for line in getsourcecode:
matchObj = re.match(r"<option value=""([a-zA-Z]*)"">[�-9; a-zA-Z()]*</option>", line, re.M|re.I)
if matchObj:
print "matchObj.group(1) : ", matchObj.group(1)
else:
print "No match!!"
You can't double the double quotes like this
>>> r"<option value=""([a-zA-Z])"">[�-9; a-zA-Z()]</option>" # bad
'<option value=([a-zA-Z])>[�-9; a-zA-Z()]</option>'
>>> r'<option value="([a-zA-Z])">[�-9; a-zA-Z()]</option>' # good
'<option value="([a-zA-Z])">[�-9; a-zA-Z()]</option>'
Use kodos to debug regexes.
edit: in python, r"foo""bar""baz"
is the same as r"foo" + "bar" + "baz"
.
my mistake, just tried and did't worked, and i have tested regex its working.
def getsourcecode():
url ="https://www.google.com/chrome?hl=da"
req = urllib2.Request(url, None)
source_code = urllib2.urlopen(req).read()
#return (source_code)
for line in getsourcecode:
matchObj = re.match(r"<option value="([a-zA-Z])">[�-9; a-zA-Z()]</option>", line)
if matchObj:
print "matchObj.group(1) : ", matchObj.group(1)
else:
print "No match!!"
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.