This is a follow up to my solved thread a few days ago about extracting parts of a RegEx search in a web scraping app. Now, I have a script that includes 4 RegEx searches that each work individually. Now I want to compile all 4 into a single search and return the 4 pieces of information in a single list. I've seen examples on the web using a plus sign "+" in between the RegEx's, but when I do that, I get an empty list returned (this is also the result if I use nothing between the searches). If I use "and" in place of "+", only the last search returns its value.
import re
import urllib
f = urllib.urlopen("http://www.atptennis.com/3/en/rankings/entrysystem/")
tennis_rankings = f.read()
#++++++++++++++++++ The RegEx's below all work individually
#They extract, in order: 1) player's rank, 2) player's name, 3) player's total points, 4) number of tourneys played
#tennis_players = re.compile("<div class=\"entrylisttext\">([\d+]*)</div>", re.I | re.S | re.M)
#tennis_players = re.compile("playernumber=[A-Z][0-9]+\" id=\"blacklink\">([a-zA-Z]+, [a-zA-Z]+)", re.I | re.S | re.M)
#tennis_players = re.compile("pointsbreakdown.asp\?player=[A-Z][0-9]+&ss=y\" id=\"blacklink\">([0-9]+)", re.I | re.S | re.M)
#tennis_players = re.compile("playeractivity.asp\?player=[A-Z][0-9]+\" id=\"blacklink\">([0-9]+)", re.I | re.S | re.M)
#++++++++++++++++++ Now, together as a single search
tennis_players = re.compile("<div class=\"entrylisttext\">([\d+]*)</div>" + "playernumber=[A-Z][0-9]+\" id=\"blacklink\">([a-zA-Z]+, [a-zA-Z]+)" + "pointsbreakdown.asp\?player=[A-Z][0-9]+&ss=y\" id=\"blacklink\">([0-9]+)" + "playeractivity.asp\?player=[A-Z][0-9]+\" id=\"blacklink\">([0-9]+)", re.I | re.S | re.M)
find_result = tennis_players.findall(tennis_rankings)
print find_result
print 'done
My preferred return is some sort of array of tuples:
[('1', 'Federer, Roger', '6600', '18'), ('2', 'Nadal, Rafael', '5800', '19'), ('3', 'Djokovic, Novak', '4900','20'), ...]
Any help would be appreciated!