I am trying to extract three values from the td tags in an html downloaded file.
<tr align="right"><td>236</td><td>Roy</td><td>Allyson</td>
<tr align="right"><td>237</td><td>Marvin</td><td>Pamela</td>
<tr align="right"><td>238</td><td>Micah</td><td>Kristine</td>
<tr align="right"><td>239</td><td>Collin</td><td>Raquel</td>
I am using the pattern match = re.findall(r'<td.?>([\d+])([.?])*<\/td>', file)
The file is created with a read() statement.
The output should look like
(236, "Roy", "Allyson")
(237, "Marvin", "Pamela")
(238, "Micah", "Kristine")
(239, "Collin", "Raquel")
What I get is
(236, "")
(237, "")
(s38, "")
(239, "")
I've tried different variations of the same pattern and get
('236', '23', '6')
('Roy', '', 'Roy)
('Allyson', '', 'Alison')
('237', '23', '7')
('Marvin', '', 'Marvin')
('Pamela', '', 'Pamela')
('238', '23', '8')
('Micah', '', 'Micah')
('Kristine', '', 'Kristine')
('239', '23', '9')
('Collin', '', 'Collin')
('Raquel', '', 'Raquel')
I'm relatively new to regular expressions so be gently, but any help would
be appreciated.
PS: I'm using Pythoon