I am working on an exercise from Google's Python class dealing with popular baby names. I have the program running properly when using only one filename, but when I try to use the wildcard to get all files with baby####.html files I get differing errors every time I run the program.
Here is the problem code:
matches = re.findall('<td>[0-9]+<\/td><td>(\w+)<\/td><td>(\w+)', file)
for match in matches:
#print(match)
rank = match[0]
name = match[1]
item = (year, name, rank)
names.append(item)
#print(item)
name = match[2]
item = (year, name, rank)
names.append(item)
#print(item)`
On the re.findall statement, the error message invalid escape sequence on \d, so I changed it to look for a numeric range.
When I ran the program again, I get a subscript out of range error on the name = match[2], which is the female name in a tuple with (popularity, male name, female name) in it. When I ran the program last night on a single file, I got the results I expected. But now, not even the single file run works. Keep in mind, in both instances, the same code is being executed.
Obviously, I'm new to Python, but have a solid understanding of how object oriented design works, having taught myself VB.NET, C# and Java.
I don't understand why running the same code with the same parameters causes these errors. It's very frustrating when the language itself has these kinds of issues.
As always, any help is appreciated!
Tom