Hi,
I'm trying to extract certain things from a web page. The website is TVRage.com, and the example I'm using at the moment is the Warehouse 13 episode list. So far I've managed to get the title of the show using this code:
#!/usr/bin/env python
import urllib
def save_page(site="http://www.tvrage.com/Warehouse_13/episode_list"):
mypath = site
mylines = urllib.urlopen(mypath).readlines()
f = open('temp2.txt', 'w')
for item in mylines:
f.write(item)
f.close()
def find_title(temp="temp2.txt"):
f = open(temp, "r")
site = f.read()
f.close()
search1 = "<title>"
search2 = " (Episode"
starter = site.find(search1)
ender = site.find(search2)
#print "Starts at %s and ends at %s" % (starter, ender) Just gives the indexes
print site[(starter+19):ender]
Now I'm trying to get episode numbers, dates, and titles, the only problem is I can't figure out how to extract them from the html. So far I've tried this code to no effect:
def find_episodes(temp="temp2.txt"):
f = open(temp, "r")
site = f.read()
f.close()
for line in site:
if '/Warehouse_13/episodes/1064905360' in line:
print line
else:
print "We got nothing."
Any suggestions would help tremendously.