Hey guys,
im trying to extract the top 10 links from a yahoo search results page. i can get all the links using the code below.. but that could be 70 links.
Any idea how i could get just those top 10 ranked ones? and not the adverts etc.
ie for this page..
http://uk.search.yahoo.com/search?p=python&fr=yfp-t-501&ei=UTF-8&meta=vc%3D
i would only want
1. www.python.org
2. www.pythonline.com
3. www.python.org/download
.
.
10.
etc
heres main lump of my code that returns ALL links on that page.
Is there even anything to distinguish which are in the top ten that way i could try extract them.
if __name__ == "__main__":
import urllib
usock = urllib.urlopen("http://uk.search.yahoo.com/search?p=python&fr=yfp-t-501&ei=UTF-8&meta=vc%3D")
parser = URLLister()
parser.feed(usock.read())
parser.close()
usock.close()
path = u"c:\\Users\\admin\\Desktop\\"
i = 0
for url in parser.urls:
if i <= (len(parser.urls)):
print i
print parser.urls[i]
page = urllib.urlopen(parser.urls[i]).read()
f = file(path + u"test" + str(i) + u".txt", "w+")
print >> f, page
f.close()
print "Html file successfully printed to file!"
any help appreciated,
thanks guys :)