Does anyone know how to download the index page from the website using a python script?
For a start, I don't understand the concept, google doesnt seem to throw up any relevant articles so am a little lost!
Does anyone know how to download the index page from the website using a python script?
For a start, I don't understand the concept, google doesnt seem to throw up any relevant articles so am a little lost!
from urllib2 import urlopen
print(urlopen('http://www.daniweb.com/forums/').read())
Maybe my between code snippet would be handy to pick out the info you want.
If its a site that doesn't allow programs or scripts to access them, you'll need to change your user-agent, and possibly be able to handle cookies.
import urllib, urllib2, cookielib
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders.append(('User-agent', 'Mozilla/4.0'))
opener.addheaders.append( ('Referer', 'http://www.daniweb.com') )
resp = opener.open('http://www.daniweb.com')
source_of_index = resp.read()
#write contents to file to see if we done it right
f = open('fi.html','w')
f.write(source_of_index)
f.close()
resp.close()
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.