Hi all.
I would like to be able to parse some data from a password protected site.
The parsing of the data is already developed and tested (I manually logged in to the site and downloaded the source code for testing purposes).
I am stuck at the log in part. I have been reading a lot about it, but still haven't managed to do it by myself. I must say my knowledge of python is pretty basic. I have already read urllib2 The Missing Manual, the urllib2 documentation, this article, too, and still haven't succeeded. I know the answer is in these pages but I am in need of a little guidance here. A month ago I didn't know how to do the 'hello world' in python and now I am dealing with HTTP Authentication, openers, handlers ! So you can imagine how much confused I am.
Correct me if I am wrong, which probably I am, first I have to submit the username and password from the form as POST and then do the HTTP Authentification thing? Or is submitting the POST variables enough?
I have created an account at the site so that you can, if you will, work with real data.
Login page
username = nunos123
password = qwerty
The 'id' for the username field is : 'frmUsername'
The 'id' for the password field is : 'frmPassword'
Here are the bits of code I collected and adapted from the previous links I think that at some point will be used.
Please note that this is not a working code!
import urllib
import urllib2
url = 'http://www.bricklink.com/login.asp?logPage=/my.asp&logFolder=p&logSub=w'
username = 'nunos123'
password = 'qwerty'
user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
#send username and password as POST and add user_agent header
values = {'username' : username, 'password' : password}
headers = { 'User-Agent' : user_agent }
data = urllib.urlencode(values)
req = urllib2.Request(url, data, headers)
response = urllib2.urlopen(req)
the_page = response.read()
#HTTP Authentication
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, url, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
pagehandle = urllib2.urlopen(theurl)
Any help on this is greatly appreciated. Thanks for your time.