Best way to parse this webpage private info:
https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login&err=1
I found Beautiful soap and PyKhtml. What is the better?
Best way to parse this webpage private info:
https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login&err=1
I found Beautiful soap and PyKhtml. What is the better?
For log in use mechanize.
for parsing are beautifulSoup or lxml good choice.
import mechanize
browser = mechanize.Browser()
browser.open(your_url)
browser.select_form(nr=0) #Check form name nr=0 work for many
browser['username'] = "xxxxx"
browser['password'] = "xxxxx"
response = browser.submit()
html = response.read()
print html
urllib2 module can help you
For log in use mechanize.
for parsing are beautifulSoup or lxml good choice.import mechanize browser = mechanize.Browser() browser.open(your_url) browser.select_form(nr=0) #Check form name nr=0 work for many browser['username'] = "xxxxx" browser['password'] = "xxxxx" response = browser.submit() html = response.read() print html
File "Ekool.py", line 5, in <module>
browser.select_form(nr=0) #Check form name nr=0 work for many
File "/usr/lib/python2.6/site-packages/mechanize/_mechanize.py", line 527, in select_form
raise FormNotFoundError("no form matching "+description)
mechanize._mechanize.FormNotFoundError: no form matching nr 0
I'm trying to (0, 1, 2, 3) same problem.
Ok I'm triyng to login this webpage: https://ee.ekool.eu/index_et.html?r=2#?/
but unfortunatly it's not work. Any ideas?
Wheres your code?
Wich methods have you tryed to?
Wich are the errors?
Cheers and Happy coding
import urllib2
import urllib
# build opener with HTTPCookieProcessor
o = urllib2.build_opener( urllib2.HTTPCookieProcessor() )
urllib2.install_opener( o )
# assuming the site expects 'user' and 'pass' as query params
p = urllib.urlencode( { 'username': 'name', 'password': 'password' } )
# perform login with params
f = o.open( 'https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login', p )
data = f.read()
f.close()
Traceback (most recent call last):
File "Ekool.py", line 61, in <module>
f = o.open( 'https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login', p )
File "/usr/lib/python2.6/urllib2.py", line 397, in open
response = meth(req, response)
File "/usr/lib/python2.6/urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.6/urllib2.py", line 435, in error
return self._call_chain(*args)
File "/usr/lib/python2.6/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 405: Not Allowed
[timo@localhost ~]$
Second ver:
import urllib2
theurl = 'https://ee.ekool.eu/index_et.html?r=2#/?screenId=g.user.login'
username = 'name'
password = 'password'
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
# this creates a password manager
passman.add_password(None, theurl, username, password)
# because we have put None at the start it will always
# use this username/password combination for urls
# for which `theurl` is a super-url
authhandler = urllib2.HTTPBasicAuthHandler(passman)
# create the AuthHandler
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
# All calls to urllib2.urlopen will now use our handler
# Make sure not to include the protocol in with the URL, or
# HTTPPasswordMgrWithDefaultRealm will be very confused.
# You must (of course) use it when fetching the page though.
pagehandle = urllib2.urlopen(theurl)
# authentication is now handled automatically for us
print pagehandle
and output:
<addinfourl at 3068350220L whose fp = <socket._fileobject object at 0xb737616c>>
Maybe this works? How to control this, if connecting sucess?
Edit:
Second solution works! Thanks!
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.