Member Avatar for leegeorg07

Hi, I have this code:

import urllib2 as url
import webbrowser

def extract(text, sub1, sub2):
    """
    extract a substring from text between first
    occurances of substrings sub1 and sub2
    """
    return text.split(sub1, 1)[-1].split(sub2, 1)[0]
start="http://xkcd.com/"
permlist=[]
textlist=[]
for i in range(1, 638):
    temp=start+str(i)
    permlist.append(str(url.urlopen(temp).readlines()[88]))
    textlist.append(str(url.urlopen(temp).readlines()[77]))

for i in permlist:
    i = extract(i, '<h3>Permanent link to this comic: ', '</h3>')

for i in textlist:
    i = extract(i, '<img src="http://imgs.xkcd.com/comics/scribblenauts.png" title="', '"')


print zip(permlist, textlist)

and whenever I run it, it raises this error:

Traceback (most recent call last):
  File "C:/Python26/test.py", line 15, in <module>
    permlist.append(str(url.urlopen(temp).readlines()[88]))
  File "C:\Python26\lib\urllib2.py", line 124, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python26\lib\urllib2.py", line 389, in open
    response = meth(req, response)
  File "C:\Python26\lib\urllib2.py", line 502, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python26\lib\urllib2.py", line 427, in error
    return self._call_chain(*args)
  File "C:\Python26\lib\urllib2.py", line 361, in _call_chain
    result = func(*args)
  File "C:\Python26\lib\urllib2.py", line 510, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found

What is the problem, but mainly what can I do to fix it?

thanks in advance

Looks like one of the 638 web pages is not available. You should use a try/except trap for this case.

Member Avatar for leegeorg07

so what could I use?

sorry, at the moment I just want a quick fix and will figure out the best way when I have time

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except Error: # catch any exception and continue the for loop
        print "Error at index %d."%i

Yeah you'll need to use exceptions, but if you want the script to continue after the error you're going to have to "pass" it, try this:

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except Error, err:
        print "Index Error: %d at %d" % (err, i)
        pass

this will not only print the error and the location of the error but will also pass to keep the loop going.

Member Avatar for leegeorg07

hey again, they are good ideas but whenever I try to run it again it says:

Traceback (most recent call last):
  File "C:\Python26\test.py", line 18, in <module>
    except Error, err:
NameError: name 'Error' is not defined

Since you don't know the specific error class, simply use ...

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except: # catch any exception and continue the for loop
        print "Error at index %d."%i
        pass
Member Avatar for leegeorg07

ok thanks, trying it now, So that I can do better handling soon, how can I find the class?

Well you found it in your first post ...
HTTPError

Member Avatar for leegeorg07

Oh ok thanks, whenever I run the zip part it uses the original text, not what I changed it to, I tried:

for i, j in permlist, textlist:
  print i, ':', j

but it says that it is out of range, what can I do? I have googled it to no avail :(

Since you don't know the specific error class, simply use ...

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except: # catch any exception and continue the for loop
        print "Error at index %d."%i
        pass

My bad i used the wrong exception, vegaseat is right.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.