urllib2 problem

Question

leegeorg07

15 Years Ago

Hi, I have this code:

import urllib2 as url
import webbrowser

def extract(text, sub1, sub2):
    """
    extract a substring from text between first
    occurances of substrings sub1 and sub2
    """
    return text.split(sub1, 1)[-1].split(sub2, 1)[0]
start="http://xkcd.com/"
permlist=[]
textlist=[]
for i in range(1, 638):
    temp=start+str(i)
    permlist.append(str(url.urlopen(temp).readlines()[88]))
    textlist.append(str(url.urlopen(temp).readlines()[77]))

for i in permlist:
    i = extract(i, '<h3>Permanent link to this comic: ', '</h3>')

for i in textlist:
    i = extract(i, '<img src="http://imgs.xkcd.com/comics/scribblenauts.png" title="', '"')


print zip(permlist, textlist)

and whenever I run it, it raises this error:

Traceback (most recent call last):
  File "C:/Python26/test.py", line 15, in <module>
    permlist.append(str(url.urlopen(temp).readlines()[88]))
  File "C:\Python26\lib\urllib2.py", line 124, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python26\lib\urllib2.py", line 389, in open
    response = meth(req, response)
  File "C:\Python26\lib\urllib2.py", line 502, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python26\lib\urllib2.py", line 427, in error
    return self._call_chain(*args)
  File "C:\Python26\lib\urllib2.py", line 361, in _call_chain
    result = func(*args)
  File "C:\Python26\lib\urllib2.py", line 510, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 404: Not Found

What is the problem, but mainly what can I do to fix it?

thanks in advance

python

5 Contributors
10 Replies
207 Views
4 Days Discussion Span
Latest Post 15 Years Ago Latest Post by ov3rcl0ck

All 10 Replies

sneekula 969 Nearly a Posting Maven

15 Years Ago

Looks like one of the 638 web pages is not available. You should use a try/except trap for this case.

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

Since you don't know the specific error class, simply use ...

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except: # catch any exception and continue the for loop
        print "Error at index %d."%i
        pass

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

Well you found it in your first post ...
HTTPError

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

leegeorg07 · Answer 1 · 2009-09-22T00:34:14+00:00

so what could I use?

sorry, at the moment I just want a quick fix and will figure out the best way when I have time

djidjadji 28 Light Poster · Answer 2 · 2009-09-22T03:17:28+00:00

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except Error: # catch any exception and continue the for loop
        print "Error at index %d."%i

ov3rcl0ck 25 Junior Poster · Answer 3 · 2009-09-22T20:23:24+00:00

Yeah you'll need to use exceptions, but if you want the script to continue after the error you're going to have to "pass" it, try this:

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except Error, err:
        print "Index Error: %d at %d" % (err, i)
        pass

this will not only print the error and the location of the error but will also pass to keep the loop going.

leegeorg07 · Answer 4 · 2009-09-22T23:20:09+00:00

hey again, they are good ideas but whenever I try to run it again it says:

Traceback (most recent call last):
  File "C:\Python26\test.py", line 18, in <module>
    except Error, err:
NameError: name 'Error' is not defined

leegeorg07 · Answer 5 · 2009-09-22T23:53:52+00:00

ok thanks, trying it now, So that I can do better handling soon, how can I find the class?

leegeorg07 · Answer 6 · 2009-09-23T16:15:48+00:00

Oh ok thanks, whenever I run the zip part it uses the original text, not what I changed it to, I tried:

for i, j in permlist, textlist:
  print i, ':', j

but it says that it is out of range, what can I do? I have googled it to no avail :(

ov3rcl0ck 25 Junior Poster · Answer 7 · 2009-09-25T23:51:32+00:00

Since you don't know the specific error class, simply use ...

for i in range(1, 638):
    try:
        temp=start+str(i)
        permlist.append(str(url.urlopen(temp).readlines()[88]))
        textlist.append(str(url.urlopen(temp).readlines()[77]))
    except: # catch any exception and continue the for loop
        print "Error at index %d."%i
        pass

My bad i used the wrong exception, vegaseat is right.

urllib2 problem

Recommended Answers Collapse Answers

All 10 Replies

Recommended Answers