Banging head against SAX

Question

fake-name 0 Newbie Poster

16 Years Ago

Ok...

I'm working on a program that takes ISBN numbers, grabs data on them from ISBNdb.com and sticks said data into a database.

At this point, I've successfully managed to get the ISBN XML file from isbndb.com, but whenever I try to feed it into SAX, it seems to try to treat the XML file as as URL, and retrieve it from the web, and then crashes.

In any case, here's the main.py

import urllib
import sys
from string import split

from xml.sax  import make_parser 
from Handler import BookHandler 

ISBNdb_Act = "J6JNO9UH"

while 1:
	bc = raw_input("Scan Barcode\n")
	print bc
	strQuery = "http://isbndb.com/api/books.xml?access_key="+ISBNdb_Act+"&index1=isbn&value1="+bc
	print strQuery
	print ("\n")
	ISBNdb_fh = urllib.urlopen(strQuery)
	ISBNdb_XML = ISBNdb_fh.read()
	print ISBNdb_XML
	
	
	
	ch = BookHandler( ) 
	saxparser = make_parser( ) 
 
	saxparser.setContentHandler(ch) 
	saxparser.parse(ISBNdb_XML) 

		
	"""
	print "Done!\n\n"
	"""

The Handler

from xml.sax.handler import ContentHandler 

class BookHandler(ContentHandler):

	def startElement(self, name, attributes):
		print "Start element:", name

And what I get on the command line

C:\BookDb>main.py
Scan Barcode
9781565847033
9781565847033
http://isbndb.com/api/books.xml?access_key=J6JNO9UH&index1=isbn&value1=9781565847033


<?xml version="1.0" encoding="UTF-8"?>

<ISBNdb server_time="2008-09-10T06:21:22Z">
<BookList total_results="1" page_size="10" page_number="1" shown_results="1">
<BookData book_id="understanding_power" isbn="1565847032">
<Title>Understanding power</Title>
<TitleLong>Understanding power: the indispensable Chomsky</TitleLong>
<AuthorsText>edited by Peter R. Mitchell and John Schoeffel</AuthorsText>
<PublisherText publisher_id="new_press">New York : New Press, c2002.</PublisherText>
</BookData>
</BookList>
</ISBNdb>

Traceback (most recent call last):
  File "C:\BookDb\Main.py", line 26, in <module>
    saxparser.parse(ISBNdb_XML)
  File "C:\Python25\lib\xml\sax\expatreader.py", line 102, in parse
    source = saxutils.prepare_input_source(source)
  File "C:\Python25\lib\xml\sax\saxutils.py", line 298, in prepare_input_source
    f = urllib.urlopen(source.getSystemId())
  File "C:\Python25\lib\urllib.py", line 82, in urlopen
    return opener.open(url)
  File "C:\Python25\lib\urllib.py", line 187, in open
    return self.open_unknown(fullurl, data)
  File "C:\Python25\lib\urllib.py", line 199, in open_unknown
    raise IOError, ('url error', 'unknown url type', type)
IOError: [Errno url error] unknown url type: '?xml version="1.0" encoding="utf-8"?>\n\n<isbndb server_time="2008-09-10t06'

C:\BookDb>

System is WXP SP3 w/
Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Intel)] on win32

I'm not the best coder around (by a long shot), but this is so simple I cannot see where it's going wrong.

On the other hand, I've got the barcode scanner front-end working perfectly; I'm better with hardware.

api open-source python xml

2 Contributors
1 Reply
341 Views
15 Hours Discussion Span
Latest Post 16 Years Ago Latest Post by jlm699

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

jlm699 320 Veteran Poster · Answer 1 · 2008-09-10T22:14:01+00:00

From the documentation

parse( source)
Process an input source, producing SAX events. The source object can be a system identifier (a string identifying the input source - typically a file name or an URL), a file-like object, or an InputSource object. When parse() returns, the input is completely processed, and the parser object can be discarded or reset. As a limitation, the current implementation only accepts byte streams; processing of character streams is for further study.

So it looks like you can't simply send text; a quick work-around would be to save that ISBNdb_XML to a temporary file and then use the file path as the parameter to parse()