Dear web gods:
After much, much, much struggle with unicode, many an hour reading all the examples online, coding them, testing them, ripping them apart and putting them back together, I am humbled. Therefore, I humble myself before you to seek guidance on a simple python unicode cgi-bin scripting problem.
My problem is more complex than this, but how about I boil down one sticking point for starters. I have a file with a Spanish word in it, "años", which I wish to read with:
#!C:/Program Files/Python23/python.exe
STARTHTML= u'''Content-Type: text/html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
</head>
<body>
'''
ENDHTML = u'''
</body>
</html>
'''
print STARTHTML
print open('c:/test/spanish.txt','r').read()
print ENDHTML
Instead of seeing "año" I see "a�o". BAD BAD BAD
Yet, if I open the file with the browser (IE/Mozilla), I see "año." THIS IS WHAT I WANT
WHAT GIVES?
Next, I'll get into codecs and stuff, but how about starting with this?
The general question is, does anybody have a complete working example of a cgi-bin script that does the above properly that they'd be willing to share? I've tried various examples online but haven't been able to get any to work. I end up seeing hex code for the non-ascii characters u'a\xf1o', and later on 'a\xc3\xb1o', which are also BAD BAD BAD.
Thanks -- your humble supplicant.