Hi

I'm using python 3.1 and Windows.


My program writes xml code to a text file, the problem is that windows needs to know that this file's encoding is utf-8. Right now it does write the proper code, but other programs (firefox, for instance) fail to read it because it somehow got saved in the ansii encoding, which poses lot of problems when you use special characters.


Is there a way of getting python to write text to a file AND specify the encoding?


The program is running perfectly under Linux, as it is not that problematic with encodings, but I need to fix this problem under windows.


Thanks.

Perhaps you should write this at the top of your file

<?xml version="1.0" encoding="UTF-8"?>

I did.

It seems the problem is Windows: the files I use have a lot of french and spanish characters, so the ansii encoding can't deal with them.

Texts are written in notepad, and the standard encoding is ansii. The problems are these:

1. If I save the file with ansii encoding, the program displays the special characters correctly, but if I open the xml file with firefox, it shows xml parsing errors due to the special characters.

2. Now, if I save the file with utf-8 encoding, the python program fails to display the special characters corectly.

This means that, somehow, the pyton program is reading ansii, but not utf-8.

In XML, there are special codes for special characters. I write french all the time,and accented characters are encoded in xml files. For example "à" is encoded "&#224;". My solution to generate xml is to use the lxml module. Here is an example in idle

>>> import lxml.etree
>>> x = lxml.etree.fromstring("<root>J'habite à Bordeaux.</root>")
>>> print lxml.etree.tostring(x)
<root>J'habite & #224; Bordeaux.</root> # <-- no space between & and #

The method 'tostring' in lxml escapes accented characters.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.