I need to convert .docx files to .doc (on Linux and Windows).
I'm planning to use the zip mod to access all of the internal XML documents.
Then I'll the /word/document.xml and I need to parse it so that it will read all of the text in the tags, place all of the text strings in a list, and then print the basic list.
Very simple stuff, xcept how do you actually parse an XML file?
Using;
from os import name, getcwd
cwd = getcwd()
if name != 'nt': dirType = '/'
else: dirType = '\\'
xml = open('%s%sword%sdocument.xml' % (cwd, dirType, dirType))
text = xml.read()
line = 0
repr(xml)
size = len(xml)
while line != size:
.. text = xml[line]
.. line = (line+1)
.. repr(text[1])
is a pain. Does it even work??
So how do you parse an XML file?