What I am trying to acheive is to parse an xml file break it up into useful components and push it to a multi table SQL database. But I cannot get off the ground in the basics.
Take an xml file like this Click Here
Which at the start is
<meeting id="35504" barriertrial="0" venue="Hawkesbury" date="2014-05-13T00:00:00" gearchanges="-1" stewardsreport="-1" gearlist="-1" racebook="0" postracestewards="0" meetingtype="TAB" rail="+4m 1300m to Winning Post, True Remainder" weather="Fine " trackcondition="Dead " nomsdeadline="2014-05-07T11:00:00" weightsdeadline="2014-05-08T16:00:00" acceptdeadline="2014-05-09T09:00:00" jockeydeadline="2014-05-09T12:00:00">
<club abbrevname="Hawkesbury Race Club Limited" code="20" associationclass="2" website="http://" />
<race id="185360" number="1" nomnumber="1" division="0" name="XXXX GOLD BENCHMARK 70 HANDICAP" mediumname="BM70" shortname="BM70" stage="Acceptances" distance="2000" minweight="55" raisedweight="1" class="BM70 " age="~ " grade="0" weightcondition="HCP " trophy="0" owner="0" trainer="0" jockey="0" strapper="0" totalprize="22000" first="12250" second="4250" third="2100" fourth="1000" fifth="525" time="2014-05-13T13:03:00" bonustype="BX02 " nomsfee="0" acceptfee="0" trackcondition=" " timingmethod=" " fastesttime=" " sectionaltime=" " formavailable="0" racebookprize="Of $22000. First $12250, second $4250, third $2100, fourth $1000, fifth $525, sixth $375, seventh $375, eighth $375, ninth $375, tenth $375">
<condition line="1">Of $22000. First $12250, second $4250, third $2100, fourth $1000, fifth $525, sixth $375, seventh $375, eighth $375, ninth $375, tenth $375</condition>
<condition line="2">Starter Subsidy: $200 for non-prize earning runners.</condition>
<condition line="3">BenchMark 70, Handicap, For No age restriction, No sex restriction (Weights Raised 1.0kg.)</condition>
<condition line="4">BOBS&BOBS Extra Bonus available: $5,000</condition>
<condition line="5">Apprentices can claim. Field Limit: 12 + 4 EM</condition>
<nomination number="1" saddlecloth="1" horse="Our Uncle Archie" id="170617" idnumber="" regnumber="" blinkers="1" trainernumber="324" trainersurname="Englebrecht" trainerfirstname="Steve" trainertrack="Warwick Farm" rsbtrainername="Steve Englebrecht" jockeynumber="86428" jockeysurname="Pracey-Holmes" jockeyfirstname="Jake" barrier="2" weight="58" rating="68" description="BR G 3 Duke of Marmalade(IRE) x Nena Candida (Canny Lad)" colours="Red And Green Hoops, Black Sleeves, Red Armbands, Black And Red Seams Cap" owners="A J Watson, Mrs S C Watson, J R Watson, P K Watson, R F Watson, J M Cockburn & Mrs J A Cockburn " dob="2010-09-19T00:00:00" age="4" sex="G" career="8-3-0-0 $53605.00" thistrack="1-1-0-0 $17250.00" thisdistance="0-0-0-0" goodtrack="4-1-0-0 $18655.00" heavytrack="1-1-0-0 $17250.00" slowtrack="0-0-0-0" deadtrack="3-1-0-0 $17700.00" fasttrack="0-0-0-0" firstup="3-0-0-0 $955.00" secondup="2-0-0-0 $900.00" mindistancewin="0" maxdistancewin="0" finished="0" weightvariation="0" variedweight="58" decimalmargin="0.00" penalty="0" pricestarting="" sectional200="0" sectional400="0" sectional600="0" sectional800="0" sectional1200="0" bonusindicator="E" />
<nomination number="2" saddlecloth="2" horse="Montiro" id="158475" idnumber="" regnumber="" blinkers="0" trainernumber="279" trainersurname="Conners" trainerfirstname="Clarry" trainertrack="Warwick Farm" rsbtrainername="Clarry Conners" jockeynumber="965" jockeysurname="Hammersley" jockeyfirstname="Paul" barrier="1" weight="57.5" rating="65" description="CH G 4 Royal Academy(USA) x Stormy Petrel (Flying Spur)" colours="Yellow, Royal Blue Armbands And Cap" owners="Victory Lodge Syndicate (Mgrs: C & M Conners), P J Collier, B E Collier, A W Rohde, D Thom & Mrs M Gelardi" dob="2009-10-09T00:00:00" age="5" sex="G" career="9-2-1-1 $38975.00" thistrack="2-0-1-0 $4625.00" thisdistance="0-0-0-0" goodtrack="3-0-1-0 $5625.00" heavytrack="0-0-0-0" slowtrack="3-1-0-1 $20350.00" deadtrack="3-1-0-0 $13000.00" fasttrack="0-0-0-0" firstup="2-1-0-0 $17625.00" secondup="2-0-1-1 $6350.00" mindistancewin="0" maxdistancewin="0" finished="0" weightvariation="0" variedweight="57.5" decimalmargin="0.00" penalty="0" pricestarting="" sectional200="0" sectional400="0" sectional600="0" sectional800="0" sectional1200="0" bonusindicator="K" />
So I can read it in just fine. I can grab single elements just fine.
In [10]: %paste
import xmltodict
document = open("/home/sayth/Scripts/va_benefits/20140513HAWK0.xml", "r")
read_doc = document.read()
xml_doc = xmltodict.parse(read_doc)
## -- End pasted text --
In [11]: xml_doc['meeting']['@id']
Out[11]: u'35504'
But I cannot get multiple items out into a list so I can push it into the database table. Well I can get every item out its xml_doc['meeting'].
If I try to specifiy
In [14]: a = []
In [15]: a = xml_doc(['meeting']['@id'],['meeting']['@venue'])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-4c706827a308> in <module>()
----> 1 a = xml_doc(['meeting']['@id'],['meeting']['@venue'])
TypeError: list indices must be integers, not str
I can manually do it, but how can I 'automate' it. So that every import filters the same way and can easily update the database.
This is how I can manually do it.
In [17]: a.append(xml_doc['meeting']['@id'])
In [18]: a.append(xml_doc['meeting']['@venue'])
In [19]: print a
[u'35504', u'Hawkesbury']