Hi I have a file with duplicate records they look something like this:
<record>
<dateadd>012012</dateadd>
<nid>R04607295</nid>
<reflink></reflink>
<FPI>YES</FPI><TPG>NO</TPG><FT>YES</FT>
<num>631</num>
<author>Anon</author>
<title>ON THE WED</title>
</record>
<record>
<dateadd>012012</dateadd>
<idref>R04607297</idref>
<reflink></reflink>
<type>Article</type>
<FPI>YES</FPI><TPG>NO</TPG><FT>YES</FT>
<num>651</num>
<author>Bent, E</author>
<title>ENTRANCES AND EXITS</title>
</record>
<record>
<dateadd>012012</dateadd>
<nid>R04607295</nid>
<reflink></reflink>
<FPI>YES</FPI><TPG>NO</TPG><FT>YES</FT>
<num>631</num>
<author>Anon</author>
<title>ON THE WED</title>
</record>
<record>
<dateadd>012012</dateadd>
<idref>R04607297</idref>
<reflink></reflink>
<type>Article</type>
<FPI>YES</FPI><TPG>NO</TPG><FT>YES</FT>
<num>651</num>
<author>Bent, E</author>
<title>ENTRANCES & EXITS</title>
</record>
Not all the records are 100% duplicates(see & vs. AND) but the num fields contain a duplicate id that can be used.
Somewhere in the past I must have fiddled with this because I have this commented out at the bottom of a Python script I usually use to remove these pesky buggers:
sed -n "/<record>/,/<\/record>/p" 2010rec.got | sort | uniq | tee -a new.txt
I have fiddled with it some but I can't get it working. So my question is just is this at all possible? Thanks