I'm working with some really ugly files at the moment When I get them they can look like any of these:
All data on one line delimited by ┌data1|data2|data3|┌data1|data2|data3|┌data1|data2|data3|┌data1|data2|data3|┌data1|data2|data3|┌
Nice data. All the bits I'm interrested already one one line per bit of information:data1|data2|data3|
data1|data2|data3|
Mixed:data1|data2|data3|┌data1|data2|data3|┌data1|data2|data3|┌
data1|data2|data3|
data1|data2|data3|
or even:"data1|data2|data3|"
"data1|data2|data3|"
"data1|data2|data3|"
"data1|data2|data3|"
So at the moment I have this:
import os
def process_data(data):
print '%s' % data
directory = '.'
absdir = os.path.abspath(directory)
for files in os.listdir(absdir):
if files.startswith(('Upd', 'UPD')):
nfile = os.path.join(absdir,files)
print nfile
with open(nfile, 'r') as infile:
for line in infile:
#discard blank lines
if not line.strip():
continue
else:
line = line.strip()
if '' in line:
lines = line.split('')
for sline in lines:
process_data(sline[:-1])
elif line.startswith('"') and line.endswith('"'):
process_data(line[1:-2])
else:
process_data(line[:-1])
This seems to work ok but I'm not convinced this is the best way to go about this. Does anyone have anyt suggestions on how I can tidy this up?
Also the delimiter character is not really the one I have but it is the closest I could find that would display here.