Parsing a text file

Question

bsh6wc 0 Newbie Poster

13 Years Ago

Hi,

I'm new to python and I am having issues attempting to input data into my code from a text file. The text looks like this:

>INFO> CELLID, #729,
>INFO> 20100520-035248 LightningTable (scale_1)
>INFO> LON,LAT -96.485,34.67,0

datatime, maxref, ref_-10, MaxVIL, TotalVIL, Size(km2), CGDenAVG, CGmaxden, CGCount, FlashCount, FlashDenAVG, MESH
20:03:30:05, 63.5, 59.5, 44.0613, 18091.8, 311.062, 0, 0, 0, 40.5807, 0.0332085, 27.41,
20:03:32:06, 63.5, 60.5, 48.3901, 17427.8, 270.588, 0, 0, 0, 76.9209, 0.0723621, 38.0266,

There are several "cells", and I need to pull out the FlashCount column from each cell.

Thanks.

python

2 Contributors
5 Replies
262 Views
2 Days Discussion Span
Latest Post 13 Years Ago Latest Post by TrustyTony

TrustyTony 888 ex-Moderator

13 Years Ago

Drop extra lines from beginning and use my code snippet: http://www.daniweb.com/software-development/python/code/293490

# text based data input with data accessible
# with named fields or indexing
from __future__ import print_function ## Python 3 style printing
from collections import namedtuple
import string

filein = open("sample.dat")

datadict = {}
for line in filein:
    if line.startswith(('>INFO','\n')):
        continue
    headerline = line.lower().replace('-','').replace('(','').replace(')', '') ## lowercase field names Python style
    break
## first non-letter and non-number is taken to be the separator
separator = headerline.strip(string.lowercase + string.digits)[0]
print("Separator is '%s'" % separator)

headerline = [field.strip() for field in headerline.split(separator)]
Dataline = namedtuple('Dataline',headerline)
print ('Fields are:',Dataline._fields,'\n')

for data in filein:
    data = [f.strip() for f in data.rstrip('\n '+separator).split(separator)]
    d = Dataline(*data)
    print(d.flashcount)

Edited 13 Years Ago by TrustyTony because: n/a

TrustyTony 888 ex-Moderator

13 Years Ago

Named tuple is for convenience and allows the column to be variable. If the data is allways at same column you can fix it or you can just count from header the correct column in each cell. Additional complication was caused by unconventional ending of the line with the separator instead of only newline.

filein = open("sample.dat")

for line in filein:
    if line.startswith(('>INFO','\n')):
        print(line.rstrip())
        continue
    headerline = line.split(', ')
    fieldno = headerline.index('FlashCount')
    break

for data in filein:
    d = data.split(', ')[fieldno]
    print(d)

filein.close()

Edited 13 Years Ago by TrustyTony because: simplified

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

bsh6wc 0 Newbie Poster · Answer 1 · 2011-07-21T03:08:51+00:00

Thank you,

Each text file contains multiple cells, and I am interested in the FlashCount separated by cell. Would dropping the first few lines allow me to do that? Sorry I wasn't very clear about that before.

Also, I'm on version 2.4.3 so I can't use namedtuple. Is there something else that I could do this with?

bsh6wc 0 Newbie Poster · Answer 2 · 2011-07-23T01:18:47+00:00

Thank you,

This has helped tremendously! I have managed to get this to work for a text file containing only 1 cell. Next, I want to get this to work with a text file containing multiple cells. If my data looks like:

>INFO> CELLID, #763,
>INFO> 20100520-035248 LightningTable (scale_1)
>INFO> LON,LAT -93.7,37.78,0

datatime, maxref, ref_-10, MaxVIL, TotalVIL, Size(km2), CGDenAVG, CGmaxden, CGCount, FlashCount, FlashDenAVG, MESH
20:03:42:29, 47, 40.5, 2.99706, 522.765, 383.863, -99900, -99900, -99900, -99900, -99900, 0.985357,
20:03:44:33, 49.5, 44, 3.88048, 807.916, 465.574, -99900, -99900, -99900, -99900, -99900, 2.5169,

>INFO> CELLID, #729,
>INFO> 20100520-035248 LightningTable (scale_1)
>INFO> LON,LAT -96.485,34.67,0

datatime, maxref, ref_-10, MaxVIL, TotalVIL, Size(km2), CGDenAVG, CGmaxden, CGCount, FlashCount, FlashDenAVG, MESH
20:03:30:05, 63.5, 59.5, 44.0613, 18091.8, 311.062, 0, 0, 0, 40.5807, 0.0332085, 27.41,
20:03:32:06, 63.5, 60.5, 48.3901, 17427.8, 270.588, 0, 0, 0, 76.9209, 0.0723621, 38.0266,

I was thinking I could somehow split the file up by searching for #'s, and then applying the bit of code I have to read a single cell. Is that a sound way of doing this? If so, how would I go about doing this?

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 3 · 2011-07-23T13:50:01+00:00

Line 11 already has checking for info line beginning the block, which checks your start of record, if you put all the lines (3-13) in proper loop and correct break from the for loop lines 11-13. That is your job. Of course you should save the data in loop instead of printing it, probably with the cellid given in first info line as key to dictionary.