I am working on some pre-processing. i have a group of text files whose names are dates (i.e. 20120201). data in each file is organized like this:
20120201_line number
field1: 123124235
field2:ergjenrgjnergoiefoiefoenr
.
.
field10:dfgkjndgknrkj
20120201_line number
.
.
.
The line number is the only thing that separates the different data elements, and there are about 10 attributes for each element. These text files are huge (> 1 GB). for a particular date, I have a list of line numbers representing data that is relevant to me, and I am only interested in 4 of the 10 fields. I'm trying to write a script that will iterate through the lines of text, and look for the line numbers of the elements I need, and then get the data for only 4 of the 10 fields(fields 2,5,7, and 8).
I am able to search the file line by line to find the line number I want using the readline() function, but I'm having issues when I try to use it again to find the appropriate fields. There's probably something obvious that I'm missing. Please help!