Hello,
I need to store data in large lists (~e7 elements) and I often get a memory error in code that looks like:
f = open('data.txt','r')
for line in f:
list1.append(line.split(',')[1])
list2.append(line.split(',')[2])
# etc.
I get the error when reading-in the data, but I don't really need all elements to be stored in RAM all the time. I work with chunks of that data.
So, more specifically, I have to read-in ~ 10,000,000 entries (strings and numeric) from 15 different columns in a text file, store them in list-like objects, do some element-wise calculations and get summary statistics (means, stdevs etc.) for blocks of say 500,000. Fast access for these blocks would be needed!
I need to read everything in at once (so no f.seek() etc. to read the data a block at a time). So I'm looking for any alternative list implementation (or other list-like data structure) with which I could read all the data, store it on disk, and load in RAM a chunk/"page" of it at a time.
Any advice on how to achieve this? Platform = windowsXP
Cheers!