Greetings all,
I'm actually proud to have a question I've never seen asked here. I've got a 1.5GB text file (YES, it's really 1.5GB) where each line needs a set of modifications. I've got the application running in a single-thread, whereby each line is read in, the modification is made, and the line is written back out.
This is fine, but it's slow. It took about an hour for the application to traverse this file. The IO itself is very fast, it's just the processing takes a second to do. So here's what I'd like to try:
While the reader has text in it, read a line.
Start a new thread to process the line, write the modified line to a file.
Sounds simple enough. The only thing is that each "line" could be anywhere from 1KB to a couple hundred MB a piece. This isn't a problem with the single-threaded application, but it is an issue when you're dealing with multiple threads, even on a quad-proc machine with 4GB of RAM.
Basically, I'd like to dynamically create a new thread to deal with each line that is in the TextReader. Once the thread finishes, it should write its data to file and release its resources. Past that, I'd like to consider a way to do it without using too much memory, or deal with OutOfMemory exceptions "cleanly", ie, without loss of data from the TextReader.
This might sound kind of complicated, I don't know. Can anyone suggest a direction to roll, or even a better strategy?