So I must build an app that has to do with mail address management and more.
Let's say the user has an excel file with 2 millions of rows (email addresses). I made it the oledb way and the first mistake I made was putting ~500k rows in a datagridview, bad bad mistake. The tiny app turned out to occupy ~700mb of ram.
I ditched the datagridview for now (I will later implement it with virtualization + on demand pages). Now with only the dataset it goes to about 170mb then settles at around 100mb of ram.
I would really appreciate some advice on what's the best way to deal with this kind of files(excel, text, csv all with about 2 mil rows) keeping in mind that I need to verify each row against a regex expression, delete duplicates and export to excel, csv or text files.
ogrishmania 0 Newbie Poster
Recommended Answers
Jump to PostAre you stuck with this technology choice? Because from your description, this project has outgrown Excel as a useful backing data store. I'd consider a database-oriented solution like MySQL (which does incorporate regular …
Jump to PostI think that, if possible, the data could be read in chunks of, say, 2K rows. Each time a chunk is read, first look for duplicates inside the chunk and, then, compare to the other 2M – 2K. If a row is present in the chunk of 2K and in …
All 8 Replies
gusano79 247 Posting Shark
xrjf 230 Posting Whiz
Reverend Jim 5,242 Hi, I'm Jim, one of DaniWeb's moderators. Moderator Featured Poster
ogrishmania 0 Newbie Poster
Reverend Jim 5,242 Hi, I'm Jim, one of DaniWeb's moderators. Moderator Featured Poster
Reverend Jim 5,242 Hi, I'm Jim, one of DaniWeb's moderators. Moderator Featured Poster
ogrishmania 0 Newbie Poster
ogrishmania 0 Newbie Poster
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.