can anyone offer performance tips to improve the running time ?
this function opens a file (a 7000 row by 30 col), and stores each elements in a matrix data . current running time is 4 sec, and i desperately need to minimize the running time as i need to iterate thru
thousands of such files please help, thanks

void iocsv(vector<vector<double> >& data, string path)
{
     string s;
     ifstream inFile;
     
     inFile.open(path.c_str());
     if (inFile) {
         while (getline(inFile, s)) {
               vector<double> col;
               tokenizer<escaped_list_separator<char> > tok(s);
                    for (tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end(); ++beg) {
                        istringstream price;
                        price.str(*beg);
                        double x;
                        price >> x;
                        col.push_back(x);
                    }
               data.push_back(col);
         }
     }   else { cerr << "Warning: cannot open file "  << path << endl;
                cerr << "Program terminating ......" << endl; }
            
     inFile.close();
}
Member Avatar for iamthwee

can anyone offer performance tips to improve the running time ?
this function opens a file (a 7000 row by 30 col), and stores each elements in a matrix data . current running time is 4 sec, and i desperately need to minimize the running time as i need to iterate thru
thousands of such files please help, thanks

void iocsv(vector<vector<double> >& data, string path)
{
     string s;
     ifstream inFile;
     
     inFile.open(path.c_str());
     if (inFile) {
         while (getline(inFile, s)) {
               vector<double> col;
               tokenizer<escaped_list_separator<char> > tok(s);
                    for (tokenizer<escaped_list_separator<char> >::iterator beg=tok.begin(); beg!=tok.end(); ++beg) {
                        istringstream price;
                        price.str(*beg);
                        double x;
                        price >> x;
                        col.push_back(x);
                    }
               data.push_back(col);
         }
     }   else { cerr << "Warning: cannot open file "  << path << endl;
                cerr << "Program terminating ......" << endl; }
            
     inFile.close();
}

Are the files sorted? The reason why I ask is because although the inital sorting costs a lot in terms of time, once sorted it would be a lot quicker to retrieve data.

hi the file is not sorted ... although i could build a macro to sort all the files (they are all in csv format) but i wish to see what other alternatives there are

How long does an empty loop take?

while (getline(inFile, s)) {
         }

Separate the "time to read the file" from the time to "tokenise the file".

It it takes <1 second, then there might be something you can do.

If it takes >3 seconds, then all your tokenising/vector stuff is not the problem.

> and i desperately need to minimize the running time as i need to iterate thru
> thousands of such files
Or just not worry about it and let the program run overnight, and it will all be done by morning anyway. If that can be done, it certainly isn't worth spending more than a day trying to make it vastly more efficient.
By your measure, it's about 900 files per hour.

it was good suggestion, well i've tested the emply loop and it took < 1 sec so it must be all the tokenizer and vector stuffs ... am i going overboard by using tokenizer, since i basically wanted to store all the elements in csv file into a matrix

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.