Hi everyone,

I'm a moderately experienced C++ programmer working on code which must do the following:
(a) Import data from a lot of little CSV files
(b) Load that data into various objects
(c) Do stuff with that data

The code I've written does (a), (b), and (c) pretty well, but I've noticed a problem with (a), which I want to ask you guys about.

Suppose I have 1000 source files. My program successfully processes Files #1 through #500. But when it reaches File #501, my program chokes and seg faults, and I automatically lose ALL the data I've collected. This is a big problem, because there is a LOT of data to process. It may take me three or four hours just to reach File #500.

When the program reads a file, each individual line is loaded into a string called Line, which is then parsed for individual values. If I'm reading gdb right (output below), the parsing is causing the trouble. (Lines 15 & 16, below) As for the line in the file which is causing the trouble, I don't see any format problems with the line itself. When I run the program multiple times, it is the same exact line which causes the seg fault every time.

What would be awesome would be a way to tell the program, "if you see a line which confuses you, skip that line, don't just automatically crash!" Skipping the entire file would be okay too.

Below is the code I'm using. Below that is the gdb analysis of why my program is choking. Any help or advice would be appreciated! :)

vector<string> ListOfFiles;
string Line;
vector<string> ValRow;

// Load all the file names into ListOfFiles

for(int i=0; i<ListOfFiles.size(); i++)
  {
    Line.clear();
    ifstream In_Flows((ListOfFiles[i]).c_str());
    while (getline(In_Flows, Line))
      {
        istringstream linestream(Line);
        ValRow.clear();
        while(getline(linestream, Value, ','))
          { ValRow.push_back(Value); }
      }
    //Load contents of ValRow into objects
  }

GDB Output:
Program received signal SIGSEGV, Segmentation fault.
0xff056b20 in realfree () from /lib/libc.so.1
(gdb) bt
#0 0xff056b20 in realfree () from /lib/libc.so.1
#1 0xff0573d4 in cleanfree () from /lib/libc.so.1
#2 0xff05652c in _malloc_unlocked () from /lib/libc.so.1
#3 0xff05641c in malloc () from /lib/libc.so.1
#4 0xff337734 in operator new () from /usr/local/lib/libstdc++.so.6
#5 0xff318fe4 in std::string::_Rep::_S_create ()
from /usr/local/lib/libstdc++.so.6
#6 0xff3196e0 in std::string::_M_mutate () from /usr/local/lib/libstdc++.so.6
#7 0xff30bd28 in std::getline<char, std::char_traits<char>, std::allocator<char> > () from /usr/local/lib/libstdc++.so.6
#8 0x000199ac in ReadTheFile (PtrFlowInfoFile=0xffbffc48,
PtrPrefixInfoFile=0xffbffc38, PtrRouterObjLibrary=0x41808, ATAFlag=true)
at ReadTheFile.h:119
#9 0x0001a2f0 in main (argc=2, argv=0xffbffccc) at Main.cpp:60
(gdb) up
#1 0xff0573d4 in cleanfree () from /lib/libc.so.1
(gdb) up
#2 0xff05652c in _malloc_unlocked () from /lib/libc.so.1
(gdb) up
#3 0xff05641c in malloc () from /lib/libc.so.1
(gdb) up
#4 0xff337734 in operator new () from /usr/local/lib/libstdc++.so.6
(gdb) up
#5 0xff318fe4 in std::string::_Rep::_S_create ()
from /usr/local/lib/libstdc++.so.6
(gdb) up
#6 0xff3196e0 in std::string::_M_mutate () from /usr/local/lib/libstdc++.so.6
(gdb) up
#7 0xff30bd28 in std::getline<char, std::char_traits<char>, std::allocator<char> > () from /usr/local/lib/libstdc++.so.6
(gdb) up
#8 0x000199ac in ReadTheFile (PtrFlowInfoFile=0xffbffc48,
PtrPrefixInfoFile=0xffbffc38, PtrRouterObjLibrary=0x41808, Flag=true)
at ReadTheFile.h:119
119 while(getline(linestream, Value, ','))
(gdb) print Line
$1 = {static npos = 4294967295,
_M_dataplus = {<std::allocator<char>> = {<__gnu_cxx::new_allocator<char>> = {<No data fields>}, <No data fields>},
_M_p = 0xd756c "DataPoint0,DataPoint1,DataPoint2,DataPoint3,DataPoint4,DataPoint5,DataPoint6,DataPoint7,DataPoint8,DataPoint9"}}
(gdb)

Member Avatar for iamthwee

There MUST be something wrong with that line, a corruption that is perhaps undetectable to the human eye. Try running through that file with a hex editor?

Don't quote me on this but i've seen a try catch{} statement in c++, it may be an option.

HmmMMMmmmm... Okay, that's good to know. I don't have any experience with hex editors, but I suppose today's a good day to learn. :)

I'll look into that catch{} statement...

Member Avatar for iamthwee

Another thing, if you're pushing the contents of ~500 files into one vector it may be a memory issue . . . But just eliminate that problematic line first.

Just read that ONE file and see if the same error flags up when your program gets to that line. Cool cool.

Awesome, thanks a million! I think you're right, I think this must be a memory problem. I'll retool the code to make sure I'm not littering the stack...
Many thanks!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.