I am working on a project and I need to process a text file.
I have read in the text file.
What I want to do is break the textfile up. The textfile looks like this:
>Name 1
ABCDEF
GHIJKLM
>Name2
GHIJKLM
What I want is to store each name and each sequence that follows separately it. For instance Name[0] = Name1. Name2 = Name2. Letters[0] = ABCDEF and Letter[1] = GHIJKLM.
I have done this in Java where I used the strink tokenizer, but from what I have read, there is no tokenizer in C++.
So far I have read the entire contents of the text file into a buffer. Then from there I have split up the file into two parts. Now I need to separate each name from the set of letters. Here is what I have so far
void processFile (){
string contents;
string fileName;
cout << "Enter the file name: ";
getline (cin,fileName);
//Open file
ifstream file(fileName.c_str()); // might want to add binary mode here
//Read contents of file into a string
stringstream buffer;
buffer << file.rdbuf();
string str(buffer.str());
contents = str.c_str();//entire file
//close the file
file.close();
//Use tokenizer function to get name and sequence sets
//will store the tokes of each name+sequence
vector<string> sets;
//get the sets - name+sequence
Tokenize (contents, sets, ">");
//stores the splitted names and sequences
vector<string>dna;
//split the sets
for (int x = 0; x < sets.size(); x++){
Tokenize (sets[x], dna, "\n");
}
//store the names
for (int i = 0; i<dna.size();){
names.push_back(dna[i]);
i = i + 2;
}
//store each sequence
for (int j = 1; j<dna.size();){
sequences.push_back(dna[j]);
j = j + 2;
}
}//End processFile
void Tokenize(const string& str,vector<string>& tokens, string del)
{
string delimiters = del;
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
The problem is with getting the names and the sets of letters. Basically I split the string I made at each occurence of ">". After that it does not work well.