Hi everyone,
First post here so feel free to school me if my etiquette needs it.
I'm reading a very long string of data (genome stuff AT TT GT AA AG AA ... etc) and comparing it using hamming distance with a similar string. (Naively checking for similarity entry by entry).
I have quite a few questions to put to you but lets begin with one that will probably demonstrate my level of C++.
My code reads as follows:
int main () {
ifstream indata;
string str[2][899701];
int j,count;
indata.open("file");
if(!indata) {
cerr << "Error: file could not be opened" << endl;
exit(1);
}
for (j=0; j<899701; j++){
indata >> str[0][j]; //read in base line
}
while(indata){
count=0;
for (j=0; j<899701; j++){
indata >> str[1][j]; //read in line to be compared
count=count+(str[0][j]==str[1][j]); //hamming distance
}
}
indata.close();
return 0;
}
My first question: Can I replace
indata >> str[1][j]; //read in line to be compared
count=count+(str[0][j]==str[1][j]); //hamming distance
with something like
count=count+(str[0][j]==indata.get);
which would combine the two steps. Also, if this is possible would it significantly speed things up. I have >5000 strings to compare and each has ~800000 pairs of letters.
I am also open to any suggestions about different approaches to take here. Is C++ a good choice for this kind of problem?
Any help/suggestions/scathing criticism is much appreciated.
Peace.