Hi everybody. I come here again to ask for your advise. Thank you at first for all your attention.
I have a big text file containing, say more than 10000 letters. I want to write a program, openning it, parsing out each word and compare them with a target word, finding out how frequent the word appears. After trying for a whole morning I have written something that cracks immediately after running. Pretty annoying and frustrating.
The key problem is, I don't exactly know how to read a text file into a unknown large buffer and handel it. Below is just one of the versions I have already tried. Doesn't work.
//open the file and find the word "Bacteria"
#include <iostream>
#include <string.h>
#include <ctype.h>
#include <fstream>
#include "getline.h"
using namespace std;
char *tempFile="60_10.out"; //define the Temp file ,global variable
bool GetWord();
char * word; //hold the word
int main()
{
string Target="Bacteria";
while (GetWord())
{
if (!Target.compare(word))
{cout<<"The word"<<Target<<" is found!\n";}
else
{cout<<"Nop, the word"<<Target<<" is not there.\n";}
}
return 0;
}
bool GetWord()
{
char Buffer[256]; //reading the file in memory
int wordOffset=0; //start at the beginning
getline(tempFile,Buffer);
if (Buffer[wordOffset]==0) //end of the string?
return false;
char *p1, *p2;
p1=p2=Buffer+wordOffset; //point to the next word
//eat leading spaces
for (int i=0; i<(int)strlen(p1) && !isalnum(p1[0]);i++)
p1++; //!isalnum letter and number 0, else 1
//see if you have a word
if (!isalnum(p1[0]) && p1[0]!='.')
return false;
//p1 now points to start of the next word
//point p2 there as well
p2=p1;
//march p2 to the end of word
while (isalnum(p2[0]))
{p2++;}
//p2 is now at end of the word
//p1 is at beginning of word
//length of word is the difference
int len=int(p2-p1);
//copy the word into the buffer
strncpy(word,p1,len);
//null terminate it
word[len]='\0';
//now find the beginning of the next word
for (int j=int(p2-Buffer);j<(int)strlen(Buffer) && !isalnum(p2[0]);j++)
{
p2++;
}
wordOffset=int(p2-Buffer);
return true;
}
the input file 60_10.out looks something like this:
../fc1f02091-es_om1328.r9t_1_3 Bacteria_Cyanobacteria_Prochlorales_Prochlorococcus:marinus
fcb203-1k12.ff40_b1.SCF_-1_-2 Bacteria_Proteobacteria_Deltaproteobacteria_Desulfuromonadales_Geobacter_Geobacter:uraniumreducens
fcb205-1l2.rf40_b1.SCF_1_3 Bacteria_Planctomycetes_Planctomycetacia_PlanctomycetalesCandidatus_Kuenenia_Candidatus:Kuenenia
anke5gh01-es_ot7.s9t_1_2 Bacteria_Firmicutes_Clostridia
anke5gi01-es_ot7.s9t_1_2 Bacteria_Firmicutes_Clostridia
anke5ca06-es_ot7.s9t_1_3 Bacteria_Firmicutes_Clostridia
fcb205-1a17.ff40_b1.SCF_1_2 Bacteria_Proteobacteria_Deltaproteobacteria_Desulfobacterales_Desulfococcus_Candidatus:Desulfococcus
No idea why my program keeps crashing.
Thank you once more for your help.