Hello,

I am trying to teach myself C++ and am currently studying file I/O. What I am trying to do is this (and I am sure you have heard this before): open a text file and output each word with the number of occurrences. For example, if the file contains the phrase "the word the is used twice in this sentence", the output would look like this:

WORD QTY
the 2
word 1
is 1

etc, etc...

Now, finding the number of words is simply a matter of counting the whitespaces and adding 1.

int main()
{
  string input, fileName;
  cout << "Please enter filename: ";
  cin >> input;
  cout << endl;  
 
  ifstream inClientFile; 
  fileName = input; 
  inClientFile.open(fileName.c_str()); 
 
  if (!inClientFile) 
  {
     cerr << "File could not be opened!" << endl;
     exit( 1 );
  } 
 
 char ch; int words=0;
 while(!inClientFile.eof())
 {
  inClientFile.get(ch);
 
  if (isspace(ch))
  words++;
 
  cout << ch;
 }
 
  cout << endl;
  words++;
  cout << "There are " << words << " words in this sentence.";
 inClientFile.close();
 return 0; 
}

Obviously, a whitespace signifies that the characters before it, and after the last whitespace, are a word. But how do you pull out what is between the whitespaces? What is the logic here? Thanks...

l0g0rrhea

One way might be to use whitespace-delimited input functions.

#include <iostream>
#include <fstream>
#include <string>
using namespace std;

int main()
{
   ifstream file("file.txt");
   string word;
   while ( file >> word )
   {
      cout << word << '\n';
   }
   return 0;
}

/* file.txt
All work and no play makes Jack a dull boy
*/

/* my output
All
work
and
no
play
makes
Jack
a
dull
boy
*/

Thanks for the response. This helps a lot. I added to what you suggested and removed the periods as well (since they are not words).

while ( inClientFile >> word ) 
   {
     int position = word.rfind(".");//find position of period
     while ( position != string::npos )
     {
       word.erase(position);//remove period from output
       position = word.rfind( ".", position + 1 );//continue searching
     }
     cout << word << "\n";
   }

I am still stuck on how to count the number of occurrences for each word. I thought about taking each word and placing it in a character array, but I am not sure that would work out to well. I also thought about placing the current string into a temp variable then compare that variable to the next string. However, that would only tell me if two concurrent words are the same; which is worthless for what I am trying to do.

Member Avatar for iamthwee

The std::map would appear perfect for your particular problem. Have you thought about using that?

[note to dani] where did the backward slashes go?[/end note] \

Member Avatar for iamthwee
#include <iostream> 
#include <map> 
#include <string> 
#include <fstream>


int main() 
{ 
        std::ifstream in ("file.txt");
        
        std::map < std::string, int > count; 
        std::string s; 
        
        while( in >> s )
        {
            ++count[s]; 
        }  
        in.close();
        
        std::map <std::string, int>::iterator it; 
        for (it = count.begin(); it != count.end(); ++it) 
        std::cout << it->first << "  "<< it->second << std::endl; 
        
        std::cin.get();
}

Since I have never even heard of this, no! What does this do exactly? As I mentioned initially, I am just learning C++.

Member Avatar for iamthwee

Since I have never even heard of this, no! What does this do exactly? As I mentioned initially, I am just learning C++.

Since you have never heard of it, it may seem somewhat peculiar. However, the stl, or standard template library is there to make things easier for you.

The std::map, when used with the example program I have given, should count the frequency of each word in your file.

If you are unsure of how functions in the STL work you can always google for it. On the other hand, you may want to try and do this by yourself.

Thanks for your help. I will have to read up on this. I do have a
question about the way this sorts, though. It sorts in
alphabetical order, but what if I wanted to sort by number of
occurrences instead? That is to say, if I have the phrase "this is a
a stupid sentence" in my text file, how do I change this code...

while( inClientFile >> word )
        {
            ++count[word]; 
        }  
       inClientFile.close();
        
        map <string, int>::iterator it; 
        for (it = count.begin(); it != count.end(); ++it) 
          cout << it->first << it->second << endl;

so that it ouptuts with the word that has the highest number of
occurrences on top, with lesser occurrences following?
a 2
is 1
sentence 1
stupid 1
this 1
Does that make sense?
l0g0rrhea

Member Avatar for iamthwee

Yes you can sort by occurences instead of alphabetical order.

I'm not sure whether the <std::map> has such a function but you could easily write your own.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.