Hello everyone!

I am in need of some help with a homework assignment. It's basically creating a file indexer using C++.

I have 2 separate files that I need to read in: one is a text document and the other is a Skip words key. I am incredibly confused about how to approach this.

I know how to read both of them in and place them into separate containers (I plan on using vectors) but I just don't know how to combine them so that the Skip words text that line up with the document text won't be displayed (do I combine them into a map?).

I then need to display words that weren't in the skipWords doc in alphabetical order (I plan on using a vector and just sorting them) and output each location they appear.

Here is the doc file:

The quick brown fox
jumped over the lazy blue
fox. I can not believe I wrote such
a common phrase.
<newpage>
Where or where are you tonight?
Why did you leave me here all

alone?
<newpage>
I searched the world over
and thought I found true love.

Here is the skipWords file:

why
are
did
here
i
not
me
a
or
you
such
where
and
the

This is the code I have so far:

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <algorithm>
#include "Indexer.h"

typedef vector<string> docText;
typedef vector<string> skipWords;

using namespace std;

void insertDocVector()
{
    string fileName;
    docText docText;

    cout << "Please enter a document file name: ";
    cin >> fileName;

    ifstream inFile(fileName.c_str(), ios::in);

    if (!inFile)
    {
        cerr << "Error opening file '" << fileName << '\'' << endl;
        system("pause");
        exit(EXIT_FAILURE);
    }

    copy(istream_iterator<string>(inFile),
        istream_iterator<string>(),
        back_inserter(docText));

}

void insertSkipVector()
{
    skipWords toSkip;
    string fileName;

    cout << "Please enter a skip-words file  name: ";
    cin >> fileName;

    ifstream inFile(fileName.c_str(), ios::in);

    if (!inFile)
    {
        cerr << "Error opening file '" << fileName << '\'' << endl;
        system("pause");
        exit(EXIT_FAILURE);
    }

    copy(istream_iterator<string>(inFile),
        istream_iterator<string>(),
        back_inserter(toSkip));
}

int main()
{
    cout << "\t\t\t\t***The Indexer***\n\n";

    insertDocVector();
    cout << endl;
    insertSkipVector();
    cout << endl;



    system("pause");
    return 0;
}

The Indexer header is just going to announce my prototypes and that's it. I don't intend on using any constructors. All I have is just reading the files in and placing them into separate vectors.

Can someone offer a stepping stone? I'm good at figuring it out if people just ask me the right questions :) I'm not asking for code, just guidance. Thank you so much in advance!

Once you have the 2 vectors, one option is to iterate through the docVector and use the std::Find function to see if the word is in the skipVector, and remove it or replce it with place holder characters if necessary.
You will probably need to change the word to all lower case before running Find.

Thank you so much for that advice! So would I use something like this:

vector<string> docText;
typedef vector<string> skipWords;

docText docText;
skipWords toSkip;

string toBeRemoved = toSkip;
vec.erase(remove(vec.begin(), vec.end(), toSkip), vec.end());

?

Something like this would work

vector<string> doc;
vector<string> skipWords;

// fill doc and skipWords from there files.  When doing so convert each word to lower case.  Remove any puctuation as well.

//c++11
for(auto i : skipWords)
    doc.erase(remove(doc.begin(), doc.end(), i), vec.end());

Thank you for the response!

Do you think you could help me figure out one for the line, page, and word count?

I figured something sort of like this:

    int lineCount = 1;
    int pageCount = 1;
    int wordCount;

     while (!fin.eof())
      {
        string buf[MAX_CHARS_PER_LINE];
        fin.getline(buf, MAX_CHARS_PER_LINE);
        lineCount++;

        if(
        const char* token[MAX_TOKENS_PER_LINE] = {}; 

        token[0] = strtok(buf, DELIMITER); 
        if (token[0])
        {
          for (int wordCount = 1; n < MAX_TOKENS_PER_LINE; wordCount++)
          {
            token[wordCount] = strtok(0, DELIMITER); 
            if (!token[wordCount]) break;
          }
        }

I'm not sure if this is anywhere near the right way to do it...I have to seperate and output what's left of the doc vector in alphabetical order (which I would use sort() for then display where the word is located each time it shows.

You can do this much easier with the STL. Here is how I normally read a file by line and separate by word:

std ifstream fin ("file.txt");
std::vector<string> docWords;

std::string line;
// Get each line from the file.  Stops when there are no more lines
while(getline(fin, line))
{
    std::stringstream ss;
    // load line into the string stream
    ss << line
    std::string temp;
    // get each word out of ss using >> since each word is seperated by a " "
    while(ss >> temp)
    {
        docWords.push_back(temp);
        temp.clear();
    }
}

As you can see each time you loop in the main while loop that is another line of the file. To get the number of words you need only to call the size() function of the vector since each element in the vector is a word.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.