I am trying to write a lexical analyzer program that will read in input from a file and produce a formatted output. I need help as I am stuck. I am sure I am not passing the file stream, and the function parameters correctly. Do note some of the functions are only to help me check if I have created them correctly. I only want to make sure they can accept the parameters. I will come back later and create the guts of the program.

#include <cstdlib>
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

//function that scans the file, accepts the file stream and creates the lexeme
void gToken(fstream &fin, char lexeme);

//function that writes the output, accepts the lexeme as input
char wToken(char &lexeme);


int main()
{
    char filename[20];
    ifstream fin(filename); //input data file
        
    char q;
    
    
    string lexeme[20];
        
    cout << "Enter the name of the file you wish to run a lexical analysis on. "
            "If you type 'q', the program will terminate. "<<endl;
    cin >> filename;
    
    if(filename == "q")
    {
      cout <<"You have chosen to quit the program."<<endl;
      system("PAUSE");
      return EXIT_SUCCESS;
    }
    else
    {
      fin.open(filename);
    
        if(!fin.bad()) //checks to see if the file will open
        {
          std::cerr <<"Error. Could not open the datafile "<< filename  <<endl;
          exit(8);
        }
    }
    gToken(fin,lexeme); 
    fin.close();
    system("PAUSE");
    return EXIT_SUCCESS;
}

//reads characters from the file beginning at the first character and stops
//at spaces.  Writes everything into the lexeme
void gToken(fstream &fin, char lexeme)
{
  char ch;
  
  while (fin.open)
  {
       while(cin.get(ch) == ' ' || == '/t')//still working on the logic of this one
        {
          ch >> lexeme;  
          cout<<"the lexeme read is " <<lexeme<<endl;
          cout<<"this is part of the getToken function" <<endl;
        }
        
        
  }
  lexeme = wToken(lexeme); //is this even correct?

}

//takes the lexeme and matches it according to rules of the language
//rules not implemented yet
char writeToken(char &lexeme) 
{
  cout<<"the lexeme is "<< lexeme <<;
  cout<<"this is the output from writetoken"<<endl;
}

One of the things you are doing correctly is passing the file handle to the function. However, in main( ) you're making some mistakes with the filename variable.

Line 18 - you're declaring the filestream and initializing with the filename variable - but that variable has no value. You don't get an actual name from the user till later. So just declare fin and leave it at that.

Line 29 - you compare the input filename (a c-style string) with the literal string "q" using equality operator. Doesn't work. Either use a strcmp( ) function (several variants to choose from) or create filename as a string type instead of char array. (Then you'll need to use the c_str( ) method when you open the file)

Line 9 - your parameter for lexeme is a char type, but in main( ) lexeme is an array of strings. Pick one.

That's enough. You should have gotten many compiler errors from this - you should pay attention to them. Fix what you can, post your correction. Please explain what it is you're trying to do in the gToken( ).

I have fixed most of the errors but still have two left in the "gToken" function. The "gToken" function is to read the input file character by character and place everything into the string lexeme until it comes to a space or the end of the line. It will then start reading from where it stopped until it comes to the next space. I have tried to use fin.get(ch) to read by character but I do not know how to use it to handle the spaces.

#include <cstdlib>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

using namespace std;

//function that scans the file, accepts the file stream and creates the lexeme
void gToken(ifstream &fin, string lexeme[21]);

//function that writes the output, accepts the lexeme as input
void wToken(string &lexeme);


int main()
{
    char filename[20];
    ifstream fin; //file stream object
        
    char q;
    
    
    string lexeme[21];
        
    cout << "Enter the name of the file you wish to run a lexical analysis on. "
            "If you type 'q', the program will terminate. "<<endl;
    cin >> filename;
    
    if(strcmp(filename, "q"))
    {
      cout <<"You have chosen to quit the program."<<endl;
      system("PAUSE");
      return EXIT_SUCCESS;
    }
    else
    {
      fin.open(filename);
    
        if(!fin.bad()) //checks to see if the file will open
        {
          std::cerr <<"Error. Could not open the datafile "<< filename  <<endl;
          exit(8);
        }
    }
    gToken(fin, lexeme); 
    //writeToken(lexeme);
    fin.close();
    system("PAUSE");
    return EXIT_SUCCESS;
}

//reads characters from the file beginning at the first character and stops
//at spaces.  Writes everything into the lexeme
void gToken(ifstream &fin, string lexeme)
{
  char ch;
  
  while (!fin.eof())
  {
       while(fin.get(ch) != '\n' || fin.get(ch) != '\s') //still working on the logic of this one
        {
          ch >> lexeme;  
          cout<<"the lexeme read is " <<lexeme<<endl;
          cout<<"this is part of the getToken function" <<endl;
          wToken(lexeme);
        }
        
  }
  

}

//takes the lexeme and matches it according to rules of the language
//rules not implemented yet
void wToken(string &lexeme) 
{
  cout<<"the lexeme is "<< lexeme <<endl;
  cout<<"this is the output from writetoken"<<endl;
}

Please note the function wToken is just a place holder at this time. It will be done at a later date.

I think this will work. WARNING:code snippet not tested.

char ch;  
string lexeme = ""; 

//while a char is found
while (fin >> ch)
{  
   //concatenate the char onto lexeme   
   lexeme += ch;

   //get all the char until the next whitespace char or eof is found
   while(fin.get(ch))
   {
      if(ch != ' ' && ch != '\n') 
     {          
        lexeme += ch;
     }
     else  //whitespace char found
     {
        //display lexeme
        cout << lexeme << endl;

        //reset lexeme to empty
        lexeme = "";

        //break out of this loop 
        break;
      }
   }
}

Instead of looking for all whitespace char yourself you could pass ch to the isspace() function. BTW, this logic doesn't handle leading whitespace. If that's desired then you need a little something extra.

I am trying to test the code snippet Lerner created but when I compile I get the following error....

[Linker error] undefined reference to `gToken(std::basic_ifstream<char, std::char_traits<char> >&, std::string*)'
ld returned 1 exit status
C:\c-- lexical analyzer\Makefile.win [Build Error] ["c--] Error 1

Can anyone tell me just what this error means so I will know how to fix it.

#include <cstdlib>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

using namespace std;

//function that scans the file, accepts the file stream and creates the lexeme
void gToken(ifstream &fin, string lexeme[21]);

//function that writes the output, accepts the lexeme as input
void wToken(string &lexeme);


int main()
{
    char filename[20];
    ifstream fin; //file stream object
        
    char q;
    string lexeme[21];
        
    cout << "Enter the name of the file you wish to run a lexical analysis on. "
            "If you type 'q', the program will terminate. "<<endl;
    cin >> filename;
    
    if(strcmp(filename, "q"))
    {
      cout <<"You have chosen to quit the program."<<endl;
      system("PAUSE");
      return EXIT_SUCCESS;
    }
    else
    {
      fin.open(filename);
    
        if(!fin.bad()) //checks to see if the file will open
        {
          std::cerr <<"Error. Could not open the datafile "<< filename  <<endl;
          exit(8);
        }
    }
    
    gToken(fin, lexeme); 
    fin.close();
    system("PAUSE");
    return EXIT_SUCCESS;
}

//reads characters from the file beginning at the first character and stops
//at spaces.  Writes everything into the lexeme
void gToken(ifstream &fin, string lexeme)
{
   char ch;  
   lexeme = " "; 

   //while a char is found
   while (fin >> ch)
   {  
     //concatenate the char onto lexeme   
     lexeme += ch;

     //get all the char until the next whitespace char or eof is found
     while(fin.get(ch))
     {
       if(ch != ' ' && ch != '\n') 
       {          
         lexeme += ch;
         wToken(lexeme);
       }
       else  //whitespace char found
        {
          //display lexeme
          cout << lexeme << endl;
          //reset lexeme to empty
          lexeme = " ";
          //break out of this loop 
          break;
        }
      }
    }             
}        
//takes the lexeme and matches it according to rules of the language
//rules not implemented yet
void wToken(string &lexeme) 
{
  cout<<"the lexeme is "<< lexeme <<endl;
  cout<<"this is the output from writetoken"<<endl;
}

I am trying to test the code snippet Lerner created but when I compile I get the following error....

[Linker error] undefined reference to `gToken(std::basic_ifstream<char, std::char_traits<char> >&, std::string*)'
ld returned 1 exit status
C:\c-- lexical analyzer\Makefile.win [Build Error] ["c--] Error 1

Can anyone tell me just what this error means so I will know how to fix it.

#include <cstdlib>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

using namespace std;

//function that scans the file, accepts the file stream and creates the lexeme
void gToken(ifstream &fin, string lexeme[21]);

//function that writes the output, accepts the lexeme as input
void wToken(string &lexeme);


int main()
{
    char filename[20];
    ifstream fin; //file stream object
        
    char q;
    string lexeme[21];
        
    cout << "Enter the name of the file you wish to run a lexical analysis on. "
            "If you type 'q', the program will terminate. "<<endl;
    cin >> filename;
    
    if(strcmp(filename, "q"))
    {
      cout <<"You have chosen to quit the program."<<endl;
      system("PAUSE");
      return EXIT_SUCCESS;
    }
    else
    {
      fin.open(filename);
    
        if(!fin.bad()) //checks to see if the file will open
        {
          std::cerr <<"Error. Could not open the datafile "<< filename  <<endl;
          exit(8);
        }
    }
    
    gToken(fin, lexeme); 
    fin.close();
    system("PAUSE");
    return EXIT_SUCCESS;
}

//reads characters from the file beginning at the first character and stops
//at spaces.  Writes everything into the lexeme
void gToken(ifstream &fin, string lexeme)
{
   char ch;  
   lexeme = " "; 

   //while a char is found
   while (fin >> ch)
   {  
     //concatenate the char onto lexeme   
     lexeme += ch;

     //get all the char until the next whitespace char or eof is found
     while(fin.get(ch))
     {
       if(ch != ' ' && ch != '\n') 
       {          
         lexeme += ch;
         wToken(lexeme);
       }
       else  //whitespace char found
        {
          //display lexeme
          cout << lexeme << endl;
          //reset lexeme to empty
          lexeme = " ";
          //break out of this loop 
          break;
        }
      }
    }             
}        
//takes the lexeme and matches it according to rules of the language
//rules not implemented yet
void wToken(string &lexeme) 
{
  cout<<"the lexeme is "<< lexeme <<endl;
  cout<<"this is the output from writetoken"<<endl;
}

Your function prototype on line 10 and your function call on line 45 refer to an array of strings. The function itself on line 53 wants a single string, not an array of strings, so there is a mismatch.

I have made some changes to the program but am having problems with manipulating the string. For the test, I have used Oreilly, C++ In a Nutshell code to try and determine if the string is a floating point number. I am certain that is where the problem is. What am I doing wrong in the code?

For string literals and character literals (single character), I only need to match the beginning " or ' with an ending " or '. I have seen many string functions (find,rfind, find_first_of, find_last_of) but am not sure which to use or how?

#include <cstdlib>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

using namespace std;

//function that scans the file, accepts the file stream and creates the lexeme
void gToken(ifstream &fin, string lexeme);

//function that writes the output, accepts the lexeme as input
void wToken();


int main()
{
    char filename[30];
    ifstream fin; //file stream object
        
    char q;
    string lexeme;
        
    cout << "Enter the name of the file you wish to run a lexical analysis on. "
            "If you type 'q', the program will terminate. "<<endl;
    cin >> filename;
    
    if(strcmp(filename, "q")== 0)
    {
      cout <<"You have chosen to quit the program."<<endl;
      system("PAUSE");
      return EXIT_SUCCESS;
    }
      //else if(fin.open(filename));
      //{
       //  if(!fin.bad()) //checks to see if the file will open
       //  {
       //    std::cerr <<"Error. Could not open the datafile "<< filename  <<endl;
       //    exit(8);
       //  }
      //}
         else(fin.open(filename));
         {
           wToken();
           gToken(fin, lexeme); 
           fin.close();
           system("PAUSE");
           return EXIT_SUCCESS;
         }
}

//reads characters from the file beginning at the first character and stops
//at spaces.  Writes everything into the string lexeme and begins tokenizing.
void gToken(ifstream &fin, string lexeme)
{
   // the following strings contain all of the characters that will make up the 
   // language
   const string lower("abcdefghijklmnopqrstuvwxyz");
   const string upper("ABCDEFGHIJKLMNOPQRSTUVWXYZ");
   const string letters = lower + upper + '_';
   const string digits("0123456789");
   const string identifiers = letters + digits;
   
   char ch;  
   lexeme = " "; 

   //while a char is found
   while (fin >> ch)
   {  
     //concatenate the char onto lexeme   
     lexeme += ch;

     //get all the char until the next whitespace char or eof is found
     while(fin.get(ch))
     {
       if(ch != ' ' && ch != '\n') 
       {          
         lexeme += ch;
         
       }
       else  //whitespace char found
        {
          string::size_type pos;
          
          //checks for floating point number
          if(lexeme[pos] == '.')
          {
            pos = lexeme.find_first_not_of(digits, pos+1);
            if (pos == string::npos)
                cout<<lexeme<<"       FLOLIT     "<<"    40    "<< lexeme<<endl;
          }
          //put tokenizing code here
          //display lexeme
          cout <<"the  lexeme is"<< lexeme << endl;
          cout<<"this is the else statement in the gtoken function"<<endl;
          //reset lexeme to empty
          lexeme = " ";
          //break out of this loop 
          break;
        }
      }
        //wToken(lexeme);
    }             
}        

//creates the header for the program output
void wToken() 
{
 cout<<"Lexeme     "<<"Token     "<<"Token #     "<<"Value/Name    "<<endl;
}

I'm not completely clear what you are trying to do. Is it this? Given a string, you are trying to extract a substring from it that is surrounded by quotes? So for the following string:

this is outside quotes "this is inside" outside again

You want to isolate the subtring:

this is inside

But you are not sure how to do it? I'm also a little confused about what gToken is supposed to accomplish. It's a void function, there are no global variables, and the string lexeme is not passed by reference. It's passed by value, but it's uninitialized in main, so you're passing a string by value to a function, then immediately initializing that string to something else inside the function, and then you are manipulating that string inside the function. All of these manipulations won't make it back to main, so why are you passing a string to gToken. It doesn't appear to do anything with the parameter passed to it. Or am I missing something?

I'm not completely clear what you are trying to do. Is it this? Given a string, you are trying to extract a substring from it that is surrounded by quotes? So for the following string:

this is outside quotes "this is inside" outside again

You want to isolate the subtring:

this is inside

But you are not sure how to do it? I'm also a little confused about what gToken is supposed to accomplish. It's a void function, there are no global variables, and the string lexeme is not passed by reference. It's passed by value, but it's uninitialized in main, so you're passing a string by value to a function, then immediately initializing that string to something else inside the function, and then you are manipulating that string inside the function. All of these manipulations won't make it back to main, so why are you passing a string to gToken. It doesn't appear to do anything with the parameter passed to it. Or am I missing something?

Ok, let me try to clarify. gToken takes the input from the file (fin) and reads it into the string lexeme which is created only by gToken. It does so by reading by character (not by line) until it comes to a space. The space is the end of the word. gToken will then take lexeme and compare it to different tokens (reserved words, punctuation marks, etc) and write the output to the screen. The comparison to the tokens has not been implemented in the code yet as I am struggling on the code for recognizing strings. Does this help?

I have tried to simplify the code a bit and remove any of the confusion. I have removed code I will not be using and am now going to try a (hopefully) simpler method to match my strings.

#include <cstdlib>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

using namespace std;

//function that scans the file, accepts the file stream and creates the lexeme
void gToken(ifstream &fin, string lexeme);
//function that writes the output, accepts the lexeme as input
void wToken();


int main()
{
    char filename[30];
    ifstream fin; //file stream object
        
    char q;
    string lexeme;
        
    cout << "Enter the name of the file you wish to run a lexical analysis on." 
            "If you type 'q', the program will terminate. "<<endl;
    cin >> filename;
    
    if(stricmp(filename, "q")== 0)
    {
      cout <<"You have chosen to quit the program."<<endl;
      system("PAUSE");
      return EXIT_SUCCESS;
    }
    else(fin.open(filename));
      {
        wToken();
        gToken(fin, lexeme); 
        fin.close();
        system("PAUSE");
        return EXIT_SUCCESS;
      }
}

//reads characters from the file beginning at the first character and stops
//at spaces.  Writes everything into the string lexeme and begins tokenizing.
void gToken(ifstream &fin, string lexeme)
{
   
   char ch;  
   lexeme = " "; 

   //while a char is found
   while (fin >> ch)
   {  
     //concatenate the char onto lexeme   
     lexeme += ch;

     //get all the char until the next whitespace char or eof is found
     while(fin.get(ch))
     {
       if(ch != ' ' && ch != '\n') 
       {          
         lexeme += ch;
         
       }
       else  //whitespace char found
        {
          //compares the lexeme to a given token and prints out the results
          if(stricmp(lexeme.c_str(), "word"))
          cout<<lexeme<<"        reserved   "<<"     26       "<< lexeme<<endl;
           else if(stricmp(lexeme.c_str(), "char"))
             cout<<lexeme<<"     character    "<<"    27   "<<lexeme<<endl;
              else
               cout<< lexeme<<"  invalid "<<endl; 
         
          //reset lexeme to empty
          lexeme = " ";
          //break out of this loop 
          break;
        }
      }
     }             
}        

//creates the header for the program output
void wToken() 
{
 cout<<endl<<"Lexeme        "<<"Token     "<<"  Token #    "<<"Value/Name    "<<endl;
}

This code compiles but the output is not correct. Here is the output...

Enter the name of the file you wish to run a lexical analysis on.If you type 'q'
, the program will terminate.
test.txt

Lexeme Token Token # Value/Name
char reserved 26 char
Word reserved 26 Word
po reserved 26 po
Press any key to continue . . .

So far, everything is correct until you get to the actual output. Lines 66-79 produce the output I want and they are giving me the correct lexeme and value but it appears they are not outputting the correct Token and Token #. The output is a simple cout statement so why is the loop skipping them?

I have tried to simplify the code a bit and remove any of the confusion. I have removed code I will not be using and am now going to try a (hopefully) simpler method to match my strings.

#include <cstdlib>
#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

using namespace std;

//function that scans the file, accepts the file stream and creates the lexeme
void gToken(ifstream &fin, string lexeme);
//function that writes the output, accepts the lexeme as input
void wToken();


int main()
{
    char filename[30];
    ifstream fin; //file stream object
        
    char q;
    string lexeme;
        
    cout << "Enter the name of the file you wish to run a lexical analysis on." 
            "If you type 'q', the program will terminate. "<<endl;
    cin >> filename;
    
    if(stricmp(filename, "q")== 0)
    {
      cout <<"You have chosen to quit the program."<<endl;
      system("PAUSE");
      return EXIT_SUCCESS;
    }
    else(fin.open(filename));
      {
        wToken();
        gToken(fin, lexeme); 
        fin.close();
        system("PAUSE");
        return EXIT_SUCCESS;
      }
}

//reads characters from the file beginning at the first character and stops
//at spaces.  Writes everything into the string lexeme and begins tokenizing.
void gToken(ifstream &fin, string lexeme)
{
   
   char ch;  
   lexeme = " "; 

   //while a char is found
   while (fin >> ch)
   {  
     //concatenate the char onto lexeme   
     lexeme += ch;

     //get all the char until the next whitespace char or eof is found
     while(fin.get(ch))
     {
       if(ch != ' ' && ch != '\n') 
       {          
         lexeme += ch;
         
       }
       else  //whitespace char found
        {
          //compares the lexeme to a given token and prints out the results
          if(stricmp(lexeme.c_str(), "word"))
          cout<<lexeme<<"        reserved   "<<"     26       "<< lexeme<<endl;
           else if(stricmp(lexeme.c_str(), "char"))
             cout<<lexeme<<"     character    "<<"    27   "<<lexeme<<endl;
              else
               cout<< lexeme<<"  invalid "<<endl; 
         
          //reset lexeme to empty
          lexeme = " ";
          //break out of this loop 
          break;
        }
      }
     }             
}        

//creates the header for the program output
void wToken() 
{
 cout<<endl<<"Lexeme        "<<"Token     "<<"  Token #    "<<"Value/Name    "<<endl;
}

This code compiles but the output is not correct. Here is the output...

Enter the name of the file you wish to run a lexical analysis on.If you type 'q'
, the program will terminate.
test.txt

Lexeme Token Token # Value/Name
char reserved 26 char
Word reserved 26 Word
po reserved 26 po
Press any key to continue . . .

So far, everything is correct until you get to the actual output. Lines 66-79 produce the output I want and they are giving me the correct lexeme and value but it appears they are not outputting the correct Token and Token #. The output is a simple cout statement so why is the loop skipping them?

lexeme is a string, so I would use the "compare" function from the string library.
http://www.cplusplus.com/reference/string/string/compare.html

I would use "compare" function in lines 68 and 70 instead of "stricmp" (as far as I can tell, this function doesn't exist. Do you mean "strcmp"?)

if(lexeme.compare ("word") == 0)
     cout << "The lexeme is word." << endl;
else if (lexeme.compare ("char") == 0)
     cout << "The lexeme is char." << endl;

In line 49, you are initializing lexeme to a blank space. That blank space stays in the string. You are later comparing lexeme to words that don't have blank spaces in them. "char" and " char" (with a leading space) will be considered different strings. I imagine you want them to be the same, so I would initialize lexeme to be the empty string in line 49, not a blank space:

lexeme = "";

Thanks for the link to the information on strings. That has really been helpful. As for the stricmp function, it removes case when matching strings. For example, when comparing the strings "word" and "Word" it would say they are both the same. I might still use the compare function but I will then need to find a new way to remove case sensitivity from the comparisons.

Thanks for the link to the information on strings. That has really been helpful. As for the stricmp function, it removes case when matching strings. For example, when comparing the strings "word" and "Word" it would say they are both the same. I might still use the compare function but I will then need to find a new way to remove case sensitivity from the comparisons.

Hmm, I didn't see stricmp on cplusplus.com. Maybe I need to broaden my horizons. What library is it from? As far as the "case" of the lexeme goes, I don't know if you can assume lexeme is all letters or not, but if you can and you want case insensitivity, you could do this:

for (int i = 0; i < lexeme.length (); i++)
     lexeme[i] = tolower(lexeme[i]);

lexeme would now be all lower-case so you could compare it to "char" and "word", as in my previous post after running the short code above, and that would remove the case-sensitivity problem. tolower is in the cctype library so you would have to include that.

Sorry for the long reply. Stricmp is part of the string library but is not a part of ANSI C++. I got the function from reading other posts in the C++ Forum.

Do you have your final code that you ended up with? It would help me out greatly to see it.

Unfortunately I do not have the code with me. It is currently located in another state. Sorry. But if you have any questions about it I might be able to answer them.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.