I'm using turbo C++. For my SIC assembler I need to read user the input assembly language program from a text file.Each line of the program contains a maximum of 3 words , LABEL OPCODE and OPERAND. LABEL or OPERAND or both can be absent by OPCODE is a must.
The following code of mine reads line by line from the input file using getline()
Now I want to split that line into those 3 strings. Also any number of spaces or tabs can be present in the line.
How do I use sscanf for this purpose?

int readinput(){
    c=0;
    char temp[40],tlabel[10],topcode[6],toperand[12];
    ifstream fin("inputfile2.txt",ios::in);
    int a=0,b=0,i=0;
    while(fin.getline(temp,30))
    {
        i=0;
        a=0;
        b=0;
        tlabel[0]  ='\0';
        topcode[0] ='\0';
        toperand[0]='\0';
       // here is where the splitting of  temp to label[if present] opcode and operand[if present] should take place
        cout<<tlabel<<"\t"<<topcode<<"\t"<<toperand<<"\n";
        strcpy(sc[c].label,tlabel);
        strcpy(sc[c].opcode,topcode);
        strcpy(sc[c].operand,toperand);
        c++;
     }
     fin.close();
     return 1;
}

How can you tell the difference between LABEL and OPCODE? If there are two words on a line, how do you know the first word is LABEL or OPCODE?

Here is a c++ solution:

#include<fstream>
#include<iostream>
#include<string>
#include<vector>
using namespace std;

bool is_label(string);
bool is_operand(string);

struct line_obj
{
     string label;
     string opcode;
     string operand;
}temp;

vector<line_obj> text_file;
string word;

ifstream infile("inputfile2.txt");

if(infile.is_open())
{
     cout << "\aFile failed to open.";
     cout << "\nFile may have been moved, renamed, or deleted.";
     cout << "\nPress [ENTER] to continue...";
     cin.get();
}

while(infile)
{
     infile >> word;
     if(is_label(word))
     {
          temp.label = word;
     }
     else
     {
          temp.label.clear();
     }
     infile >> word;
     if(is_operand(word))
     {
          temp.operand = word;
     }
     else 
     {
          temp.operand.clear();
     }
     infile >> temp.operand;

     text_file.push_back(temp);
}

infile.close();

//Display
for(int i=0, size=text_file.size(); i<size; i++)
{
     cout << text_file[i].label << ' ' << text_file[i].opcode << ' ' << text_file[i].operand << endl;
}

How can you tell the difference between LABEL and OPCODE? If there are two words on a line, how do you know the first word is LABEL or OPCODE?

The first word will be LABEL

The following is the format

LABEL OPCODE OPERAND

What you are looking to do is what is called tokenization, the process of breaking an input stream into tokens. Given the type of data you're working with, it shouldn't be very difficult. The usual solution is to go through the input character by character to find where the current token ends:

char* getToken(char* src, char* dest)
{
    while (!isspace(*src))
    {
        dest++ = src++;  
    }
    
    return src;   // return a pointer to the current position in the source string
}

This is a pretty generic approach to tokenizing; it isn't that different from what the standard function strtok() does, in fact, though with strtok() you can set different types of delimiters.

The tricky part comes when you go to interpret the meaning of the tokens. Since you need to be able to tell when something is a label, versus an mnemonic, versus an argument, you will want to have a first pass that does nothing but collect the labels (and their positions, which is the real main purpose of the first pass anyway) and put them in a table. You need to read in the first token of a line, and if it isn't an mnemonic, put it into the label table. If it is an mnemonic, you need to advance the position in the output file by one; you also need to check whether that mnemonic takes an argument or not, and if it does, check that there is another token on the line (the argument) - if the argument is missing, it's an error.

During the second pass, you would check each token to see if it is either in the table of mnemonics, or in the table of labels (what are collectively known as the symbol table). If it isn't in the symbol tables, it is an error. If it is in the label table, ignore it (you already have it's position) and move on to the next token. If it is an mnemonic, and it doesn't take an argument, emit the appropriate opcode to the output file and skip the rest of the line. If the mnemonic does take an argument, read that argument and use it's value to generate the opcode.

Here is a c++ solution:

#include<fstream>
#include<iostream>
#include<string>
#include<vector>
using namespace std;

struct line_obj
{
     string label;
     string opcode;
     string operand;
}temp;

vector<line_obj> text_file;

ifstream infile("inputfile2.txt");

if(infile.is_open())
{
     cout << "\aFile failed to open.";
     cout << "\nFile may have been moved, renamed, or deleted.";
     cout << "\nPress [ENTER] to continue...";
     cin.get();
}

while(infile)
{
     infile >> temp.label;
     infile >> temp.opcode;
     infile >> temp.operand;

     text_file.push_back(temp);
}

infile.close();

//Display
for(int i=0, size=text_file.size(); i<size; i++)
{
     cout << text_file[i].label << ' ' << text_file[i].opcode << ' ' << text_file[i].operand << endl;
}

This code wont work on TURBO C++ V3.

it's not a full working program.. it's a code snippet. but the basic algorithm should be enough to jog some of them marbles rollin' around in your noggin' and at some point you'll say, "ah, I see.. lemme try this approach.."

As opposed to where you are right now, which is, "GIMME TEH WORKING CODEZ!!!@!@###$!!!!"

You may find my Passim assembler useful as a guide for tokenizing and symbol table manipulation. While it is in C rather than C++, it should be familiar enough to make sense out of.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.