Help with parsing strings of ints

Question

jmoran19 0 Newbie Poster

14 Years Ago

Hi, I'm looking for help writing a program to check an input file for errors in its pattern.

Basically, I've got a file of thousands of lines like this:
"70 20000731 210202 19 36005 354 55.369 -37.207 -54 0.847 491 0.981 0.985 278977"

I have no trouble opening the file and reading the first line into a string called line. What I need is access to the individual integers/floats for a few simple calculations, and also to check that all 14 entries are present.

I'm trying to make a function to which I'll pass the string of the line, which will break it up and perform the checks I need. The big thing I'm missing here is the correct way to parse a string and turn the values into... an array of variables? Or something more efficient I'm not even thinking of?

Thanks for any tips in the right direction. I am running in circles.

c++

4 Contributors
4 Replies
110 Views
1 Day Discussion Span
Latest Post 14 Years Ago Latest Post by jmoran19

Clinton Portis 211 Practically a Posting Shark

14 Years Ago

The big thing I'm missing here is the correct way to parse a string and turn the values into... an array of variables? Or something more efficient I'm not even thinking of?

Very good observation, parsing your file-string into an array of substrings. Yes, there is also a way to turn those strings into numbers you can use to perform math calculations.

First, let's talk about parsing a <string> class object.

I have recently posted an in-depth method to parsing strings using the strtok() function from <cstring>, but since we are using <string> class objects, let's benefit from the cool stuff that comes with string objects.

Here is a list of all the stuff packed inside of a string.

Here is my favorite way to get individual stuff out of a string. I like to go right to the find_first_of() string member.

As you can see, find_first_of() is well overloaded to fit your needs.

size_t find_first_of ( const string& str, size_t pos = 0 ) const;
size_t find_first_of ( const char* s, size_t pos, size_t n ) const;
size_t find_first_of ( const char* s, size_t pos = 0 ) const;
size_t find_first_of ( char c, size_t pos = 0 ) const;

The version I like to use above is in green. Simply supply a char and tell it where to start (or resume) in your traversal through the source string. I like using find_first_of() because you can use multiple delimeters (however, in this example we will only be using one which would make find() a better choice.)

In this code, we will use the ' ' white space as a delimeter, return all words located in between white spaces:

#include<string>
string substrings[100];
int prev_pos=-1, curr_pos=0;

int i=0;
while(i < document.size())
{
     prev_pos = curr_pos;
     curr_pos = document.find_first_of(' ', curr_pos+1);     
     substrings[i] =  document.substr(prev_pos+1, curr_pos-1);
     i++;
}

The code above traverses through a string and looks for white spaces, keeping track of where it detected it's last delimeter (prev_pos) and when it encounters its next delimeter (cur_pos). One assumption with the code above is that only one white space exists between words in the document string. One could correct for this algorithmically by deleting any existance of extra white spaces.

When a delimeter is detected (in this case, a white space) the prev_pos and curr_pos counters are adjusted by one element to come off the white spaces and provde the starting and ending element positions of the substring. (the -1 initialization of prev_pos is designed to account for the first loop iteration, where the first element of the first word will be document[0]).

The above code uses another <string> member called substr().

Another option not discussed here is to perform array operations directly on your string. You could use function from <cctype> like isalpha() or isspace() as you loop through each character of the string.

You have now successfully parsed the document string into an array of substrings[] (tokens). Now all you have to is turn the strings into numbers.

Here is a tutorial on using the <sstream> library to make your string-to-int conversions.

#include<sstream>

double numbers[100];

for(int i=0; i < substrings.size(); i++)
{

     istringstream str_to_int(substrings[i]);
     str_to_int >> numbers[i];
}

There you have it, parsing a document string into an array of substrings... and converting those strings into an array of numbers.

Oh yeah, all this code is untested, I just wrote it off the top of me' head (don't have a compiler on this laptop) so if there are any errors let me know.

Also, if anyone knows a better way to parse a <string> object, let me know (I mostly use the find_first_of() with substr() method)

Edited 14 Years Ago by Clinton Portis because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mrnutty 761 Senior Poster · Answer 1 · 2009-10-27T11:10:24+00:00

Just to add, another way to prase string using sstream :

void praseString(string& src, float dst[], int size){
	stringstream convert;
	convert << src;
	for(int i = 0; i < size; i++)	
		convert >> dst[i];	
}

dkalita 110 Posting Pro in Training · Answer 2 · 2009-10-27T11:21:45+00:00

Below is an idea that u can apply. It is not the complete implementation. U can use this approach may be with some minute modification.

typedef enum
{_INT, _FLOAT, _OTHER} DataType;

typedef struct
{
       DataType dataType;
       union
       {
               int intData;
               float flData;
       } data;
}Data;

DataType getType(char *)
{
     /*implement yourself*/
     /*should return the datatype*/
}
float getFloat(char *)
{
     /*implement yourself*/
     /*should return the float equivalent of the string*/
}

vector<Data> dataList;

/*processing single line*/
DataType _type;
Data tmp;
char *tok = strtok(line, " ");
while(tok)
{
          _type = getType(tok);
          if(_type == _INT)
          {
                tmp.dataType = _INT;
                tmp.data.intData = atoi(tok); 
          }
          else if(_type == _FLOAT)
          {
                tmp.dataType = _FLOAT;
                tmp.data.flData = getFloat(tok); 
          }
          else
          {
                /*ERROR TYPE....HANDLE ERROR*/
           }
          dataList.push_back(tmp);
          tok = strtok(NULL, " ");
}

This was just one way. There are more than one way to do this job.

jmoran19 0 Newbie Poster · Answer 3 · 2009-10-28T04:51:27+00:00

Thanks guys, I wound up going with FirstPerson's code, it worked the best given the rest of the work I'd already done on the problem. Many thanks, again.