Stripping End-Of-Line characters from string

Question

TheWolverine 0 Junior Poster in Training

13 Years Ago

Hi all,

I'm busy writing a generic textfile reader class and I'm struggling to write the code to deal correctly with end-of-line (EOL) characters for Mac, Linux and Windows.

I've done a fair bit of reading on the issue and I came up with the following function within my TextFileReader class to strip EOL characters once I've read the contents of a textfile using getline( ) and stored the strings in a map.

//! Strip End-Of-Line characters.
void TextFileReader::stripEndOfLineCharacters( )
{
    // Search through container of data and remove newline characters.
    string::size_type stringPosition_ = 0;
    string searchString_ = "\r";
    string replaceString_ = "";

    for ( unsigned int i = 0; i < 1; i++ )
    {
        for ( iteratorContainerOfDataFromFile_
              = containerOfDataFromFile_.begin( );
              iteratorContainerOfDataFromFile_
              != containerOfDataFromFile_.end( );
              iteratorContainerOfDataFromFile_++ )
        {
            while ( ( stringPosition_ = iteratorContainerOfDataFromFile_
                      ->second.find( searchString_,
                                     stringPosition_ ) ) != string::npos )
            {
                // Replace search string with replace string.
                iteratorContainerOfDataFromFile_->second
                        .replace( stringPosition_, searchString_.size( ),
                                  replaceString_ );

                // Advance string position.
                stringPosition_++;
            }
        }

        // Switch search string.
        searchString_ = "\n";
    }
}

I thought that this would eliminate all EOL characters cross-platform but that doesn't seem to be the case. It works fine on my Mac, running Mac OS 10.5.8. It doesn't seem to work on Windows systems though. Strangely, on Windows systems running this function on strips the EOL character for the first string in the map and the rest of them are still one character too long.

This leads me to thinking that maybe I can't just replace the "\r" and "\n" characters, but everything I read suggests that it's the combination of the two that Windows uses to represent EOL characters.

Any, maybe there's a better way of doing this.

I'd really appreciate any input as this is a really small bug that's preventing me from moving on.

Thanks a lot in advance!

Kartik

c++

4 Contributors
12 Replies
563 Views
1 Week Discussion Span
Latest Post 13 Years Ago Latest Post by TheWolverine

WaltP 2,905 Posting Sage w/ dash of thyme

13 Years Ago

This leads me to thinking that maybe I can't just replace the "\r" and "\n" characters, but everything I read suggests that it's the combination of the two that Windows uses to represent EOL characters.

Correct...
Windows uses <CR><LF> combination
Mac uses <CR> alone
*nix uses <LF> alone

WaltP 2,905 Posting Sage w/ dash of thyme

13 Years Ago

That completely depends on what you really want the string to look like when you're done.
If your string is ABCDE\n98765\nalpha-omega\n , what do you want it to contain when done?

WaltP 2,905 Posting Sage w/ dash of thyme

13 Years Ago

If \n is the EOL character, how can it not be the EOL character? "if '\n' is a legitimate part of the string data" is meaningless if it's defined as an EOL character. You need to decide how you define what. And since you didn't really answer my question, all I can say whatever you are doing must be wrong and I don't know what you ultimately want.

Your description makes cursory sense but when we get down to the little details, you try to describe (rather than show) in a confusing and detailless way what you want.

Ever hear the adage "a picture is worth a thousand words?"

Edited 13 Years Ago by WaltP because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TheWolverine 0 Junior Poster in Training · Answer 1 · 2011-05-11T02:37:15+00:00

Correct...
Windows uses <CR><LF> combination
Mac uses <CR> alone
*nix uses <LF> alone

Thanks for confirming that. So am I right in then thinking that searching and replacing the "\r" and "\n" characters in each string should be a cross-platform way of dealing with stripping the EOL characters? If so, any ideas why my code might be acting strangely?

Thanks,

Kartik

TheWolverine 0 Junior Poster in Training · Answer 2 · 2011-05-11T03:24:49+00:00

That completely depends on what you really want the string to look like when you're done.
If your string is ABCDE\n98765\nalpha-omega\n , what do you want it to contain when done?

In short, all I want to be able to do is to read in textfiles, generated on windows machines, macs, and unix systems in the form of strings, so that I can then use the string data. I want to have a textfile reader that can read in files with the same content, generated on the three different systems, and have the map container that contains the string data read in to be identical in all three cases when reading in the same text file.

At the moment that's not the case because of the EOL characters. So the function I wrote was to strip those EOL characters so that the string data is all the same regardless of the platform.

In the example that you quote, if "\n" is a legitimate part of the string data, then I would not want that to be stripped. I only want the "\n" to be stripped if it's an EOL character. The same goes for "\r".

Right now I have a text file that contains the following:

This is line 1.

This contains 15 characters, which is what the input string data should contain.

When I read the file in on my mac, it has 16 characters, and I used the function I posted to get rid of the EOL character so that the input string data matches the above.

On a Windows system that isn't working however, and even after I run the function, I'm left with a string with 16 characters, which means that the string data does not match the above.

So my question is just how I should go about ensuring that I consistently read in the same string data irrespective of the platform on which the text file was created.

Thanks in advance,

Kartik

TheWolverine 0 Junior Poster in Training · Answer 3 · 2011-05-11T04:47:12+00:00

If n is the EOL character, how can it [B]not[/B] be the EOL character? "[I]if 'n' is a legitimate part of the string data[/I]" is meaningless if it's defined as an EOL character. You need to decide how you define what. And since you didn't really answer my question, all I can say whatever you are doing must be wrong and I don't know what you ultimately want.

Your description makes cursory sense but when we get down to the little details, you try to describe (rather than show) in a confusing and detailless way what you want.

Ever hear the adage "a picture is worth a thousand words?"

I hope this will make it clear.

I have two textfiles called testFileMadeWithWindows.txt and testFileMadeWithMac.txt.

Open the first file with Notepad on a Windows machine and it contains the follows.

This is line 1.
This is line 2.
This is line 3.

Open the second file with TextEdit on a Mac and it contains the follows.

This is line 1.
This is line 2.
This is line 3.

In other words, the file content of both files is intended to be identical.

I want to read both these files using my FileReader class and store the strings in maps.

To achieve this I use the getline() function.

When I read in testFileMadeWithWindows.txt using getline( ), it turns out that the string sizes are as follows:

16
16
15

Similarly, when I read in testFileMadeWithMac.txt using getline( ), it turns out that the string sizes are as follows:

16
16
15

I now execute the stripEndOfLineCharacters( ) function that I posted in my first post on maps containing this data.

For testFileMadeWithWindows.txt this results in the following string sizes:

15
16
15

For testFileMadeWithMac.txt this results in the following string sizes:

15
15
15

I use string::compare to compare the strings I have read in from the textfiles with the expected string data, which should be:

This is line 1.
This is line 2.
This is line 3.

The Windows comparison fails, specifically the comparison with the second line fails.

The Mac comparison is successful for all three strings.

I would like to know how to solve this such that the Windows comparison is successful too.

I hope this is sufficiently clear.

Thanks,

Kartik

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 4 · 2011-05-11T04:56:43+00:00

Maybe you would benefit from reading how universal newline support is implemented in Python (or you can use it yourself)?

WaltP 2,905 Posting Sage w/ dash of thyme Team Colleague · Answer 5 · 2011-05-11T06:52:12+00:00

Just test the end of the string for 0x0A and replace it with 0x00. Then test again for 0x0D and replace with 0x00.

TheWolverine 0 Junior Poster in Training · Answer 6 · 2011-05-21T04:34:38+00:00

Just test the end of the string for 0x0A and replace it with 0x00. Then test again for 0x0D and replace with 0x00.

Unfortunately, this doesn't seem to work. I've tried this and various other combinations and I'm still stumped.

If anyone has any other tips on how to strip end-of-line characters from strings read in from a text file using getline(), I'd appreciate your advice.

Thanks,

Kartik

Duoas 1,025 Postaholic Featured Poster · Answer 7 · 2011-05-21T09:04:55+00:00

If I'm using a Mac, how am I supposed to put a line in the file that contains a '\r' (without it being treated as an EOL)?

WaltP 2,905 Posting Sage w/ dash of thyme Team Colleague · Answer 8 · 2011-05-21T09:14:26+00:00

Just test the end of the string for 0x0A and replace it with 0x00. Then test again for 0x0D and replace with 0x00.
Unfortunately, this doesn't seem to work. I've tried this and various other combinations and I'm still stumped.

Then you probably did it wrong. Can't help if we can't see what you tried.

TheWolverine 0 Junior Poster in Training · Answer 9 · 2011-05-21T10:18:27+00:00

Then you probably did it wrong. Can't help if we can't see what you tried.

I did show you precisely what I was doing in the first post. I replaced the searchString_ variable with the hexadecimal equivalents.

In any case, I managed to solve it now by checking instead for the integer equivalents (10 and 13), which for some reason works perfectly well.

So for anyone stuck with a similar problem, I'd suggest trying the integer equivalents.

Kartik