How to read a text file that doesn't have constant formatting?

Question

weasel7711 0 Junior Poster in Training

14 Years Ago

I wrote a program to analyze a log file for a machine that my company repairs.
The program that runs the machine spits output into a text file (.log) and my program will analyze it and return the results of different calculations to the user.

The log file idealy looks like this most of the time

This is ideal formatting and USUALLY is the case. Now out of each of those 4 line "paragraphs" only the first and the last lines (eg >22 and 9000) are analyzed. The middle two lines are ignored. So the program that I wrote reads in 5 line "chunks" until the end of the file. It reads the first line, removes the ">" and stores the value into an int array. It then reads the next 3 lines and stores the last line read into another integer array. After reading is done, the arrays are analyzed and output is displayed to the user.

However, occasionally the program that interacts with the machine spits out random newlines and the formatting is different, like this:

So I am wondering if anyone could give input to me on how to write an algorithm that will read this data correctly regardless of the newlines. Is there a way to read a line and ignore blank lines?
I originally wrote this program in C++ but I made a C# GUI version so that it would be easier for the users to use.

Any ideas?
Thanks
-Weasel

algorithm file-system gui

3 Contributors
5 Replies
461 Views
6 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by weasel7711

All 5 Replies

Mitja Bonca 557 Nearly a Posting Maven

14 Years Ago

Tell me, which values you want to get from this upper example? In each part are these 2 numbers:
1. the number on the right side of th ">" mark, and
2. the number which is just above the next ">" mark?

So in this example:
>25
233966

300156
89980

you would like to get 25 and 89980. Am I right?

Edited 14 Years Ago by Mitja Bonca because: n/a

Mitja Bonca 557 Nearly a Posting Maven

14 Years Ago

This is something you would like to have I guess:

//creating a generci list for storing the wanted values
            List<int> list = new List<int>();

            using (StreamReader sr = new StreamReader(@"C:\1\test25.txt"))
            {
                string line;
                int value;
                int counter = 0;
                while ((line = sr.ReadLine()) != null)
                {
                    if (line.Contains(">"))
                    {
                        value = Convert.ToInt32(line.Remove(0, 1));
                        list.Add(value);
                    }
                    if (line != " ")
                        counter++;
                    if (counter == 4)
                    {
                        list.Add(Convert.ToInt32(line));
                        counter = 0;
                    }
                }
            }

What the code does, it to check if the line contains the ">" char. If it does, it addes the number beside the char to the array (in my case I used a generic list, whihc is way more appropriate to use then an array). Then if goes row by row forward.
Every this part consist of 4 NOT EMPTY rows. So there is a counter which counts all not empty rows (if row is empty there is no counting done). When counter reachers 4 (4th not empty row in the part) it add the number to the list again, and resets the couner. And story goes one form beginning.

I hope its understanadable enough.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Momerath 1,327 Nearly a Senior Poster Featured Poster · Answer 1 · 2011-04-12T20:20:38+00:00

using System;
using System.IO;

namespace TestBed {
    class TestBed {
        static void Main() {
            StreamReader sr = new StreamReader("Test.txt");
            String currentLine = null;
            String startLine = null;
            String lastGoodLine = null;
            Boolean lookingForStart = true;

            while (sr.EndOfStream == false) {
                if (lookingForStart) {
                    currentLine = sr.ReadLine();
                    if (currentLine.StartsWith(">")) {
                        startLine = currentLine;
                        lookingForStart = false;
                    }
                } else {
                    if (sr.Peek() == '>') {
                        Console.WriteLine("Start Line -> {0}{1}End Line ->{2}", startLine, Environment.NewLine, lastGoodLine);
                        lookingForStart = true;
                    } else {
                        currentLine = sr.ReadLine();
                        if (String.IsNullOrEmpty(currentLine.Trim()) == false) {
                            lastGoodLine = currentLine;
                        }
                    }
                }
            }

            if (lookingForStart == false) {
                Console.WriteLine("Start Line -> {0}{1}End Line ->{2}", startLine, Environment.NewLine, lastGoodLine);
            }

            Console.ReadLine();

        }
    }
}

weasel7711 0 Junior Poster in Training · Answer 2 · 2011-04-12T20:22:08+00:00

Tell me, which values you want to get from this upper example? In each part are these 2 numbers:
1. the number on the right side of th ">" mark, and
2. the number which is just above the next ">" mark?
So in this example:
>25
233966
300156
89980
you would like to get 25 and 89980. Am I right?

Yes that is correct.

weasel7711 0 Junior Poster in Training · Answer 3 · 2011-04-12T20:41:56+00:00

Thank you both, they both look like great algorithms. I will play around with my code and use your suggestions and let you know when I have gotten it to work. Thank you.

How to read a text file that doesn't have constant formatting?

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers