I have 10 large .txt files with over 13,000 rows of data. Each “column is separated with a comma.
I want to find the largest value in one of the columns for all the txt files. (By largest I mean greatest number of characters for a particular column).
I wanted to write something in java to find the largest value for the column. Any ideas on how I would go about doing this?

Thanks
sj.

Member Avatar for iamthwee

1. File i/o
2. Parse the data by virtue of their columns
3. Use the String length function to ascertain the longest word.

I'm having troble with the parsing part. What class should i use if i want to pull out the content between the the 4th and 5th comma of each row in my file?

Thanks

Member Avatar for iamthwee

Look up rthe string.split method.

Effectively you're using the comma as a delimiter to separate the line into tokens.

Then you can just count the commas to find the 4th or fifth token or just count the tokens.

Once you read in the data, go string by string and use StringTokenizer to get each element of the line, and depending on which token it is test it with the right data (meaning the data for the columns). So StringTokenizer toks = new StringTokenizer(line, ", ");

Member Avatar for iamthwee

I think that the general concensus is that string.split should be used over the StringTokeniser method. It has better functionality and is considered altogether better.

Perhaps its only downfall is the slight difference in speed, although this difference is very very negligible.

Thanks for the help. This is what i came up with.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class CSVFileParser
{

    public static void main(String[] args)
    {
        CSVFileParser csvFile = new CSVFileParser();
        csvFile.readCSVFile();
    }

    void readCSVFile()
    {
        String record = null;
        String currentWord = "";
        int recCount = 0;

        try
        {
            FileReader fileReader = new FileReader("myfile.txt");
            BufferedReader bufferedReader = new BufferedReader(fileReader);

            record = new String();
            while ((record = bufferedReader.readLine()) != null)
            {
                recCount++;
                String csvRow = record;
                String[] column = csvRow.split(",");
                boolean isValidData = false;
                for (int i = 0; i < column.length; i++)
                {
                    boolean isValidColumn = true;
                    // needed this for the following case: test1,test2,"test3,test,test,test",test4,test5
                    if (column[i].startsWith("\""))
                    {
                        isValidColumn = false;
                    }
                    
                    if (i == 4 && isValidColumn)
                    {
                        // If true don't need to bother with the else below.  Already have the data needed for the row.
                        isValidData = true;
                        String cellData = column[i];
                        if (cellData.length() > currentWord.length())
                        {
                            currentWord = cellData;
                        }
                    }
                    else
                    {
                        // needed this for the following case: test1,test2,"test3,test,test,test",test4,test5
                        if (column[i].endsWith("\"") && !isValidData)
                        {
                            String cellData = column[i + 1];
                            if (cellData.length() > currentWord.length())
                            {
                                currentWord = cellData;
                            }
                        }
                    }
                }
            }
            System.out.println(currentWord);
        }
        catch (IOException e)
        {
            System.out.println("error");
            e.printStackTrace();
        }
    }
}
Member Avatar for iamthwee

So does it do what you want?

Yep, it did the job.

I would have liked the program to loop through all the files, so that i didn't have to update the name of the file i was reading each time.

Also i could have used the split method to find quotes first, then the comma.
I think the condition checks could be simpler if i did it this way.

Member Avatar for iamthwee


I would have liked the program to loop through all the files, so that i didn't have to update the name of the file i was reading each time.

Have you had a look at this

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.