Hello everyone,

I am having some difficulty with file handling, I hope someone can help.
Here is the background:
1) I have to read a .dat file, file size varies from 692kb to 109.742kb with the following format of data:
eg :

ID|flag
1|y
2|y
3|y
4|y
4.37777451|y
5.52625317|y

2) i have to extract the data from the file and put them into strings for persisting.

The problem
I can copy the file and read the file.
The trouble I have is with speed.
I have used

1) BufferedReader in = new BufferedReader(new FileReader(copiedFile));
(which gave me an avg time of 25mins to process a 692k file)

2) FileInputStream fis = new FileInputStream(fileName);
(which gave avg time of 12 mins to process)

3) File file = new File(fileName); Scanner scanner = new Scanner(file);
(which gave avg time of 118mins to process)

what is the BEST and FASTEST way to do this?????
I am totally running out of options here. :(
please help.
sample code:

 public void meh() throws Exception {
        String DATE_FORMAT_NOW = "yyyy/MM/dd hh:mm:ss";
        SimpleDateFormat sdf = new SimpleDateFormat(DATE_FORMAT_NOW);
        long starttime = System.currentTimeMillis();
        Date timeNow = new Date();
        String timeS = sdf.format(timeNow);
        System.out.println(">> time of exe: " + timeS + " milliseconds: " + starttime);
        String fileLoc2 = props.getProperty(PROP_FILENAME);
        System.out.println(">>>> File Location: " + fileLoc2);
        String output = ">> ";
        String delim = "|";
        String idno;
        String flag;
        String line;
        int count = 0;
        try {
            FileInputStream fis = new FileInputStream(fileName);
            int cnt3 = 0;
            final int BUFSIZE = 1024;
            byte buf[] = new byte[BUFSIZE];
            int len;
            while ((len = fis.read(buf)) != -1) {
                for (int i = 0; i < len; i++) {
                    if (buf[i] == 'n') {
                        line = new String(buf);
                        for(StringTokenizer tk = new StringTokenizer(line,delim); tk.hasMoreTokens();){
                        idno = tk.nextToken();
                        if(tk.hasMoreTokens()){
                        flag = tk.nextToken();
                        }
                        System.out.println(">>id " + idno + ">>>> flg");
                        }
                        cnt3++;

                    }
                }
            }
            fis.close();

            System.out.println("row count " + cnt3);
            long elapsedTime = System.currentTimeMillis() - starttime;
            System.out.println(">>> elapsed time: " + elapsedTime);
        } catch (IOException e) {
            System.err.println(e);
        }

    } 

code 2:

public class read2 {
  String DATE_FORMAT_NOW = "yyyy/MM/dd hh:mm:ss";
        SimpleDateFormat sdf = new SimpleDateFormat(DATE_FORMAT_NOW);
        long starttime = System.currentTimeMillis();
        Date timeNow = new Date();
        String timeS = sdf.format(timeNow);
         int count = 0;
     void readFile(String fileName) {
        System.out.println(">> time of exe: " + timeS + " milliseconds: " + starttime);
        try {
            Scanner scanner = new Scanner(new File(fileName));
            scanner.useDelimiter(System.getProperty("line.separator"));
            while (scanner.hasNext()) {
                parseLine(scanner.next());
                count++;
            }
            scanner.close();
             System.out.println(">> rows: " + count);
              long elapsedTime = System.currentTimeMillis() - starttime;
        System.out.println(">>> elapsed time: " + elapsedTime);
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }

    private static void parseLine(String line) {
        Scanner lineScanner = new Scanner(line);
        lineScanner.useDelimiter("|");
        String idno = lineScanner.next();
        String flag = lineScanner.next();
        System.out.println(">> id: " + idno + " >> flag: " + flag);
    }

    public static void main(String[] args) {
          String fileName = "Id.dat";
     read2 r = new read2();
     r.readFile(fileName);
    }
}
package whatever;

public class Read {
    String DATE_FORMAT_NOW = "yyyy/MM/dd hh:mm:ss";
    SimpleDateFormat sdf = new SimpleDateFormat(DATE_FORMAT_NOW);
    long starttime = System.currentTimeMillis();
    String timeS = sdf.format(new Date(starttime));
    void readFile(String fileName) {
        System.out.println(">> time of exe: " + timeS + " milliseconds: " + starttime);
        BufferedReader br = null;
        try {
            br = new BufferedReader (new FileReader(fileName));
            int count = 0;
            String line = "";
            while ((line = br.readline()) != null) {
                String[] lineA = line.split("\\|");
                System.out.println(">> id: " + lineA[0]+ " >> flag: " + lineA[1]);
                count++;
            }
            System.out.println(">> rows: " + count);
            System.out.println(">>> elapsed time: " + (System.currentTimeMillis() - starttime));
        } catch (FileNotFoundException fnfe) {
            fnfe.printStackTrace();
        } catch (IOException ioe) {
            ioe.printStackTrace();
        } finally {
            if (br != null) try { br.close(); } catch (IOException ioe) {}
        }
    }

    public static void main(String[] args) {
        String fileName = "\\Id.dat";
        Read r = new Read();
        r.readFile(fileName);
    }
}

OMG!
THANK YOU masijade!!!
time down to 26secs!

THANK YOU SO MUCH!!!!!
*****if its not too much to ask, could you please tell me what I did wrong.
I see you are using the split(), is that what increased the speed?

*feel like a right idiot that i didnt even think of it .... :(

again, thank you!!!!

Partly, both StringTokenizer and Scanner (in this case, especially, when reading a file) are not the fastest tools around. StringTokenizer, for one, is all, but, deprecated and shouldn't really be used, but there are more problems with that than just speed. And Scanner is great for verifying data types before reading and for automatically converting the string to the datatype you need as you read it, but with that comes a fairly large performance hit.

The other part though, was the fact that both the Scanner and the FileInputStream (with the only 1024 buffer size) were making far too many disk reads. A Buffered(Reader/InputStream) will fill it's buffer with as few disk calls as possible (usually one if there were no problems, and it has, usually, an, at least, 8k buffer), then your read calls are made against this buffer (automatically removing line endings when using readLine).

commented: Nice work. +19

thank you so much for your help!

I knew that the stringbuffer wasnt the best way to go but since i have a tight deadline I just went with what i knew to be the quickest to implement.
I usually only work with xml files and i use jdom there. Working with legacy output has been a bit of strain for the past 2 days.

thank you so much for your help, i really appreciate it.
regards,
Miyuki

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.