Hi all, I'm writing a simple encryption algorithm and have stumbled upon a problem relating to the file handling itself.

What my program does is reads text from a file, encrypts it, and writes it to another file. That's all well and good, and it works fine - until a large file is used for input. I tested it with a 30MB file and 300MB file, and got an 'out of memory error - no space on heap' or something like that.

I imagine that this is caused due to the way I'm handling the file.

To solve it, I decided to use a BufferedReader to read a character at a time using the 'read()' method. After the character is read, it is encrypted and then written to a file using the BufferedWriter's 'write()' method.

This method works for large files (i tested it on a 300MB file, and although it took around 2 minutes to finish; it worked).

My main concern is that like this, the harddisk is constantly being accessed for every single character - I'm sure it's highly inefficient.

Can anyone suggest something I can do to improve efficiency? Please note that when I used 'readLine()' instead of 'read()', i got the java heap error, so im guessing i have to read a character at a time.

Also, i cannot post code as this is for an assignment - just some guidelines/suggestions would suffice and would be greatly appreciated :)

Query also posted here: http://www.java-forums.org/new-java/36531-best-way-solve-encryption-file-handling-problem.html#post165491

How about you can use BufferOutputStream? You could build a byte array at whatever size you want. Then, use BufferOutputStream to write out each time. For example, your byte array size is 4kb, you would write out 4kb each time instead of 1 character at a time.

How about you can use BufferOutputStream? You could build a byte array at whatever size you want. Then, use BufferOutputStream to write out each time. For example, your byte array size is 4kb, you would write out 4kb each time instead of 1 character at a time.

Thanks for the reply. Actually it has to use the BufferedReader/Writer strictly.

This is what I thought of doing:

while (a character is read from file and buffered via the BufferedReader) {

encrypt the character

write the character to file

}

What do you think?

I just look at their API, you could read in and write out using a char array. Adapting with your way, you could read in as a char array with a certain size, then encrypt all the char in the array, and then write the whole array out to the file. Then rinse the array and repeat. Would that be better for you to reduce the HD access?

My main concern is that like this, the harddisk is constantly being accessed for every single character - I'm sure it's highly inefficient.

Yes, it sure is.

Please note that when I used 'readLine()' instead of 'read()', i got the java heap error, so im guessing i have to read a character at a time.

Something's fishy here. The default buffer size of Buffered streams is AFAIK 8KB. Plus if you are looping over the input stream reading the bytes, encrypting them and writing them to a file, garbage collection should ensure that the bytes previously read are collected before throwing an OOME. Are you sure you are not keeping references to previously read data?

Actually it has to use the BufferedReader/Writer strictly.

Again fishy. The encryption algorithm doesn't know/shouldn't know the kind of content it is encrypting and hence it should have been BufferInputStream and BufferedOutputStream instead of its Reader/Writer counterparts unless you are using some sort of special encryption algorithm which operates strictly on textual data. :-)

Yes it's purely textual data - just for an assignment ;) with the current code it takes around 8 minutes to encrypt a 600mb text file using a simple substitution cipher. What do you think, is it decent?

Yes it's purely textual data - just for an assignment ;) with the current code it takes around 8 minutes to encrypt a 600mb text file using a simple substitution cipher. What do you think, is it decent?

Nope; because for a simple substitution cipher like Caesar cipher, I can encrypt a 600 MB file in 40 seconds. :-)

Are you still reading the file character by character since that would explain your timings?

I think you are getting OutOfMemoryException when using readLine() because your sample 600MB file does not contain any newline and hence the Reader tries to read the entire file content in a single String. The most effective solution here would be to use FileReader/FileWriter and implement your own buffering (32KB buffer would be a good one).

This is what im doing:

while ((inp.read()) != -1) { 

encrypt;
writeToFile;

}

Any thoughts?

Is that a single character being read? If yes, that method is painfully slow. Like already mentioned, if readLine() throws OOME, it is possible that your entire file contains a single line. In that case, just use the read() method to read a specific number of characters rather than an entire line. 8Kb char buffer would be a good start.

Thanks for the reply. How exactly do i set it to read 8kb though?

A sample snippet:

private void process(final Reader reader, final Writer writer) {
    try {
        final char[] cbuf = new char[8 * 1024];
        int len = -1;
        while((len = reader.read(cbuf)) != -1) {
            // translate is your method which takes a string and translates/encrypts it
            writer.write(translate(new String(cbuf, 0, len)));
        }
    } catch(final Exception e) {
        throw new RuntimeException(e);
    }
}

Given that the buffering is done by the method, you need not even use a Buffered reader/writer; a File reader/writer should suffice.

A sample snippet:

private void process(final Reader reader, final Writer writer) {
    try {
        final char[] cbuf = new char[8 * 1024];
        int len = -1;
        while((len = reader.read(cbuf)) != -1) {
            // translate is your method which takes a string and translates/encrypts it
            writer.write(translate(new String(cbuf, 0, len)));
        }
    } catch(final Exception e) {
        throw new RuntimeException(e);
    }
}

Given that the buffering is done by the method, you need not even use a Buffered reader/writer; a File reader/writer should suffice.

I tried implementing something like that, using the read(char[], int, int) method, but something bizarre happened - for some reason the text was being duplicated; i.e. the reader just read the first 'x' amount of characters and spam pasted them in the write file :/ My code was pretty similar to yours, except that I used read(cbuf, 0, 1000) instead of just read(cbuf) - shouldnt this read the first 1000 characters, place them in the array and then once they're written, it overwrites the array starting from 0 again?

Oh wait, I think I know what's going on. I foolishly assumed that the char[] cbuf automatically flushes its contents when it's written to the file. I should have known better hehe.

The only solution i can see is to initialise the array after '
while((len = reader.read(cbuf)) != -1) { '. However, this causes an error as the reader is unable to read its onctents and place them in the cbuf unless i initialise it before this block.

Any suggestions? :)

I'm not sure what your issue here is because the snippet I posted would work *out of the box* without any modifications as far as the reading and writing part is concerned. I'd recommend reading the Javadocs for the read() method and writing small snippets to understand how it actually works.

I'm not sure what your issue here is because the snippet I posted would work *out of the box* without any modifications as far as the reading and writing part is concerned. I'd recommend reading the Javadocs for the read() method and writing small snippets to understand how it actually works.

Ive read the javadocs, but it didnt mention if the array is flushed or not though. I'll just have to experiment i suppose.

I'm not sure why you use the word "flush".

It goes like this: you create a char array (which initially contains all '\0' characters) and pass the array to the read method. This method fills up the "char" array with the characters read and returns the "number of characters" (n) read. You then utilize from the same char array `n' characters which have just being read. Rinse and repeat with the same array; the next read() call simply overwrites the old data; there is no flushing. Simple, no? :-)

I'm not sure why you use the word "flush".

It goes like this: you create a char array (which initially contains all '\0' characters) and pass the array to the read method. This method fills up the "char" array with the characters read and returns the "number of characters" (n) read. You then utilize from the same char array `n' characters which have just being read. Rinse and repeat with the same array; the next read() call simply overwrites the old data; there is no flushing. Simple, no? :-)

Yes, i see what you mean :) The only thing is that at some point, the array does not fill completely as it would run out of text - ie the amount of text read would be less than the amount of characters left to be read by the reader. Hence, there would be 'old' values still contained in the array. Im wondering how we can make sure those old characters are not there ;)

how we can make sure those old characters are not there

You don't need to; read my previous post again. read() returns an int which represents the number of characters read. So even when doing your last read if your buffer isn't full, it really doesn't matter since you know "which portion" of the array contains the newly read values. If you'll look at the original snippet which I posted, I use a String constructor which creates a String object based on the "valid slice" of the array using this same return value of read() method.

You don't need to; read my previous post again. read() returns an int which represents the number of characters read. So even when doing your last read if your buffer isn't full, it really doesn't matter since you know "which portion" of the array contains the newly read values. If you'll look at the original snippet which I posted, I use a String constructor which creates a String object based on the "valid slice" of the array using this same return value of read() method.

The reason I'm asking all this is because I would like to fill the old characters with something else. So to do so, would I need to manually loop through the array from the last character read to the cbuf.length, and fill them myself?

Thanks so much for your assistance by the way :)

Are you talking about re-using the character array for something else? If not, you've got me all lost there; post some sample code/pseudocode as to what that *something else* is and what you are doing right now.

Are you talking about re-using the character array for something else? If not, you've got me all lost there; post some sample code/pseudocode as to what that *something else* is and what you are doing right now.

Something else is basically filling the old spaces with '#' characters.

Without the relevant code you are using plus a small sample of the input file, there is no way for me to assist you. Your best bet would be "debug" your code in an IDE like Eclipse or Netbeans and find out exactly what is happening.

Without the relevant code you are using plus a small sample of the input file, there is no way for me to assist you. Your best bet would be "debug" your code in an IDE like Eclipse or Netbeans and find out exactly what is happening.

Unfortunately I can't post code as this is an assignment :/

Basically, what I'm trying to accomplish is that the reader reads a fixed amount of characters at a time and places them in the array. However, when the last block of characters is read, they will not necessarily overwrite all elements of the array, as there may be less characters left than the size of the array.

Thus, the 'old characters' which are left and fill the remaining places of the array which have not been overwritten on the last read, are still there. I woud like to overwrite them with '#'.

Understood what im getting at? I know it's a bit hard to visualise without code, but we're not allowed to post any code :/

Thanks for the help :)

Look into the Arrays.fill method for filling a specific array range with a given character.

But are you sure you have to do this? I mean, the "remaining" places in the array would change based on the change in buffer size. Just make sure you have got the requirements right...

Look into the Arrays.fill method for filling a specific array range with a given character.

But are you sure you have to do this? I mean, the "remaining" places in the array would change based on the change in buffer size. Just make sure you have got the requirements right...

I'll look into it - thanks :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.