Hi,
I have a problem where I need to generate hash for ~10e9 (1 bio) keys. I tried creating a hash of my own but after all the research I realized that I can as well (or even should) use one of existing functions.
I narrowed down on SHA-1, which generates a 20-byte key. Range is 0-10e48
My storage allows me to store only a "java long" so I'm doing a simple mod of the SHA1 hash to truncate it to the range of long (64 bits = 10e18).
Now what I'm worried about is that the mod'ing would screw up the distribution of the hash and would increase the possibility of collision.
In my usecase a collision is basically fatal. There is no way for me to recover / handle a collision.
Question:
How to do I measure the distribution of a hash function? I have no idea how the hash functions are evaluated. So are there any special/pre-defined techniques/code I can reuse?
Thanks
PS: I could use a hash that generates a 8 byte hash but I'm using SHA1 as it's supposed to be good.
Here is the implementation:
package com.kash;
import java.math.BigInteger;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
public class Hash_SHA1 implements HashFunction {
private MessageDigest algo = null;
public Hash_SHA1() {
try {
algo = MessageDigest.getInstance("SHA-1");
} catch (NoSuchAlgorithmException e) {
System.out.println("Hash_my.Hash_SHA1(): NoSuchAlgorithmException");
e.printStackTrace();
}
}
@Override
public BigInteger hash(String s) {
// SHA-1 generates a 20 byte hash, we can only deal with 8 bytes so mod it.
return new BigInteger(1, algo.digest(s.getBytes()))
.remainder(new BigInteger("999999999999999999"));
}
}
Attached: Exported Eclipse project with JUnits and input.txt and the interface etc. For testing.