I'm soliciting advice for performace improvements for creating a weak checksum for file segments (used in the rSync algorithm)
Here's what I have so far:
def blockchecksums(instream, blocksize=4096):
from hashlib import md5
weakhashes = []
stronghashes = []
for chunk in iter(lambda: instream.read(blocksize),""):
a = b = 0
l = len(chunk)
for n, i in enumerate(bytes(chunk)):
a += i
b += (l - n)*i
weakhashes.append((b << 16) | a)
stronghashes.append(md5(chunk).hexdigest())
return weakhashes, stronghashes
I haven't had any luck speeding things up using itertools or using c functions (like any() )