I'm thinking about writing a small command line directory management program in Python. The most basic functionality I want is to be able to compare the the files in a directory and sub-directory, and if there are any duplicates then delete the lower most duplicate.
I all ready have a basic idea of how to accomplish that. Merely hash both files and compare the hash values. If the hash values are equivalent, then they are the same. However, I'm not sure on the specifics of how to accomplish it. I know the data has to be read from each file and then hashed to do the comparison. However, what is the best method for reading the data? I'll be dealing with large files, possibly upwards of 4GB, so the program needs to be able to handle this.
What are some other good tools to include? I'm thinking about writing some file renaming tools as well.