I'm writing an application to catalog a very large number of files (2 million+ files)on a share drive. For each file in the directory tree, I'm adding a record to a mysql database. The share drive is active, in that files will be regularly added; thus the app will need to update the database periodically as well.
The database is initially set to have a auto incremented integer field as the primary key.
My question is, as the database needs to be updated, what is the most efficient way to determine if a file record already exists in the database, prior to inserting it. I don't think a query should be performed, each time a file is encountered, to see if it's already been loaded. Is there a better option for a primary key (like the full path name of the file)?
Thanks,
Vic