Hi,
Wondering how I would be able to implement an efficient fuzzy match algorithm in either c# or sql. I was using levenshtein's distance for my test data (~10 records) but once I started using a larger data set (~800 records), I realized this may not be the best way to go. It took approx. 4 minutes to run through the ~800 records to find one match. This is for a web form our clients will use so we are looking for something ~10 seconds or better.
I have to search through a customer data and match on firstname, lastname, social, birthdate, and the first line of their address. I need to account for typos or missing values so I can't just use social to match. Currently, the database I am using has about 800 customers, but eventually, this program will be used for larger databases (in the thousands).
I am looking for some free tools that I may be able to use. I am looking for records that match 80% and greater. I have also tried soundex but I do not think that it would be accurate enough for this. For example, I have a customer Jim Saunders and I search for firstname=Jim, lastname=Saun. Jim Saunders would not come up. I would need to account for names that are not complete as well.
Thanks.
Edit: I just tried a different approach that may work...
I use soundex again, but instead of matching the whole name, I take the first three characters of the first and last name. IE: JIM SAU. and that would match. But is there a way to match numbers? I cannot use soundex to match on birthdate or social.