I want to develope an ETL application for Data Ware House Project (Decision Support System) for university,
i am extracting data from, Excel, Access and CSV format files, i have done the extraction, but now the problem is the Transformation of Data, i have to clean it and standardize it, so is there any algorithm that can find the occurence or probability of a name and make it a standard and for other name with different spellings they will be replaced by a standard.
for example
Malik, Maliq, Malique, Mallick, Malick, Malicq, Malik
now the occurence of Malik is more than other names
i want to find probabiltity of Malik and make it a standard and replace other same name but different spellings with MALIK which is a standard
So, please help me how to do it
I am working on C++ using .NET 2008