Hello,
I'm working on a website that will be based on user contributed data, submitted using a regular HTML form.
To simplify my question, let's say that there will be two fields in the form: "User Name" and "Country" (this is just an example, not the actual site).
There will be two tables in the database : "countries" and "users," with "users.country_id" being a foreign key to the "countries" table (one-to-many).
The initial database will be empty. Users from all over the world will submit their names and the countries they live in and eventually the "countries" table will get filled out with all of the country names in the world.
Since one country can have several alternative names, input like Chile, Chili, Chilli will generate 3 different records in the countries table, but in fact there is only one country. When I search for records from Chile, Chili and Chilli will not be included.
So my question is - what would be the best way to deal with a situation like this, with conditions such that the initial database is empty, no other resources are available and everything is based on user input?
How can I organize it in such way that Chile, Chili and Chilli would be treated as one country, with minimum manual interference.
What are the best practices when it comes to normalizing user submitted data and is there a scientific term for this? I'm sure this is a common problem.
Again, I used country names just to simplify my question, it can be anything that has possible different spellings.