Hi.

This is the first Topic I've posted at (the cool) daniweb site so go easy on me.

I want to put the cities of England with all their districts in a database. I can think of the following two ways to do this:

1)

cities (cityID, cityName)
districts (districtID, cityID, districtName)

2)

cities (cityName)
london (districtID, districtName)
manchester (districtID, districtName)
liverpool (districtID, districtName)
...

If I go with the 1st way - the districts table will have a lot of records/rows. If I go with the 2nd - I'll have to create a table for each city. I have very little experience with databases so I'd want to ask you which way is better.

Any help would be fantastic.
Cheers.

The first approach is the logical thing for a database design. Having a lot of data is not a consideration while designing a database. The essence of a database design is to model generic entities which can assume several representable values.

In your case, you are trying to store cities and districts and model their relationship. Your problem statement defines two generic entities (usually identified by nouns) - cities and districts. London, Liverpool, Manchester are all city names each of which is a specific data row of the cities table. Data is related to some other data, like in your case the data in cities has to be related to data in the district itself.

Your second approach is not relational modelling as it has been defined in the book. What you are trying to do is some kind of a direct lookup where the data is linked to an entity (I would term that as a lookup). The interesting thing to note here is that if you have table by the city names, you would have several tables with the same logical design and each containing one row. You might realize that this is pure redundancy. You would be putting related things in unrelated entities when they belong to the same class of data.

Pardon my explanation because I haven't any database specific terms here. I just wrote whatever logically came to my mind. :D

Thank you aniseed for the detailed reply. I read some articles about database design these days. Now I can clearly agree the first approach is the logical thing for a database design. I have a problem however with the this first approach. I read about normal forms and I think the districts table may violate the 3NF since actually the districtName column is dependent on the cityID column in a way and as far as I understood - this is not allowed with the 3NF. I'd love to know what you think about that. Do you think the table really violates the 3NF?

Here's the table again:

districts (districtID, cityID, districtName)

And here's the best article I read about 3NF.

Yeah, it's not in 3NF. I thought it was obvious with the example on the Wikipedia link. It's similar to your own example.

cities (cityID, cityName)
districts (districtID, cityID, districtName)
These are the tables you have. The reason why you need further normalization is that you can have multiple cities within one district. With this table structure for districts, you will
1. store only one cityId associated with a district, in which case you are missing out other city-district relationships.
2. store the same districtName several times for each combination of districtId and cityId. This will mean redundant data. You only need to store the districtName in one place and you would be storing it in several places.

As per the third normal form, the districtName is an attribute which is not required to express the relationship between cities and districts. You can express the relation with only the cityId and districtId.

So, with 3NF, your tables will look like:
cities (cityId, cityName)
districts (districtId, districtName)
city_district (cityId, districtId) [<-- the mapping table (pardon my unimaginative nomenclature)]

Great reply aniseed. Thank you a lot for the time you spend to answer my questions.

I thought about the approach with a table like city_district (cityId, districtId). The thing is - if I use such table won't it be much harder to use the data? If I need for example to retrieve from the database all the districts in London. I don't have much experience with server languages but as far as I know such task would require much more code.

If I use:
districts (districtID, cityID, districtName) Then my query is simple - select * from districts where cityID = the ID of London

If I use:
city_district (cityId, districtId), I would have create an array with all the districtIDs that belong to London then find select these districts in the districts table.

Is this complication worth it? What do you think?

The complication is not always worth it. As per my experience, normalization does not always mean better performance. It is sometimes better to introduce redundancy to improve performance. But that's a totally different topic and quite a time consuming one to understand.

For simplicity, it is better to normalize your database structure in most of the cases. :)

Thank you again aniseed. While designing the database for this project I'm working on I often considered some of your advices. Highly apprecieate your help.

your many - many relationship assumes there can be many cities in a district and many districts in a city.

Is that the case? Correct me if I'm wrong but I don't believe it is. It seems that Districts are the largest geographical entity which contain cities.

So assuming that is the case your model should be:

districts (districtId, districtName)
cities (cityId, districtId, cityName)

Where cities refer to the distric in which they are located.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.