I know everyone and his cat has one of these up but I'm just so stuck... I mean in theory I think I understand normalization and then I get down to it and I just... don't. I've been googling and looking through examples and I'm ready to tear my hair out. This is what I have thus far, any help your willing to give would be greatly appreciated


Relational Model

Authors(aID, aname, abio)
Books(ISBN, libref, title, summary)
Format(fid, type)
Section(sectionId, sectionName)
Cardholders(cardNum, cname)
Series(serID, title)
Status(sid, status)
Publishers(pID, name)
Checkout(cardNum, libref, date)
Return(cardNum, libref, date)
haveFormat(libref, fid)
areInSection(libref, sectionID)
areInSeries(libref, serID, volumeNum)
areAvailable(libref, sid)
authorsWrite(aid, libref)
booksPub(libref, pid, year)
authorsPub(aid, pid)
Quantity(ISBN, numOfBooks)
Normalization

According to 1NF the tables in a database should not contain repeating groups. The only tables in violation of this are the Books table and the areInSeries tables. Due to the fact that each title can come in a variety of formats, each format gets its own unique ISBN and library reference number. It would be better than to break down the Books table by Format.

booksHardcover(libref, ISBN, title, summary)
booksPaperback(libref, ISBN, title, summary)
booksAudio(libref, ISBN, title, summary)
booksTradePaperback(libref, ISBN, title, summary)
breaking the inSeries table down by format gets rid of the duplicates:
seriesPaperback(libref, serID, volumeNum)
seriesTradePaperback(libref, serID, volumeNum)
seriesAudio(libref, serID, volumeNum)
seriesHardcover(libref, serID, volumeNum)
BCNF
R = Authors(aid, aname, abio)
F = aname -> abio

***didn't copy my formatting but the first value in each is the key and was underlined in my word doc

Your understanding of what constitutes a repeating group is incorrect. Taking your Books example, you said "each title can come in a variety of formats, each format gets its own unique ISBN and library reference number". Therefore, it has a unique key. An actual repeating group would be more like this:
Books(title, ISBN1, libref1, ISBN2, libref2, ISBN3, libref3, summary)
So you can see that the columns ISBNx and librefx repeat 3 times.

what if I had something like this:

(12345, 123, Harry Potter and the Goblet of Fire, [summary])
(98765, 876, Harry Potter and the Goblet of Fire, [summary])

one being a hardcover edition and the other being a paperback? would violate?

ETA: my understanding of the entire concept of normalization is thin, I just can't get it

Yes, but not of 1st normal form.

(Harry Potter, 12345, 123, 98765, 876, [summary]) would be a violation of 1nf.

One way to paraphrase the 3 forms is: "Data depends on the key, the whole key, and nothing but the key, so help me Codd" So 1nf is satisfied (in your example) because the ISBN is a unique key. Second normal form is violated, because ISBN does not identify the Book itself only an edition (Title should not appear in multiple records). 3nf means that the key uniquely identifies all the data it covers, and all data covered is unique to that key. So, you could have a set of tables like:
Book (BookId [pk], Title, summary)
Edition (isbn [pk], BookId [fk], covertype [fk])
Cover (covertype [pk], covertypedesc)
LibraryBook (Libref [pk], isbn [fk]) would do the trick.

Hope this helps.

I think that helps, thanks. I'm unlikely to forget that little phrase I think, it made me laugh.

Now unless I'm totally mistaken the only other problem area is the Series table, which can be solved similar to the way you've provided above. I think everything else is all ready normalized?

Hard to tell, because none of the specs indicate what the key attributes are. You can handle that little omission by just putting [pk] or [fk] after the attributes. Keep in mind that if a pk is more than one attribute, you can put [pk1] [pk2] after the relevant attributes. That gives a hint about what order the attributes are considered.

Authors(aID, aname, abio)
Books(ISBN, libref, title, summary)
Format(fid, type)
Section(sectionId, sectionName)
Cardholders(cardNum, cname)
Series(serID, title)
Status(sid, status)
Publishers(pID, name)
Checkout(cardNum, libref, date)
Return(cardNum, libref, date)
haveFormat(libref, fid)
areInSection(libref, sectionID)
areInSeries(libref, serID, volumeNum)
areAvailable(libref, sid)
authorsWrite(aid, libref)
booksPub(libref, pid, year)
authorsPub(aid, pid)
Quantity(ISBN, numOfBooks)

Fixed with original formatting, since it didn't copy from Word right before

Hmm...where to start.

I could just give you the real normalization and let you compare, but 8 students out of 10 would just take that and turn it in. Not that I'm saying you're one of the 80%, I'm just sayin'.

Instead, let me give you some hints that will help you figure it out yourself.

1. Are there tables that have a compound primary key (more than one column in the key)? If so, is there any reason the key has to be compound? HINT: Look at the BOOKS table. Both ISBN and LIBREF are unique, right? So pick one or the other.
2. Are there tables that have keys that won't make their data unique enough? HINT: Look at AreInSeries. Ask yourself "Does the book identified by libref appear in more than one series?" You could also try to come up with examples. For Example, "The Fellowship of the Ring" is in the series "Lord Of The Rings". HINT2: The structure you have is correct.
3. Are there tables that have the same key? HINT: Look at Checkout and Return. If you are identifying one transaction (a checkout/return pair...no book can be returned without being checked out) then consider unifying the two tables like
Checkout(cardNum, libref, checkoutdate, checkindate)
4. Are there tables that don't appear to relate via key to any other table? HINT: Look at table Quantity. Could libref be the correct key for this?

Other than those few things, a very good effort.

thanks a lot, your tips are really more than enough I believe. The theory of normalization was just going over my head, so trying to take that technical jargon and whittle it down to something that made more sense was kinda hard. And actually a few of the things you've mentioned are all ready things I had noticed and made a note to change. The Quantity was a late addition to deal with the what if I had two copies with the same ISBN, they would have separate reference numbers in the library but the same ISBN because they are the same title and format. But I can probably change that once I've reworked some of this.

okay I have one last and perhaps stupid question and I'm really wishing I knew for my notes on normalization or like an old assignment cause they had good examples. However; I lack either. As part of my report on the design of this database I have to show that's its been normalized. I'm wondering if there's an 'official' process or procedure for proving this? This'll teach me to get rid of class notes before graduating.

Nope. Just go table by table, apply the rules and prove they are right.

Here's a little bitmap that might help you visualize. Please pay special attention to Quantity (maybe unify with Books?) and areAvailable (one book has many statusses...but when?)

Okay hi again, I have one last request, if you don't mind.

I've been thinking how best to organize this section for my report and I think the best way is to take each table and show how it complies or violates each form and I was wondering if you might take one of the tables there and provide an example for me. I believe with that I can finally put this to rest and get on with the rest of my design.

Thanks again

Just go with the simplest:
Table: Authors
Desc: List of people who write books.
Violations: none

Table: Quantity
Desc: Number of volumes of this book.
Violations: the key is the same as the parent's key.

etc., etc.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.