I'm currently working on a little project for a friend of mine. He runs a computer shop and needs a small application to register sales and print out a sales ticket. I've done this in Open Office and it works fine (hey, stop throwing things at me!). The problem is that I have to store sales in a database and be able to print out a sales list at the end of each month.

I have an idea about a database that holds 3 tables:
- customer details
- product details
- sale details

I believe the names are self-explanatory.

I have some problems designing the third table ("sale details"). A customer can buy an arbitrary number of different products and an arbitrary number of each of those at once. What would be the best way to design the table? My best guess would be to insert a new row for each type of product and to somehow connect it to other rows with a sale ID.

Hello mad pat

You need a further table, name it salesitems, where its primary key must be formed from salesID and productID because of many-to-many relationship. Both are also foreign keys. Here can you also place quantity, item price, total price etc. Ok, write the create-table statements for all four tables down, submit them and we will further discuss your assignment.

krs,
tesu

Of course! A fourth table is the solution! THX!!!

Well, it's been some time since I worked on databases, so I'm a bit rusty on SQL. I believe the following schema will clarify my goal.

First come the customer and product tables:

customer_details:
- ID, integer, auto increment, primary key
- name
- address

product_details
- ID, integer, auto increment, primary key
- name
- manufacturer
- category
- unit price

Then come the sales and sale_items tables. The sales table contains the customer ID, the date and a reference number for the items. The sale_items table has all individual items in it. It has the product ID and the sale ID.

sales
- ID, integer, auto increment, primary key
- customer (customer_details.ID)
- sale date
- total price, single
- items_ref, sales_items.groupSaleID, integer, unique

sales_items
- product, product_details.ID
- groupSaleID, integer
- amount, small int
- price 'in case of discounts

Hi,

sales
- ID, integer, auto increment, primary key
- customer (customer_details.ID)
- sale date
- total price, single
- items_ref, sales_items.groupSaleID, integer, unique

sales_items
- product, product_details.ID
- groupSaleID, integer
- amount, small int
- price 'in case of discounts

sales and salesitems must be improved: Primary key of salesitems is and only is: (productID, salesID) and nothing else! productID is the primary key of products (you named it ID there), salesID is primary key of sales (you named it ID there!)

I ll give you an example how to create the many-to-many-relationship table sales_items:

create table sales_items
(
   productID integer not null, salesID integer not null, 
   amount integer not null, price decimal (10,2) not null,
   primary key (productID, salesID);
   -- and most important:
   foreign key (productID) references products,
   foreign key (salesID) references sales
);

Because of foreign-key attributes, productID must be primary key of products, there you may name it ID or bla and vice versa. Also salesID must be primary key of sales, there you may name it ID or blabla or something else.

If your database engine is not able to handle foreign keys (for example mysql isamdb does not like them) then change engines, for example innodb can handle them.
You should never omit primary keys and foreign keys.

btw, whats your target database?

-----
tesu

OK, but what is the benefit of using a double primary key?

Currently I'm running tests with Open Office's internal database application Base, but the goal is to use a server that runs MySQL, since I'm familiar with it and I can use it for free.

Hi Mad Pat

I am back again. It's really great that you are using open-office db. You know it's hsqldb, that famous pure Java database system what follows official SQL standard nearly completely. You might also know that contrary to open-office's embedded solution this Java db can also run as server (I myself prefer its competitor Derby from Apache).

> Benefit of double primary key?

Well, there is no other way for mapping a many-to-many relationship into relational data model correctly. Also database system then assures integrity of data, so that must not be programmed in application programs explicitly.

Necessary (but not sufficient) for correct normalizations. Therefore, no abnormalities when inserting, updating or deleting rows.

Virtually all kinds of data grouping usually needed in application programs can be completely formulated at SQL level.

Functioning relational data models do economize programming expenses considerably. And don't forget rise of performance due to indexes automatically and correctly defined by database if primary and foreign keys are properly set up. Usually this cannot be achieved by surrogate keys.

Test it! As usual, design data model without composite primary keys or foreign keys, even do so for linking tables which implements many-to-many relationships. Best: replace all of them by surrogate keys having auto-increment attribute. Then try to get the total sales of your customers grouped by customer's name by a simple sql-select statement (no php, java, c++ code) like that one:

select c."name" as Customer, sum(quantity*s.salesprice) as "Total sales"
from salesitems s join sales join products join customers c
group by c."name" order by "Total sales" desc

(what really works on my simple sales data model (ok, I did some adaptation so it met your introduced data model).

krs,
tesu

> Benefit of double primary key?

Well, there is no other way for mapping a many-to-many relationship into relational data model correctly. Also database system then assures integrity of data, so that must not be programmed in application programs explicitly.

You're talking about "referential integrity", wright? Someone of the PostGreSQL community once said that MySQL has only very poor support for it.

But imagine the following situation:

I have a table "product details" with one off the columns holding information in regard to what category the product belongs to ("laptop", "hard disk", "screen", etc.). The application has two list boxes: one lists all available categories, and the other lists all products in that category. When the user selects a category all products in that category are being looked up in the database and displayed in the seconds list box.

Suppose a specific type of laptop is no longer for sale. If I would remove it from table "product_details", some rows in the "sales" table may refer to a no longer existing row within the table "product_details". I could add a column "available" to the table "product_details". This would however make the application having to check one more field while querying the database + it has to go through a lot of redundant data. This could slow down the applicaton considerably.

Hi Mad Pat

In the past all database engines of mysql didn't support any referential integrity by foreign keys. Today, only innodb is able to handle foreign keys correctly (of a kind). Yet my isam db allows definition of foreign keys BUT is not able to keep referential integrity! (Though this be madness, yet there is method in it! Therefore, you must always be using innodb engine to ensure referential integrity.)

No, a simple status field every time tested when selecting product data preselected by primary key of categories would never never slow down performance. Let me compare it with a very large application SAP R/3, oracle db, about 800 users on-line, I am involved with: Filling a combo-box with about 35000 rows of material data via a many-to-many linking table by same way I already described in prior post will last less than 5 seconds. Any idea, how long it will last if there is an additional mere status field in where clause?

Ok, less than 5,01 seconds or so, so lets turn back to your conception of linking table sales_items which implements a many-to-many relationships between two associate tables. To connect sales_details with product_details a third table sales_items had been introduced what has been given its own primary key named ID to assure "unique" rows. To do this efficiently almost all relational database systems (rdbms) automatically add an unique index on that primary key (except postgresql, what explicitly requires an unique constraint).

We are not done so far because we need two further columns to manage the many-to-many relationships: primary keys of product_details and sales_details must be added to our linking table firstly filling it with useful information. For following mainstream, we should not care referential integrity. Therefore,we generously leave off foreign-key constraints. But stop, what about performance? Shall we really allow to scan that linking table completely every time users select new sales_detail? No, that sounds too badly, so let us put a simple index on that column. (consider it's still the column what points to primary key of sales_detail.)

Coming here, did we really do enough? (Asking so) Possibly not, as everybody knows inner key joins are the fasted joins ever been invented (we really do need some joins to clue all those "fine" normalized tables together). To tell the machine that key joins were possible we shall put a further simple index on the column that points to primary key of product_details. We are finally done, really? No, what about the select statement to fill our combo box showing all items of a particular sales? Here is it:

select s.productID, p.name from sales_items s inner join product_details p 
  where s.salesID = salesID_the_user_selects_from_first_combobox

So we are able to admire all items of sales of salesID_the_user_selects_from_first _combobox.

But stop, what about that sole primary key ID??? We are missing it! Isn't it contain in above select statement? No, and unfortunately nobody has got any droll idea of any useful consideration of it. So we can drop it ! By dropping this ugly column, suddenly we got remember our both simple indexes once we put on the fields pointing to primary keys productID and salesID boosting performance. He, what, if we replace those plain index definitions by foreign key constraints? WOW !!! Suddenly, all ends well: FIRST, because we have spent foreign-key constraints on both columms, rdbms now knows that key join is possible what will boost performance dramatically. And secondly, not enough: rdbms is completely happy because it is now able to ensure REFERENTIAL INTEGRITY, really without any extra expense!

Did we have had enough of words? No, too badly, because of dropping ID from table sales_items it has lost its primary key. To keep a long story short: The pair (salesID, productID) is the only true candidate to be primary key of sales_items. And that is still the creating statement of sales_items:

create table sales_items
(
   salesID integer not null, productID integer not null, 
   quantity integer not null, salesprice decimal (10,2) not null,
   primary key (productID, salesID),
   foreign key (productID) references products on delete restrict,
   foreign key (salesID) references sales on delete restrict
);

I'll close here by citing famous Joe Celko:
"I call people who put a magical, universal "id" column with an auto-increment on all their tables "id-iots" and charge a few thousand dollars per day to fix the problems they cause, Ever run into Khablah numerology and the belief that God put a 17 digit Hebreew number on everything in creation? That His true name is a 211 digit number? Etc. What id-iots are doing is mim(i)cking a magnetic tape file and record numbers -- a physical locator, and not a relational concept at all."

To be honest: I would never name opponents to be id-iots !

krs,
tesu

THX again. I will study your explanation when I have time.

I know that the arbitrary use of a "ID column" on each table doesn't bring me much. My consideration for using in nonetheless was the following:
Suppose the shopkeeper just sold a number of items that were entered into the database and he sees he made a mistake (wrong item, wront amount of that item, wrong pricing, etc.), he would have to go back to correct it. I could list all items that belong to that saleID. But if he would want to make a change to the table sales_items, he would need to be able to clearly identify a specific row, wouldn't he?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.