Database Tip: Use an Artificial Primary Key
Monday, 19 November 2007
If your data has a clear "master" field or combination of fields, which can uniquely identify each row (such as customer name in
a table of customers or ISBN for a table of books), it is tempting to use that as the primary key of the table. However, my advice
is: don't do that, use a separate, numeric, artificial primary key instead. Yes, it's an extra column on the table,
requiring extra space, and you will have to generate it somehow, but that's not a big deal. Every database vendor provides some way
of auto-generating unique key values (e.g. SEQUENCE
s in Oracle, and AUTOINCREMENT
fields in SQLite), so
populating it is easy, and the complications it saves are more than worth the trade-off. You can still maintain the uniqueness of
the master columns by applying a unique index to those columns.
Save Space on Foreign Keys
Firstly, if you have any tables that are associated with the master table, and therefore have foreign key columns that refer to
the primary key of your master, then having a separate primary key can actually save space overall, as the data for the
master columns doesn't have to be duplicated across all the linked tables. This is especially important if there is more than one
"master column", such as customer_first_name
and customer_last_name
, or if the data for these columns is
large.
Changing the master data
Secondly, if the "master columns" are actually the primary key of your table, changing the data in them is potentially
problematic, especially if they are used as a foreign key in other tables. Many online services use a customer's email address as
their "master column": each customer has one email address, and one email address refers to one customer. That's fine until a
customer changes their email address. Obviously, you don't want to lose all data associated with a given customer just because they
changed their email address, so you need to update the row rather than delete the old one and insert a new one. If the
email_address
column is the primary key of the table, and therefore used as the foreign key in other tables, then
you've got to update the data not just in the master table, but in each dependent table too.
This is not impossible, but it's certainly more complex and time consuming. If you miss a table, the transaction may not complete
due to foreign key constraint violations, or (worse) the transaction may complete, but some of the data may be orphaned. Also, in
some database engines, the constraint violation will fire when you change either the master table or the dependent table, so you
need to execute a special SQL statement to defer the constraint checking until COMMIT
time. If you use an auto-generated
primary key, then only the data in the master table needs changing.
Changing the master columns
Finally, if the primary key is auto-generated, then not only is it easy to change the data in the master columns, but you
can actually change what the master columns are. For example, if you initially decide that customer_first_name
and customer_last_name
make an ideal primary key, then you're stuck if you then get another customer with the same
name. OK, so you make customer_first_name
, customer_last_name
and customer_address
the
primary key. Oops — now you've got to duplicate the address information across all the dependent tables. Now you encounter two
people with the same name at the same address (e.g. father and son), so you need to add a new designator to the key (e.g. Henry
Jones Junior, Richard Wilkins III). Again, you need to update all the dependent tables. If the primary key is
auto-generated, there's no problem — just update the unique constraint on the master table to include the appropriate columns,
and all is well, with the minimum of fuss.
Simplify your code
It's not going to simplify your code much, but using an auto-generated numeric key means that this is all you need to store as an identifier inside your program to refer to a particular row: much easier than storing the data from a combination of columns. Also, it's much easier to write code to update the data on one table than on multiple tables.
Conclusion
Don't use real table data as the primary key for a table: instead, use a separate, numeric, auto-generated column as the primary key. This will simplify the connections between tables, and make your life easier if the structure of the database or the data in the key columns changes.
Related Posts
In previous posts on Database Design, I've talked about:
- Using parameterized queries,
- Creating appropriate indexes,
- Using transactions, and
- Delaying the use of a database
Posted by Anthony Williams
[/ database /] permanent link
Tags: database, primary key, foreign key
Stumble It! | Submit to Reddit | Submit to DZone
If you liked this post, why not subscribe to the RSS feed or Follow me on Twitter? You can also subscribe to this blog by email using the form on the left.
Design and Content Copyright © 2005-2024 Just Software Solutions Ltd. All rights reserved. | Privacy Policy
1 Comment
Hello friends, If you are searching for how to get free club penguin membership codes online within minute. You can also get free club penguin codes totally free of cost.