Database Normalization Basics
If you've been working with databases for a while, chances are you've heard the term normalization. Perhaps someone's asked you "Is that database normalized?" or "Is that in BCNF?" All too often, the reply is "Uh, yeah." Normalization is often brushed aside as a luxury that only academics have time for. However, knowing the principles of normalization and applying them to your daily database design tasks really isn't all that complicated and it could drastically improve the performance of your DBMS.
In this article, we'll introduce the concept of normalization and take a brief look at the most common normal forms. Future articles will provide in-depth explorations of the normalization process.
#What is Normalization?
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
#The Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article.
Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let's explore the normal forms.
#First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:
In this article, we'll introduce the concept of normalization and take a brief look at the most common normal forms. Future articles will provide in-depth explorations of the normalization process.
#What is Normalization?
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
#The Normal Forms
The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article.
Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let's explore the normal forms.
#First Normal Form (1NF)
First normal form (1NF) sets the very basic rules for an organized database:
- Eliminate duplicative columns from the same table.
- Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
#Second Normal Form (2NF)
Second normal form (2NF) further addresses the concept of removing duplicative data:
Second normal form (2NF) further addresses the concept of removing duplicative data:
- Meet all the requirements of the first normal form.
- Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
- Create relationships between these new tables and their predecessors through the use of foreign keys.
#Third Normal Form (3NF)
Third normal form (3NF) goes one large step further:
Third normal form (3NF) goes one large step further:
- Meet all the requirements of the second normal form.
- Remove columns that are not dependent upon the primary key.
#Boyce-Codd Normal Form (BCNF or 3.5NF)
The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one more requirement:
The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one more requirement:
- Meet all the requirements of the third normal form.
- Every determinant must be a candidate key.
A candidate key is a combination of attributes that can be uniquely used to identify a database record without any extraneous data. Each table may have one or more candidate keys. One of these candidate keys is selected as the table primary key. (Mike Chapple)
A candidate key is a single column or group of columns that uniquely identify a row. You can have multiple candidate keys in a single table because it's possible to have multiple unique constraints. One of the candidate keys can be designated as the primary key for the table.
A candidate key is a single column or group of columns that uniquely identify a row. You can have multiple candidate keys in a single table because it's possible to have multiple unique constraints. One of the candidate keys can be designated as the primary key for the table.
A composite key is simply any key that is composed of more than one column. This means that when a candidate key has multiple columns in it, it is also a composite key. (Matt Nelson)
#Fourth Normal Form (4NF)
Finally, fourth normal form (4NF) has one additional requirement:
#Fourth Normal Form (4NF)
Finally, fourth normal form (4NF) has one additional requirement:
- Meet all the requirements of the third normal form.
- A relation is in 4NF if it has no multi-valued dependencies.
Multivalued dependencies occur when the presence of one or more rows in a table implies the presence of one or more other rows in that same table.
Examples:
For example, imagine a car company that manufactures many models of car, but always makes both red and blue colors of each model. If you have a table that contains the model name, color and year of each car the company manufactures, there is a multivalued dependency in that table. If there is a row for a certain model name and year in blue, there must also be a similar row corresponding to the red version of that same car.
For example, imagine a car company that manufactures many models of car, but always makes both red and blue colors of each model. If you have a table that contains the model name, color and year of each car the company manufactures, there is a multivalued dependency in that table. If there is a row for a certain model name and year in blue, there must also be a similar row corresponding to the red version of that same car.
Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.
#Some Good Reasons Not To Normalize
That said, there are some good reasons not to normalize your database. Let’s look at a few:
1. Joins are expensive.
Normalizing your database often involves creating lots of tables. In fact, you can easily wind up with what might seem like a simple query spanning five or ten tables. If you’ve ever tried doing a five-table join, you know that it works in principle, but its painstakingly slow in practice. If you’re building a web application that relies upon multiple-join queries against large tables, you might find yourself thinking: “If only this database wasn’t normalized!” When you hear that thought in your head, it’s a good time to consider denormalizing. If you can stick all of the data used by that query into a single table without really jeopardizing your data integrity, go for it! Be a rebel and denormalize your database. You won’t look back!
2. Normalized design is difficult.
If you’re working with a complex database schema, you’ll probably find yourself banging your head against the table over the complexity of normalization. As a simple rule of thumb, if you’ve been banging your head against the table for an hour or two trying to figure out how to move to the fourth normal form, you might be taking normalization too far. Step back and ask yourself if it’s really worth continuing.
3. Quick and dirty should be quick and dirty.
If you’re just developing a prototype, just do whatever works quickly. Really. It’s OK. Rapid application development is sometimes more important than elegant design. Just remember to go back and take a careful look at your design once you’re ready to move beyond the prototyping phase. The price you pay for a quick and dirty database design is that you might need to throw it away and start over when it’s time to build for production.
As a parting thought, I offer some words of caution. When you do choose to vary from the rules of normalization, you’ll need to be extra vigilant about the way you enforce database integrity. If you store redundant information, put triggers and other controls in place to make sure that information stays consistent.
Source 2 : About.Com - Mike Chapple - Candidate Key
No comments:
Post a Comment
Share Your Inspiration...