The database tables used in this book were designed for simplicity's sake, to help you understand the basic interaction between PHP and MySQL. These are not "normalized" databases. "Normalization" is a word you'll hear a lot when you begin to create detailed database-driven applications, and it requires a different type of thought process—thinking in relational terms before seeing the relationships in front of you. In this appendix, you learn the basics of database normalization, along with some key elements of the SQL language.
Database normalization is essentially a set of rules that allows you to organize your database in such a way that your tables are related, where appropriate, and flexible for future growth and relationships. The sets of rules used in normalization are called normal forms. If your database design follows the first set of rules, it's considered in the first normal form. If the first three sets of rules of normalization are followed, your database is said to be in the third normal form. This appendix goes through the normal forms, using the concept of students and courses in a school, and shows you how to normalize the my_contacts table used previously in the book.
Before explaining the first normal form, let's start with something that needs to be normalized. In the case of a database, a flat table is a prime example of something needing to be normalized. A flat table is like a spreadsheet with many columns of data. In a flat table, there are no relationships between multiple tables, as all the data you could possibly want is right there in that single flat table. This scenario is not the most efficient design and will consume more physical space on your hard drive than a set of normalized database tables.
Suppose you have a table that holds student and course information for a school. You might have the following fields in your flat table, as shown in Table D.1.
|
Field Name |
Description |
|---|---|
|
StudentName |
Name of the student |
|
CourseID1 |
ID of the first course taken by the student |
|
CourseDescription1 |
Description of the first course taken by the student |
|
CourseInstructor1 |
Instructor of the first course taken by the student |
|
CourseID2 |
ID of the second course taken by the student |
|
CourseDescription2 |
Description of the second course taken by the student |
|
CourseInstructor2 |
Instructor of the second course taken by the student |
You might then repeat CourseID, CourseDescription, and CourseInstructor columns many more times to account for all the classes a student can take during their academic career. Although redundant, this is the method used when creating a single flat table to store information. Eliminating this redundancy is the first step in database normalization, so next you'll take this flat table to first normal form. If your table remained in its flat format, you could have a lot of unclaimed space and a lot of space being used unnecessarily—not an efficient table design!
The main rules for the first normal form are as follows:
Eliminate repeating information
Create separate tables for related data
Looking at the flat table design, with its many repeated sets of fields for students and courses, you can identify students and courses as its two distinct topics. Taking your student and courses flat table to the first normal form would mean that you create two tables: one for students (call it students) and one for students plus courses (call it students_courses). You can see the new table designs in Tables D.2 and D.3.
|
Field Name |
Description |
|---|---|
|
StudentID |
A unique ID for the student. This new field is now a primary key. |
|
StudentName |
Name of the student. |
|
Field Name |
Description |
|---|---|
|
StudentID |
Unique ID of the student, matching an entry in the students table. |
|
CourseID |
ID of the course being taken by the student. |
|
CourseDescription |
Description of the course taken by the student. |
|
CourseInstructor |
Instructor of the course taken by the student. |
Your two new tables now represent a one-to-many relationship of one student to many courses. Students can take as many courses as they want, and are not limited to the number of CourseID/CourseDescription/CourseInstructor groupings that exist in the flat table.
You still have some work to do, and the next step is to put these tables into second normal form.
The basic rule for the second normal form is
No non-key attributes depend on a portion of the primary key
In plain English, this means that if fields in your table are not entirely related to a primary key, you have to keep working on them. In the students and courses example, this means breaking out the courses into their own table so that the original flat table is now just a table full of unique students.
CourseID, CourseDesc, and CourseInstructor can become a table called courses with a primary key of CourseID. The students_courses table should then just contain two fields: StudentID and CourseID. You can see the new table designs in Tables D.4 and D.5.
|
Field Name |
Description |
|---|---|
|
CourseID |
Unique ID of the course |
|
CourseDescription |
Description of the course |
|
CourseInstructor |
Instructor of the course |
|
Field Name |
Description |
|---|---|
|
StudentID |
Unique ID of the student, matching an entry in the students table. |
|
CourseID |
Unique ID of the course being taken, matching an entry in the courses table. |
Believe it or not, you can go even further with this example, to the third normal form.
The rule for the third normal form is
No attributes depend on other non-key attributes
This rule simply means that you need to look at your tables and determine whether more fields exist that can be broken down further and that aren't dependent on a key. Think about removing repeated data and you'll find your answer—instructors. Usually, an instructor will teach more than one class. However, the CourseInstructor field in the courses table is not a key of any sort. So if you break out this information and create a separate table purely for the sake of efficiency and maintenance, that's the third normal form. Take a look at the new courses table, and the instructors table, in Tables D.6 and D.7.
|
Field Name |
Description |
|---|---|
|
CourseID |
Unique ID of a course |
|
CourseDescription |
Description of the course |
|
CourseInstructorID |
ID of the instructor, matching an entry in the instructors table |
|
Field Name |
Description |
|---|---|
|
InstructorID |
Unique ID of an instructor |
|
InstructorName |
Name of the instructor |
|
InstructorNotes |
Any notes regarding the instructor |
The third normal form is usually adequate for removing redundancy and allowing for flexibility and growth. Next, you normalize the my_contacts table, used previously in this book.
In the original my_contacts table, there's not a lot of repeating information, but there very easily could be if you expanded this to be an actual address book. In an address book, people usually have contact information for home and work, multiple phone methods (land line, cell phone, and so on), and even multiple e-mail addresses. It would make much more sense to break all of those elements out into separate tables and attach the information to people through a primary key. Table D.8 shows the original my_contacts table as reference.
|
Field Name |
Description |
|---|---|
|
id |
Creates a unique ID number for the entry |
|
f_name |
The person's first name |
|
l_name |
The person's last name |
|
address1 |
First line of the address |
|
address2 |
Second line of the address |
|
address3 |
Third line of the address |
|
postcode |
ZIP or postal code |
|
country |
Country in which the person resides |
|
prim_tel |
Primary telephone number |
|
sec_tel |
Secondary telephone number |
|
|
E-mail address |
|
birthday |
The person's birthday |
Now identify the different areas for which different tables will exist: address, phone, and e-mail are adequate for this example. Tables D.9, D.10, and D.11 show the fields for these new tables.
|
Field Name |
Description |
|---|---|
|
id |
Creates a unique ID number for the address entry |
|
contact_id |
ID corresponding to a person in the master contact table |
|
address1 |
First line of the address |
|
address2 |
Second line of the address |
|
address3 |
Third line of the address |
|
postcode |
ZIP or postal code |
|
address_type |
Type of address, such as home, work, or other |
|
Field Name |
Description |
|---|---|
|
id |
Creates a unique ID number for the phone entry |
|
contact_id |
ID corresponding to a person in the master contact table |
|
phone_number |
Phone number |
|
phone_type |
Type of phone number, such as home, work, cell, or fax |
|
Field Name |
Description |
|---|---|
|
id |
Creates a unique ID number for the e-mail entry |
|
contact_id |
ID corresponding to a person in the master contact table |
|
country |
Country in which the person resides |
|
|
E-mail address |
|
email_type |
Type of e-mail address, such as home or work |
These new tables all contain the contact_id key, which corresponds to an entry in the new master contact table. The basic my_contacts table, used as the master contact table, should now look something like Table D.12.
|
Field Name |
Description |
|---|---|
|
id |
Creates a unique ID number for the entry |
|
f_name |
Person's first name |
|
l_name |
Person's last name |
|
birthday |
Person's birthday |
With these new tables in place, you will have a much more flexible (and normalized!) set of tables for maintaining contact information.