Team LiB
Previous Section Next Section

Appendix D: Database Normalization and SQL Reference

The database tables used in this book were designed for simplicity's sake, to help you understand the basic interaction between PHP and MySQL. These are not "normalized" databases. "Normalization" is a word you'll hear a lot when you begin to create detailed database-driven applications, and it requires a different type of thought process—thinking in relational terms before seeing the relationships in front of you. In this appendix, you learn the basics of database normalization, along with some key elements of the SQL language.

Understanding Database Normalization

Database normalization is essentially a set of rules that allows you to organize your database in such a way that your tables are related, where appropriate, and flexible for future growth and relationships. The sets of rules used in normalization are called normal forms. If your database design follows the first set of rules, it's considered in the first normal form. If the first three sets of rules of normalization are followed, your database is said to be in the third normal form. This appendix goes through the normal forms, using the concept of students and courses in a school, and shows you how to normalize the my_contacts table used previously in the book.

Applying the Normal Forms

Before explaining the first normal form, let's start with something that needs to be normalized. In the case of a database, a flat table is a prime example of something needing to be normalized. A flat table is like a spreadsheet with many columns of data. In a flat table, there are no relationships between multiple tables, as all the data you could possibly want is right there in that single flat table. This scenario is not the most efficient design and will consume more physical space on your hard drive than a set of normalized database tables.

Suppose you have a table that holds student and course information for a school. You might have the following fields in your flat table, as shown in Table D.1.

Table D.1: The Student and Courses Table

Field Name

Description

StudentName

Name of the student

CourseID1

ID of the first course taken by the student

CourseDescription1

Description of the first course taken by the student

CourseInstructor1

Instructor of the first course taken by the student

CourseID2

ID of the second course taken by the student

CourseDescription2

Description of the second course taken by the student

CourseInstructor2

Instructor of the second course taken by the student

You might then repeat CourseID, CourseDescription, and CourseInstructor columns many more times to account for all the classes a student can take during their academic career. Although redundant, this is the method used when creating a single flat table to store information. Eliminating this redundancy is the first step in database normalization, so next you'll take this flat table to first normal form. If your table remained in its flat format, you could have a lot of unclaimed space and a lot of space being used unnecessarily—not an efficient table design!

Taking a Table to First Normal Form

The main rules for the first normal form are as follows:

  • Eliminate repeating information

  • Create separate tables for related data

Looking at the flat table design, with its many repeated sets of fields for students and courses, you can identify students and courses as its two distinct topics. Taking your student and courses flat table to the first normal form would mean that you create two tables: one for students (call it students) and one for students plus courses (call it students_courses). You can see the new table designs in Tables D.2 and D.3.

Table D.2: The students Table

Field Name

Description

StudentID

A unique ID for the student. This new field is now a primary key.

StudentName

Name of the student.

Table D.3: The students_courses Table

Field Name

Description

StudentID

Unique ID of the student, matching an entry in the students table.

CourseID

ID of the course being taken by the student.

CourseDescription

Description of the course taken by the student.

CourseInstructor

Instructor of the course taken by the student.

Your two new tables now represent a one-to-many relationship of one student to many courses. Students can take as many courses as they want, and are not limited to the number of CourseID/CourseDescription/CourseInstructor groupings that exist in the flat table.

You still have some work to do, and the next step is to put these tables into second normal form.

Taking Tables to Second Normal Form

The basic rule for the second normal form is

  • No non-key attributes depend on a portion of the primary key

In plain English, this means that if fields in your table are not entirely related to a primary key, you have to keep working on them. In the students and courses example, this means breaking out the courses into their own table so that the original flat table is now just a table full of unique students.

CourseID, CourseDesc, and CourseInstructor can become a table called courses with a primary key of CourseID. The students_courses table should then just contain two fields: StudentID and CourseID. You can see the new table designs in Tables D.4 and D.5.

Table D.4: The courses Table

Field Name

Description

CourseID

Unique ID of the course

CourseDescription

Description of the course

CourseInstructor

Instructor of the course

Table D.5: The New students_courses Table

Field Name

Description

StudentID

Unique ID of the student, matching an entry in the students table.

CourseID

Unique ID of the course being taken, matching an entry in the courses table.

Believe it or not, you can go even further with this example, to the third normal form.

Taking Tables to Third Normal Form

The rule for the third normal form is

  • No attributes depend on other non-key attributes

This rule simply means that you need to look at your tables and determine whether more fields exist that can be broken down further and that aren't dependent on a key. Think about removing repeated data and you'll find your answer—instructors. Usually, an instructor will teach more than one class. However, the CourseInstructor field in the courses table is not a key of any sort. So if you break out this information and create a separate table purely for the sake of efficiency and maintenance, that's the third normal form. Take a look at the new courses table, and the instructors table, in Tables D.6 and D.7.

Table D.6: The courses Table

Field Name

Description

CourseID

Unique ID of a course

CourseDescription

Description of the course

CourseInstructorID

ID of the instructor, matching an entry in the instructors table

Table D.7: The instructors Table

Field Name

Description

InstructorID

Unique ID of an instructor

InstructorName

Name of the instructor

InstructorNotes

Any notes regarding the instructor

The third normal form is usually adequate for removing redundancy and allowing for flexibility and growth. Next, you normalize the my_contacts table, used previously in this book.

Normalizing the my_contacts Table

In the original my_contacts table, there's not a lot of repeating information, but there very easily could be if you expanded this to be an actual address book. In an address book, people usually have contact information for home and work, multiple phone methods (land line, cell phone, and so on), and even multiple e-mail addresses. It would make much more sense to break all of those elements out into separate tables and attach the information to people through a primary key. Table D.8 shows the original my_contacts table as reference.

Table D.8: The Original my_contacts Table

Field Name

Description

id

Creates a unique ID number for the entry

f_name

The person's first name

l_name

The person's last name

address1

First line of the address

address2

Second line of the address

address3

Third line of the address

postcode

ZIP or postal code

country

Country in which the person resides

prim_tel

Primary telephone number

sec_tel

Secondary telephone number

email

E-mail address

birthday

The person's birthday

Now identify the different areas for which different tables will exist: address, phone, and e-mail are adequate for this example. Tables D.9, D.10, and D.11 show the fields for these new tables.

Table D.9: Fields for the address Table

Field Name

Description

id

Creates a unique ID number for the address entry

contact_id

ID corresponding to a person in the master contact table

address1

First line of the address

address2

Second line of the address

address3

Third line of the address

postcode

ZIP or postal code

address_type

Type of address, such as home, work, or other

Table D.10: Fields for the phone Table

Field Name

Description

id

Creates a unique ID number for the phone entry

contact_id

ID corresponding to a person in the master contact table

phone_number

Phone number

phone_type

Type of phone number, such as home, work, cell, or fax

Table D.11: Fields for the email Table

Field Name

Description

id

Creates a unique ID number for the e-mail entry

contact_id

ID corresponding to a person in the master contact table

country

Country in which the person resides

email

E-mail address

email_type

Type of e-mail address, such as home or work

These new tables all contain the contact_id key, which corresponds to an entry in the new master contact table. The basic my_contacts table, used as the master contact table, should now look something like Table D.12.

Table D.12: The New my_contacts Table

Field Name

Description

id

Creates a unique ID number for the entry

f_name

Person's first name

l_name

Person's last name

birthday

Person's birthday

With these new tables in place, you will have a much more flexible (and normalized!) set of tables for maintaining contact information.


Team LiB
Previous Section Next Section