Building Your Data Tables

Once you have designed the data according to the rules of normalization, you are ready to actually build sample data tables in SQL. It pays to build your tables carefully to avoid problems. I prefer to build all my tables in an SQL script so I can easily rebuild my database if (okay, when) my programs mess up the data structure. Besides, enemy agents are always lurking about preparing to sabotage my operations.

I also add plenty of sample data in the script. You don't want to work with actual data early on, because you are guaranteed to mess up somewhere during the process. However, it is a good idea to work with sample data that is a copied subset of the actual data. Your sample data should anticipate some of the anomalies that might occur in actual data (for example, what if a person doesn't have a middle name?). My entire script for the spy database is available on the CD-Rom as buildSpy.sql. All SQL code fragments shown in the rest of this chapter come from that file.

Setting Up the System

I began my SQL script with some comments that describe the database and a few design decisions I made when building the database.

######################################
# buildSpy.sql
# builds and populates all databases for spy examples
# uses mysql -should adapt easily to other rdbms
# by Andy Harris for PHP/MySQL for Abs. Beg
######################################

######################################
# conventions
######################################
# primary key = table name . ID
# primary key always first field
# all primary keys autonumbered
# all field names camel-cased
# only link tables use underscore
# foreign keys indicated although mySQL does not enforce
# every table used as foreign reference has a name field
######################################

######################################
#housekeeping
######################################

use chapter9;
DROP TABLE IF EXISTS badSpy;
DROP TABLE IF EXISTS agent;
DROP TABLE IF EXISTS operation;
DROP TABLE IF EXISTS specialty;
DROP TABLE IF EXISTS agent_specialty;
DROP TABLE IF EXISTS spyFirst;

Notice that I specified a series of conventions. These self-imposed rules will help make my database easier to manage when things get complicated. Some of the rules might not make sense yet (because I haven't identified what a foreign key is, for instance), but the important thing is I have clearly identified some rules that will help me later on.

The code then specifies the chapter9 database, and deletes all tables if they already existed. This behavior ensures I'll start with a fresh version of the data.

Creating the agent Table

The normalized agent table is actually quite simple. The actual table is shown in Table 9.3.

Table 9.3: THE AGENT TABLE
agentID	name	operationID
1	Bond	1
2	Falcon	1
3	Cardinal	2
4	Blackford	2

The only data remaining in the agent table is the agent's name and a numerical field for the operation. The operationID field is used as the glue that holds together the agent and operation tables.

I've added a few things to improve the SQL code for creating the agent table to ensure that it behaves well.

CREATE TABLE agent (
  agentID int(11) NOT NULL AUTO_INCREMENT,
  name varchar(50) default NULL,
  operationID int(11) default NULL,
  PRIMARY KEY (agentID),
  FOREIGN KEY (operationID) REFERENCES operation (operationID)
);

Recall that the first field in a table is usually called the primary key. Primary keys must be unique and each record must have one. I've also chosen to name each primary key according to a special convention. Primary key names will always begin with the table name and end with "ID." I added this convention because it will make things easier later on when I write programs to work with this data. The NOT NULL modifier ensures that all records of this table must have a primary key. The AUTO_INCREMENT identifier is a special tool that allows MySQL to pick a new value for this field if no value is specified. This will ensure that all entries are unique.

TRAP

Not all databases use the AUTO_INCREMENT feature in the same way. You might need to look up some other way to automatically generate key fields if you aren't using MySQL.

I've added an indicator at the end of the CREATE TABLE statement to indicate that agentID is the primary key of the agent table.

Creating a Reference to the operation Table

Take a careful look at the operationID field. This field contains an integer, which will be used to refer to a particular operation. I also added an indicator specifying operationID as a foreign key reference to the operation table. The operationID field in the agent table will contain a reference to the primary key of the operation table. This type of field is referred to as a foreign key.

TRICK

Some DBMS systems require you to specify primary and foreign keys. MySQL currently does not require this, but it's a good idea to do so anyway for two reasons. First, it's likely that future versions of MySQL will require these statements, because they are used to improve the reliability of a database. Second, it's very good to specify in the code when you wish a field to have a special purpose, even if the DBMS doesn't do anything with that information.

Inserting a Value into the agent Table

The INSERT statements for the agent table have one new trick made possible by the AUTO_INCREMENT designation of the primary key.

INSERT INTO agent VALUES(
  null, 'Bond', 1
);

The primary key is initialized with the value null. This might be surprising because primary keys are explicitly designed to never contain a null value. Since the agentID field is set to AUTO_INCREMENT, the null value is automatically replaced with an unused integer. This trick ensures that each primary key value will be unique.

Building the operation Table

The new operation table contains information referring to an operation. (See Table 9.4 for descriptions of some operation IDs.)

Table 9.4: THE OPERATION TABLE
operationID	name	description	location
1	Dancing Elephant	Infiltrate suspicious zoo	London
2	Enduring Angst	Make bad guys feel really guilty	Lower Volta
3	Furious Dandelion	Plant crabgrass in enemy lawns	East Java

Each operation gets its own record in the operation table. All the data corresponding to an operation is stored in the operation record. Each operation's data is stored only one time. This has a number of positive effects:

It's only necessary to enter operation data once per op, saving time on data entry.
Since there's no repeated data, you won't have data inconsistency problems (like the circus/zoo problem).
The new database will take less space, because there's no repeated data.
The operation is not necessarily tied to an agent, so you won't accidentally delete all references to an operation by deleting the only agent assigned to that mission (remember, this could happen with the original data design).
If you need to update operation data, you don't need to go through every agent to figure out which ones were assigned to that operation (again, you would have had to do this with the old database design).

The SQL used to create the operation table is much like that used for the agent table:

CREATE TABLE operation (
  operationID int(11) NOT NULL AUTO_INCREMENT,
  name varchar(50) default NULL,
  description varchar(50) default NULL,
  location varchar(50) default NULL,
  PRIMARY KEY ('OperationID')
);


INSERT INTO operation VALUES(
  null, 'Dancing Elephant',
  'Infiltrate suspicious zoo', 'London'
);

As you can see, the operation table conforms to the rules of normalization, and it also is much like the agent table. Notice that I'm being very careful about how I name things. SQL is (theoretically) case-insensitive, but I've found that this is not always true (especially in MySQL, where the Windows versions appear not concerned about case, but UNIX versions will treat operationID and OperationID as different field names). I specified that all field names will use 'camel-case' (just like you've been doing with your PHP variables). I have also named the key field according to my own formula (table name followed by "ID").

Using a Join to Connect Tables

The only downside to disconnecting the data tables is the necessity to somehow rejoin the data when needed. The user really doesn't care that the operation and the agent are in different tables, but he or she still will want the data to be visible as if they were on the same table. The secret to re-attaching tables is a tool called the inner join. Take a look at the following SELECT statement in SQL:

SELECT agent.name AS agent, operation.name AS operation
FROM agent, operation
WHERE agent.operationID = operation.operationID
ORDER BY agent.name;

At first glance this looks like an ordinary query, but it is a little different because it joins up data from two different tables. Table 9.5 illustrates the results of this query.

Table 9.5: COMBINING TWO TABLES
agent	operation
Blackford	Enduring Angst
Bond	Dancing Elephant
Cardinal	Enduring Angst
Falcon	Dancing Elephant
Rahab	Furious Dandelion

Creating Useful Joins

An SQL query can pull data from more than one table. To do this, there a couple of basic rules.

First, you might need to specify the field names more formally. Notice that the SELECT statement specifies agent.name rather than simply name. This is necessary because both tables contain a field called name. Using the table.field syntax is much like using a person's first and last name. It's not necessary if there's no chance of confusion, but in a larger environment the more complete naming scheme can avoid confusion.

Also, note the use of the AS clause. This provides an alias for the column and provides a nicer output.

The FROM clause up to now has only specified one table. In this example, it's necessary to specify that data will be coming from two different tables.

Examining a Join Without a WHERE Clause

The WHERE clause helps to clarify the relationship between the two tables. As an explanation, consider the following query:

SELECT
  agent.name AS 'agent',
  agent.operationID as 'agent opID',
  operation.operationID as 'op opID',
  operation.name AS 'operation'
FROM agent, operation
ORDER BY agent.name;

This query is much like the earlier query, except it includes the operationID field from each table and it omits the WHERE clause. You might be surprised by the results.

Adding a WHERE Clause to Make a Proper Join

Without a WHERE clause, all possible combinations are returned. The only records we're concerned with are those where the operationID fields in the agent table and in the operation table have the same value. The WHERE clause returns only these values joined by a common operation ID.

The secret to making this work is the operationID fields in the two tables. You've already learned that each table should have a primary key. The primary key field is used to uniquely identify each record in a database. In the agents table, agentID is the primary key. In operations, operationID is the primary key. (You might note my unimaginative but very useful naming convention here) I was able to take all data that refers to the operation out of the agent table by replacing those fields with a field that points to the primary key of the operations table. A field that references the primary key of another table is called a foreign key. Primary and foreign keys cement the relationships between tables. See Table 9.6.

Table 9.6: JOINING AGENT AND OPERATION WITHOUT A WHERE CLAUSE
agent	agent opID	op opID operation
Blackford	1	1 Dancing Elephant
Blackford	1	2 Enduring Angst
Blackford	1	3 Furious Dandelion
Bond	1	1 Dancing Elephant
Bond	1	2 Enduring Angst
Bond	1	3 Furious Dandelion
Cardinal	2	2 Enduring Angst
Cardinal	2	3 Furious Dandelion
Cardinal	2	1 Dancing Elephant
Falcon	1	1 Dancing Elephant
Falcon	1	2 Enduring Angst
Falcon	1	3 Furious Dandelion
Rahab	3	1 Dancing Elephant
Rahab	3	2 Enduring Angst
Rahab	3	3 Furious Dandelion

Adding a Condition to a Joined Query

Of course, you can still use the WHERE clause to limit which records are shown. Use the AND structure to build compound conditions. For example, this code:

SELECT
  agent.name AS 'agent',
  operation.name AS operation
FROM agent, operation
WHERE agent.operationID = operation.operationID
  AND agent.name LIKE 'B%';

will return the code name and operation name of every agent whose code name begins with "B."

You should know that the syntax I provided here is a convenient shortcut supported by most DBMS systems. The formal syntax of the inner join looks like this:

SELECT agent.name, operation.name
FROM
  agent INNER JOIN operation
  ON agent.OperationID = operation.OperationID
ORDER BY agent.name;

Many data programmers prefer to think of the join as part of the WHERE clause and use the WHERE syntax. A few SQL databases (notably many offerings from Microsoft) do not allow the WHERE syntax for inner joins, and require the INNER JOIN to be specified as part of the FROM clause. When you use this INNER JOIN syntax, the ON clause indicates how the tables will be joined.