CS571_Week3_Assignment3.1
Provided below are tables with a small amount of data. For each, identify the normal form the table currently is in, and then describe the procedure you would follow to bring the design to the next normal form. Then follow that procedure and present the data in the new design. (The videos for this lesson should be able to help with this a lot.) For example, if a table meets the criteria for first normal form but fails to meet the criteria for second normal form, state the table is in first normal form, and then describe the process you would follow to bring the design into second normal form. If a table is not even in first normal form just state “Not even 1NF”.
- We want to capture: the pizzas eaten by each person.
Name |
Pizzas |
Amy |
mushroom, pepperoni |
Ben |
cheese, pepperoni |
Cal |
supreme |
Dan |
cheese, mushroom, pepperoni, sausage, supreme |
Not in 1NF, due to a list of values in one column “pizzas”. Rules of 1NF: Each column should contain atomic values, a column should contain values that are the same type, each column should have a unique name, and the order in which data is saved doesn’t matter. The list of multi-values should be broken up into separate row for each value in the multi-list.
NAME |
PIZZA |
Amy |
mushroom |
Amy |
Pepperoni |
Ben |
Cheese |
Ben |
Pepperoni |
Cal |
Supreme |
Dan |
Cheese |
Dan |
Mushroom |
Dan |
Pepperoni |
Dan |
Sausage |
Dan |
Supreme |
- We want to capture: The project codes associated with a client code, as well as the client contact and the project name.
ClientCode |
ClientContact |
ProjectCode |
ProjectName |
BV |
Lambright, Teodoro |
LDU |
Aberrant Longitude |
BV |
Lambright, Teodoro |
MEK |
Pink Cloud |
HK |
Boyers |
EFS |
Intensive Indigo Moose |
JU |
Valley, Russell |
XDB |
Strong Omega |
JU |
Valley, Russell |
JPQ |
Lonesome Cloud |
Table is in 1NF, but not 2NF. There is redundancy in the client code and client contact. The client code functionally determines the client contact. Process to achieve consists of decomposition-splitting the single table into multiple tables.
Client Code |
Client Contact |
BV |
Lambright, Teodoro |
HK |
Boyers |
JU |
Valley, Russell |
Client Code |
ProjectCode |
ProjectName |
BV |
LDU |
Aberrant Longitude |
BV |
MEK |
Pink Cloud |
HK |
EFS |
Intensive Indigo Moose |
JU |
XDB |
Strong Omega |
JU |
JPQ |
Lonesome Cloud |
- We want to capture: Employees’ names as well as those employees’ dependents names.
Lname |
Fname |
Dependent1 |
Dependent2 |
Dependent3 |
Smith |
John |
Alice |
Elizabeth |
Michael |
Wong |
Franklin |
Alice |
Joy |
Theodore |
Wallace |
Jennifer |
Abner |
Not in 1NF.
- We want to capture: Employees’ names along with the department they belong to, as well as the location of the main office for each department.
EmployeeName |
DepartmentName |
DepartmentMainOffice |
Smith, John |
Research |
Sugarland |
Wong, Franklin |
Research |
Sugarland |
English, Joyce |
Research |
Sugarland |
Narayan, Ramesh |
Research |
Sugarland |
Borg, James |
Headquarters |
Houston |
Wallace, Jennifer |
Administration |
Stafford |
Jabbar, Ahmad |
Administration |
Stafford |
Zelaya, Alicia |
Administration |
Stafford |
Often when normalizing a table design you don’t yet have data in your table. For instance, in cases where you are designing a new data structure, not analyzing an existing table. So, the small amount of data provided in the above examples cannot always be counted on. However, sometimes you might have to determine whether a table is in 2NF or 3NF based solely on the knowledge of what attributes functionally determine other attributes. Presented here is a new type of notation that enables you to do that type of analysis without being able to see data.
A -> B, C
This means that column A is the key for the table and it functionally determines columns B and C. In other words, this means that for every unique value in column A, there can be only one row and thus only one distinct value for the data in columns B and C.
Determining if a column’s data functionally determines another column’s data can usually be accomplished by asking the question “Can X have only one Y or can it have more than one Y?” An example might be “Can a doctor have only one patient or more than one patient?” If the answer is only one, then doctor functionally determines patient, otherwise if a doctor can have more than one patient, the doctor does not functionally determine the patient. In other words, if doctor functionally determines patient, then if you know the doctor, you can know the one and only patient he/she has. If doctor does not functionally determine patient (which of course it doesn’t), then you can’t just talk about the doctor and his/her patient. You have to specify the patient. This means that the key would have to include both doctor and patient.
- Is the following true? Why or why not?
Doctor, Patient -> Prescription
- Is the following true? Why or why not?
Doctor, Patient, Prescription -> Refills Remaining
Sometimes a key that is made up of more than one attribute (like Doctor, Patient) can technically functionally determine another attribute (like the Patient’s birth date). But it is also true that you don’t need Doctor in order to functionally determine a Patient’s birth date. This situation is what 2NF is all about. Making sure that all functionally determined attributes need all of the key columns to functionally determine them. In cases where you don’t, you need to decompose the related data into two or more tables. One for the full key (and any attributes you need the full key to determine), and one for each attribute that is only determined by a part of they key (along with that part of the key). For example, if you had:
Doctor, Patient -> Last Appointment Date and Time, Patient Birth Date
You would want to create two tables. One for:
Doctor, Patient -> Last Appointment Date and Time
And one for:
Patient -> Patient Birth Date
The problem this solves is that if a patient is seeing more than one doctor (perhaps a general practitioner and a neurologist), you don’t have to duplicate the patient’s birth date for each doctor he/she visits. This is redundant, and additionally makes it possible for the “two” birth dates to differ in the database when clearly a person doesn’t have two birth dates.
- Should the following design be decomposed into two or more tables? Why or why not? And if decomposing, what columns belong in each resultant table?
Patient, Hospital Room Number -> Hospital Floor
- Should the following design be decomposed into two or more tables? Why or why not? And if decomposing, what columns belong in each resultant table?
Doctor, Patient -> Hospital Admit Date and Time, Doctor’s Pager #
Sometimes a key can technically functionally determine two attributes but does so transitively. An example would be:
Employee -> Employee’s Boss, Employee’s Boss’s Email Address
The problem is that a boss likely has several people that report to them. So, if everywhere we record an employee’s boss, we also record their boss’s email address, it’s possible that the same boss might appear to have different email addresses when looked up by one employee vs. another. Solving this problem is what 3NF is all about. Again, this is not just about reducing redundancy but also reducing the chance for inconsistent data. To solve this problem, we’d want to decompose the design into two tables:
Employee -> Employee’s Boss
Boss -> Boss’s Email Address
Aside:
This decomposition is sufficient to meet the requirements of 3NF. However, an even better design would involve realizing that a boss is also an employee and store all employees in one table and have the relationship between employee and boss recursively refer to other rows in the same table:
Employee -> Employee Email Address, Boss
The Boss attribute would refer to the Employee column of another row in that same table.
- Should the following design be decomposed into two or more tables? Why or why not? And if decomposing, what columns belong in each resultant table?
Doctor -> Department, Department Office Location
(Note: a department can have only one office location)
- Should the following design be decomposed into two or more tables? Why or why not? And if decomposing, what columns belong in each resultant table?
Doctor, Patient, Date and Time -> Diagnosis, Typical Prognosis
(By Typical Prognosis, assume of the diagnosis in general, not the particular doctor diagnosing the particular patient.)