Abstract
Functional dependencies and Normalization play an important role in relational database design. Functional dependencies are key to establishing the relationship between key and non-key attributes in a relation. Normalization process works towards removing anomalies in a relation and prevents data redundancy. This paper, intended to be a graduate research paper, establishes the definitions of these concepts. The paper introduces Functional Dependency and explains its inference rules. The paper also introduces Normalization and various normal forms 1NF thru 5NF including the BCNF. The paper also explains how Functional Dependencies and Normalization are related, why they are important with regards to relational databases, and advantages of designing a normalized database.
Relational Databases: Functional Dependency and Normalization
Definitions and Concepts
Functional Dependency
A functional dependency is a constraint between two sets of attributes from the database. A functional dependency, represented by X → Y, between two sets of attributes X and Y that are subsets of a relation R specifies a constraint that, for any two tuples t1 and t2 in R that have t1[X] = t2[X],they must also have t1[Y] = t2[Y].
This means the values of the set of attributes Y of a tuple in R are determined by the set of attributes X. In other words, the values of the set of attributes X functionally determine the values of the set of attributes Y. We can also say that Y is functionally dependent on Y.
The set of attributes X on the left-hand side of the functional dependency is called determinant and the set of attributes Y on the right-hand side of the functional dependency is called dependent. Despite the mathematical definition a functional dependency cannot be determined automatically. It is a property of the semantics of attributes – the database designers will have to understand how the attributes are related to each other to specify a functional dependency. (Elmasri, Ramez and Shamkant B. Navathe. 2006)
Example
Consider the example of an SSN (Social Security Number) database. Every individual has a unique SSN. So, the other attributes of the relation, like name, address etc. can be determined using the SSN. That makes SSN a determinant, and name, address, the dependents – thus establishing the functional dependencies:
SSN name
SSN address
Inference Rules for Functional Dependency
Armstrong’s axioms are a set of inference rules used to infer all the functional dependencies on a relational database. They were developed by William W. Armstrong.
Axiom of reflexivity: if Y is a subset of X, then X determines Y
If Y is a subset of X then X Y
Axiom of augmentation: if X determines Y, then XZ determines YZ for any Z
If X Y, then XZ YZ
Axiom of transitivity: if X determines Y and Y determines Z, then X must determine Z
If X Y and Y Z, then X Z
Union: if X determines Y and X determines Z then X must also determine Y and Z
If X Y and X Z, then X YZ
Decomposition: if X determines Y and Z, then X determines Y and X determines Z separately
If X YZ, then X Y and X Z
Normalization
Database Normalization is a process that allows the storage of data without unnecessary redundancy and thereby eliminate data inconsistency. A normalized database eliminates anomalies in updating, inserting, and deleting data, which improves the efficiency and effectiveness of the database. Users can maintain and retrieve data from a normalized database without difficulty. Data Normalization can be used by the designer of a database to identify and group together related data elements (attributes) and establish relationships between the groups.
Database Normalization concept and its ‘Normal Forms’ were originally invented by Edgar Codd, the inventor of the relational model. The ‘Normal Forms’ provide the criteria for determining a table’s degree of vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less vulnerable it is.
First Normal Form (1NF)
An entity type or table is in 1NF when each of its attributes contain simple values which are atomic and contains no repeating groups of data. The domain of an attribute in an 1NF table must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.
Example
Consider the address attribute in a sales database. It is not an atomic attribute, because it is made up of atomic attributes as street, city, state and zip. For the relation to be in 1NF, the appropriate database design should have the atomic attributes street, city, state and zip instead of an address attribute.
Un-Normalized: sales (date, order_no, product_no, product_description, price, quantity_sold, cust_name, cust_address)
1NF: sales (date, order_no, product_no, product_description, price, quantity_sold, cust_name, cust_street, cust_city, cust_state, cust_zip)
Second Normal Form (2NF)
An entity type or table is in 2NF when it is in 1NF and all its non-key attributes depend on the whole key (i.e., functional dependency). There cannot be partial dependencies.
Example
Continuing with the sales database, the order_no and the product_no form the composite key for the table. There are partial dependencies – date is dependent on order_no, but not product_no – which violates the requirement for 2NF. The product_description is dependent on product_no and not on order_no. Removing these partial dependencies will result in 2NF.
1NF: sales (date, order_no, product_no, product_description, price, quantity_sold, customer_name, customer_street, customer_city, customer_state, customer_zip)
2NF: order (date, order_no, cust_no);
product (product_no, product_description, price);
order_detail (order_no, product_no, quantity_sold);
customer (cust_no, cust_name, cust_street, cust_city, cust_state, cust_zip)
Third Normal Form (3NF)
An entity type or table is in 3NF when it is in 2NF and non-key attributes do not depend on other non-key attributes (i.e., there is no transitive dependency).
Example
Continuing with the sales database, the non-key attributes cust_city and cust_state are dependent on cust_zip which is a non-key attribute. Creating a separate zip table will transform the design into 3NF, where in there are no more dependencies between non-key attributes.
2NF: order (date, order_no, cust_no);
product (product_no, product_description, price);
order_detail (order_no, product_no, quantity_sold);
customer (cust_no, cust_name, cust_street, cust_city, cust_state, cust_zip)
3NF: order (date, order_no, cust_no);
product (product_no, product_description, price);
order_detail (order_no, product_no, quantity_sold);
customer (cust_no, cust_name, cust_street, zip_code);
zip (zip_code, city, state)
Boyce Codd Normal Form (BCNF)
An entity type or table is in BCNF when it is in 3NF and all candidate keys defined for the relation satisfy the test for third normal form.
Example
Continuing with the sales database, all the candidate keys already satisfy the 3NF requirements.
Fourth Normal Form (4NF)
An entity type or table is in 4NF when it is in BCNF and there are no non-trivial multi-valued dependencies. To move from BCNF to 4NF, remove any independently multi-valued components of the primary key to two new parent entities.
Example
For example, a professor can teach multiple subjects and can also mentor multiple students. To be in 4NF, the professor to subjects should be a separate relation and professor to students should be a separate relation – since they are independent of each other
Fifth Normal Form (5NF)
To be in 5NF, a relation decomposed into two relations must have lossless-join property, which ensures that no spurious tuples are generated when relations are reunited through a natural join.
Example
In the sales database example, when the sales database was split into order and product, the natural join of those two tables does not result in loss of data (tuples).
(Russell, Gordon. Chapter 4; Nguyen Kim Anh, Relational Design Theory)
Importance of Functional Dependency and Normalization to Relational Model
How are they related
Normalization theory draws heavily on the theory of functional dependencies. When a database designer sets out to design a database, it is essential to understand the semantics of the data – how the attributes are related to one another. This helps in establishing the functional dependencies between attributes. Once the functional dependencies are identified, the design the database in to a ‘normal form’ of the highest order possible is easier. Rules for each normal form, starting from the 1NF are invariably framed around maintaining the functional dependencies and are also based on the inference rules for functional dependencies (refer Inference Rules section). For example, to be in 2NF the non-key attributes should be dependent on the whole-key, which means the functional dependencies should be satisfied. Similarly, to be in 3NF, transitive dependency should be removed, which can be done if the functional dependencies are established correctly.
In other words, database normalization process ensures an efficient organization of data in database tables, which results in guaranteeing that data dependencies make sense, and also reducing the space occupied by the database via eliminating redundant data.
Why are they necessary for Relational Database model?
Functional dependencies play an important role in relational database design. They are used to establish keys that are used to define normal forms for relations. In addition, they help in deriving constraints based on the relationships between attributes. As a database grows in size and complexity it is essential that order and organization be maintained to control these complexities and minimize errors and redundancy in the associated data. This goal is managed by normalization. Database normalization minimizes data duplication to safeguard databases against logical and structural problems, such as data anomalies.
Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Normalization can help keep the data free of errors and can also help ensure that the size of the database doesn’t grow large with duplicated data. Normalization permits us to design our relational database tables so that they “(1) contain all the data necessary for the purposes that the database is to serve, (2) have as little redundancy as possible, (3) accommodate multiple values for types of data that require them, (4) permit efficient updates of the data in the database, and (5) avoid the danger of losing data unknowingly (Wyllys, R. E., 2002).”
The resulting normalized database is highly efficient, which can be characterized by –
- Increased Consistency: Information is stored in one place and one place only, reducing the possibility of inconsistent data.
- Easier object-to-data mapping: Highly-normalized data schemas in general are closer conceptually to object-oriented schemas because the object-oriented goals of promoting high cohesion and loose coupling between classes results in similar solutions.
Moreover, a normalized database is advantageous when operations will be write-intensive or when ACID (Atomicity, Consistency, Isolation, Durability) compliance is required. Some advantages include:
- Updates run quickly since no data being duplicated in multiple locations.
- Inserts run quickly since there is only a single insertion point for a piece of data and no duplication is required.
- Tables are typically smaller than the tables found in non-normalized databases. This usually allows the tables to fit into the buffer, thus offering faster performance.
- Data integrity and consistency is an absolute must if the database must be ACID compliant. A normalized database helps immensely with such an undertaking.
- Searching, sorting, and creating indexes can be faster, since tables are narrower, and more rows fit on a data page.
- Minimizes/avoids data modification issues.
(https://en.wikipedia.org/wiki/ACID_(computer_science))
Summary
The paper defined the concept of functional dependency, which is the basic tool for analyzing relational schemas, and discussed some of its properties. Functional dependencies specify semantic constraints among the attributes of a relation schema. Next it described the normalization process for achieving good designs It presented examples to illustrate how by using the general definition of the normal forms, a given relation may be analyzed and decomposed to eventually yield a set of relations in 3NF. The paper also touches not often used BCNF, 4NF and 5NF normal forms.
Then the paper explains how functional dependencies and normalization are inter-related in the design of a relational model database. It explains the importance of functional dependency and normalization in the design of a relational database. A normalized database is highly efficient and has many advantages.
References
- Wyllys, R. E., 2002. Database management principles and applications
- Elmasri, Ramez and Shamkant B. Navathe. 2006. Fundamentals of Database Systems. 5th ed. Reading, MA: Addison-Wesley
- Russell, Gordon. Chapter 4 – Normalization. Database eLearning
- Nguyen Kim Anh, Relational Design Theory. OpenStax CNX
- Gaikwad, A.S., Kadri, F.A., Khandagle, S.S., Tava, N.I. (2017) Review on Automation Tool for ERD Normalization. International Research Journal of Engineering and Technology (IRJET) [Online]. 4 (2), pp. 1323-1325. [Accessed 07 May 2017]. Available from: https://www.irjet.net/archives/V4/i2/IRJET-V4I2259.pdf
- https://en.wikipedia.org/wiki/ACID_(computer_science)
Tables
Un-normalized Table: sales
sales
date |
order_no |
cust_no |
product_description |
price |
quantity_sold |
cust_name |
cust_address |
12/12/2018 |
1001 |
A320 |
MP3 |
10.00 |
8 |
Tom |
1 Main St, Hartford, CT 06106 |
12/12/2015 |
1001 |
B101 |
Ipod |
100.00 |
4 |
Tom |
1 Main St, Hartford, CT 06106 |
01/05/2019 |
1002 |
C101 |
Blu Ray |
80.00 |
3 |
Aaron |
1 Holy Lane, Manchester, 06040 |
1NF Table: sales
sales
date |
order_no |
product_no |
product_description |
price |
quantity_sold |
cust_name |
…. |
12/12/2018 |
1001 |
A320 |
MP3 |
10.00 |
8 |
Tom |
|
12/12/2015 |
1001 |
B101 |
Ipod |
100.00 |
4 |
Tom |
|
01/05/2019 |
1002 |
C101 |
Blu Ray |
80.00 |
3 |
Aaron |
Continued..
sales
…. |
cust_street |
cust_city |
cust_state |
cust_zip |
1 Main St |
Hartford |
CT |
06106 |
|
1 Main St |
Hartford |
CT |
06106 |
|
1 Holy Lane |
Manchester |
CT |
06040 |
2NF Table: order
order
date |
order_no |
cust_no |
12/12/2018 |
1001 |
101 |
01/05/2019 |
1002 |
102 |
2NF Table: product
product
product_no |
product_description |
price |
A320 |
MP3 |
10.00 |
B101 |
Ipod |
100.00 |
C101 |
Blu Ray |
80.00 |
2NF Table: order_detail
order_detail
order_no |
product_no |
quantity_sold |
1001 |
A320 |
8 |
1001 |
B101 |
4 |
1002 |
C101 |
3 |
2NF Table: customer
customer
cust_no |
cust_name |
cust_street |
cust_city |
cust_state |
cust_zip |
101 |
Tom |
1 Main Street |
Hartford |
CT |
06106 |
102 |
Aaron |
1 Holy Lane |
Manchester |
CT |
06040 |
3NF Table: customer
customer
cust_no |
cust_name |
cust_street |
zip_code |
101 |
Tom |
1 Main Street |
06106 |
102 |
Aaron |
1 Holy Lane |
06040 |
3NF Table: zip
zip
zip_code |
city |
State |
06106 |
Hartford |
CT |
06040 |
Manchester |
CT |
Cite This Work
To export a reference to this article please select a referencing style below: