Database Design
116K - views

Database Design

Functional Dependences. Normal Forms. D. Christozov / G.Tuparov. INF 280 Database Systems: DB design: Normal Forms. 1. Objectives. Purpose of normalization.. Problems associated with redundant data..

Download Presentation

Database Design




Download Presentation - The PPT/PDF document "Database Design" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentation on theme: "Database Design"— Presentation transcript:

Slide1

Database Design

Functional DependencesNormal Forms

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

1

Slide2

Objectives

Purpose of normalization.Problems associated with redundant data.Identification of various types of update anomalies such as insertion, deletion, and modification anomalies.How to recognize appropriateness or quality of the design of relations.Use of functional dependencies to group attributes.The process of normalization.Normal forms: 1NF, 2NF, 3NFBoyce–Codd (BCNF) normal form.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

2

Slide3

Database Design

What is relational database design?The grouping of attributes to form "good" relation schemas.Two levels of relation schemasThe logical "user view" levelThe storage "base relation" level Design is concerned mainly with base relations. What are the criteria for "good" base relations? 

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

3

Slide4

Data Redundancy

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

4

Slide5

Database Design

Informal guidelines for good relational design.Formal concepts of functional dependencies and normal forms: 1NF (First Normal Form); 2NF (Second Normal Form); 3NF (Third Normal Form); BCNF (Boyce-Codd Normal Form).Normalization vs. de-normalization.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

5

Slide6

Database Design: Guideline 1

GUIDELINE 1: Informally, each tuple in a relation should represent one entity or relationship instance. (Applies to individual relations and their attributes).Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation.Only foreign keys should be used to refer to other entities.Entity and relationship attributes should be kept apart as much as possible.Bottom Line: Design a schema that can be explained easily relation by relation. The semantics of attributes should be easy to interpret.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

6

Slide7

Database Design: Guideline 1

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

7

Employee-Department

DMGRSSN

DName

DNum

Address

Bdate

SSN

Ename

PLocation

Pname

Ename

Hours

PNumber

SSN

Employee-Project

Slide8

Database Design: Anomalies (1)

Redundant Information in Tuples and Update AnomaliesMixing attributes of multiple entities may cause problems.Information is stored redundantly wasting storage.Problems with update anomalies:Insertion anomalies;Deletion anomalies;Modification anomalies.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

8

Slide9

Database Design: Anomalies (2)

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

9

Slide10

Database Design: Anomalies (3)

Consider the relation:EMP_PROJ ( SSN, PNumber, Hours, Ename, Pname, ..)Update Anomaly: Changing the name of project number P1 from “ProjectX” to “ProjectM” may cause this update to be made for all employees working on the project P1.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

10

PLocation

Pname

Ename

Hours

PNumber

SSN

EMP_PROJ

Slide11

Database Design: Anomalies (4)

Insert Anomaly: Cannot insert a project unless an employee is assigned to.Inversely - Cannot insert an employee unless an he/she is assigned to a project.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

11

PLocation

Pname

Ename

Hours

PNumber

SSN

EMP_PROJ

Slide12

Database Design: Anomalies (5)

Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

12

PLocation

Pname

Ename

Hours

PNumber

SSN

EMP_PROJ

Slide13

Database Design: Anomalies (6)

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

13

EMP_DEPT

DMGRSSN

DName

DNum

Address

Bdate

SSN

Ename

Slide14

Database Design: Guideline 2

GUIDELINE 2: Design a schema that does not suffer from the insertion, deletion and update anomalies. If there are any present, then note them so that applications can be made to take them into account.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

14

Slide15

Database Design: Guideline 3

GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible.Attributes that are NULL frequently could be placed in separate relations (with the primary key).Reasons for nulls:attribute not applicable or invalid;attribute value unknown (may exist);value known to exist, but unavailable.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

15

Slide16

Database Design: Guideline 4

Bad designs for a relational database may result in erroneous results for certain JOIN operations.The "lossless join" property is used to guarantee meaningful results for join operations.GUIDELINE 4: The relations should be designed to satisfy the lossless join condition. No spurious tuples should be generated by doing a natural-join of any relations.There are two important properties of decompositions: non-additive or losslessness of the corresponding join;preservation of the functional dependencies.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

16

Slide17

Database Design: Guideline 4

Lossless-join property enables us to find any instance of original relation from corresponding instances in the smaller relations. Must be achieved at any cost.Dependency preservation property enables us to enforce a constraint on original relation by enforcing some constraint on each of the smaller relations.See 15.4, 15.5 and 15.6 from textbook.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

17

Slide18

Functional Dependences (1)

Functional Dependency is a property of the meaning (or semantics) of the attributes in a relation:Describes relationship between attributes in a relation. If A and B are attributes of relation R, B is functionally dependent on A (denoted A  B), if each value of A in R is associated with exactly one value of B in R.Determinant of a functional dependency refers to attribute or group of attributes on left-hand side of the arrow.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

18

Slide19

Functional Dependences (2)

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

19

Slide20

Functional Dependences (3)

Examples of FD constraints:social security number determines employee name: SSN  ENAMEproject number determines project name and location: PNUMBER  {PNAME, PLOCATION}employee ssn and project number determines the hours per week that the employee works on the project: {SSN, PNUMBER}  HOURS

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

20

Slide21

Functional Dependences (4)

X  Y holds if whenever two tuples have the same value for X, they must have the same value for Y. For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y] X  Y in R specifies a constraint on all relation instances r(R). Written as X  Y; can be displayed graphically on a relation schema as in Figures ( denoted by an arrow). FDs are derived from the real-world constraints on the attributes . An FD is a property of the attributes in the schema R. The constraint must hold on every relation instance r(R). If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K]).

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

21

Slide22

The Process of Normalization

Formal technique for analyzing a relation based on its primary key and functional dependencies between its attributes.Often executed as a series of steps. Each step corresponds to a specific normal form, which has known properties.As normalization proceeds, relations become progressively more restricted (stronger) in format and also less vulnerable to update anomalies.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

22

Slide23

Relationship Between Normal Forms

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

23

Slide24

Normalization: 1NF

Definitions:A relation is in 1NF if all attribute values are atomic.Atomic: cannot be further broken down:NO multivalued attributes;NO nested attributes.Cure for Non-1NF Relations.Multivalued attribute is eliminated by projection:{DNUM, DNAME, MSSN, DLOC}  {DNUM, DNAME, MSSN}, {DNUM, DLOC}Nested attribute is eliminated by flattening:ENAME(FNAME MI LNAME) {FNAME, MI, LNAME} and ENAME is discarded.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

24

Slide25

Normalization: 2NF

Definitions:Prime attribute - attribute that is member of the primary key K.Non-prime attribute – attribute that is not a member of the PK.Full functional dependency - a FD Y  Z where removal of any attribute from Y means the FD does not hold any more.Examples:1) {SSN, PNUMBER}  HOURS is a full FD since neither SSN  HOURS nor PNUMBER  HOURS hold 2) {SSN, PNUMBER} ENAME is not a full FD (it is called a partial dependency ) since SSN  ENAME also holds

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

25

Slide26

Normalization: 2NF

A relation R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

26

PLocation

PName

Ename

Hours

PNumber

SSN

Employee-Project

SSN

PNumber

Hours

SSN

EName

PNumber

PName

PLocation

fd1

fd2

fd3

Slide27

Normalization: 3NF

Definition: Transitive functional dependency - a FD X  Z that can be derived from two FDs X  Y and Y  Z Examples:SSN  DMGRSSN is a transitive FD since SSN  DNUMBER and DNUMBER  DMGRSSN hold SSN  ENAME is non-transitive since there is no set of attributes X where SSN  X and X  ENAME

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

27

Slide28

Normalization: 3NF

A relation R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key.NOTE: In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency. E.g., Consider EMP (SSN, Emp#, Salary ). Here, SSN -> Emp# -> Salary and Emp# is a candidate key.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

28

Slide29

Normalization: 3NF

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

29

Employee-Department

DMGRSSN

DName

DNum

Address

Bdate

SSN

Ename

DNum

Address

BDate

SSN

Ename

DMGRSSN

DName

DNum

SSN

Dnum

Dname

/DMGRSSN

Slide30

Normalization: 3NF

Converting from 2NF to 3NF:Identify the primary key in the 2NF relation.Identify functional dependencies in the relation.If transitive dependencies exist on the primary key remove them by placing them in a new relation along with a copy of their dominant.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

30

Slide31

General Definitions of 2NF and 3NF

Second normal form (2NF)A relation R that is in 1NF is in 2NF if every non-primary-key attribute A is not partially functionally dependent on any candidate key in R.Third normal form (3NF)A relation R that is in 2NF is in 3NF if no non-primary-key attribute A is transitively dependent on any candidate key in R.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

31

Slide32

Normalization: BCNF

A relation R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X  A holds in R, then X is a superkey of R.Notes:Each normal form is strictly stronger than the previous one:Every 2NF relation is in 1NF;Every 3NF relation is in 2NF;Every BCNF relation is in 3NF.There exist relations that are in 3NF but not in BCNF.The goal is to have each relation in BCNF (or 3NF).

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

32

Slide33

BCNF

A relation R is in BCNF if a functional dependency XA holds in R, then X is a superkey of R.Based on functional dependency that takes into account all candidate keys in a relation.For a relation with only one candidate key, 3NF and BCNF are equivalent.A relation is in BCNF, if and only if every determinant is a candidate key.Violation of BCNF may occur in a relation that:Contains 2 (or more) composite keys;Which candidate keys overlap and share at least 1 attribute.

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

33

Slide34

Example 1

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

34

Atributes:DescriptionEqpt_numberOrder number of a piece of the equipment from given type.LocationPlace, where the piece of the equipment operatesOprtr_ID_1Identifier of the first shift operator (person operates on that equipment)Oprtr_Name_1Name of the first shift operatorOprtr_Phone_1Phone number of the first shift operatorOprtr_ID_2Identifier of the second shift operator (person operates on that equipment)Oprtr_Name_2Name of the second shift operatorOprtr_Phone_2Phone number of the second shift operatorOprtr_ID_3Identifier of the third shift operator (person operates on that equipment)Oprtr_Name_3Name of the third shift operatorOprtr_Phone_3Phone number of the third shift operatorEqpt_TypeType of the equipmentProducerProducer of this type of equipmentInst_DateDate of installation of the piece of the equipmentCons_PowerConsumed electricity of this type of the equipmentMaint_TimeRequired period of maintenance for this type of equipmentMaint_Lst_DateDate of the last maintenanceMaint_IDIdentifier of the person responsible for the last maintenanceMaint_NameName of the person responsible for the last maintenanceMaint_PhonePhone of the person responsible for the last maintenanceMaint_Act_1Description of the first activity in last maintenanceMaint_Act_2Description of the second activity in last maintenanceMaint_Act_3Description of the third activity in last maintenance

“Equipment” holds data about the usage and maintenance activities of equipment in an enterprise

Notes:

Eqpt_Number

,

Eqpt_Type

,

Location

, and

Maint_Lst_Date

composed the primary key - identify any piece of equipment and last maintenance activities.

Producer,

Cons_Power

,

Maint_Time

are specific for an equipment type, identified by

Eqpt_Type

.

Operators (from any shift) operate on a particular piece of equipment and are identified by

Oprts_ID

. The relation keeps track to their

names

and

phones

.

For any piece of equipment, maintenance is done periodically or when fails.

Maintenance activities are predefined list for a given Type of Equipment. For maintenance of a particular piece of equipment up to three activities are listed.

Relation holds information about the person, who performed the maintenance activities, identified by

Maint_ID

, their

names

and

phones

.

Slide35

Example 2

Chains’ supplier

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

35

Attribute

Description

Code

Identifier of the product

Name

Supplier’s name of the product

Price

Supplier’s regular price

Dealer

Identifier of the Dealer

DName

Dealer’s name

DCode

Dealer’s identifier of the product

Dprice

Dealer’s price of the product

DPhone

Dealer’s phone

DPName

Dealer’s name of the product

DQuantity

Quantity of the product ordered by the Dealer

Date

Date of the order

Slide36

Example 3

Home delivery Pizzas

D. Christozov / G.Tuparov

INF 280 Database Systems: DB design: Normal Forms

36

Attribute

Description

Order_id

Identifier of the ordered (automatic)

Pizza_id

Identifier of the products

Quantity

Quantity of the product, purchased by the customer

Ingredient_id

Identifier of additional ingredient ordered (one or more)

Description

Description of the ingredient

Unit

Measurement unit used for ingredient

Amount

Quantity per unit: quantity of ingredient used in a product unit

I_price

Price of the ingredients

Name

Name of the customer

Address

Address of the customer

Price

Price of the product

Slide37

Q & A

Attention ! Next classQuiz 3: Normal Forms

D. Christozov / G.Tuparov

INF 280 Database Systems: Relational Model

37