/
Lecture Lecture

Lecture - PowerPoint Presentation

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
372 views
Uploaded On 2016-03-27

Lecture - PPT Presentation

10 Database Theory amp Practice 4 Data Normalization UFCEUS202 Web Programming Normalization 1 What is Normalization Informally Normalization can be thought of as a process defined within the theory of relational database to break up larger relations into many ID: 270511

key normalization normal relation normalization key relation normal dependent cleaner form roster attribute primary attributes functionally type bus cno

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Lecture" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Lecture 10Database Theory & Practice (4) :Data Normalization

UFCEUS-20-2 : Web

ProgrammingSlide2

Normalization (1)What is Normalization? Informally, Normalization

can be thought of as a process defined within the theory of relational database to break up larger relations into many small ones using a set of rules.

Normalization

resolve problems with data anomalies and redundancy. It is essentially a two-step process to:

1. put the data into tabular form (by removing repeating groups);

and

2. to remove duplicated records to separate tables.

As we work through the

Normalization

process, we will make use of data that relates to the Bus Depots’ Database – a description and E-R model of which was handed out in last weeks session and is also available from the resource area. Slide3

Normalization (2)Un-normalized data (1) Well-normalised databases have a design that reflects the true dependencies between entities, allowing the data to be updated quickly with little risk of introducing inconsistencies. Before discussing how to design a well-

normalised

database using

Codd's

Normalization techniques, we first consider a poor database design. Consider for example a relation 'bus' which includes bus registration number, model, type number, type description, depot name (note that names have changed slightly from the study for the purposes of this example):

registration no

model

type number

type description

depot

Al 23ABC

Routemaster

1

doubledecker

Holloway

D678FGH

Volvo 8700

2

metrobus

Holloway

H2591JK

Daf SB220

3

midibus

Hornsey

P200IJK

Mercedes 709D

2

metrobus

Hornsey

P300RTY

Mercedes Citaro

4

bendy-bus

Hornsey

R678FDS

Daf SB220

1

doubledecker

W653TJH

Routemaster

1

doubledeckerSlide4

Normalization (3)Un-normalised data (2) There are several problems with the previous relation:

Redundancy

-

the 'type description' is repeated for each 'type number' in the relation. The 'model' is also repeated for a particular 'type description', for example a

Routemaster

is always a doubledecker busUpdate anomalies

-

as a consequence of the redundancy, we could update the 'type description' in one tuple, while leaving it fixed in

another

Deletion anomalies

-

if we should delete all the buses of a particular type, we might lose all the information about that

type

Insertion anomalies

-

the inverse to deletion anomalies is we cannot record a new type in our table unless there exists a bus of that type - for example if there is the type 'open top' we cannot store this in our database. To get around this we might put null values in the type number and description components of a tuple for that bus, but when we enter an item for that supplier, will we remember to delete the tuple with nulls?Slide5

Normalization (4)Functional dependencies (1) Determinants A formal definition for the term

functional dependence

is:

Given

a relation which has attributes (x, y, ...), we say that an attribute y is functionally dependent on another attribute x, if (and only if) each x value has associated with it precisely one y value (at any one time).

For

example, examine the following relation:

Cleaner no.

(

cno

)

Cleaner name

(cname)

Cleaner salary

(csalary)

Depot no.

(dno)

110

John

2550

101

111

Jean

2500

101

112

Betty

2400

102

113

Vince

2800

102

114

Jay

3000

102Slide6

Normalization (5) In the previous diagram, attributes cname, csalary and dno are each functionally dependent on attribute

cno

- given a particular

cno

value, there exists precisely one corresponding value for each of the cname

,

csalary

and

dno

.

In general then, the same x-values may appear in many different tuples of the relation; if y is functionally dependent on x, then every one of these tuples must contain the same

value

. Going back to the cleaner example, we can represent these functional dependencies diagrammatically as:Slide7

Normalization (6) The previous figure is an example of a determinacy diagram. The arrow line can be read as 'depends on' (reading from left to right). So we say, for example, 'cno depends on

cname

'. We can also 'read' the diagram from right to left. This time the arrowed line is read as

'

functionally dependent on

'. So we say, for example 'cname is functionally dependent on cno

'.

The attribute or group of attributes on the left-hand side are called the

determinant

.

The determinant of a value is not necessarily the primary key. In the example,

cno

is a determinant of

cname because knowing the cleaner's number we can determine the cleaner's name.

Recognizing the functional dependencies is an essential part of understanding the meaning or semantics of the data. The fact that cname,

csalary

and

dno

are functionally

dependent on

cno

means that each cleaner has one name, has one salary and works at precisely one depot.Slide8

Normalization (7)Functional dependencies (2) Composite attributes The notion of functional dependence can be extended to cover the case where the determinant (particularly the primary key) is composite, i.e. it consists of more that one attribute.

Full functional dependence

An attribute y is defined to be fully

functionally

dependent on attribute x if it is functionally dependent on x and not functionally dependent on any subset of the attributes of x where it is a composite attribute.

Partial dependencies

The opposite of full functional dependence is partial dependence. Where we have data values that depend on only a part of the primary key, then we have a partial dependency.

Transitive dependencies

This occurs when the value of an attribute is not determined directly from the primary key, but through the value of another attribute and this attribute in turn is determined by the primary key.Slide9

Normalization (8)The normal forms A number of normal forms have been proposed, but the first five normal forms have been widely accepted. The normal forms progress from first normal form, to second, and so on. Data in second normal form implies that it is also in first normal form - i.e. each level of Normalization implies that the previous level has been met.

Other normal forms such as Boyce-

Codd

(BCNF) which is an extension of

3NF

.Slide10

Normalization (9)

Correspondence between the normal forms:Slide11

Normalization (10)Normal form example Consider the following example forms that record information about cleaners at the Middlesex Depot and the buses they look after. Note that three extra attributes, roster number, roster date and job complete have been added to the original model. The cleaner ticks against the appropriate job after he/she has completed the cleaning of a particular bus.Slide12

Normalization (11)The un-normalised relation:Slide13

Normalization (12)First normal form (1 NF) The next step in the Normalization process is to remove the repeating groups from the un­normalised relation. A relation is in 1 NF if - and only if - all domains contain only atomic or single values, i.e. all repeating groups of data are removed.

A repeating group is a group of attributes that occurs a number of times for each record in the relation. So for example, in the Roster relation, each roster record has a group of buses (roster record 104 has 6 buses).

Selecting

a suitable key for the table

In order to convert an un-

normalised

relation into first normal form, we must identify the key attribute(s) involved. From the un-

normalised

relation we can see that each roster has a

roster_no

, each cleaner a

cno, each depot a

dno, each bus a reg_no and each type a tno. In order to convert an un-normalised

relation into normal form, we also have to identify a key for the whole relation. Bearing this definition in mind, on examination the primary key of the relation is

roster-no

,

reg_no

.

We now draw the determinacy diagram for the roster relation, showing the attributes which are dependent on the primary key:Slide14

Normalization (13)

Determinacy diagram for the first normal form:Slide15

Normalization (14)Roster relation in first normal form:Slide16

Normalization (15)The problems with 1 NF are:Redundancy - e.g. roster date, cleaner name etc. repeatedInsertion anomaly

- a cleaner cannot be inserted into the database unless he/she has a bus to clean

Deletion anomaly

- deleting a tuple might lose information from the database. For example, if a cleaner cleaning a particular bus leaves the company, then we lose information for the buses he cleaned

Update anomaly

- e.g. a change to the cleaner name means it must change in all tuples which include that cleaner name.Slide17

Normalization (16)Second normal form (2NF) We now describe the second step in the Normalization process using the relation above which is in first normal form. Firstly we determine the functional dependencies on the identifying attributes (i.e. the primary key (

roster_no

,

reg_no

) and its parts.

If the key is composite, the other attributes must be functionally dependent on the

whole of the key

. In other words we are looking for partial functional dependencies. In the example, roster date is functionally dependent on the partial key

roster_no

- there is only one

roster_date

for a particular

roster_no

. Also cno,

cname, dno, dname

etc

are all functionally dependent on the partial

key

reg_no

. The attribute '

status

', however, is the only attribute fully functionally dependent on the whole of the primary key.Slide18

Normalization (17)

Determinacy diagram for the second normal form:Slide19

Normalization (18)

Roster in first second normal form:Slide20

Normalization (19) 2NF has less redundancy than 1NF as we have removed repeating groups. However there are still a number of problems:

Redundancy

- for example, in the Bus relation, cleaner name is repeated for each cleaner number

Insertion anomaly

- a cleaner cannot be inserted into the database unless he/she is responsible for at least one busDeletion anomaly - deleting a tuple might lose information from the database. For example, if we delete a cleaner who is only responsible for that one bus, then we lose information about the cleaner

Update anomaly

- e.g. a change to the cleaner name means changes must be made in all tuples which include that cleaner name.Slide21

Normalization (20)Third normal form (3NF) A 3NF relation is in 2NF but also it must satisfy the non-transitive dependency rule, which states that every non-key attribute must be non-transitively dependent on the primary key. Another way of saying this is that a relation is in 3NF if all its non-key attributes are directly dependent on the primary key. Transitive dependencies are resolved by creating new relations for each entity.

There are three transitive dependencies in the Bus relation above as is illustrated by vertical lines in the 2NF determinacy diagram. For example:

cno

is functionally dependent on

reg_no

; cname is functionally dependent on

reg_no

. Additionally,

cname

is functionally dependent

cno

.

We therefore have the transitive dependency: reg_no determines

cno

and

cno

determines

cname

then

reg_no

determines

cname

Two other transitive dependencies are identified involving

tname

and

dname

. The determinacy diagrams for third normal form are given on the next slide:Slide22

Normalization (21)

Determinacy diagram for the third normal form:Slide23

Normalization (22)

Roster in

third

normal form:Slide24

Normalization (24)

Steps of the

Normalization

process (1) :Slide25

Normalization (24)

Normal

form

What is it?

What does

process do?

How is it achieved?

1 NF

Relation in 1 NF if

- it contains scalar (atomic)

values only

- Removes

repeating groups

- Make a separate relation

for each group of related

attributes

- Give each new relation a

primary key

2 NF

Relation in 2NF if

- in I NF

- all non-key attributes are

dependent on the whole

of the primary key and

not part of it

- Removes

redundant data

- If an attribute depends on

only part of a multi­

value key, remove it to a

separate table

3 NF

Relation in 3NF if

- in 2NF

- non-key attributes are

dependent on primary

key and independent of

each other

- i.e. non-key attribute

must be non-transitively

dependent on the primary

key

- a non-key attribute is

changed, that change

should not affect the

others

- Removes

attributes not

dependent on

the key thereby

further reducing

redundancy

- Make a separate relation

for attributes transitively

dependent on the

primary key

- Give each new relation a

primary key

- Original relation will

include a foreign key to

link to new relationSlide26

BibliographyBibliographyAn Introduction to Database Systems (8th ed.), C J Date, Addison Wesley 2004Database Management Systems, P Ward & G Defoulas

, Thomson

2006

Database Systems Concepts (4

th

ed.), A Silberschatz, H F Korth & S Sudarshan, McGraw-Hill 2002