/
NormalisationDatabase Systems Lectures 11-12Natasha Alechina NormalisationDatabase Systems Lectures 11-12Natasha Alechina

NormalisationDatabase Systems Lectures 11-12Natasha Alechina - PDF document

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
395 views
Uploaded On 2015-09-12

NormalisationDatabase Systems Lectures 11-12Natasha Alechina - PPT Presentation

1 In This Lecture ID: 127187

1 In This Lecture

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "NormalisationDatabase Systems Lectures 1..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 NormalisationDatabase Systems Lectures 11-12Natasha Alechina In This Lecture•Idea of normalisation•Functional dependencies•Normal forms •Redundancy is often caused by a functional •A functional dependency (FD) is a link between two sets of attributes in a relation•We can normalise a •{ID, modCode} {First, Last, modName}•{modCode} IDmodCodemodNameFirstLast 111G51PRGProgrammingJoeBloggs222G51DBSDatabasesAnneSmith FDsand Normalisation•We define a set of 'normal forms'•Each normal form has fewer FDsthan the last•Since FDsrepresent redundancy, each normal form has less redundancy than the last•Not all FDscause a Key attributes •We callan attribute key attribute 2 Partial FDsand 2NF•Partial FDs:•A FD, is a partial FD, if some attribute of can be removed and the FD still holds•Formally, there is some proper subset of , such that •Let us call attributes which are part of some candidate key, key attributes, and the rest non-key attributes.Second normal form:•A relation is in second normal form (2NF) if it is in 1NF and no non-key attribute is partially dependent on a candidate key•In other words, no where C is a strict subset of a candidate key and B is a non-key attribute. •1NF is not in 2NF•We have the FD{Module, Text} {Lecturer, Dept}•But also{Module} {Lecturer, Dept}•And so Lecturer and Dept are partially dependent on the primary keyModule Dept Lecturer TextM1 D1 L1 T1M1 D1 L1 T2M2 D1 L1 T1M2 D1 L1 T3M3 D1 L2 T4M4 D2 L3 T1M4 D2 L3 T5M5 D2 L4 T6 •Suppose we have a relation R with scheme S and the FD A B where B = { }•Let C = S –(A U B)•In other words:•A –attributes on the left hand side of the FD•B –attributes on the right hand side of the FD•C –all other attributes•It turns out that we can split R into two parts:•R1, with scheme C U A•R2, with scheme A U B•The original relation can be recovered as the natural join of R1 and R2: •R = R1 NATURAL JOIN R2 1NF to 2NF –ExampleModule Dept Lecturer TextM1 D1 L1 T1M1 D1 L1 T2M2 D1 L1 T1M2 D1 L1 T3M3 D1 L2 T4M4 D2 L3 T1M4 D2 L3 T5M5 D2 L4 T6 Module Dept LecturerM1 D1 L1M2 D1 L1M3 D1 L2M4 D2 L3M5 D2 L4 Module TextM1 T1M1 T2M2 T1M2 T3M3 T4M4 T1M4 T5M1 T6 A A, B where A is the `bad’dependency –violating 2NF Problems Resolved in 2NF•Problems in 1NF•INSERT –Can't add a module with no texts•UPDATE –To change lecturer for M1, we have to change two rows•DELETE –If we remove M3, we remove L2 as well•In 2NF the first two are resolved, but not the third oneModule Dept LecturerM1 D1 L1M2 D1 L1M3 D1 L2M4 D2 L3M5 D2 L4 Problems Remaining in 2NF•INSERT anomalies•Can't add lecturers who teach no modules•UPDATE anomalies•To change the department for L1 we must alter two rows•DELETE anomalies•If we delete M3 we delete L2 as wellModule Dept LecturerM1 D1 L1M2 D1 L1M3 D1 L2M4 D2 L3M5 D2 L4 3 Transitive FDsand 3NF•Transitive FDs:•A FD, is a transitive FD, if there is some set such that are non-trivial FDsnon-trivial means: is not a subset of •We have•Third normal form •A relation is in third normal form (3NF) if it is in 2NF and no non-key attribute is transitively dependent on a candidate key•Alternative (simpler) definition: a relation is in 3NF if in every non-trivial fdA either B is a key attribute or A is a •2NFa is not in 3NF•We have the FDs{Module} {Lecturer}{Dept}•So there is a transitive FD from the primary key {Module} to {Dept}Module Dept LecturerM1 D1 L1M2 D1 L1M3 D1 L2M4 D2 L3M5 D2 L4 2NF to 3NF –ExampleModule Dept LecturerM1 D1 L1M2 D1 L1M3 D1 L2M4 D2 L3M5 D2 L4 Lecturer DeptL1 D1L2 D1L3 D2L4 D2 Module LecturerM1 L1M2 L1M3 L2M4 L3M5 L4 Problems Resolved in 3NF•Problems in 2NF•INSERT –Can't add lecturers who teach no modules•UPDATE –To change the department for L1 we must alter two rows•DELETE –If we delete M3 we delete L2 as •In 3NF all of these are resolved (for this relation –but 3NF can still have anomalies!)Lecturer DeptL1 D1L2 D1L3 D2L4 D2 Module LecturerM1 L1M2 L1M3 L2M4 L3M5 L4 Normalisation so Far•First normal form•All data values are atomic•Second normal form•In 1NF plus no non-key attribute is partially dependent on a candidate key•Third normal form•In 2NF plus no non-key attribute depends transitively on a candidate key (or, no dependencies of non-key on non-superkey) The Stream Relation•Consider a relation, Stream, which stores information about times for various streams of courses •For example: labs for first years•Each course has several streams•Only one stream (of any course at all) takes place at any given time•Each student taking a course is assigned to a single stream for it 4 The Stream Relation StudentCourseTimeJohnDatabases12:00MaryDatabases12:00RichardDatabases15:00RichardProgramming10:00MaryProgramming10:00RebeccaProgramming13:00 Candidate keys: {Student, Course} and {Student, Time} FDsin the Stream Relation•Stream has the following non-trivial •{Student, Course} {Time}•{Time} {Course}•Since all attributes are key attributes, Stream is in 3NF Anomalies in Stream•INSERT anomalies•You can’t add an empty stream•UPDATE anomalies•Moving the 12:00 class to 9:00 means changing two rows•DELETE anomalies•Deleting Rebecca removes a stream StudentCourseTimeJohnDatabases12:00MaryDatabases12:00RichardDatabases15:00RichardProgramming10:00MaryProgramming10:00RebeccaProgramming13:00 Boyce-CoddNormal Form•A relation is in Boyce-Coddnormal form (BCNF) if for every FD A B either•B is contained in A (the FD is trivial), or•A contains a candidate key of the relation, •In other words: every determinant in a non-trivial dependency is a •The same as 3NF except in 3NF we only worry about non-key Bs•If there is only one candidate key then 3NF and BCNF are the same •Stream is not in BCNF as the FD {Time} {Course} is non-trivial and {Time} does not contain a candidate StudentCourseTimeJohnDatabases12:00MaryDatabases12:00RichardDatabases15:00RichardProgramming10:00MaryProgramming10:00RebeccaProgramming13:00 Conversion to BCNFStudent Time Course Time Stream has been put into BCNF but we have lost the FD{Student, Course} {Time}Student Course Time 5 Decomposition Properties•Lossless: Data should not be lost or created when splitting relations up•Dependency preservation: It is desirable that FDsare preserved when splitting relations up•Normalisation to 3NF is always lossless and dependency preserving•Normalisation to BCNF is lossless, but may not preserve all dependencies Higher Normal Forms•BCNF is as far as we can go with FDs•Higher normal forms are based on other sorts of dependency•Fourth normal form removes multi-valued dependencies•Fifth normal form removes join dependencies 1NF Relations 2NF Relations 3NF Relations BCNF Relations 4NF Relations 5NF Relations Denormalisation•Normalisation•Removes data •Solves INSERT, anomalies•This makes it easier to maintain the information in the database in a consistent state•However•It leads to more tables in the database•Often these need to be joined back together, which is expensive to do•So sometimes (not often) it is worth ‘denormalising’ Denormalisation•You mightwant to denormaliseif•Database speeds are unacceptable (not just a bit slow)•There are going to be very few INSERTs, UPDATEs, or DELETEs•There are going to be lots of SELECTsthat involve the joining of tables Street Postcode CityNot normalised since {Postcode} {City} Street Postcode City Postcode Lossless decomposition•To normalise a relation, we used projections•If R(A,B,C) satisfies Athen we can project it on A,B and A,C without losing information•Lossless decomposition: R = (R) is projection of R on AB and is natural join.•Reminder of projection: ABC (R) Relational algebra reminder: 1cc2yde3zaa4ubc5wcd 1cc3zaa C=D 6 Connection to SQLWHERE (some property holds)translates into relational algebra Relational algebra reminder: ABAC1x1w1x2v1x3u2y1w2y2v2y3u Relational algebra: natural join R2 = R1.A,B,CR2) ABC1xw2yv ModuleLecturerTextDBSnzaCBDBSnzaUWRDBnzaUWAPSrcbB ModuleLecturerDBSnza RDBnzaAPSrcbModule,Lecturer ModuleTextDBSCB RDBAPSModule,Text DBS lossless: no fd FirstAgeJohnSmith20 FirstLastJohnSmith First,Last FirstAgeJohn20 First,Age JohnJohnBrown30MarySmith20TomBrown10JohnBrownMarySmithTomBrown lossless: no fd FirstAgeJohnSmith20 First,LastFirst,Last FirstLastJohnSmith First,Last FirstAgeJohn20 First,Age JohnJohnBrown30MarySmith20TomBrown10JohnBrownMarySmithTomBrownJohnSmith30JohnBrown20 7 Heath’s theorem•A relation R(A,B,C) that satisfies a functional B can always be non-loss decomposed into its projections R1=(R) and R2=Proof. •Firstwe show that R (R). This actually holds for any relation, does not have to satisfy A •Assume r R. We need to show r (R). Since r R, r(A,B) (R) and r(A,C) (R). Since r(A,B) and r(A,C) have the same value for A, their join r(A,B,C) = r is in Heath’s theorem•Now we show that R. This only holds if R satisfies A •Assume r (R). •So, r(A,B) (R) and r(A,C) (R). •By the definition of projection, if r(A,B) (R), then there is a tuplesR such that s(A,B) = r(A,B). Similarly, since r(A,C) (R), there is sR such that (A,C) = r(A,C). •Since s(A,B) = r(A,B) and s(A,C) = r(A,C), s(A) = (A). So because of A B, s(B) = s(B). This means that s(A,B,C) = s(A,B,C) = r and r •Consider a relation Book with attributes Author, Title, Publisher, City, Country, Year, ISBN. There are two candidate keys: ISBN and (Author, Title, Publisher, Year). City is the place where the book is published, and there are functional dependencies Publisher City and City Country. Is this relation in 2NF? Explain your answer. (4 marks)•Is this relation in 3NF? Explain your answer. (5 marks)•Is the relation above in BCNF? If not, decompose it to BCNF and explain why the resulting tables are in BCNF. (5 marks). Next Lecture•Physical DB Issues•RAID arrays for recovery and speed•Indexes and query efficiency•Query optimisation•Query trees•For more information•Connolly and Beggchapter 21 and appendix C.5, Ullmanand Widom5.2.8 Next Lecture•More normalisation•Lossless decomposition; why our reduction to 2NF and 3NF is lossless•Boyce-Coddnormal form (BCNF)•Higher normal forms•Denormalisation•For more information•Connolly and Beggchapter 14•Ullmanand Widomchapter 3.6