1 In This Lecture ID: 127187
Download Pdf The PPT/PDF document "NormalisationDatabase Systems Lectures 1..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1 NormalisationDatabase Systems Lectures 11-12Natasha Alechina In This LectureIdea of normalisationFunctional dependenciesNormal forms Redundancy is often caused by a functional A functional dependency (FD) is a link between two sets of attributes in a relationWe can normalise a {ID, modCode} {First, Last, modName}{modCode} IDmodCodemodNameFirstLast 111G51PRGProgrammingJoeBloggs222G51DBSDatabasesAnneSmith FDsand NormalisationWe define a set of 'normal forms'Each normal form has fewer FDsthan the lastSince FDsrepresent redundancy, each normal form has less redundancy than the lastNot all FDscause a Key attributes We callan attribute key attribute 2 Partial FDsand 2NFPartial FDs:A FD, is a partial FD, if some attribute of can be removed and the FD still holdsFormally, there is some proper subset of , such that Let us call attributes which are part of some candidate key, key attributes, and the rest non-key attributes.Second normal form:A relation is in second normal form (2NF) if it is in 1NF and no non-key attribute is partially dependent on a candidate keyIn other words, no where C is a strict subset of a candidate key and B is a non-key attribute. 1NF is not in 2NFWe have the FD{Module, Text} {Lecturer, Dept}But also{Module} {Lecturer, Dept}And so Lecturer and Dept are partially dependent on the primary keyModule Dept Lecturer TextM1 D1 L1 T1M1 D1 L1 T2M2 D1 L1 T1M2 D1 L1 T3M3 D1 L2 T4M4 D2 L3 T1M4 D2 L3 T5M5 D2 L4 T6 Suppose we have a relation R with scheme S and the FD A B where B = { }Let C = S (A U B)In other words:A attributes on the left hand side of the FDB attributes on the right hand side of the FDC all other attributesIt turns out that we can split R into two parts:R1, with scheme C U AR2, with scheme A U BThe original relation can be recovered as the natural join of R1 and R2: R = R1 NATURAL JOIN R2 1NF to 2NF ExampleModule Dept Lecturer TextM1 D1 L1 T1M1 D1 L1 T2M2 D1 L1 T1M2 D1 L1 T3M3 D1 L2 T4M4 D2 L3 T1M4 D2 L3 T5M5 D2 L4 T6 Module Dept LecturerM1 D1 L1M2 D1 L1M3 D1 L2M4 D2 L3M5 D2 L4 Module TextM1 T1M1 T2M2 T1M2 T3M3 T4M4 T1M4 T5M1 T6 A A, B where A is the `baddependency violating 2NF Problems Resolved in 2NFProblems in 1NFINSERT Can't add a module with no textsUPDATE To change lecturer for M1, we have to change two rowsDELETE If we remove M3, we remove L2 as wellIn 2NF the first two are resolved, but not the third oneModule Dept LecturerM1 D1 L1M2 D1 L1M3 D1 L2M4 D2 L3M5 D2 L4 Problems Remaining in 2NFINSERT anomaliesCan't add lecturers who teach no modulesUPDATE anomaliesTo change the department for L1 we must alter two rowsDELETE anomaliesIf we delete M3 we delete L2 as wellModule Dept LecturerM1 D1 L1M2 D1 L1M3 D1 L2M4 D2 L3M5 D2 L4 3 Transitive FDsand 3NFTransitive FDs:A FD, is a transitive FD, if there is some set such that are non-trivial FDsnon-trivial means: is not a subset of We haveThird normal form A relation is in third normal form (3NF) if it is in 2NF and no non-key attribute is transitively dependent on a candidate keyAlternative (simpler) definition: a relation is in 3NF if in every non-trivial fdA either B is a key attribute or A is a 2NFa is not in 3NFWe have the FDs{Module} {Lecturer}{Dept}So there is a transitive FD from the primary key {Module} to {Dept}Module Dept LecturerM1 D1 L1M2 D1 L1M3 D1 L2M4 D2 L3M5 D2 L4 2NF to 3NF ExampleModule Dept LecturerM1 D1 L1M2 D1 L1M3 D1 L2M4 D2 L3M5 D2 L4 Lecturer DeptL1 D1L2 D1L3 D2L4 D2 Module LecturerM1 L1M2 L1M3 L2M4 L3M5 L4 Problems Resolved in 3NFProblems in 2NFINSERT Can't add lecturers who teach no modulesUPDATE To change the department for L1 we must alter two rowsDELETE If we delete M3 we delete L2 as In 3NF all of these are resolved (for this relation but 3NF can still have anomalies!)Lecturer DeptL1 D1L2 D1L3 D2L4 D2 Module LecturerM1 L1M2 L1M3 L2M4 L3M5 L4 Normalisation so FarFirst normal formAll data values are atomicSecond normal formIn 1NF plus no non-key attribute is partially dependent on a candidate keyThird normal formIn 2NF plus no non-key attribute depends transitively on a candidate key (or, no dependencies of non-key on non-superkey) The Stream RelationConsider a relation, Stream, which stores information about times for various streams of courses For example: labs for first yearsEach course has several streamsOnly one stream (of any course at all) takes place at any given timeEach student taking a course is assigned to a single stream for it 4 The Stream Relation StudentCourseTimeJohnDatabases12:00MaryDatabases12:00RichardDatabases15:00RichardProgramming10:00MaryProgramming10:00RebeccaProgramming13:00 Candidate keys: {Student, Course} and {Student, Time} FDsin the Stream RelationStream has the following non-trivial {Student, Course} {Time}{Time} {Course}Since all attributes are key attributes, Stream is in 3NF Anomalies in StreamINSERT anomaliesYou cant add an empty streamUPDATE anomaliesMoving the 12:00 class to 9:00 means changing two rowsDELETE anomaliesDeleting Rebecca removes a stream StudentCourseTimeJohnDatabases12:00MaryDatabases12:00RichardDatabases15:00RichardProgramming10:00MaryProgramming10:00RebeccaProgramming13:00 Boyce-CoddNormal FormA relation is in Boyce-Coddnormal form (BCNF) if for every FD A B eitherB is contained in A (the FD is trivial), orA contains a candidate key of the relation, In other words: every determinant in a non-trivial dependency is a The same as 3NF except in 3NF we only worry about non-key BsIf there is only one candidate key then 3NF and BCNF are the same Stream is not in BCNF as the FD {Time} {Course} is non-trivial and {Time} does not contain a candidate StudentCourseTimeJohnDatabases12:00MaryDatabases12:00RichardDatabases15:00RichardProgramming10:00MaryProgramming10:00RebeccaProgramming13:00 Conversion to BCNFStudent Time Course Time Stream has been put into BCNF but we have lost the FD{Student, Course} {Time}Student Course Time 5 Decomposition PropertiesLossless: Data should not be lost or created when splitting relations upDependency preservation: It is desirable that FDsare preserved when splitting relations upNormalisation to 3NF is always lossless and dependency preservingNormalisation to BCNF is lossless, but may not preserve all dependencies Higher Normal FormsBCNF is as far as we can go with FDsHigher normal forms are based on other sorts of dependencyFourth normal form removes multi-valued dependenciesFifth normal form removes join dependencies 1NF Relations 2NF Relations 3NF Relations BCNF Relations 4NF Relations 5NF Relations DenormalisationNormalisationRemoves data Solves INSERT, anomaliesThis makes it easier to maintain the information in the database in a consistent stateHoweverIt leads to more tables in the databaseOften these need to be joined back together, which is expensive to doSo sometimes (not often) it is worth denormalising DenormalisationYou mightwant to denormaliseifDatabase speeds are unacceptable (not just a bit slow)There are going to be very few INSERTs, UPDATEs, or DELETEsThere are going to be lots of SELECTsthat involve the joining of tables Street Postcode CityNot normalised since {Postcode} {City} Street Postcode City Postcode Lossless decompositionTo normalise a relation, we used projectionsIf R(A,B,C) satisfies Athen we can project it on A,B and A,C without losing informationLossless decomposition: R = (R) is projection of R on AB and is natural join.Reminder of projection: ABC (R) Relational algebra reminder: 1cc2yde3zaa4ubc5wcd 1cc3zaa C=D 6 Connection to SQLWHERE (some property holds)translates into relational algebra Relational algebra reminder: ABAC1x1w1x2v1x3u2y1w2y2v2y3u Relational algebra: natural join R2 = R1.A,B,CR2) ABC1xw2yv ModuleLecturerTextDBSnzaCBDBSnzaUWRDBnzaUWAPSrcbB ModuleLecturerDBSnza RDBnzaAPSrcbModule,Lecturer ModuleTextDBSCB RDBAPSModule,Text DBS lossless: no fd FirstAgeJohnSmith20 FirstLastJohnSmith First,Last FirstAgeJohn20 First,Age JohnJohnBrown30MarySmith20TomBrown10JohnBrownMarySmithTomBrown lossless: no fd FirstAgeJohnSmith20 First,LastFirst,Last FirstLastJohnSmith First,Last FirstAgeJohn20 First,Age JohnJohnBrown30MarySmith20TomBrown10JohnBrownMarySmithTomBrownJohnSmith30JohnBrown20 7 Heaths theoremA relation R(A,B,C) that satisfies a functional B can always be non-loss decomposed into its projections R1=(R) and R2=Proof. Firstwe show that R (R). This actually holds for any relation, does not have to satisfy A Assume r R. We need to show r (R). Since r R, r(A,B) (R) and r(A,C) (R). Since r(A,B) and r(A,C) have the same value for A, their join r(A,B,C) = r is in Heaths theoremNow we show that R. This only holds if R satisfies A Assume r (R). So, r(A,B) (R) and r(A,C) (R). By the definition of projection, if r(A,B) (R), then there is a tuplesR such that s(A,B) = r(A,B). Similarly, since r(A,C) (R), there is sR such that (A,C) = r(A,C). Since s(A,B) = r(A,B) and s(A,C) = r(A,C), s(A) = (A). So because of A B, s(B) = s(B). This means that s(A,B,C) = s(A,B,C) = r and r Consider a relation Book with attributes Author, Title, Publisher, City, Country, Year, ISBN. There are two candidate keys: ISBN and (Author, Title, Publisher, Year). City is the place where the book is published, and there are functional dependencies Publisher City and City Country. Is this relation in 2NF? Explain your answer. (4 marks)Is this relation in 3NF? Explain your answer. (5 marks)Is the relation above in BCNF? If not, decompose it to BCNF and explain why the resulting tables are in BCNF. (5 marks). Next LecturePhysical DB IssuesRAID arrays for recovery and speedIndexes and query efficiencyQuery optimisationQuery treesFor more informationConnolly and Beggchapter 21 and appendix C.5, Ullmanand Widom5.2.8 Next LectureMore normalisationLossless decomposition; why our reduction to 2NF and 3NF is losslessBoyce-Coddnormal form (BCNF)Higher normal formsDenormalisationFor more informationConnolly and Beggchapter 14Ullmanand Widomchapter 3.6