/
Database Management Systems (CS 564) Database Management Systems (CS 564)

Database Management Systems (CS 564) - PowerPoint Presentation

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
342 views
Uploaded On 2019-12-05

Database Management Systems (CS 564) - PPT Presentation

Database Management Systems CS 564 Fall 2017 Lecture 7 Recap Schema Refinement Redundancy causes various kinds of anomalies To refine schemas Detect anomalies Find FDs in the relations schemas ID: 769263

564 fall normal dayton fall 564 dayton normal cid decomposition schema bcnf deptname fds form database management set 3nf

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Database Management Systems (CS 564)" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Database Management Systems (CS 564) Fall 2017Lecture 7

Recap: Schema RefinementRedundancy causes various kinds of anomalies To refine schemas:Detect anomaliesFind FDs in the relations’ schemasApply Armstrong’s axioms to expand these FDsUse the FDs to find the anomalies in the schemasRemove anomaliesDecompose the anomalous schemas CS 564 (Fall'17) 2

How to Find FDs Create ER modelTranslate it into a relational schemaThink about FDs that are validFrom the application point of viewGiven a table with a set of tuples, the best you can do is to Confirm that a FD seems to be valid, or Prove that a FD is definitely not valid (through counterexamples) You cannot prove that a FD is valid CS 564 (Fall'17) 3

How to Find FDs (Cont.) Example: does CID → Instructor hold for the following instance of Section?CS 564 (Fall'17) 4 SecID CID Semester Year Instructor 30098 MATH240 Fall 2017 Euclid 40026 CS367 Fall2016Dijkstra1005CS367Spring2004Gauss30451CS764Fall2017Patel20006CS564Spring2001Codd Section CID InstructorMATH240EuclidCS367DijkstraCS367GaussCS764PatelCS564Codd

Check FDs Let S be a set of FDs defined on the attributes in the set Xe.g. X={SID, Name, SSN}, S ={(SID → Name, SSN), (SSN → SID) } Question: is Y ⊆ X a superkey?To answer this question among others, we find all the attribute sets that Y determines CS 564 (Fall'17) 5

Attribute Set Closures Given a set X of attributes and a set S of FDs, the closure of Y ⊆ X (under S ), called Y + , is the set of all attributes Z ∈ X such that Y → Z e.g. X ={ SecID , CID, CName, Year, Department}, S={(SecID → CID, CName, Year, Department), (CID → Department)} Y={CName, CID}Y+={CName, CID, Department}CS 564 (Fall'17)6

Compute Attribute Set Closures CS 564 (Fall'17)7 Y + = Y loop if ∃ FD Z → T in S s.t. Z ⊆ Y+ : Y+ = Y+ ∪ Tuntil Y+ does not change any further

Use Attribute Set Closures Test if Y is a superkeyCompute Y+ and check if Y + contains all the attributes of R Test if a given FD Y → Z holds (without computing S +)Compute Y + and check if Z is in Y + CS 564 (Fall'17)8

Minimal Basis of FD Sets Opposite of closureS is a minimal basis for a set F of FDs if   S + = F+ Every FD in S has one attribute on the RHS If we remove any FD from S , the closure would not be F + anymoreIf for any FD in S we remove one or more attributes from the LHS, the closure would not be F+ anymoreCS 564 (Fall'17)9

Minimal Basis of FD Sets (Cont.) S is a minimal basis for a set F of FDs if S + = F + Every FD in S has one attribute on the RHSIf we remove any FD from S , the closure is not F+ If for any FD in S we remove one or more attributes from the LHS, the closure is not F +CS 564 (Fall'17)10Example a ⟶ b a,b,c,d ⟶ e e,f ⟶ g,h a,c,d,f ⟶ e,gWhat is the minimal basis for the above FD set? a ⟶ b a,c,d ⟶ e e,f ⟶ g e,f ⟶ h

Recap: Schema RefinementRedundancy causes various kinds of anomalies To refine schemas:Detect anomaliesFind FDs in the relations’ schemasApply Armstrong’s axioms to expand these FDsUse the FDs to find the anomalies in the schemasRemove anomaliesDecompose the anomalous schemas CS 564 (Fall'17) 11

Detect AnomaliesAnomaly 1 CS 564 (Fall'17)12 CID Name Credits DeptName Dept Address CS564 Database Management Systems 3 CS W. Dayton St. MATH240Discrete Mathematics4MATHCharter St.CS367Intro to Data Structures3CSW. Dayton St.CS764Adv. Database Management3CSW. Dayton St.CourseQ: What is the source of redundancy in the above relation?A: The moment we know DeptName, the value of DeptAddress is fixed, i.e . the FD DeptName→ DeptAddress holds. But DeptName in not a superkey.

Detect Anomalies (Cont.)In general, when a non-trivial FD X → Y holds for relation R, but X is not a superkey, multiple tuples can have the same value(s) for attributes in X Which automatically means than the same value(s) for attributes in Y would be repeated for those tuples, causing redundancy To deal with this situation, we decompose the anomalous relation schema CS 564 (Fall'17) 13

Schema DecompositionLet 𝓡 (A) be a relational schema with schema name 𝓡 and schema attribute set AWe can decompose 𝓡 into two schemas 𝓡1(B) and 𝓡2 ( C ) such that B ∪ C = AIn case of the previously discussed anomaly, B = A ∖ Y and C = X∪YCS 564 (Fall'17)14

Schema Decomposition: ExampleCS 564 (Fall'17) 15 CID Name Credits DeptName Dept Address CS564 Database Management Systems 3 CS W. Dayton St. MATH240Discrete Mathematics4MATHCharter St.CS367Intro to Data Structures3CSW. Dayton St.CS764Adv. Database Management3CSW. Dayton St.CourseCIDNameCredits DeptName CS564 Database Management Systems3CSMATH240Discrete Mathematics4MATHCS367Intro to Data Structures3CS CS764 Adv. Database Management 3 CS Course DeptName DeptAddress CS W. Dayton St. MATH Charter St. Department A X Y → B C B ∩ C = X C ∖ B = Y

Schema Decomposition Desiderata Minimize redundancyAvoid information loss (lossless join)Preserve the FDs (dependency-preserving)Ensure good query performance CS 564 (Fall'17) 16

Lossless Join DecompositionJoin refresher: querying related tuples from two or more relations ExampleCS 564 (Fall'17)17 SELECT D.Name FROM Student AS S, Department AS D WHERE S.Major = D.DID AND S.SID = 17;

Lossy Join DecompositionCS 564 (Fall'17) 18 CID Name Credits DeptName Dept Address CS564 Database Management Systems 3 CS W. Dayton St. MATH240Discrete Mathematics4MATHCharter St.CS367Intro to Data Structures4CSW. Dayton St.CS764Adv. Database Management3CSW. Dayton St.CourseCIDNameDeptName Dept Address CS564Database Management SystemsCSW. Dayton St.MATH240Discrete MathematicsMATHCharter St.CS367Intro to Data StructuresCS W. Dayton St. CS764 Adv. Database Management CS W. Dayton St. Course DeptName Credits CS 3 MATH 4 CS 4 DeptCreds CID Name Credits DeptName Dept Address CS564 Database Management Systems 3 CS W. Dayton St. MATH240 Discrete Mathematics 4 MATH Charter St. CS367 Intro to Data Structures 4 CS W. Dayton St. CS764 Adv. Database Management 3 CS W. Dayton St. CS564 Database Management Systems 4 CS W. Dayton St. CS367 Intro to Data Structures 3 CS W. Dayton St. CS764 Adv. Database Management 4 CS W. Dayton St. Course Extra data! i.e. lossy join decomposition

Lossless Join Decomposition (Cont.)Suppose you decompose relation schema 𝓡 into two schemas 𝓡1 and 𝓡2 If for any instance R (with schema 𝓡), you can decompose R into R 1 (with schema 𝓡 1 ) and R 2 (with schema 𝓡2) such that R can be recovered by joining R 1 and R2 , then the decomposition is a lossless join decomposition CS 564 (Fall'17) 19

Dependency-preserving DecompositionCS 564 (Fall'17) 20 CID Name Credits DeptName Dept Address CS564 Database Management Systems 3 CS W. Dayton St. MATH240Discrete Mathematics4MATHCharter St.CS367Intro to Data Structures4CSW. Dayton St.CS764Adv. Database Management3CSW. Dayton St.CourseCIDNameDeptName Dept Address CS564Database Management SystemsCSW. Dayton St.MATH240Discrete MathematicsMATHCharter St.CS367Intro to Data StructuresCS W. Dayton St. CS764 Adv. Database Management CS W. Dayton St. Course DeptName Credits CS 3 MATH 4 CS 4 DeptCreds F={CID → Name, CID → Credits, CID → DeptName , CID → DeptAddress, DeptName → DeptAddress} F={CID → Name, CID → DeptName , CID → DeptAddress, DeptName → DeptAddress} F={ } Semantic is lost, since if you are only given Course and DeptCreds there is no way to account for the FD CID → Credits.

Dependency-preserving Decomposition (Cont.) Suppose you decompose relation schema 𝓡 with FD set F into two schemas 𝓡1 and 𝓡2 with FD sets F1 and F 2 respectively If all the FDs inferable from F can also be inferred from the union of F 1 and F2 , we say that the decomposition is dependency preservingi.e. if (F1 ∪ F 2 ) + = F+CS 564 (Fall'17)21

Schema Decomposition Desiderata (Cont.) Example: let 𝓡(J, K, L) have FDs F={J → K, KL → J}We cannot achieve all the three properties above𝓡1(J , K) and 𝓡 2 (K, L) 𝓡 1 (J, L) and 𝓡2 (K, L)𝓡1 (J, K) and 𝓡 2 (J, L), …If we want both lossless join and dependency preservation, we have to tolerate some redundancyFortunately, there situations are rare in real-worldCS 564 (Fall'17)22Q: Is it always possible to find a redundancy-reducing, lossless join, dependency-preserving decomposition?A: No.

Normal FormsTo make anomaly detection and decomposition more systematic, we define normal formsInformally, a relation is said to be in a particular normal form if it doesn’t have certain anomaliesCS 564 (Fall'17) 23

Normal Forms (Cont.)CS 564 (Fall'17) 24 BCNF Third Normal Form (3NF) Second Normal Form (2NF) First Normal Form (1NF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

Boyce-Codd Normal Form (BCNF) One of the most commonly used normal formsLet 𝓡 be a relation schema with the FD set F. 𝓡 is in BCNF if for every FD X → Y in F X → Y is trivial (i.e. Y ⊆ X), or X is a superkey of 𝓡 CS 564 (Fall'17) 25

F={CID →Name,CID→ Credits, CID → DeptName , CID →DeptAddress, DeptName→DeptAddress } BCNF (Cont.) CS 564 (Fall'17) 26 CID Name CreditsDeptNameDeptAddressCS564Database Management Systems3CSW. Dayton St.MATH240Discrete Mathematics4MATHCharter St.CS367Intro to Data Structures3CSW. Dayton St.CS764Adv. Database Management3CSW. Dayton St.CourseCID Name Credits DeptNameCS564Database Management Systems3CSMATH240Discrete Mathematics4MATHCS367 Intro to Data Structures 3 CS CS764 Adv. Database Management 3 CS Course DeptName DeptAddress CS W. Dayton St. MATH Charter St. Department Q: Is this relation in BCNF? A : No. Q: Are these relations in BCNF? A : Yes. Such a decomposition is called a BCNF decomposition.

BCNF DecompositionGiven a relation schema 𝓡(Z) with BCNF-violating FD X → Y, decompose 𝓡 into 𝓡1(X, Y) and 𝓡2(Z ∖ Y) Example CS 564 (Fall'17) 27 Course(CID, Name, Credits, DeptName , DeptAddress ) Course(CID, Name, Credits, DeptName ) Department(DeptName, DeptAddress)DeptName → DeptAddress

BCNF Decomposition PropertiesRemoves certain types of redundancy See multivalued dependency for a type of redundancy BCNF decomposition does NOT removeIs lossless joinIs NOT ALWAYS dependency preservingAs we saw beforeCS 564 (Fall'17) 28 A: No. Why? Q: Is it possible for a binary relation R (x 1 , x 2 ) to be non-BCNF?

BCNF Decomposition ExampleCS 564 (Fall'17) 29 Person(Name, SSN, Age, EyeColor , PhoneNo ) Phone(SSN, PhoneNumber ) Person(Name, SSN, Age, EyeColor ) SSN → Name, Age, EyeColor

BCNF Decomposition Example (Cont.)CS 564 (Fall'17) 30 Person(Name, SSN, Age, EyeColor , PhoneNo , CanDrink ) Phone(SSN, PhoneNumber ) Person(Name, SSN, Age, EyeColor , CanDrink)SSN → Name, Age, EyeColorAge → CanDrinkSSN → Name, Age, EyeColorAge → CanDrink⇒ SSN → Name, Age, EyeColor, CanDrinkPerson(Name, SSN, Age, EyeColor)Drink(Age, CanDrink)The final schemasSSN is not a superkey.Age is not a superkey.

Recap: Schema RefinementRedundancy causes various kinds of anomalies To refine schemas:Detect anomaliesFind FDs in the relations’ schemasApply Armstrong’s axioms to expand these FDsUse the FDs to find the anomalies in the schemasRemove anomaliesDecompose the anomalous schemas (i.e. achieve desired normal forms) CS 564 (Fall'17) 31

Recap: Normal FormsCS 564 (Fall'17) 32 BCNF Third Normal Form (3NF) Second Normal Form (2NF) First Normal Form (1NF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

Third Normal Form (3NF)Let 𝓡 be a relation schema with the FD set F. 𝓡 is in 3NF if for every FD X → Y in FX → Y is trivial (i.e. Y ⊆ X ), or X is a superkey of 𝓡, or Y is part of a key of 𝓡If 𝓡 is in BCNF, then it certainly is in 3NF, but not necessarily the other way around CS 564 (Fall'17) 33

3NF Violation A FD X → Y of 𝓡 violates 3NF conditions ifeither X is a subset of some key or X is NOT a proper subset of any key CS 564 (Fall'17) 34 Y X Key Partial Dependency Key Y X Y XKeyTransitive DependencyRedundant storage of (X, Y) pairsCannot store X without storing Y

3NF Decomposition CS 564 (Fall'17)35 PropertyID CountyName LotNo Area Price 1000 Dane 12 10 25 1001 Milwaukee 1 301071002Milwaukee2301071003Milwaukee335242LotPropertyID → CountyName, LotNo, Area, PriceCountyName, LotNo → PropertyID, Area, PriceArea → PriceArea → CountyNamePropertyIDCountyName LotNo Area 1000Dane12101001Milwaukee1301002Milwaukee230 1003 Milwaukee 3 35 Lot Area Price 10 25 30 107 35 242 Price PropertyID → CountyName , LotNo , Area, Price CountyName , LotNo → PropertyID, Area, Price Area → CountyName Area → Price A: No. Why? Q: Are these relations in BCNF ? Problem: transitive dependency Area → Price

3NF Decomposition (Cont.) Follow the BCNF decomposition algorithmTypically, you can stop earlierIf losing FD X → Y, you can add a relation 𝓡3( X , Y )3NF allows this type of redundancy CS 564 (Fall'17) 36 Hence, it is always possible to find a lossless join , dependency-preserving 3NF decomposition.

3NF Decomposition (Cont.) Can make the algorithm more efficient by using the minimal basis of F instead of FCompute a minimal basis M for FFind a lossless join decomposition of 𝓡 ( which might miss some FDs in M) Add additional relations to cover any missing FDs CS 564 (Fall'17) 37

Fourth Normal Form (4NF)When there are no multivalued dependencies (MVDs) among the attributes in the schemaMVD: given a schema 𝓡(X, Y, Z ), there exists a MVD X ↠ Y if for any value of X, the set of values of Y is independent of Zi.e. decomposition 𝓡 1 ( X , Y ) and 𝓡2(X,Z) is losslessAny FD is a MVD as wellCS 564 (Fall'17)38

MVD ExampleCS 564 (Fall'17) 39 SSN Phone No Course 123-32-99 99 (206) 572-4312 CS564 123-32-99 99 (206) 572-4312 CS367 123-32-99 99(206) 432-8954CS564123-32-9999(206) 432-8954CS367PersonTypically, MVD problems arise when two many-to-one relationships are mixed into one relationSSN ↠ PhoneNo SSN ↠ Course

4NF (Cont.)Let 𝓡 be a relation schema with the MVD set V. 𝓡 is in 4NF if for every MVD X ↠ Y in V X ↠ Y is trivial (i.e. Y ⊆ X), or X is a superkey of 𝓡 Same as BCNF definition, with FD replaced by MVD CS 564 (Fall'17) 40

Fifth Normal Form (5NF)Sometimes, a relation cannot be lossless join decomposed into two relations, but can be into three or more5NF captures the idea that a relation schema must have some particular lossless join decomposition Concept of join dependency Finding actual 5NF cases is difficult CS 564 (Fall'17) 41

Schema Refinement (Cont.)Aim for all the relations to be in BCNF, settle for 3NF In practice, when your relations are in BCNF, usually they are in 5NF as wellCS 564 (Fall'17) 42

Schema Refinement in Practice Tradeoff between redundancy and query answering performanceNormalization makes answering particular queries more difficulte.g. queries involving many tablesSo in “read-heavy” applications, you might decide to denormalize your schema in favor of answering queries faster CS 564 (Fall'17) 43

Recap: Schema RefinementRedundancy causes various kinds of anomalies To refine schemas:Detect anomaliesFind FDs , apply Armstrong’s axioms, find anomaliesRemove anomaliesDecompose the anomalous schemasDesired decomposition propertiesRedundancy reducing, lossless join, dependency preserving Normal forms 3NF, BCNF, 4NF, … CS 564 (Fall'17) 44

Recap: Normal FormsCS 564 (Fall'17) 45 BCNF Third Normal Form (3NF) Second Normal Form (2NF) First Normal Form (1NF) Fourth Normal Form (4NF) Fifth Normal Form (5NF)

Relational Algebra: Foundations of Operating on Relational Data Next UpCS 564 (Fall'17)46 Questions?