/
CSCI 2141 – Intro to CSCI 2141 – Intro to

CSCI 2141 – Intro to - PowerPoint Presentation

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
342 views
Uploaded On 2019-11-26

CSCI 2141 – Intro to - PPT Presentation

CSCI 2141 Intro to Database Systems Summary The Design Process Normalization process produces good tables from poor ones Ensure that the proposed entities meet the required normal form normally 3NF before creating table structures ID: 768143

design product job components product design components job employee serial table tables exercise database component chg erd hours improving

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CSCI 2141 – Intro to" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

CSCI 2141 – Intro to Database Systems Summary

The Design ProcessNormalization process produces good tables from poor onesEnsure that the proposed entities meet the required normal form (normally 3NF) before creating table structuresNormalization should hence be part of the design processCommon for experienced designers to make mistakes These mistakes can come to light during normalization checks Database design is an iterative process You normally might not get it right the first time

The Design ProcessFirst step – Producing an Entity-Relationship Diagram (ERD)Creating an ERD is an iterative process Identify relevant entities, their attributes and relationships Create an ERD Examine the ERD to identify additional entities and attributes ERD provides the macro view of an organizations data requirements and operations Second step – Normalize the tables Normalization focuses on specific entities and its attributes Provides a micro view of entities within the ERD Go to step 1 and repeat if required

ExampleLet us examine the operations of a contracting companyThe company manages many projectsEach project requires the services of many employees An employee may be assigned to several different projects Some employees are not assigned to any project and perform other duties (e.g. executive secretary) Some employees are part of a labor pool, to be shared by all project teams Each employee has a single primary job classification that determines the hourly billing rate Many employees can have the same job classification

ExampleInitial ERDPROJECT (PROJ_NUM, PROJ_NAME) EMPLOYEE ( EMP_NUM , EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_DESCRIPTION, JOB_CHG_HOURS) Identify the normal forms of the above tables

ExamplePROJECT is in 3NFEMPLOYEE is in 2NFSingle attribute PK in 1NF => 2NFTransitive dependency JOB_DESCRIPTION → JOB_CHG_HOURS Converting Employee to 3NF results in another table JOB (JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOURS) Redraw modified ERD

ExampleAt this time EMPLOYEE and PROJECT have a many to many relationshipThis needs to be broken downAdd an ASSIGNMENT entity

ExampleNote the use of surrogate PK in several tablesDesign properly reflects requirements, and all tables are in 3NF

Improving the Database DesignPrimary Key assignmentsUsing integers as PK instead of characters or stringsFewer chances of making mistakes, hence lesser referential integrity errorsNaming conventions Associate attributes with entities using entity prefixes E.g. EMP_ID or EMPLOYEE_ID or EMP_NUM JOB_DESC or JOB_DESCRIPTION

Improving the Database DesignAttribute atomicityMake attributes atomicAtomic attribute cannot be further sub-dividedE.g. LNAME, FNAME, INITIAL instead of NAME ST_NUM, ST_NAME, CITY, STATE, COUNTRY, ZIP instead of ADDRESS What about DAY, MONTH, YEAR instead of DATE ? Evaluate benefits of programming overhead against need Not usually needed as processing date is easy, and functions available in DB to get date, month and year without much additional overhead

Improving the Database DesignIdentify new attributesConsider the real world implementation and future extensionE.g. Employee table may need EMPLOYEE_HIREDATE if organization has decisions based on job longevity (bonus?) SALARY, INSURANCE, BENEFITS payments (if required) Identify new relationships Employee – Manager relationship, Better to have a new table or simply a “manages” relation between Employee and Project table?

Improving the Database DesignPK refinement for data granularityWhat does PROJ_CHG_HOURS representDaily, weekly, monthly, yearly or life-time total?Different ways to store PROJ_CHG_HOURS Evaluate benefit of one against the other

Improving the Database DesignMaintain historical accuracySee tables in 3NFCan we calculate total amount charged for an employee on a particular project using these tables? ASSIGN_HOURS x CHG_HOUR Will historical accuracy be maintained if employee details change? ASSIGN_CHG_HOUR needs to be stored in the Assignment table along with ASSIGN_HOURS

Improving the Database DesignConsider using derived attributesASSIGN_HOURS x ASSIGN_CHG_HOUR = ASSIGN_CHGEvaluate storing ASSIGN_CHG as the actual charged amount in the tableThis creates a transitive dependency but maintains historical accuracy, reduces run-time calculation complexity and human-errors Saves reporting time if many transactions need to be reported or summarized, and simplifies writing of application software

Improving the Database DesignUsing the enhancements suggested, the new DB is as shown:

DenormalizationPrevious example showed us that database design can be improved using DenormalizationDenormalization is the process of introducing redundancy in tables to improve performanceA good RDBMS design is normalized to eliminate any unnecessary duplicationsThis produces many tables and relationships In order to generate information, data must be put together from many tables Joining several tables takes additional I/O operations and processing logic, thus reducing speed

DenormalizationDenormalization introduces redundanciesAdvantage of higher processing speed must be carefully weighed against the disadvantage of data anomaliesIn addition, Denormalization may produce larger tables, making updates and indexing slower Consider a ZIP table: ZIP ( ZIP_CODE , CITY, STATE) Do we really need to store this in a separate table?

Design Exercise 1Suppose a manufacturer produces three high-cost, low-volume products: P1, P2, and P3. Product P1 is assembled with components C1 and C2; product P2 is assembled with components Cl, C3, and C4; and product P3 is assembled with components C2 and C3. Components may be purchased from several vendors, as shown in the following table. Vendor Components Supplied V1 V2 V3 C1, C2 C1, C2, C3, C4 C1, C2, C4

Design Exercise 1Each product has a unique serial number, as does each component. To track product performance, careful records are kept to ensure that each product’s components can be traced to the component supplier.Products are sold directly to final customers; that is, no wholesale operations are permitted. The sales records include the customer identification and the product serial number. Using the preceding information, do the following: Write the business rules governing the production and sale of the products. Create an ER diagram capable of supporting the manufacturer’s product/component tracking requirements.

Design Exercise 1 Business Rule PRODUCT COMPONENTS P1 C1 C2 P2 C1 C3 C4 P3 C2 C3 1. A component can be part of several products, and a product is made up of several components. VENDOR COMPONENTS SUPPLIED V1 C1 C2 V2 C1 C2 C3 C4 V3 C1 C2 C4 2. A component can be supplied by several vendors, and a vendor supplies several components. Business Rule

Design Exercise 1 – Initial ERD

Design Exercise 1 – Sample Data

Design Exercise 1 – Problems in ERDWe used default optionalitiesVarious PRODUCTs do not necessarily contain all available COMPONENTs, All VENDORs do not supply all COMPONENTsBusiness Rule 3: Each product has a unique serial number, as does each component . To keep track of product performance, careful records are kept to ensure that each product's components can be traced to the component supplier . Currently, each product represents a product line

Design Exercise 1 – Solution One way to produce the tracking capability required by business rule 3 is to use a ternary relationship between PRODUCT, COMPONENT, and VENDOR

Ternary RelationshipThe ER diagram we have just shown represents a many‑to‑many‑to‑many TERNARY relationship, expressed by M:N:P. This ternary relationship indicates that:A product is composed of many components and a component appears in many products.A component is provided by many vendors and a vendor provides many components. A product contains components of many vendors and a vendor's components appear in many products.

Design Exercise 1 – Solution Assigning attributes to the SERIALS entity, we may draw the dependency diagram shown below: P_SERIAL partial dependency partial dependency C_SERIAL PROD_TYPE COMP_TYPE VEND_CODE

Design Exercise 1 – Solution P_SERIAL PROD_TYPE C_SERIAL P_SERIAL COMP_TYPE C_SERIAL VEND_CODE Table name: P_SERIAL Table name: C_SERIAL Table name: SERIAL The Normalized Dependency Diagrams

Design Exercise 1 – Solution