/
A Database Reverse Engineering Case Study Michael R. Blaha, DSc. A Database Reverse Engineering Case Study Michael R. Blaha, DSc.

A Database Reverse Engineering Case Study Michael R. Blaha, DSc. - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
346 views
Uploaded On 2019-11-02

A Database Reverse Engineering Case Study Michael R. Blaha, DSc. - PPT Presentation

A Database Reverse Engineering Case Study Michael R Blaha DSc blahacomputerorg wwwsuperdataguycom DAMA Chicago Aug 17 2016 What is Database Reverse Engineering Reverse engineering is the inverse to normal development ID: 762462

sales production fqir person production sales person fqir tables dbre data database humanresources purchasing model reverse wordpress case study

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A Database Reverse Engineering Case Stud..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

A Database Reverse Engineering Case Study Michael R. Blaha, DSc.blaha@computer.orgwww.superdataguy.com DAMA Chicago Aug 17, 2016

What is Database Reverse Engineering? Reverse engineering is the inverse to normal developmentStart with an application and work backwards to understand the software and infer its intentReverse engineering can apply to a variety of artifacts Hardware, programming code, databases, … Our focus here is on databases

Why Would Anyone Want to Do DBRE? To elicit requirementsDBRE is not intended to perpetuate past flawsDBRE is merely a source of tentative requirementsTo convert legacy dataTo integrate application stovepipes To assess software To assist maintenance To construct documentation

Case Study 1: Reverse Engineer WordPress

Rationale WordPress is an interesting DBRE case study because...WordPress is a well-known applicationThe case study has a populated database The data is real (not synthetic) The data is not proprietary Illustrate DBRE techniques for a small database

Processing Details Export MySQL db from www.superdataguy.com websiteThe exported localhost.sql is unreadable Import SQL code into a local MySQL db Export schema only from local MySQL db The exported file is readable Manually edit the SQL by deleting… `, unsigned, COLLATE, KEY, UNIQUE KEY, ENGINE Reverse engineer schema with ERwin

Initial ERwin Model

Record Counts From querying the MySQL database wp_fqir_commentmeta 68 wp_fqir_comments 23 wp_fqir_links 0 wp_fqir_options 359 wp_fqir_postmeta 656 wp_fqir_posts 406 wp_fqir_term_relationships 59 wp_fqir_term_taxonomy 15 wp_fqir_termmeta 0 wp_fqir_terms 15 wp_fqir_usermeta 47 wp_fqir_users 1

Manually Add FKs Look for name similarityVerify with data analysisSELECT * FROM wp_fqir_commentmetaWHERE comment_ID NOT IN ( SELECT comment_ID FROM wp_fqir_comments); -- 0 records SELECT * FROM wp_fqir_commentmeta WHERE comment_ID IS NULL ; -- 0 records

DBRE ERwin Model

Data Dictionary By inspecting the data of each table… wp_fqir_commentmeta Key-value pairs from Akismet plugin wp_fqir_comments Comments posted to the Website (mostly spam) wp_fqir_links wp_fqir_options Key-value pairs for WordPress and plugins wp_fqir_postmeta Key-value pairs for WordPress, embedded media, and plugins wp_fqir_posts Each record is a WordPress page or post Appears to keep a history wp_fqir_term_relationships wp_fqir_term_taxonomy wp_fqir_termmeta wp_fqir_terms Mostly user defined categories in WordPress wp_fqir_usermeta Key-value pairs for users wp_fqir_users Authorized users for the Website

Commentary WordPress has a very small databaseOnly 12 tablesI had expected more tablesNote the lack of connectivity among tablesTypically tables are more highly interconnected WordPress has no dangling references I don’t think WordPress uses RI ERwin only partially reverse engineers MySQL It chokes on some keywords

Case Study 2: Reverse Engineer Adventure Works 2012

Rationale Adventure Works 2012 is an interesting DBRE case study because...Adventure Works is a free database provided with MS SQL ServerThe case study has a populated database The data is not proprietary The database is of medium size (71 tables) The database defines referential integrity Only one FK is missing

Mechanical Approach Strip down the schema to get to a core modelThis is like skimming a bookWe are working towards an abridgement of a modelWe can quickly get a sense of a schema We will use ER/Studio

Record Counts Query the SQL Server databasedbo.AWBuildVersion 1 dbo.DatabaseLog 1597 dbo.ErrorLog 0 HumanResources.Department 16 HumanResources.Employee 290 HumanResources. EmployeeDepartmentHistory 296 HumanResources.EmployeePayHistory 316 HumanResources.JobCandidate 13 HumanResources.Shift 3 Person.Address 19614 Person.AddressType 6 Person.BusinessEntity 20777 Person.BusinessEntityAddress 19614 Person.BusinessEntityContact 909 Person.ContactType 20 Person.CountryRegion 238 Person.EmailAddress 19972 Person.Password 19972 Person.Person 19972 Person.PersonPhone 19972 Person.PhoneNumberType 3 Person.StateProvince 181

Record Counts Record counts partially indicate table purposeProduction.BillOfMaterials 2679 Production.Culture 8 Production.Document 13 Production.Illustration 5 Production.Location 14 Production.Product 504 Production.ProductCategory 4 Production.ProductCostHistory 395 Production.ProductDescription 762 Production.ProductDocument 32 Production.ProductInventory 1069 Production.ProductListPriceHistory 395 Production.ProductModel 128 Production.ProductModelIllustration 7 Production.ProductModel ProductDescriptionCulture 762 Production.ProductPhoto 101 Production.ProductProductPhoto 504 Production.ProductReview 4 Production.ProductSubcategory 37 Production.ScrapReason 16 Production.TransactionHistory 113443 Production.TransactionHistoryArchive 89253 Production.UnitMeasure 38 Production.WorkOrder 72591

Record Counts Note different counts for ‘types’ and ‘instances’ Purchasing.WorkOrderRouting 67131 Purchasing.ProductVendor 460 Purchasing.PurchaseOrderDetail 8845 Purchasing.PurchaseOrderHeader 4012 Purchasing.ShipMethod 5 Purchasing.Vendor 104 Sales.CountryRegionCurrency 109 Sales.CreditCard 19118 Sales.Currency 105 Sales.CurrencyRate 13532 Sales.Customer 19820 Sales.PersonCreditCard 19118 Sales.SalesOrderDetail 121317 Sales.SalesOrderHeader 31465 Sales.SalesOrderHeaderSalesReason 27647 Sales.SalesPerson 17 Sales.SalesPersonQuotaHistory 163 Sales.SalesReason 10 Sales.SalesTaxRate 29 Sales.SalesTerritory 10 Sales.SalesTerritoryHistory 17 Sales.ShoppingCartItem 3 Sales.SpecialOffer 16 Sales.SpecialOfferProduct 538 Sales.Store 701

Processing Details Import schema into ER/StudioFile / New / Reverse engineer AdventureWorks2012All owners, user tables, no inferencesAdd FK Production.WorkOrderRouting.ProductID -> Production.Product.ProductID Successively delete all entity types with 0,1 connections Delete entity types with few (<= 3) connections

Step 1: ER/Studio Model

Step 2: ER/Studio Model

Step 3: ER/Studio Model

Step 3: Final Tables Remaining tables in blackdbo.AWBuildVersion 1 dbo.DatabaseLog 1597 dbo.ErrorLog 0 HumanResources.Department 16 HumanResources.Employee 290 HumanResources. EmployeeDepartmentHistory 296 HumanResources.EmployeePayHistory 316 HumanResources.JobCandidate 13 HumanResources.Shift 3 Person.Address 19614 Person.AddressType 6 Person.BusinessEntity 20777 Person.BusinessEntityAddress 19614 Person.BusinessEntityContact 909 Person.ContactType 20 Person.CountryRegion 238 Person.EmailAddress 19972 Person.Password 19972 Person.Person 19972 Person.PersonPhone 19972 Person.PhoneNumberType 3 Person.StateProvince 181

Step 3: Final Tables Deleted tables in red Production.BillOfMaterials 2679 Production.Culture 8 Production.Document 13 Production.Illustration 5 Production.Location 14 Production.Product 504 Production.ProductCategory 4 Production.ProductCostHistory 395 Production.ProductDescription 762 Production.ProductDocument 32 Production.ProductInventory 1069 Production.ProductListPriceHistory 395 Production.ProductModel 128 Production.ProductModelIllustration 7 Production.ProductModel ProductDescriptionCulture 762 Production.ProductPhoto 101 Production.ProductProductPhoto 504 Production.ProductReview 4 Production.ProductSubcategory 37 Production.ScrapReason 16 Production.TransactionHistory 113443 Production.TransactionHistoryArchive 89253 Production.UnitMeasure 38 Production.WorkOrder 72591

Step 3: Final Tables Purchasing.WorkOrderRouting 67131 Purchasing.ProductVendor 460 Purchasing.PurchaseOrderDetail 8845 Purchasing.PurchaseOrderHeader 4012 Purchasing.ShipMethod 5 Purchasing.Vendor 104 Sales.CountryRegionCurrency 109 Sales.CreditCard 19118 Sales.Currency 105 Sales.CurrencyRate 13532 Sales.Customer 19820 Sales.PersonCreditCard 19118 Sales.SalesOrderDetail 121317 Sales.SalesOrderHeader 31465 Sales.SalesOrderHeaderSalesReason 27647 Sales.SalesPerson 17 Sales.SalesPersonQuotaHistory 163 Sales.SalesReason 10 Sales.SalesTaxRate 29 Sales.SalesTerritory 10 Sales.SalesTerritoryHistory 17 Sales.ShoppingCartItem 3 Sales.SpecialOffer 16 Sales.SpecialOfferProduct 538 Sales.Store 701

Commentary DBRE depends on having FKsThe 0,1 connection deletions lose little infoThe “few” connections deletions are speculative Supertype/subtypes are troublesome From a separate manual DBRE BusinessEntity -> Employee, Vendor, Person, Store Employee -> SalesPerson

Supertype / Subtype

Case Study 3: Core DBRE

Rationale A project building a very large data warehouse100 facts200 dimensionsThe primary operational feeder application has 8500 tables I was new to the project and there was a lot to learn I wanted to reverse engineer the feeder application so that I could understand it

Available Inputs We had the following inputs (paper printouts) for the feeder applicationA thorough data dictionaryPrimary key definitionsForeign key definitions

The DBRE Problem Reverse engineer a database with 8500 tablesWith smaller schema, we could type the database structure into a modeling tool and then analyze itHowever, 8500 tables would take too longWe decided to determine the tightly connected tables and hope that would yield a much smaller model We presume that the tightly connected tables are the most important ones

DBRE Approach Do a graph analysisCreate a meta-table with FK to PK referencesThe FK in the source table points to the PK in the target tableUsing SQL, successively delete tables with 0,1 FK connections The final result is the multiply connected tables

Example

Example Repeatedly subtract tables with one reference until there is no change

Finding Core Tables DELETE FROM TableReferences AS T3WHERE EXISTS ( SELECT T1.sourceTable FROM TableReferences AS T1 WHERE NOT EXISTS ( SELECT * FROM TableReferences AS T2 WHERE T1.sourceTable = T2.targetTable ) AND T3.sourceTable = T1.sourceTable GROUP BY T1.sourceTable HAVING COUNT(*)= 1 ); The middle query finds tables with one source reference The innermost query limits the one-source tables to those that are not the target of any other sources The outer query does the deletion

Results Initial: 8500 tablesSeveral thousand FK definitions854 tables have FK columns254 tables are referenced by FKs Final result : 553 core tables

Case Study 4:Enterprise Data Model

Rationale Construct an enterprise data modelMy client – a financial software vendor – was a fusion of five formerly separate companiesThe applications were greatly dissimilar because they were built by separate organizations The purpose of the EDM was to provide a basis for integrating the applications and help the new company strengthen their brand

DBRE Approach We wanted to seed the EDM with application contentWe tried full DBRE but it was not helpful because the resulting models were so differentWe tried core DBRE but the models were still confusing because they were so different Finally we decided to count the FK references to each table We included the tables with the highest counts This worked

Example AddressType 2Applicant 29ApplicantAddressHistory 7ApplicantType 2 LenderApplicantDetails 2 Country 2 OverseasCorrespondence 2 ExistingInsuranceCover 6 Provider 3 PaymentFrequency 2 Fee 11 FeeDueType 2 FeeType 2 ProductFee 2

Results

DBRE and Zachman FW The “What” columnPopulated database – Technician perspectiveDatabase schema – Engineer perspective Physical data model – Architect perspective Logical data model – Business mgmt. perspective Conceptual data model – Executive perspective

Thank you for attending… Any questions???