Michael R Blaha DSc blahacomputerorg wwwsuperdataguycom DAMA MN Nov 16 2016 What is Database Reverse Engineering Reverse engineering is the inverse to normal development Start with an application and work backwards to understand the software and infer its intent ID: 777558
Download The PPT/PDF document "A Database Reverse Engineering Case Stud..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A Database Reverse Engineering Case Study
Michael R. Blaha, DSc.blaha@computer.orgwww.superdataguy.com
DAMA
MN
Nov. 16,
2016
Slide2What is Database Reverse Engineering?
Reverse engineering is the inverse to normal developmentStart with an application and work backwards to understand the software and infer its intentReverse engineering can apply to a variety of artifacts
Hardware, programming code, databases, …
Our focus here is on databases
Slide3Why Would Anyone Want to Do DBRE?
To elicit requirementsDBRE is not intended to perpetuate past flawsDBRE is merely a source of tentative requirementsTo convert legacy dataTo integrate application stovepipes
To assess software
To assist maintenance
To construct documentation
Slide4Inputs to DBRE
The available information varies widely…Database structureDocumentationApplication understanding
Data
Database queries
Forms and reports
Slide5Key Themes for DBRE
Don’t mistake hypotheses for conclusionsExpect multiple interpretationsDon’t be discouraged by approximate resultsExpect odd constructs
Watch for consistent style
Slide6Case Study 1:
Reverse Engineer WordPress
Slide7Rationale
WordPress is an interesting DBRE case study because...WordPress is a well-known applicationWordPress is a framework that generalizes the process of building a website
The case study has a populated database
The data is real (not synthetic)
The data is not proprietary
Illustrate DBRE techniques for a small database
Slide8Processing Details
Export MySQL db from www.superdataguy.com websiteThe exported localhost.sql is unreadable
Import
SQL code into a local MySQL
db
Export schema only from local MySQL
db
The exported file is readable
Manually edit the SQL by deleting…
`, unsigned, COLLATE, KEY, UNIQUE KEY, ENGINE
Reverse engineer schema with ERwin
Slide9Initial ERwin Model
Slide10Record Counts
From querying the MySQL database
wp_fqir_commentmeta
90
wp_fqir_comments
31
wp_fqir_links
0
wp_fqir_options
377
wp_fqir_postmeta
703
wp_fqir_posts
434
wp_fqir_term_relationships
64
wp_fqir_term_taxonomy
15
wp_fqir_termmeta
0
wp_fqir_terms
15
wp_fqir_usermeta
49
wp_fqir_users
1
Slide11Manually Add FKs
Look for name similarityVerify with data analysisSELECT * FROM wp_fqir_commentmetaWHERE comment_ID NOT IN
(
SELECT comment_ID
FROM wp_fqir_comments);
--
0
records
SELECT * FROM wp_fqir_commentmeta
WHERE comment_ID
IS NULL
;
--
0
records
Slide12DBRE ERwin Model
Slide13Commentary
WordPress has a very small schemaOnly 12 tablesI had expected more tablesWordPress has no dangling references
WordPress lacks RI
WordPress compensates with careful programming
ERwin only partially reverse engineers MySQL
It chokes on some keywords
Slide14Case Study 2:
Reverse Engineer Adventure Works 2012
Slide15Rationale
Adventure Works 2012 is an interesting DBRE case study because...Adventure Works is a free database provided with MS SQL ServerThe case study has a populated database
The data is not proprietary
The database is of medium size (71 tables)
The database defines referential integrity
Only one FK is missing
Slide16Mechanical Approach
Strip down the schema to get to a core modelThis is like skimming a bookWe are working towards an abridgement of a modelWe can quickly get a sense of a schema
We will use ER/Studio
Slide17Record Counts
Query the SQL Server databasedbo.AWBuildVersion
1
dbo.DatabaseLog
1597
dbo.ErrorLog
0
HumanResources.Department
16
HumanResources.Employee
290
HumanResources.
EmployeeDepartmentHistory
296
HumanResources.EmployeePayHistory
316
HumanResources.JobCandidate
13
HumanResources.Shift
3
Person.Address
19614
Person.AddressType
6
Person.BusinessEntity
20777
Person.BusinessEntityAddress
19614
Person.BusinessEntityContact
909
Person.ContactType
20
Person.CountryRegion
238
Person.EmailAddress
19972
Person.Password
19972
Person.Person
19972
Person.PersonPhone
19972
Person.PhoneNumberType
3
Person.StateProvince
181
Slide18Record Counts
Record counts partially indicate table purposeProduction.BillOfMaterials
2679
Production.Culture
8
Production.Document
13
Production.Illustration
5
Production.Location
14
Production.Product
504
Production.ProductCategory
4
Production.ProductCostHistory
395
Production.ProductDescription
762
Production.ProductDocument
32
Production.ProductInventory
1069
Production.ProductListPriceHistory
395
Production.ProductModel
128
Production.ProductModelIllustration
7
Production.ProductModel
ProductDescriptionCulture
762
Production.ProductPhoto
101
Production.ProductProductPhoto
504
Production.ProductReview
4
Production.ProductSubcategory
37
Production.ScrapReason
16
Production.TransactionHistory
113443
Production.TransactionHistoryArchive
89253
Production.UnitMeasure
38
Production.WorkOrder
72591
Slide19Record Counts
Note different counts for ‘types’ and ‘instances’
Purchasing.WorkOrderRouting
67131
Purchasing.ProductVendor
460
Purchasing.PurchaseOrderDetail
8845
Purchasing.PurchaseOrderHeader
4012
Purchasing.ShipMethod
5
Purchasing.Vendor
104
Sales.CountryRegionCurrency
109
Sales.CreditCard
19118
Sales.Currency
105
Sales.CurrencyRate
13532
Sales.Customer
19820
Sales.PersonCreditCard
19118
Sales.SalesOrderDetail
121317
Sales.SalesOrderHeader
31465
Sales.SalesOrderHeaderSalesReason
27647
Sales.SalesPerson
17
Sales.SalesPersonQuotaHistory
163
Sales.SalesReason
10
Sales.SalesTaxRate
29
Sales.SalesTerritory
10
Sales.SalesTerritoryHistory
17
Sales.ShoppingCartItem
3
Sales.SpecialOffer
16
Sales.SpecialOfferProduct
538
Sales.Store
701
Slide20Processing Details
Import schema into ER/StudioFile / New / Reverse engineer AdventureWorks2012All owners, user tables, no inferencesAdd FK Production.WorkOrderRouting.ProductID -> Production.Product.ProductID
Successively delete all entity types with 0,1 connections
Delete entity types with few (<= 3) connections
Slide21Step 1: ER/Studio Model
Slide22Step 2: ER/Studio Model
Slide23Step 3: ER/Studio Model
Slide24Step 3: Final Tables
Remaining tables in blackdbo.AWBuildVersion
1
dbo.DatabaseLog
1597
dbo.ErrorLog
0
HumanResources.Department
16
HumanResources.Employee
290
HumanResources.
EmployeeDepartmentHistory
296
HumanResources.EmployeePayHistory
316
HumanResources.JobCandidate
13
HumanResources.Shift
3
Person.Address
19614
Person.AddressType
6
Person.BusinessEntity
20777
Person.BusinessEntityAddress
19614
Person.BusinessEntityContact
909
Person.ContactType
20
Person.CountryRegion
238
Person.EmailAddress
19972
Person.Password
19972
Person.Person
19972
Person.PersonPhone
19972
Person.PhoneNumberType
3
Person.StateProvince
181
Slide25Step 3: Final Tables
Deleted tables in red
Production.BillOfMaterials
2679
Production.Culture
8
Production.Document
13
Production.Illustration
5
Production.Location
14
Production.Product
504
Production.ProductCategory
4
Production.ProductCostHistory
395
Production.ProductDescription
762
Production.ProductDocument
32
Production.ProductInventory
1069
Production.ProductListPriceHistory
395
Production.ProductModel
128
Production.ProductModelIllustration
7
Production.ProductModel
ProductDescriptionCulture
762
Production.ProductPhoto
101
Production.ProductProductPhoto
504
Production.ProductReview
4
Production.ProductSubcategory
37
Production.ScrapReason
16
Production.TransactionHistory
113443
Production.TransactionHistoryArchive
89253
Production.UnitMeasure
38
Production.WorkOrder
72591
Slide26Step 3: Final Tables
Purchasing.WorkOrderRouting
67131
Purchasing.ProductVendor
460
Purchasing.PurchaseOrderDetail
8845
Purchasing.PurchaseOrderHeader
4012
Purchasing.ShipMethod
5
Purchasing.Vendor
104
Sales.CountryRegionCurrency
109
Sales.CreditCard
19118
Sales.Currency
105
Sales.CurrencyRate
13532
Sales.Customer
19820
Sales.PersonCreditCard
19118
Sales.SalesOrderDetail
121317
Sales.SalesOrderHeader
31465
Sales.SalesOrderHeaderSalesReason
27647
Sales.SalesPerson
17
Sales.SalesPersonQuotaHistory
163
Sales.SalesReason
10
Sales.SalesTaxRate
29
Sales.SalesTerritory
10
Sales.SalesTerritoryHistory
17
Sales.ShoppingCartItem
3
Sales.SpecialOffer
16
Sales.SpecialOfferProduct
538
Sales.Store
701
Slide27Commentary
DBRE depends on having FKsThe 0,1 connection deletions lose little infoThe “few” connections deletions are speculativeSupertype/subtypes are troublesome
From a separate manual DBRE
BusinessEntity -> Employee, Vendor, Person, Store
Employee -> SalesPerson
Slide28Supertype / Subtype
Slide29Case Study 3:
Core DBRE
Slide30Rationale
A project building a very large data warehouse100 facts200 dimensionsThe primary operational feeder application has 8500 tablesI was new to the project and there was a lot to learn
I wanted to reverse engineer the feeder application so that I could understand it
Slide31Available Inputs
We had the following inputs (paper printouts) for the feeder applicationA thorough data dictionaryPrimary key definitionsForeign key definitions
Slide32The DBRE Problem
Reverse engineer a database with 8500 tablesWith smaller schema, we could type the database structure into a modeling tool and then analyze itHowever, 8500 tables would take too longWe decided to determine the tightly connected tables and hope that would yield a much smaller model
We presume that the tightly connected tables are the most important ones
Slide33DBRE Approach
Do a graph analysisCreate a meta-table with FK to PK referencesThe FK in the source table points to the PK in the target tableUsing SQL, successively delete tables with 0,1 FK connections
The final result is the multiply connected tables
Slide34Example
Slide35Example
Repeatedly subtract tables with one reference until
there
is no
change
Slide36Finding Core Tables
DELETE FROM TableReferences AS T3WHERE EXISTS ( SELECT T1.sourceTable FROM TableReferences AS T1 WHERE NOT EXISTS
(
SELECT *
FROM
TableReferences AS
T2
WHERE
T1.sourceTable =
T2.targetTable )
AND
T3.sourceTable =
T1.sourceTable
GROUP BY
T1.sourceTable
HAVING COUNT(*)=
1 );
The middle query finds tables with one source
reference
The
innermost query limits the one-source tables to those that are not the target of any other
sources
The
outer query does the
deletion
Slide37Results
Initial: 8500 tablesSeveral thousand FK definitions854 tables have FK columns254 tables are referenced by FKs
Final result
: 553 core tables
Slide38Case Study 4:
Enterprise Data Model
Slide39Rationale
Construct an enterprise data modelMy client – a financial software vendor – was a fusion of five formerly separate companiesThe applications were greatly dissimilar because they were built by separate organizations
The purpose of the EDM was to provide a basis for integrating the applications and help the new company strengthen their
brand
Slide40DBRE Approach
We wanted to seed the EDM with application contentWe tried full DBRE but it was not helpful because the resulting models were so differentWe tried core DBRE but the models were still confusing because they were so different
Finally we decided to count the FK references to each
table
We included the tables with the highest counts
This worked
Slide41Example
AddressType 2Applicant 29ApplicantAddressHistory 7ApplicantType
2
LenderApplicantDetails
2
Country
2
OverseasCorrespondence
2
ExistingInsuranceCover
6
Provider
3
PaymentFrequency
2
Fee
11
FeeDueType
2
FeeType
2
ProductFee
2
Slide42Results
Slide43Thank you for attending…
Any questions???