/
A Database Reverse Engineering Case Study A Database Reverse Engineering Case Study

A Database Reverse Engineering Case Study - PowerPoint Presentation

beastialitybiker
beastialitybiker . @beastialitybiker
Follow
346 views
Uploaded On 2020-06-15

A Database Reverse Engineering Case Study - PPT Presentation

Michael R Blaha DSc blahacomputerorg wwwsuperdataguycom DAMA MN Nov 16 2016 What is Database Reverse Engineering Reverse engineering is the inverse to normal development Start with an application and work backwards to understand the software and infer its intent ID: 777558

sales production tables person production sales person tables fqir dbre database purchasing humanresources reverse data study case 19972 counts

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "A Database Reverse Engineering Case Stud..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A Database Reverse Engineering Case Study

Michael R. Blaha, DSc.blaha@computer.orgwww.superdataguy.com

DAMA

MN

Nov. 16,

2016

Slide2

What is Database Reverse Engineering?

Reverse engineering is the inverse to normal developmentStart with an application and work backwards to understand the software and infer its intentReverse engineering can apply to a variety of artifacts

Hardware, programming code, databases, …

Our focus here is on databases

Slide3

Why Would Anyone Want to Do DBRE?

To elicit requirementsDBRE is not intended to perpetuate past flawsDBRE is merely a source of tentative requirementsTo convert legacy dataTo integrate application stovepipes

To assess software

To assist maintenance

To construct documentation

Slide4

Inputs to DBRE

The available information varies widely…Database structureDocumentationApplication understanding

Data

Database queries

Forms and reports

Slide5

Key Themes for DBRE

Don’t mistake hypotheses for conclusionsExpect multiple interpretationsDon’t be discouraged by approximate resultsExpect odd constructs

Watch for consistent style

Slide6

Case Study 1:

Reverse Engineer WordPress

Slide7

Rationale

WordPress is an interesting DBRE case study because...WordPress is a well-known applicationWordPress is a framework that generalizes the process of building a website

The case study has a populated database

The data is real (not synthetic)

The data is not proprietary

Illustrate DBRE techniques for a small database

Slide8

Processing Details

Export MySQL db from www.superdataguy.com websiteThe exported localhost.sql is unreadable

Import

SQL code into a local MySQL

db

Export schema only from local MySQL

db

The exported file is readable

Manually edit the SQL by deleting…

`, unsigned, COLLATE, KEY, UNIQUE KEY, ENGINE

Reverse engineer schema with ERwin

Slide9

Initial ERwin Model

Slide10

Record Counts

From querying the MySQL database

wp_fqir_commentmeta

90

wp_fqir_comments

31

wp_fqir_links

0

wp_fqir_options

377

wp_fqir_postmeta

703

wp_fqir_posts

434

wp_fqir_term_relationships

64

wp_fqir_term_taxonomy

15

wp_fqir_termmeta

0

wp_fqir_terms

15

wp_fqir_usermeta

49

wp_fqir_users

1

Slide11

Manually Add FKs

Look for name similarityVerify with data analysisSELECT * FROM wp_fqir_commentmetaWHERE comment_ID NOT IN

(

SELECT comment_ID

FROM wp_fqir_comments);

--

0

records

SELECT * FROM wp_fqir_commentmeta

WHERE comment_ID

IS NULL

;

--

0

records

Slide12

DBRE ERwin Model

Slide13

Commentary

WordPress has a very small schemaOnly 12 tablesI had expected more tablesWordPress has no dangling references

WordPress lacks RI

WordPress compensates with careful programming

ERwin only partially reverse engineers MySQL

It chokes on some keywords

Slide14

Case Study 2:

Reverse Engineer Adventure Works 2012

Slide15

Rationale

Adventure Works 2012 is an interesting DBRE case study because...Adventure Works is a free database provided with MS SQL ServerThe case study has a populated database

The data is not proprietary

The database is of medium size (71 tables)

The database defines referential integrity

Only one FK is missing

Slide16

Mechanical Approach

Strip down the schema to get to a core modelThis is like skimming a bookWe are working towards an abridgement of a modelWe can quickly get a sense of a schema

We will use ER/Studio

Slide17

Record Counts

Query the SQL Server databasedbo.AWBuildVersion

1

dbo.DatabaseLog

1597

dbo.ErrorLog

0

HumanResources.Department

16

HumanResources.Employee

290

HumanResources.

EmployeeDepartmentHistory

296

HumanResources.EmployeePayHistory

316

HumanResources.JobCandidate

13

HumanResources.Shift

3

Person.Address

19614

Person.AddressType

6

Person.BusinessEntity

20777

Person.BusinessEntityAddress

19614

Person.BusinessEntityContact

909

Person.ContactType

20

Person.CountryRegion

238

Person.EmailAddress

19972

Person.Password

19972

Person.Person

19972

Person.PersonPhone

19972

Person.PhoneNumberType

3

Person.StateProvince

181

Slide18

Record Counts

Record counts partially indicate table purposeProduction.BillOfMaterials

2679

Production.Culture

8

Production.Document

13

Production.Illustration

5

Production.Location

14

Production.Product

504

Production.ProductCategory

4

Production.ProductCostHistory

395

Production.ProductDescription

762

Production.ProductDocument

32

Production.ProductInventory

1069

Production.ProductListPriceHistory

395

Production.ProductModel

128

Production.ProductModelIllustration

7

Production.ProductModel

ProductDescriptionCulture

762

Production.ProductPhoto

101

Production.ProductProductPhoto

504

Production.ProductReview

4

Production.ProductSubcategory

37

Production.ScrapReason

16

Production.TransactionHistory

113443

Production.TransactionHistoryArchive

89253

Production.UnitMeasure

38

Production.WorkOrder

72591

Slide19

Record Counts

Note different counts for ‘types’ and ‘instances’

Purchasing.WorkOrderRouting

67131

Purchasing.ProductVendor

460

Purchasing.PurchaseOrderDetail

8845

Purchasing.PurchaseOrderHeader

4012

Purchasing.ShipMethod

5

Purchasing.Vendor

104

Sales.CountryRegionCurrency

109

Sales.CreditCard

19118

Sales.Currency

105

Sales.CurrencyRate

13532

Sales.Customer

19820

Sales.PersonCreditCard

19118

Sales.SalesOrderDetail

121317

Sales.SalesOrderHeader

31465

Sales.SalesOrderHeaderSalesReason

27647

Sales.SalesPerson

17

Sales.SalesPersonQuotaHistory

163

Sales.SalesReason

10

Sales.SalesTaxRate

29

Sales.SalesTerritory

10

Sales.SalesTerritoryHistory

17

Sales.ShoppingCartItem

3

Sales.SpecialOffer

16

Sales.SpecialOfferProduct

538

Sales.Store

701

Slide20

Processing Details

Import schema into ER/StudioFile / New / Reverse engineer AdventureWorks2012All owners, user tables, no inferencesAdd FK Production.WorkOrderRouting.ProductID -> Production.Product.ProductID

Successively delete all entity types with 0,1 connections

Delete entity types with few (<= 3) connections

Slide21

Step 1: ER/Studio Model

Slide22

Step 2: ER/Studio Model

Slide23

Step 3: ER/Studio Model

Slide24

Step 3: Final Tables

Remaining tables in blackdbo.AWBuildVersion

1

dbo.DatabaseLog

1597

dbo.ErrorLog

0

HumanResources.Department

16

HumanResources.Employee

290

HumanResources.

EmployeeDepartmentHistory

296

HumanResources.EmployeePayHistory

316

HumanResources.JobCandidate

13

HumanResources.Shift

3

Person.Address

19614

Person.AddressType

6

Person.BusinessEntity

20777

Person.BusinessEntityAddress

19614

Person.BusinessEntityContact

909

Person.ContactType

20

Person.CountryRegion

238

Person.EmailAddress

19972

Person.Password

19972

Person.Person

19972

Person.PersonPhone

19972

Person.PhoneNumberType

3

Person.StateProvince

181

Slide25

Step 3: Final Tables

Deleted tables in red

Production.BillOfMaterials

2679

Production.Culture

8

Production.Document

13

Production.Illustration

5

Production.Location

14

Production.Product

504

Production.ProductCategory

4

Production.ProductCostHistory

395

Production.ProductDescription

762

Production.ProductDocument

32

Production.ProductInventory

1069

Production.ProductListPriceHistory

395

Production.ProductModel

128

Production.ProductModelIllustration

7

Production.ProductModel

ProductDescriptionCulture

762

Production.ProductPhoto

101

Production.ProductProductPhoto

504

Production.ProductReview

4

Production.ProductSubcategory

37

Production.ScrapReason

16

Production.TransactionHistory

113443

Production.TransactionHistoryArchive

89253

Production.UnitMeasure

38

Production.WorkOrder

72591

Slide26

Step 3: Final Tables

Purchasing.WorkOrderRouting

67131

Purchasing.ProductVendor

460

Purchasing.PurchaseOrderDetail

8845

Purchasing.PurchaseOrderHeader

4012

Purchasing.ShipMethod

5

Purchasing.Vendor

104

Sales.CountryRegionCurrency

109

Sales.CreditCard

19118

Sales.Currency

105

Sales.CurrencyRate

13532

Sales.Customer

19820

Sales.PersonCreditCard

19118

Sales.SalesOrderDetail

121317

Sales.SalesOrderHeader

31465

Sales.SalesOrderHeaderSalesReason

27647

Sales.SalesPerson

17

Sales.SalesPersonQuotaHistory

163

Sales.SalesReason

10

Sales.SalesTaxRate

29

Sales.SalesTerritory

10

Sales.SalesTerritoryHistory

17

Sales.ShoppingCartItem

3

Sales.SpecialOffer

16

Sales.SpecialOfferProduct

538

Sales.Store

701

Slide27

Commentary

DBRE depends on having FKsThe 0,1 connection deletions lose little infoThe “few” connections deletions are speculativeSupertype/subtypes are troublesome

From a separate manual DBRE

BusinessEntity -> Employee, Vendor, Person, Store

Employee -> SalesPerson

Slide28

Supertype / Subtype

Slide29

Case Study 3:

Core DBRE

Slide30

Rationale

A project building a very large data warehouse100 facts200 dimensionsThe primary operational feeder application has 8500 tablesI was new to the project and there was a lot to learn

I wanted to reverse engineer the feeder application so that I could understand it

Slide31

Available Inputs

We had the following inputs (paper printouts) for the feeder applicationA thorough data dictionaryPrimary key definitionsForeign key definitions

Slide32

The DBRE Problem

Reverse engineer a database with 8500 tablesWith smaller schema, we could type the database structure into a modeling tool and then analyze itHowever, 8500 tables would take too longWe decided to determine the tightly connected tables and hope that would yield a much smaller model

We presume that the tightly connected tables are the most important ones

Slide33

DBRE Approach

Do a graph analysisCreate a meta-table with FK to PK referencesThe FK in the source table points to the PK in the target tableUsing SQL, successively delete tables with 0,1 FK connections

The final result is the multiply connected tables

Slide34

Example

Slide35

Example

Repeatedly subtract tables with one reference until

there

is no

change

Slide36

Finding Core Tables

DELETE FROM TableReferences AS T3WHERE EXISTS ( SELECT T1.sourceTable FROM TableReferences AS T1 WHERE NOT EXISTS

(

SELECT *

FROM

TableReferences AS

T2

WHERE

T1.sourceTable =

T2.targetTable )

AND

T3.sourceTable =

T1.sourceTable

GROUP BY

T1.sourceTable

HAVING COUNT(*)=

1 );

The middle query finds tables with one source

reference

The

innermost query limits the one-source tables to those that are not the target of any other

sources

The

outer query does the

deletion

Slide37

Results

Initial: 8500 tablesSeveral thousand FK definitions854 tables have FK columns254 tables are referenced by FKs

Final result

: 553 core tables

Slide38

Case Study 4:

Enterprise Data Model

Slide39

Rationale

Construct an enterprise data modelMy client – a financial software vendor – was a fusion of five formerly separate companiesThe applications were greatly dissimilar because they were built by separate organizations

The purpose of the EDM was to provide a basis for integrating the applications and help the new company strengthen their

brand

Slide40

DBRE Approach

We wanted to seed the EDM with application contentWe tried full DBRE but it was not helpful because the resulting models were so differentWe tried core DBRE but the models were still confusing because they were so different

Finally we decided to count the FK references to each

table

We included the tables with the highest counts

This worked

Slide41

Example

AddressType 2Applicant 29ApplicantAddressHistory 7ApplicantType

2

LenderApplicantDetails

2

Country

2

OverseasCorrespondence

2

ExistingInsuranceCover

6

Provider

3

PaymentFrequency

2

Fee

11

FeeDueType

2

FeeType

2

ProductFee

2

Slide42

Results

Slide43

Thank you for attending…

Any questions???