/
Compression Aware Physical Database Design Compression Aware Physical Database Design

Compression Aware Physical Database Design - PowerPoint Presentation

BelieveInYou
BelieveInYou . @BelieveInYou
Follow
342 views
Uploaded On 2022-07-28

Compression Aware Physical Database Design - PPT Presentation

Vivek Narasayya Manoj Syamala Microsoft Research Brown University Hideaki Kimura hkimuracsbrownedu viveknarmanojsy microsoftcom Graduates soon On Job Market Every Major DBMS ID: 930803

indexes compression 000aa 00bbb compression indexes 00bbb 000aa compressed space size bbb index budget design select dict overheads shipdate

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Compression Aware Physical Database Desi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Compression Aware Physical Database Design

Vivek NarasayyaManoj Syamala

Microsoft Research

Brown University

Hideaki Kimura*

hkimura@cs.brown.edu

{viveknar,manojsy}@microsoft.com

(*) Graduates soon. On Job Market.

Slide2

Every Major DBMS Supports

Saves Storage ConsumptionSaves I/O BandwidthBackground: Compression in DB

Tables,

Indexes

SELECT

Compressed

Data

INSERT

Decompress

C

ompress

Query Process

Engine

DBMS A:

4x

!

DBMS B:

10x

!

DBMS C:

12x

!

Slide3

Compression Schemes in DB

City

Seattle

San JoseSeattle

..

Dictionary Encoding

1

2

1

..

Dict.

1:Seattle

2:San

Jose

+

Local dict. (Oracle, SQL Server)

Global dict. (DB2)

NULL Suppression

LZO

, RLE…

Price000321

000054000015

..

@321

@54

@15

..

Prefix Suppression,

Slide4

Two Types of Compression in DB

Order

Independent

Order

Dependent

A

000AA

000AA

000AA

000AA

00BBB

00BBB

00BBB

00BBB

B

X

X

Y

Y

X

X

Y

Y

I

AB

A

000AA

000AA

00BBB

00BBB

000AA

000AA

00BBB

00BBB

B

X

X

X

X

Y

Y

Y

Y

I

BA

A

@AA

@AA

@AA@AA@BBB@BBB@BBB@BBB

A@AA@AA@BBB@BBB@AA@AA@BBB@BBB

IAB

IBA

A000AA↑↑↑00BBB↑↑↑

A000AA↑00BBB↑000AA↑00BBB↑

IAB

IBA

page

=

fragmented

NULL-Supp.

Global dict.

Run Length Enc.

Local dict.

Slide5

Saves Storage Space, I/O

CPU Overhead to Compress & DecompressDifferent Compression Scheme= Different Saving

↔ Overhead

Benefits and Overheads

How Do We Use It?

DBA

Slide6

Depends on Workload

SELECTs/INSERTs FrequencyCPU bottleneck? IO bottleneck?

Issue 1: To Compress or not..

Depends on

Data

9GB

10GB

-90%

10GB

1GB

-10%

High Compression Ratio

Low Compression Ratio

Slide7

Issue 2: What Index to Create

I

1

I

2

I

3

I

4

Q1

Q2

I

1

I

3

I

5

I

5

Syntactically

Relevant

Indexes

 

 

 

Select

Candidate Configurations

Enumerate

Best

Configuration

 

Configuration

I

1

I

5

 

Physical DB Design Tool

DBMS

Query

Optimizer

Hypothetical

Indexes

Estimate Runtime

What-if

A

nalysis

Prune

Slide8

Run Design Tool to Select I

ndexesCompress them, then Repeat.Naïve Solution: Staged Design

Stage 1

Stage 2

Compress!

100 MB

Budget

Idx

MV

100 MB

Idx

MV

5

0 MB

Idx

MV

100 MB

Workload

Slide9

Misses an index that makes sense only with compression

Problem in tight space budget

Shipdate

State

PriceDiscount

Feb 21CA

$12310%Jan 9

RI

$222

0%

Jul

5

TX

$213

5%

Sales

SELECT SUM(Price*Discount) FROM Sales

WHERE State='CA' and Jul 01 < Shipdate < Sep 01

I1

(State, Shipdate):

95 MB → 50 MB

I

2 (State, Shipdate)Include (Price, Discount):

170 MB

9

0 MB

Choice for 100 MB?

Slide10

Example: Tight Space Budget

?

Good design:

175MB

CREATE COMPRESSED INDEX (L_PARTKEY,L_ORDERKEY,L_SUPPKEY) INCLUDE (L_QUANTITY,L_EXTENDEDPRICE,L_DISCOUNT)Staged:

155MBCREATE INDEX (L_ORDERKEY) INCLUDE(L_SUPPKEY,L_COMMITDATE,L_RECEIPTDATE)

Slide11

Result in too high CPU overheads for compression/decompression.

Problem in plenty space budget

I

2

(State, Shipdate)Include (Price, Discount):

170 MB

→ 90 MB

Choice for 200 MB?

UPDATE Sales SET Price=..

I

1

(State, Shipdate):

95 MB

50 MB

INSERT INTO Sales …

CPU Overheads

Slide12

Example: Plenty Space Budget

Worse with More Budget!

Slide13

How to Estimate Index-size

after compression?How to Evaluate benefits/overheads of compression?How Compression affects Candidate Selection/Enumeration?

Integrated Solution N

eeded!

Slide14

Essential Metric of Indexes

To Fit Space BudgetTo Estimate I/O costNeed Compression FractionSize Estimation

Col-A

Width=8

Col-B

Width=4

Col-C

Width=10

S

tats

Table

#tuple=

1M

Size (

I

ABC) = (8 + 4 + 10 + 4) * 1M = 26 MB

Clust. Key

Width=4Comp. Size (

IABC) = 26 MB * CF (IABC

)

Slide15

SampleCF

[Idreos et al. ICDE'10]

Prior work

Sample Size: Cost

Accuracy

Still Expensive for

1,000s

of indexes

1GB

Table

10MB

Sample

CREATE

COMPRESSED

INDEX

SampleCF

Overheads

Slide16

Solution Overview

Slide17

Index Size Deduction

I

b

I

a

I

a,b

SampleCF

Col-Ext Deduction

I

a

I

b

NULL supp. (

ORD-IND

)

I

a,b

I

b,a

Col-Set

Deduction

A

000AA

/

/

/

00BBB

/

/

/

A

000AA

/

00BBB

/

000AA

/

00BBB

/

I

AB

I

BA

Local dict. (

ORD-DEP

)

Estimate From Run-Length

Sum-up

Savings

More Details in paper

Slide18

Size-Estimation Strategy

Sample Size?Deduction Path?Expected Errors?Formulate as Graph Problem

Greedy algorithm to solve(details in the paper)

Optimize Accuracy-Cost Trade-off

Slide19

Query Cost model to consider (De)Compression CPU costCandidate Selection/Enumeration

Issues in Design Tool

Key Challenge:

Space-Performance

Trade-off

Slide20

Candidate Selection:Space-Performance Trade-off

I

A

I

B

I

C

I

D

Q1

Q2

Select

Fastest

I

A

I

C

I

A

I

B

I

C

I

D

Compressed Versions

Add Compressed

Indexes

Compressed Indexes

are often

Slower-but-Smaller

Most of them

are Ignored!

(exception: very high

compression ratio)

Slide21

Skyline Candidate Selection

Construct

Skyline

of ConfigurationsPick Both Fast-Indexes

and Small-Indexes

Slide22

Greedy picks un-compressed indexes too early

Enumeration: Problem

I

A

I

B

I

C

B

I

C

10MB

5MB

10MB

Comp.

Seed

I

A

 

I

C

15MB Room

I

A

 

I

B

I

A

 

I

C

B

I

A

 

I

B

I

C

B

I

A

 

I

B

I

C

I

A

 

I

C

I

C

B

Optimal Design

Slide23

Recover oversized configurations

Compress indexes in the config.Local Backtrack in Enumeartion

I

A

I

A

 

I

B

I

A

 

I

B

I

C

I

A

 

I

C

I

C

B

Recover

If Oversized

I

A

 

I

B

I

C

C

 

Slide24

Implemented on SQL Server 2008Modified Database Tuning Advisor (DTA) "

DTAc"Modified Query Cost Model

TPC-H Scale-1 (more results in paper)SELECT-intensive/UPDATE-intensiveCompared Estimated Runtime

Experimental Results

Slide25

Both Skyline & Backtrack are required esp. for tight budget

Candidate Selection/Enumeration

Clustered/

2ndary

Indexes

Slide26

Especially better in tight budgetChoose lightly compressed designs in UPDATE-intensive

DTAc vs. DTA

Clustered/

2ndary

/MV Indexes

Slide27

Reduce Size Estimation Overheads for a factor of 3

Mostly <10% Estimation ErrorOverhead in DTA

Slide28

Opportunities and ChallengesIntegrated Approach to exploit

compression in physical designSpace-Performance TradeoffSize EstimationOpen IssuesColumn-Store

Conclusion