/
Ripple Joins for Online Aggregation Ripple Joins for Online Aggregation

Ripple Joins for Online Aggregation - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
464 views
Uploaded On 2016-07-05

Ripple Joins for Online Aggregation - PPT Presentation

By Peter J Haas and Joseph M Hellerstein published in June 1999 Presented By Sthuti Kripanidhi 9282010 1 CSE 6339 Data Exploration Overview What the paper is all about Traditional Algorithms ID: 391630

join ripple 6339 data ripple join data 6339 exploration 2010 cse joins online aggregation max block tuples time algorithms

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Ripple Joins for Online Aggregation" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Ripple Joins for Online Aggregation

By:Peter J. Haas and Joseph M. Hellersteinpublished in June 1999:Presented By:Sthuti Kripanidhi

9/28/2010

1

CSE 6339 - Data ExplorationSlide2

OverviewWhat the paper is all about

Traditional Algorithms Online AggregationRipple Joins: IntroductionHow different is Ripple joinRipple Join variantsAspect ratiosFuture Work9/28/20102

CSE 6339 - Data ExplorationSlide3

What the paper is about..The paper talks about a class of join algorithms called Ripple joins for the online processing of multi-table aggregation queries.

This paper tells how to join a bunch of tables and get the SUM, COUNT, or AVG in GROUP BY clauses showing approximate results immediately and the confidence interval of the results from the first few tuples retrieved.9/28/20103

CSE 6339 - Data ExplorationSlide4

Traditional Algorithms

Traditional algorithms take a lot of time since they have to process the entire tables or relations The users have to wait for a long time before the results are returned.An better method is Online Aggregation.9/28/20104

CSE 6339 - Data ExplorationSlide5

Online AggregationA running estimate of the final aggregates are continuously displayed to the user.

Quick results rather than minimize time for completion.The proximity of the running estimate to the final result is also displayed to the user.(confidence interval).9/28/2010CSE 6339 - Data Exploration5Slide6

GUI

9/28/2010CSE 6339 - Data Exploration6Slide7

Ripple Joins: IntroductionGeneralize the traditional block nested loops and hash joins.

Non blockingSquare ripple join – samples are drawn at the same rateRectangular ripple join – samples out one relation at a higher rate than another.9/28/2010CSE 6339 - Data Exploration

7Slide8

Ripple Join: IntroductionTypical query forms

SELECT op(expression) FROM R1, R2, … , RK WHERE predicate

GROUP BY columns;

9/28/2010

CSE 6339 - Data Exploration

8Slide9

How different is Ripple join?Traditional hash join blocks until the entire query output is finished.

Ripple join reports approximate results after each sampling step, and allows user intervention.In the inner loop, an entire table is scanned. Ripple join expands the sample set incrementally. Ripple joins avoid complete scan of the relations.

9/28/2010

9

CSE 6339 - Data ExplorationSlide10

How Ripple Join works..Assume ripple join of relations R and S

Select a random tuple r from R. Join with previously selected S tuples. Select a random tuple s from S. Join with previously selected R tuples. Join r and s.

9/28/2010

10

CSE 6339 - Data ExplorationSlide11

Ripple Join: Square two table join

9/28/2010CSE 6339 - Data Exploration11

R

S X

N = 1Slide12

9/28/2010

CSE 6339 - Data Exploration12

R

S X

X

X X

N = 2Slide13

9/28/2010

CSE 6339 - Data Exploration13

R

S X X

X

X X

X

X X X

N = 3Slide14

Ripple Join AlgorithmFor(max=1 to infinity)

{ for(i=1 to max-1) if(predicate(R[i],s[max])) output(R[i],S[max]); for(i=1 to max) if(predicate(R[max],s[

i])) output(R[max],S[

i]);}

9/28/2010

14

CSE 6339 - Data ExplorationSlide15

Ripple Join IteratorAn

iterator based DBMS invokes an iterator’s next() method each time an output tuple is needed.The iterator needs to store the next position to be fetched from each of its inputs R and S.9/28/2010CSE 6339 - Data Exploration

15Slide16

PipeliningCan easily be pipelined for multiple binary joins

Cannot do three-table joins as two binary ripple joins.9/28/201016CSE 6339 - Data ExplorationSlide17

Ripple Join VariantsBlock Ripple Join

Hash Ripple JoinIndex Ripple Join9/28/2010CSE 6339 - Data Exploration17Slide18

Block Ripple JoinTakes disk blocks of R and S in turn (not

tuples)Read a disk block of R and scan against old S Evict from memoryRead Block of S and compare with older R tuples.Has I/O saving since each block is taken out at a time.9/28/2010CSE 6339 - Data Exploration

18Slide19

Index and Hash Ripple JoinsIndex Ripple Join

Identical to indexed-enhanced nested loop joinHash Ripple JoinUsed only for Equijoin queries.9/28/2010CSE 6339 - Data Exploration19Slide20

Statistical ConsiderationsGoal-to provide efficient, accurate, interactive estimation.

Estimator unbiased, consistentRunning average is biased but consistentCapable of giving tight confidence intervals9/28/2010CSE 6339 - Data Exploration20Slide21

Aspect Ratios

Aspect ratio: how many tuples are retrieved from each base relation per sampling step. e.g. β1 = 1, β

2 = 3, …

Ripple join adjusts the aspect ratio according to the sizes of the base relations.

9/28/2010

CSE 6339 - Data Exploration

21Slide22

Why is it called Ripple Join?9/28/2010

CSE 6339 - Data Exploration22

The algorithm seems to ripple out from a corner of the join.

Acronym: "Rectangles of Increasing Perimeter Length"Slide23

Performance

9/28/2010CSE 6339 - Data Exploration23Slide24

Conclusions and Future Work Complete implementation of online aggregation must be able to handle multi-table queries.

This paper introduces ripple joins, a family of join algorithms designed to meet the performance needs of online aggregation system.9/28/2010CSE 6339 - Data Exploration24Slide25

Though ripple joins are symmetric, it is still not clear how a query optimizer should choose among the ripple join variants, nor how it should order a sequence of ripple joins.

9/28/2010CSE 6339 - Data Exploration25Slide26

ReferencesHaas & Hellerstein

, “Ripple Joins for Online Aggregation” (SIGMOD ’99)Haas & Hellerstein, “Online Query Processing: A Tutorial”P. J Haas, J.M Hellerstein and H.J Wang Online aggregation. In Proc. 1997 ACM SIGMOD Intl Conf. Management of data pages.9/28/2010

CSE 6339 - Data Exploration

26