/
Evaluating Window Joins over Unbounded Streams Evaluating Window Joins over Unbounded Streams

Evaluating Window Joins over Unbounded Streams - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
389 views
Uploaded On 2016-08-02

Evaluating Window Joins over Unbounded Streams - PPT Presentation

Author Jaewoo Kang Jeffrey F Naughton Stratis D Viglas University of WisconsinMadison CS Dept Presenter Yang YingChia 楊 應 甲 R01922018 CSIE National Taiwan University ID: 429576

window join moving cost join window cost moving tuples maximizing time input streams memory efficiency processing joins workestimating abstractbackgroundintroductionrelated joinson hash nlj

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Evaluating Window Joins over Unbounded S..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Evaluating Window Joins over Unbounded Streams

Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas

University of Wisconsin-Madison CS Dept.

Presenter

:

Yang Ying-Chia

(

R01922018)

CSIE,

National

Taiwan

UniversitySlide2

Outline

Abstract

Background

IntroductionRelated WorkEstimating the Cost of Moving Window JoinsOn Maximizing the Efficiency of Processing JoinsConclusion

2Slide3

Abstract – Problem and Solution

Problem: Process joins over unbounded streams.

Solution: Moving Window Join

Queries have “window predicates”

3Slide4

Abstract – Central Point of the Thesis

The paper proposes a

unit-time-basis cost

model for evaluating moving window joins.Using this cost model, it proposes strategies for maximizing the efficiency of processing joins in different scenarios.

4Slide5

AbstractBackground

Introduction

Related Work

Estimating the Cost of Moving Window JoinsOn Maximizing the Efficiency of Processing JoinsConclusion5Slide6

Background

Join

Nested Loops Join (NLJ)

Hash Join (HJ)Moving Window Join6Slide7

Background – Join

7Slide8

Background – Nested Loops Join (NLJ)

8Slide9

Background – Hash Join (HJ)

9Slide10

Background – Moving Window Join

10Slide11

Background – Moving Window Join

Instead of saying we want to join all tuples of A and B, we say we want to join all tuples that have arrived on A in the last t1 seconds with all the tuples that have arrived on S in the last t2 seconds.

11Slide12

AbstractBackground

Introduction

Related Work

Estimating the Cost of Moving Window JoinsOn Maximizing the Efficiency of Processing JoinsConclusion12Slide13

Introduction – Questions

How can we measure the efficiency of a moving

window join evaluation strategy

, since the traditional metric of execution time to completion does not apply?

Can an algorithm for a moving window join take advantage of asymmetries in the rates of the input streams?

How can we

deal with cases in which an input stream is so fast that the system cannot keep up?

If memory is

the

bottleneck, how should we allocate memory between the

two

windows

for

the two inputs

?

13Slide14

Introduction – The Three Scenarios

One stream is much faster than the other.

System resources are insufficient to keep up with the input

streams.Memory is limited.

14Slide15

AbstractBackground

Introduction

Related Work

Estimating the Cost of Moving Window JoinsOn Maximizing the Efficiency of Processing JoinsConclusion15Slide16

Related Work

Predicate grouping and group optimization techniques

Adaptive query processing and query scrambling

Symmetric Hash Join and symmetric nested loops joinDiag-Join for data warehouse environment

Rate based streaming query optimization framework

16Slide17

AbstractBackground

Introduction

Related Work

Estimating the Cost of Moving Window JoinsOn Maximizing the Efficiency of Processing JoinsConclusion17Slide18

Estimating the Cost of Moving Window Joins

Cost model

Cost of a single join operation

18Slide19

Cost of Nested Loop Join A to B

19

Number of tuples accessed in a time unit

Cost of accessing a single tuple

Number of tuples accessed to

search for matched in window B

Number of tuples

insert and invalidationSlide20

Cost of Hash Join A to B

20

Cost of probe(b) and invalidate(b) is a function of the hash bucket size in window B

Cost of accessing a single tuple in

a specific hash table implementationSlide21

Cost of Full Join

Symmetric Join

HHJ, NNJ

21Slide22

Cost of Full Join

Asymmetric Join

HNJ

22Slide23

Cost Curves for Full Joins

23

σ

a

= 1

/|A|

=

1/

Nkey

(A)

σ

b

= 1/|B| = 1/

Nkey

(B)Slide24

Observation from the Previous Graphs

When input streams’ speed difference is minimal, HJ outperforms every other join combinations.

As the speed gap

increases, the cost of HJ increases considerably and exceeds that of HNJ at around 70 tuples/sec and 140 tuples/sec.Here we have a performance crossover point.

24Slide25

Estimating the Weight Factors

The crossover points can be calculated by equating the two cost formulas

For

two given streams,

we can determine when NLJ will outperform

HJ, depending

on the ratio of the arrival of the input streams.

25

…Slide26

AbstractBackground

Introduction

Related Work

Estimating the Cost of Moving Window JoinsOn Maximizing the Efficiency of Processing JoinsConclusion26Slide27

Recall the three scenarios

One stream is much faster than the other.

System resources are insufficient to keep up with the input streams.

Memory is limited.27Slide28

Exploiting Asymmetry in Input Streams Speed

Assumptions:

The two time windows are fixed.

The aggregate speed of two streams is less than the system’s service rate μ (i.e., λa

+

λ

b

< μ ).

The following inequality determines the likely winner between NLJ and HJ:

If inequality holds,

NLJ

will outperform

HJ;

otherwise, HJ

outperforms

NLJ.

28Slide29

Graphs to Prove the Previous Hypothesis

29Slide30

Observation from the Previous Graphs

HHJ costs the least until the input rate reaches about 70 tuples/sec; then HNJ takes over. Hence, either HHJ or HNJ is the winner.

Both hash join output rates decrease drastically after passing their thrashing point.

30Slide31

Maximizing the Number of Result Tuples with Limited Computing Resources

This scenario

arises

under the following conditions:System evaluates very expensive predicatesThe input stream’s speed is faster than the join operator’s service rate, i.e., λa

+

λ

b

> μ.

Hence, not all answer tuples can be generated and input streams need to be “regulated”.

But, what policy?

31Slide32

Performance Comparison between Policies

32

The winner is the equal distribution strategy!

Regardless of time window sizes and window selectivity factors.Slide33

Maximizing the Number of Result Tuples with Limited Memory

Assumption:

The two time window sizes can be adjusted to fully utilize available memory.

The two arrival rates are constant.Hence, memory allocation strategies are necessary. But, what policy? Will equal distribution win

again?

33Slide34

Performance Comparison between Policies

34

The winner is the Max

A strategy, which allocates all memory to the slower

stream.

Keep the

slower stream in memory and let the faster one probe against it and pass

by.Slide35

Maximizing the Number of Result Tuples with Limited Memory

Another assumption:

Variable time windows

Variable arrival rates35Slide36

Performance Comparison between Policies

36

The best policy is either maximizing stream A’s time window in conjunction with maximizing B’s arrival rate, or we can maximize B’s time window and A’s arrival rate alternatively.Slide37

AbstractBackground

Introduction

Related Work

Estimating the Cost of Moving Window JoinsOn Maximizing the Efficiency of Processing JoinsConclusion37Slide38

Conclusion

A unit-time basis model to analyze expected performance of moving window joins is introduced.

The proposed cost-model divides the join cost into two independent terms, each corresponding to one of the two join directions.

This work can be extended to have a cost model beyond single joins and for full query plans.Other algorithms apart from NLJ and HJ can be modeled and evaluated.

38Slide39

The End

Thanks for your attention