/
Data stream Unbounded Table Data stream Unbounded Table

Data stream Unbounded Table - PowerPoint Presentation

SupremeGoddess
SupremeGoddess . @SupremeGoddess
Follow
345 views
Uploaded On 2022-08-04

Data stream Unbounded Table - PPT Presentation

new data in the data stream new rows appended to an unbounded table Data stream as an unbounded table Trigger every 1 sec 1 2 3 Result Query Time Input Output c omplete mode ID: 935113

cat dog data owl dog cat owl data late time table watermark event result esult counts updated input 10m

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data stream Unbounded Table" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Data stream

Unbounded Table

new data in the

data stream

=

new rows appended

to an unbounded table

Data stream as an unbounded table

Slide2

Trigger: every 1 sec

1

2

3

Result

Query

Time

Input

Output

c

omplete mode

r

esult up to t=

1

data up

to

t=

1

data up

to

t=

2

data up

to

t=

3

r

esult up to t=

2

r

esult up to t=3

Programming Model for Structured Streaming

Slide3

cat

1

dog

3

cat

2dog

3owl

1

cat2dog

4owl

2

cat

dog

dog dog

owl cat

1

2

3

r

esult up to t=

1

Result

Table of word counts

Time

data up

to

t=

1

Input

Unbounded table of all input

data up

to

t=

2

data up

to

t=

3

Output

Complete Mode

w

ord

c

ount

query

dog

owl

cat

dog

dog dog

owl cat

cat

dog

dog dog

owl cat

dog

owl

r

esult up to t=

2

r

esult up to t=3

p

rint all the counts to console

nc

Model of the Quick Example

cat

dog

dog dog

Slide4

12:05

12:10

12:02

cat

dog

12:03

dog dog

12:11dog

12:13owl

12:07owl cat

12:15

Result Tables

after 5 minute

triggers

Time

Input

Stream12:00 - 12:10

cat

1

12:00

- 12:10

dog

3

12:00

- 12:10

cat

2

12:00

- 12:10

dog

3

12:00

- 12:10

owl

1

12:05 - 12:15

cat

1

12:05 - 12:15

owl

1

12:00

- 12:10

cat

2

12:00

- 12:10

dog

3

12:00

- 12:10

owl

1

12:05 - 12:15

cat

1

12:05 - 12:15

owl

2

12:05 - 12:15

dog

1

12:10 - 12:20

dog

1

12:10 - 12:20

owl

1

c

ounts incremented for windows

12:00 - 12:10 and 12:05 - 12:15

c

ounts incremented for windows

12:05 - 12:15 and 12:10 - 12:20

Windowed Grouped Aggregation

with 10 min windows, sliding every 5 mins

12:00

Slide5

12:05

12:10

12:02

cat

dog

12:03

dog dog

12:04dog

12:13owl

12:07

owl cat

12:15

Result Tables

after 5 minute triggers

Time

Input

Stream12:00 - 12:10

cat

1

12:00

- 12:10

dog

3

12:00

- 12:10

cat

2

12:00

- 12:10

dog

3

12:00

- 12:10

owl

1

12:05 - 12:15

cat

1

12:05 - 12:15

owl

1

12:00

- 12:10

cat

2

12:00

- 12:10

dog

4

12:00

- 12:10

owl

1

12:05 - 12:15

cat

1

12:05 - 12:15

owl

2

12:10 - 12:20

owl

1

c

ounts updated for

w

indow

12:00 - 12:10

Late data handling in

Windowed Grouped Aggregation

12:00

late data that was

generated at 12:04

but

arrived at 12:11

Slide6

Processing Time

with 5 min triggers

Watermarking

in

Windowed

Grouped Aggregation

with Update Mode12:00

12:05

12:10

12:15

12:20

12:05

12:10

12:15

12:20

12:25

Data as (event time, word)

Data late but within watermark

Data too late outside watermark

Max event time seen till now

Watermark =

max event time -- late threshold

watermark updated every trigger using late threshold = 10 min

12:04, donkey

12:07, dog

12:08, owl

12:09, cat

12:15, cat

12:13, owl

12:08, dog

Event Time

data too late, ignored in counts

wm

= 12:14

- 10m

= 12:04

wm

= 12:21

-

10m = 12:11

12:17, owl

Result Tables

after each trigger

12:00

- 12:10

owl

1

12:00

- 12:10

dog

1

12:00

- 12:10

cat

1

12:05

- 12:15

owl

1

12:05

- 12:15

dog

2

12:05

- 12:15

cat

1

12:00

- 12:10

owl

1

12:00

- 12:10

dog

1

12:05

- 12:15

owl

1

12:05

- 12:15

dog

1

12:10

- 12:20

dog

1

12:00

- 12:10

owl

1

12:00

- 12:10

dog

2

12:00

- 12:10

cat

1

12:05

- 12:15

owl

2

12:05

- 12:15

dog

3

12:05

- 12:15

cat

2

12:10

- 12:20

dog

1

12:10

- 12:20

cat

1

12:10

- 12:20

owl

1

12:00

- 12:10

owl

1

12:00

- 12:10

dog

2

12:00

- 12:10

cat

1

12:05

- 12:15

owl

2

12:05

- 12:15

dog

3

12:05

- 12:15

cat

2

12:10

- 12:20

dog

1

12:10

- 12:20

cat

1

12:10

- 12:20

owl

2

12:21, owl

12:14, dog

purple rows

are updated rows that

are written to the sink as output

table updated with late data

(12:17, owl)

intermediate state for 12:00 - 12:10 dropped as watermark > 12:10

table

not

updated with too late data (12:04, donkey

)

Slide7

Processing Time

with 5 min triggers

final counts

for 12:00 - 12:10 added to table when watermark > 12:10, late data counted, and intermediate state for window dropped

Watermarking

in

Windowed Grouped Aggregation with

Append Mode12:00

12:00

- 12:10

owl

1

12:00

- 12:10

cat

1

12:00

- 12:10

dog

2

partial counts for window 12:00 - 12:10 maintained as internal state while waiting for late data, so not yet added to result table

12:05

12:10

12:15

12:20

12:05

12:10

12:15

12:20

12:25

Data as (event time, word)

Data late but within watermark

Data too late outside watermark

Max event time seen till now

Watermark =

max event time -- late threshold

12:04, donkey

12:07, dog

12:14, dog

12:08, owl

12:09, cat

12:15, cat

12:13, owl

12:08, dog

Event Time

data too late, ignored in counts

12:21, owl

wm

= 12:14

- 10m =12:04

wm

= 12:21

-

10m =12:11

12:25

12:30

12:09, cat

12:17, owl

12:00

- 12:10

owl

1

12:00

- 12:10

cat

1

12:00

- 12:10

dog

2

12:05

- 12:15

owl

2

12:05

- 12:15

cat

2

12:05

- 12:15

dog

3

wm

= 12:26

-

10m =12:16

12:26, owl

Result

Tables

after

each trigger