/
Matrix Profile III: The Matrix Profile allows Visualization Matrix Profile III: The Matrix Profile allows Visualization

Matrix Profile III: The Matrix Profile allows Visualization - PowerPoint Presentation

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
417 views
Uploaded On 2017-04-20

Matrix Profile III: The Matrix Profile allows Visualization - PPT Presentation

ChinChia Michael Yeh Helga Van Herle Eamonn Keogh httpwwwcsucredueamonnMatrixProfilehtml Outline Motivation Proposed method Experiment result Conclusion 2 Outline Motivation ID: 539623

subsequences time subsequence series time subsequences series subsequence length beats mds profile normal hypothesis set bits motivation string type

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Matrix Profile III: The Matrix Profile a..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Matrix Profile III: The Matrix Profile allows Visualization of Salient Subsequences in Massive Time Series

Chin-Chia Michael Yeh, Helga Van Herle, Eamonn Keogh

http://www.cs.ucr.edu/~eamonn/MatrixProfile.htmlSlide2

Outline

MotivationProposed methodExperiment resultConclusion

2Slide3

Outline

MotivationProposed methodExperiment resultConclusion

3Slide4

Motivation

4

You have a heartbeat time seriesSlide5

Motivation

5

You know where the heartbeats areSlide6

Motivation

6

You can easily visualize heartbeats by mapping them into 2D with algorithm like

MultiDimensional

Scaling (MDS)

If the scatter plot and corresponding subsequences are shown to domain expert, the correct label can be easily recoveredSlide7

Motivation

7

Normal Best

Abnormal Best

Normal beats forms one cluster while abnormal beat forms two clusters

You can easily visualize heartbeats by mapping them into 2D with algorithm like

MultiDimensional

Scaling (MDS)

If the scatter plot and corresponding subsequences are shown to domain expert, the correct label can be easily recoveredSlide8

Motivation

8

However, segmentation of time series is rarely available as annotation is usually expensive (even if possible)Slide9

Motivation

9

If we simply slide a window across the time series, the resulting scatter plot is not interpretable because

being forced to “explain”

all

subsequences is condemned to be meaningless

[a]

[a] J. Lin, E. Keogh and W.

Truppel

, “Clustering of time-series subsequences is meaningless: implications for previous and future research,” in

Knowledge and Information Systems

, 2005.Slide10

Motivation

10

This is a chicken-and-egg paradox as we only want to explain the subsequence that explainableSlide11

Problem statement

Given a time series

and a desired subsequence length

, how do we select a subsequences of length from

so that the result low dimensional projection is meaningful?

 

11

, time series

 

, subsequence length

 Slide12

Problem statement

Given a time series

and a desired subsequence length

, how do we select a subsequences of length from

so that the result low dimensional projection is meaningful?

 

12

, time series

 

, subsequence length

 

We want to find subsequences that produce meaningful low dimensional projectionSlide13

Outline

MotivationProposed methodExperiment resultConclusion

13Slide14

Minimum description length principle

14

Minimum Description Length (MDL) principle: the best hypothesis for a given set of data is the one that leads to the

best compression of the data [a]Given a set of all possible subsequence

of a time series, how do we pick a set of hypothesis

which optimally compresses

?

 

[a] https://en.wikipedia.org/wiki/Minimum_description_lengthSlide15

Toy example in text

Given a string with relevant substring’s locationa

fat cat plays hide and seek in

fog with dog

15

two rhyming pairs forms two clusters in the scatter plot (projected with hamming distance and MDS)

fat

cat

fog

dogSlide16

Toy example in text

Given a string without relevant substring’s locationafatcatplayshideandseekinfogwithdog

16

To make this string more like “time series”, spaces are removedSlide17

Toy example in text

Given a string without relevant substring’s locationafatcatplayshideandseekinfogwithdog

17

If each char requires 8 bits to store, total bits to store the string is 280 bitsSlide18

Toy example in text

Given a string without relevant substring’s location

={

:fog

},

={

:fat

}

a

__

__

playshideandseekin

__

with__ 18With hypothesis and , the string can be store with 206 bits (without compress is 280 bits)If the hypothesis substrings {fog, fat} and compressed substrings {dog, cat} are projected to 2D with MDS, the 2 cluster rhythm pairs are recovered fatcatfogdogSlide19

Brute force solution

If the time series’ length is

and the desired subsequence length is

, all possible subsequences set contains

subsequences

If we know in advance that there are

hypothesis in the time series, the time complexity of brute force search is

 

19

However,

is unknown in most case

The true time complexity is even higher and intractable for most real time series

 Slide20

Heuristic rule for approximate search

A subsequences with closer nearest neighbor is more likely be a good hypothesis

20

Neighbor

pair

3,000

0

3,000 float takes 96,000 bits to store Slide21

Heuristic rule for approximate search

A subsequences with closer nearest neighbor is more likely be a good hypothesis

21

 

noise section: 2,400 float = 76,800 bits

pattern: 300 float = 9,600 bits

pattern position: 2

int

= 64 bits

Total = 86,464 (was 96,000)

 

3,000

0

__

__

 

__

__

 Slide22

Matrix profile

22

Matrix profile [a] is a meta time series that annotate

which compactly stores the nearest neighbor information of each subsequencesTime complexity is

 

[a] http://www.cs.ucr.edu/~eamonn/MatrixProfile.html

local minimums are motifs

3,000

0

P

, matrix

profile

T

,

synthetic data

, subsequence length

 

By searching just the subsequences around the local minimums of matrix profile, good hypothesis set can be recovered more efficientlySlide23

Outline

MotivationProposed methodExperiment resultConclusion

23Slide24

Heartbeat

24

Normal Beats

Premature

Contractions Ventricular Beats

Normal Beats

PVC Beats

Type A

PVC Beats

Type B

False

Positive

Ground truth

Our methodSlide25

Heartbeat

Ground truth

Our method

25

Normal Beats

Premature

Contractions Ventricular Beats

Normal Beats

PVC Beats

Type A

PVC Beats

Type B

False

Positive

While

A

and

B

are both PVCs, their morphology (which is related to where in the ventricle they initiate) are different. It appears that type

B

is a right bundle branch pattern, coming from right side of the heart, and Type

A

is more likely to be the of the fusion of a normal beat and an aberrant beat. Moreover, there is also evidence of a retrograde P-wave in type

B

.Slide26

Human motions

26

From this time series, our algorithm selects 11 subsequences

They form three clustersSlide27

Human motions

27

Bowing

Waving

Crouching

When we check the class label for each subsequence, they are indeed from different classSlide28

Human motions

28

0

60

120

Bowing

Waving

Crouching

Subsequence from the same cluster has very similar shapeSlide29

Nursery rhyme: London bridge falling down

29

What'll you take to set him

fr

..

broke my chain, broke my chain

(piano)

G

b

-B

b

-D

b

F-Ab-Db Gb-Bb-Db…it up with penny loavesMy fair lady silver and gold, silver and g…silver and gold, silver and..My fair lady(piano) Gb-Bb-Db F-Ab-Db Gb-Bb-Db…d it up with penny loavesWhat'll you take to set him fr..…fair lady, buil.. ..fair lady, pin.. broke my chain, broke my chain….. man to watch all night….. man to watch all nightSlide30

Small extension: from ED-MDS to DTW-MDS

30

ED-MDS

DTW-MDS

miss

walking very slow

normal walking

Nordic walking

running

cycling

rope jumping

Because matrix profile + MDL is able to select a small set of subsequences, applying MDS with DTW is computable (some dataset requires DTW for warping invariance)Slide31

Outline

MotivationProposed methodExperiment resultConclusion

31Slide32

Conclusion

Project subsequences into 2D space is a good way to explore time series dataWe generally should not attempt to explain all the data, but rather only consider salient subsequencesMatrix profile + MDL can be used as the heuristic rules for selecting salient subsequence for visualization

Limitation: only repeated subsequence is selected, sometimes the more interested subsequence is the unique one (anomaly)

32