/
Matrix Profile III: The Matrix Profile allows Visualization of Salient Subsequences in Matrix Profile III: The Matrix Profile allows Visualization of Salient Subsequences in

Matrix Profile III: The Matrix Profile allows Visualization of Salient Subsequences in - PowerPoint Presentation

smith
smith . @smith
Follow
67 views
Uploaded On 2023-08-23

Matrix Profile III: The Matrix Profile allows Visualization of Salient Subsequences in - PPT Presentation

ChinChia Michael Yeh Helga Van Herle Eamonn Keogh httpwwwcsucredueamonnMatrixProfilehtml Outline Motivation Proposed method Experiment result Conclusion 2 Outline Motivation ID: 1014141

subsequences time subsequence series time subsequences series subsequence length set string profile hypothesis bits mds matrix methodexperiment outlinemotivationproposed scatter

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Matrix Profile III: The Matrix Profile a..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Matrix Profile III: The Matrix Profile allows Visualization of Salient Subsequences in Massive Time Series Chin-Chia Michael Yeh, Helga Van Herle, Eamonn Keogh http://www.cs.ucr.edu/~eamonn/MatrixProfile.html

2. OutlineMotivationProposed methodExperiment resultConclusion2

3. OutlineMotivationProposed methodExperiment resultConclusion3

4. Motivation4You have a heartbeat time series

5. Motivation5You know where the heartbeats are

6. Motivation6You can easily visualize heartbeats by mapping them into 2D with algorithm like MultiDimensional Scaling (MDS)If the scatter plot and corresponding subsequences are shown to domain expert, the correct label can be easily recovered

7. Motivation7Normal BestAbnormal BestNormal beats forms one cluster while abnormal beat forms two clustersYou can easily visualize heartbeats by mapping them into 2D with algorithm like MultiDimensional Scaling (MDS)If the scatter plot and corresponding subsequences are shown to domain expert, the correct label can be easily recovered

8. Motivation8However, segmentation of time series is rarely available as annotation is usually expensive (even if possible)

9. Motivation9If we simply slide a window across the time series, the resulting scatter plot is not interpretable because being forced to “explain” all subsequences is condemned to be meaningless [a][a] J. Lin, E. Keogh and W. Truppel, “Clustering of time-series subsequences is meaningless: implications for previous and future research,” in Knowledge and Information Systems, 2005.

10. Motivation10This is a chicken-and-egg paradox as we only want to explain the subsequence that explainable

11. Problem statementGiven a time series and a desired subsequence length , how do we select a subsequences of length from so that the result low dimensional projection is meaningful? 11, time series , subsequence length 

12. Problem statementGiven a time series and a desired subsequence length , how do we select a subsequences of length from so that the result low dimensional projection is meaningful? 12, time series , subsequence length We want to find subsequences that produce meaningful low dimensional projection

13. OutlineMotivationProposed methodExperiment resultConclusion13

14. Minimum description length principle14Minimum Description Length (MDL) principle: the best hypothesis for a given set of data is the one that leads to the best compression of the data [a]Given a set of all possible subsequence of a time series, how do we pick a set of hypothesis which optimally compresses ? [a] https://en.wikipedia.org/wiki/Minimum_description_length

15. Toy example in textGiven a string with relevant substring’s locationa fat cat plays hide and seek in fog with dog15two rhyming pairs forms two clusters in the scatter plot (projected with hamming distance and MDS)fatcatfogdog

16. Toy example in textGiven a string without relevant substring’s locationafatcatplayshideandseekinfogwithdog16To make this string more like “time series”, spaces are removed

17. Toy example in textGiven a string without relevant substring’s locationafatcatplayshideandseekinfogwithdog17If each char requires 8 bits to store, total bits to store the string is 280 bits

18. Toy example in textGiven a string without relevant substring’s location={:fog}, ={:fat}a____playshideandseekin__with__ 18With hypothesis and , the string can be store with 206 bits (without compress is 280 bits)If the hypothesis substrings {fog, fat} and compressed substrings {dog, cat} are projected to 2D with MDS, the 2 cluster rhythm pairs are recovered fatcatfogdog

19. Brute force solutionIf the time series’ length is and the desired subsequence length is , all possible subsequences set contains subsequencesIf we know in advance that there are hypothesis in the time series, the time complexity of brute force search is  19However, is unknown in most caseThe true time complexity is even higher and intractable for most real time series 

20. Heuristic rule for approximate searchA subsequences with closer nearest neighbor is more likely be a good hypothesis20Neighborpair3,00003,000 float takes 96,000 bits to store

21. Heuristic rule for approximate searchA subsequences with closer nearest neighbor is more likely be a good hypothesis21 noise section: 2,400 float = 76,800 bitspattern: 300 float = 9,600 bitspattern position: 2 int = 64 bitsTotal = 86,464 (was 96,000) 3,0000____ ____ 

22. Matrix profile22Matrix profile [a] is a meta time series that annotate which compactly stores the nearest neighbor information of each subsequencesTime complexity is  [a] http://www.cs.ucr.edu/~eamonn/MatrixProfile.htmllocal minimums are motifs3,0000P, matrix profileT, synthetic data, subsequence length By searching just the subsequences around the local minimums of matrix profile, good hypothesis set can be recovered more efficiently

23. OutlineMotivationProposed methodExperiment resultConclusion23

24. Heartbeat24Normal BeatsPrematureContractions Ventricular BeatsNormal BeatsPVC BeatsType APVC BeatsType BFalsePositiveGround truthOur method

25. HeartbeatGround truthOur method25Normal BeatsPrematureContractions Ventricular BeatsNormal BeatsPVC BeatsType APVC BeatsType BFalsePositiveWhile A and B are both PVCs, their morphology (which is related to where in the ventricle they initiate) are different. It appears that type B is a right bundle branch pattern, coming from right side of the heart, and Type A is more likely to be the of the fusion of a normal beat and an aberrant beat. Moreover, there is also evidence of a retrograde P-wave in type B.

26. Human motions26From this time series, our algorithm selects 11 subsequencesThey form three clusters

27. Human motions27BowingWavingCrouchingWhen we check the class label for each subsequence, they are indeed from different class

28. Human motions28060120BowingWavingCrouchingSubsequence from the same cluster has very similar shape

29. Nursery rhyme: London bridge falling down29What'll you take to set him fr..broke my chain, broke my chain(piano) Gb-Bb-Db F-Ab-Db Gb-Bb-Db…it up with penny loavesMy fair lady silver and gold, silver and g…silver and gold, silver and..My fair lady(piano) Gb-Bb-Db F-Ab-Db Gb-Bb-Db…d it up with penny loavesWhat'll you take to set him fr..…fair lady, buil.. ..fair lady, pin.. broke my chain, broke my chain….. man to watch all night….. man to watch all night

30. Small extension: from ED-MDS to DTW-MDS30ED-MDSDTW-MDSmisswalking very slownormal walkingNordic walkingrunningcyclingrope jumpingBecause matrix profile + MDL is able to select a small set of subsequences, applying MDS with DTW is computable (some dataset requires DTW for warping invariance)

31. OutlineMotivationProposed methodExperiment resultConclusion31

32. ConclusionProject subsequences into 2D space is a good way to explore time series dataWe generally should not attempt to explain all the data, but rather only consider salient subsequencesMatrix profile + MDL can be used as the heuristic rules for selecting salient subsequence for visualizationLimitation: only repeated subsequence is selected, sometimes the more interested subsequence is the unique one (anomaly)32