/
Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million

Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million - PowerPoint Presentation

cady
cady . @cady
Follow
66 views
Uploaded On 2023-06-22

Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million - PPT Presentation

Yan Zhu Zachary Zimmerman Nader Shakibay Senobari ChinChia Michael Yeh Gareth Funning Abdullah Mueen Philip Brisk Eamonn Keogh httpwwwcsucredueamonnMatrixProfilehtml ID: 1001948

profile time matrix 1qti time profile 1qti matrix stomp series 1di distance algorithm subsequence qti 000 pair window motif

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Matrix Profile II: Exploiting a Novel Al..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one Hundred Million Barrier for Time Series Motifs and JoinsYan Zhu, Zachary Zimmerman, Nader Shakibay Senobari Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Brisk, Eamonn Keogh http://www.cs.ucr.edu/~eamonn/MatrixProfile.htmlOr, how to do four hundred ninety-nine quadrillion, nine hundred ninety-nine trillion, nine hundred ninety-nine billion, five hundred million pairwise comparisons very fast.

2. Definition Review: Distance Profile d1,1d2,1…dn-m+1,1A seismology time series, with two repeated earthquake patternsQuery, the 1st subsequence in the time seriesObtain the z-normalized Euclidean distance between Query and each window (subsequence) in the time series. We would obtain a vector like this:di,j is the distance between the ist subsequence and the jth subsequence.D1We can obtain D2, D3, … Dn-m+1 similarly.

3. Definition Review: From Distance Profile to Matrix ProfileNote: this distance matrix is symmetric!Matrix Profile: a vector of distance between each subsequence and its nearest neighbordi,j is the distance between the ith window and the jth window of the time series d1,1d1,2………d1,n-m+1d2,1d2,2………d2,n-m+1………………di,1di,2…di,j…di,n-m+1………………dn-m+1,1dn-m+1,2………dn-m+1,n-m+1ithjthMin(D1)Min(D2)Min(Dn-m+1)Min(Di)P1P1……...Pn-m+1

4. From Matrix Profile to MotifThe Matrix Profile has two minimum points. This pair of minimum points correspond to the 1st motif in the time series.(the closest pair of subsequences in the time series)A pair of minimum pointstime seriesmatrix profile

5. Question: How to compute Matrix Profile very fast?Answer: We have an O(n2) time, O(n) space algorithm called STOMP to evaluate it. To see how it works, let us first introduce an important formula: Dot product of the ith window and the jth window. Once we know , it takes O(1) time to compute . We precompute and store the means and stds in O(n) space.

6. The relationship between and  ………………                   …  time complexity! ………………………………………………

7. STOMP Algorithm: Computing the ith lineP1P2P3…Pn-m+1di,1di,2di,3…di,n-m+1QTi,1QTi,2QTi,3…QTi,n-m+1QTi-1,1QTi-1,2…QTi-1,n-mQTi-1,n-m+1Update if SmallerMatrix ProfileDistance ProfileWe pre-calculate QTx,1 and QT1,x (x=1,2,3,…,n-m+1).Then iterate through i=2, 2, 3, …, n-m+1.d1,1d1,2…d1,n-m+1d2,1d2,2…d2,n-m+1…………di,1di,2…di,n-m+1…………dn-m+1,1dn-m+1,2…dn-m+1,n-m+1P1P2…Pn-m+1minminminmin

8. PiPiPi+1…Pn-m+1QTi,iQTi,i+1…QTi,n-m+1di,idi,i+1…di,n-m+1QTi-1,i-1QTi-1,i…QTi-1,n-mQTi-1,n-m+1di,i+1di,i+2…di,n-m+1dmin…Second Kernel Launch: Evaluate Final Value of Pi minUpdate if SmallerFirst Kernel Launch: Update Pi to Pn-m+1 P1P2P3…Pn-m+1di,1di,2di,3…di,n-m+1QTi,1QTi,2QTi,3…QTi,n-m+1QTi-1,1QTi-1,2…QTi-1,n-mQTi-1,n-m+1…Porting the algorithm to GPUOptimizeUpdate if SmallerUpdate if Smaller

9. Comparison of STAMP, STOMP and GPU-STOMPAlgorithm n217218219220STAMP15.1 min1.17 hours5.4 hours24.4 hoursSTOMP4.21 min0.3 hours1.26 hours5.22 hoursGPU-STOMP10 sec18 sec46 sec2.5 minFor a fix subsequence length m=256: timeAlgorithm m | n2000 | 17,279,800400 | 100,000,000STAMP (estimated)36.5 weeks25.5 yearsSTOMP (estimated)8.4 weeks5.4 yearsGPU-STOMP9.27 hours12.13 daysFor large data, and for the very first time in the literature, 100,000,000

10. Comparing the speed of STOMP with existing algorithmsAlgorithm m 5121,0242,0484,096STOMP501s (14MB)506s (14MB)490s (14MB)490s (14MB)Quick-Motif27s (65MB)151s (90MB)630s (295MB)695s (101MB)MK2040s (1.1GB)N/A (>2GB)N/A (>2GB)N/A (>2GB)For a time series of length : CPU time(memory usage) Note: the time and space cost of STOMP is independent of how the data looks.

11. Case Study I: Parameter SettingThere is only one parameter to set: the subsequence length m, however, the result is not sensitive to it… raw seismograph datamatrix profiles0min30minm=1000m=2000m=4000

12. Case Study II: The Benefit of Using Matrix Profile for Motif Discovery 0100020003000199620091996, ID:301049902009, ID:371327705 1st motif is a pair of sensor defects5st motif is a pair of matching seismology patterns

13. Case Study III: Earthquake SwarmsThe matrix profile of a seven-minute snippet from a seismograph recording at Mount St Helens“... so regularly that we dubbed them ‘drumbeats’. The period between successive drumbeats shifted slowly with time, but was 30–300 seconds” * We are not only providing a linear-space algorithm that is much faster than all existing motif-discovery algorithms; we are actually providing much more information than just the top k motifs in the time series with STOMP.

14. 514,000524,000-0.100.10.2Y-axis magnetometry100020000Case Study III: Penguin Telemetry 7.5 hours of recording at 40Hz took GPU-STOMP only 2.5 minute to run.

15. SummaryWe introduced STOMP and GPU-STOMP, the first algorithm that is capable to discover motifs for the longest time series in the literature, 100,000,000.The algorithm costs only linear space and the speed is independent of how the data looks like.Matrix Profile provides the information of the nearest neighbors of all subsequences in the time series.STOMP can discover much more than just motifs.Paper, code and datasets available at: http://www.cs.ucr.edu/~eamonn/MatrixProfile.html

16. Questions?