/
Finding Time Series Discords Based on Haar Transform A Finding Time Series Discords Based on Haar Transform A

Finding Time Series Discords Based on Haar Transform A - PDF document

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
413 views
Uploaded On 2015-06-01

Finding Time Series Discords Based on Haar Transform A - PPT Presentation

cuhkeduhk Department of Computer Science and Engineering University of California River CA 92521 eamonncsucredu Department of Information and Software Engineering George Mason University jessicaisegmuedu Abstract The problem of 64257nding anomaly has ID: 78036

cuhkeduhk Department Computer Science

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Finding Time Series Discords Based on Ha..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Theproblemof“ndinganomalyhasreceivedmuchattention recently.However,mostoftheanomalydetectionalgorithmsdepend onanexplicitde“nitionofanomaly,whichmaybeimpossibletoelicit fromadomainexpert.Usingdiscordsasanomalydetectorsisuseful sincelessparametersettingisrequired.Keoghetalproposedanecient highlyeective. 1Introduction Inmanyapplications,timeserieshasbeenfoundtobeanaturalandusefulform ofdatarepresentation.Someoftheimportantapplicationsinclude“nancialdata thatischangingovertime,electrocard iograms(ECG)andothermedicalrecords, seriesdataareaccumulatedovertime,itisofinteresttouncoverinteresting patternsontopofthelargedatasets.Suchdataminingtargetisoftenthe commonfeaturesthatfrequentlyoccur.However,tolookfortheunusualpattern isfoundtobeusefulinsomecases.Forexample,anunusualpatterninaECG somecriticalchangesintheenvironments. Algorithmsfor“ndingthemostunusualtimeseriessubsequencesareproposed byKeoghetalin[6].Suchasubsequenceisalsocalledatimeseries discord ,which isessentiallyasubsequencethatisthele astsimilartoallothersubsequences. FindingTimeSeriesDiscordsBasedonHaarTransform33 largestdistancetoitsnearestnon-selfmatch.Thatis,allsubsequenceCofT, non-selfmatch M D ofD,andnon-selfmatch M C ofC,minimumEuclidean DistanceofDto M D � minimumEuclideanDistanceofCto M C . Theproblemto“nddiscordscanobviouslybesolvedbyabruteforcealgorithm whichconsidersallthepossiblesubsequencesand“ndsthedistancetoitsnearest non-selfmatch.Thesubsequencewhichhasthegreatestsuchvalueisthediscord. However,thetimecomplexityofthisalgorithmis O ( m 2 ),wheremisthelength oftimeseries.Obviously,thisalgorithmisnotsuitableforlargedataset. Keoghetalintroducedaheuristicdiscorddiscoveryalgorithmbasedonthe bruteforcealgorithmandsomeobservations[5].Theyfoundthatactuallywe donotneedto“ndthenearestnon-selfmatchforeachpossiblecandidatesub- sequence.Accordingtothede“nitionoftimeseriesdiscord,acandidatecannot beadiscord,ifwecan“ndanysubsequencethatisclosertothecurrentcan- didatethanthecurrentsmallestnearestnon-selfmatchdistance.Thisbasic ideasuccessfullyprunesawayalotofunn ecessarysearchesandreducesalotof computationaltime. 2.2HaarTransform TheHaarwaveletTransformiswidelyusedindierentapplicationssuchascom- putergraphics,image,signalprocessingandtimeseriesquerying[7].Wepropose toapplythistechniquetoapproximatethetimeseriesdiscord,astheresulting waveletcanrepresentthegeneralshapeofatimesequence.Haartransformcan beseenasaseriesofaveraginganddier encingoperationsonadiscretetime function.Wecomputetheaverageandd ierencebetweeneverytwoadjacent valuesof f ( x ).Theprocedureto“ndtheHaartransformofadiscretefunction f ( x )=(9735)isshownbelow. Example ResolutionAveragesCoecients 4(9735) 2(84)(1-1) 1(6)(2) Resolution4isthefullresolutionofthediscretefunction f ( x ).Inresolution2,(8 4)areobtainedbytakingaverageof(97)and(35)atresolution4respectively.(1 -1)arethedierencesof(97)and(35)dividedbytworespectively.Thisprocess iscontinueduntilaresolutionof1isreached.TheHaartransform H ( f ( x ))= ( cd 0 0 d 1 0 d 1 1 )=(621-1)isobtainedwhichiscomposedofthelastaveragevalue 6andthecoecientsfoundontherightmostcolumn,2,1and-1.Itshouldbe pointedoutthat c isthe overallaveragevalue ofthewholetimesequence,which isequalto(9+7+3+5) / 4=6.Dierentresolutionscanbeobtainedbyadding dierencevaluesbacktoorsubtractdierencefromanaverage.Forinstance,(8 4)=(6+26-2)where6and2arethe“rstandsecondcoecientrespectively. Haartransformcanberealizedbyaseriesofmatrixmultiplicationsasil- lustratedinEquation(1).Envisioningtheexampleinputsignal x asacolumn 36A.W.-c.Fuetal. candidatesatLine12.Giventhesubsequence p ,theInnerheuristicordershould pickthesubsequence q closestto p “rst,sinceitwillgivethesmallest Dist value, andwhichwillhavethebestchancetobreaktheloopatLine12.Inthissection, wewilldiscussoursuggestedheuristicsearchorder,sothattheinnerloopcan oftenbebrokeninthe“rstfewiterationssavingalotofrunningtime. 3.1Discretization WeshallimposetheheuristicOuterandInnerordersbasedontheHaartrans- formationofsubsequences.We“rsttransformalloftheincomingsequencesby theHaarwavelettransform.Inordertoreducethecomplexityoftimeseries comparison,wewouldfurthertransform eachofthetransformedsequencesinto asequence(word)of“nitesymbols.Thealphabetmappingisdecidedbydis- cretizingthevaluerangeforeachHaarwaveletcoecient.Weassumethatforall i ,the i th coecientofallHaarwaveletsinthesamedatabasetendstobeevenly distributedbetweenitsminimumandmax imumvalue,sowecandeterminethe ŽcutpointsŽbypartitioningthisspecifyregionintoseveralequalsegments.The cutpointsde“nethediscretizationofthe i Š th coecient. De“nition3. Cutpoints: Forthe i th coecient,cutpointsareasortedlist ofnumbers B i =  i, 1 , i, 2 ,..., i,m ,where m isthenumberofsymbolsinthe alphabet,and  i,j Š  i,j +1 =  i,a Š  i, 0 a (2)  i, 0 and  i,a arede“nedasthesmallestandthelargestpossiblevalueofthe i th coecient,respectively. WethencanmakeuseofthecutpointstomapallHaarcoecientsintodierent symbols.Forexample,ifthe i th coecientfromaHaarwaveletisinbetween  i, 0 and  i, 1 ,itismappedtothe“rstsymbola.Ifthe i th coecientisbetween  i,j Š 1 and  i,j ,itwillbemappedtothe j th symbol,etc.Inthiswayweforma wordforeachsubsequence. De“nition4. Wordmapping: Awordisastringofalphabet.Asubsequence C oflength n canbemappedtoaword  C = c 1 ,  c 2 ,...,  c n .Supposethat C is transformedtoaHaarwavelet ¯ C = { ¯ c 1 , ¯ c 2 ...., ¯ c n } .Let  j denotethe j th element ofthealphabet,e.g.,  1 = a and  2 = b ,....Let B i =  i, 1 ,... i,m betheCutpoints forthe i -thcoecientoftheHaartransform.Thenthemappingfromtoaword  C isobtainedasfollows:  c i =  j   i,j Š 1  ¯ c i  i,j (3) 3.2OuterLoopHeuristic First,wetransformallthesubsequences ,whichareextractedbyslidingawin- dowwithlengthnacrosstimeseriesT,bymeansoftheHaartransform.The transformedsubsequencesaretransformedintowordsbyusingourproposeddis- cretizingalgorithm.Finally,allthewordsareplacedinan array withapointer referringbacktotheoriginalsequences.Figure1illustratesthisidea.