cuhkeduhk Department of Computer Science and Engineering University of California River CA 92521 eamonncsucredu Department of Information and Software Engineering George Mason University jessicaisegmuedu Abstract The problem of 64257nding anomaly has ID: 78036
Download Pdf The PPT/PDF document "Finding Time Series Discords Based on Ha..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Theproblemofndinganomalyhasreceivedmuchattention recently.However,mostoftheanomalydetectionalgorithmsdepend onanexplicitdenitionofanomaly,whichmaybeimpossibletoelicit fromadomainexpert.Usingdiscordsasanomalydetectorsisuseful sincelessparametersettingisrequired.Keoghetalproposedanecient highlyeective. 1Introduction Inmanyapplications,timeserieshasbeenfoundtobeanaturalandusefulform ofdatarepresentation.Someoftheimportantapplicationsincludenancialdata thatischangingovertime,electrocard iograms(ECG)andothermedicalrecords, seriesdataareaccumulatedovertime,itisofinteresttouncoverinteresting patternsontopofthelargedatasets.Suchdataminingtargetisoftenthe commonfeaturesthatfrequentlyoccur.However,tolookfortheunusualpattern isfoundtobeusefulinsomecases.Forexample,anunusualpatterninaECG somecriticalchangesintheenvironments. Algorithmsforndingthemostunusualtimeseriessubsequencesareproposed byKeoghetalin[6].Suchasubsequenceisalsocalledatimeseries discord ,which isessentiallyasubsequencethatisthele astsimilartoallothersubsequences. FindingTimeSeriesDiscordsBasedonHaarTransform33 largestdistancetoitsnearestnon-selfmatch.Thatis,allsubsequenceCofT, non-selfmatch M D ofD,andnon-selfmatch M C ofC,minimumEuclidean DistanceofDto M D minimumEuclideanDistanceofCto M C . Theproblemtonddiscordscanobviouslybesolvedbyabruteforcealgorithm whichconsidersallthepossiblesubsequencesandndsthedistancetoitsnearest non-selfmatch.Thesubsequencewhichhasthegreatestsuchvalueisthediscord. However,thetimecomplexityofthisalgorithmis O ( m 2 ),wheremisthelength oftimeseries.Obviously,thisalgorithmisnotsuitableforlargedataset. Keoghetalintroducedaheuristicdiscorddiscoveryalgorithmbasedonthe bruteforcealgorithmandsomeobservations[5].Theyfoundthatactuallywe donotneedtondthenearestnon-selfmatchforeachpossiblecandidatesub- sequence.Accordingtothedenitionoftimeseriesdiscord,acandidatecannot beadiscord,ifwecanndanysubsequencethatisclosertothecurrentcan- didatethanthecurrentsmallestnearestnon-selfmatchdistance.Thisbasic ideasuccessfullyprunesawayalotofunn ecessarysearchesandreducesalotof computationaltime. 2.2HaarTransform TheHaarwaveletTransformiswidelyusedindierentapplicationssuchascom- putergraphics,image,signalprocessingandtimeseriesquerying[7].Wepropose toapplythistechniquetoapproximatethetimeseriesdiscord,astheresulting waveletcanrepresentthegeneralshapeofatimesequence.Haartransformcan beseenasaseriesofaveraginganddier encingoperationsonadiscretetime function.Wecomputetheaverageandd ierencebetweeneverytwoadjacent valuesof f ( x ).TheproceduretondtheHaartransformofadiscretefunction f ( x )=(9735)isshownbelow. Example ResolutionAveragesCoecients 4(9735) 2(84)(1-1) 1(6)(2) Resolution4isthefullresolutionofthediscretefunction f ( x ).Inresolution2,(8 4)areobtainedbytakingaverageof(97)and(35)atresolution4respectively.(1 -1)arethedierencesof(97)and(35)dividedbytworespectively.Thisprocess iscontinueduntilaresolutionof1isreached.TheHaartransform H ( f ( x ))= ( cd 0 0 d 1 0 d 1 1 )=(621-1)isobtainedwhichiscomposedofthelastaveragevalue 6andthecoecientsfoundontherightmostcolumn,2,1and-1.Itshouldbe pointedoutthat c isthe overallaveragevalue ofthewholetimesequence,which isequalto(9+7+3+5) / 4=6.Dierentresolutionscanbeobtainedbyadding dierencevaluesbacktoorsubtractdierencefromanaverage.Forinstance,(8 4)=(6+26-2)where6and2aretherstandsecondcoecientrespectively. Haartransformcanberealizedbyaseriesofmatrixmultiplicationsasil- lustratedinEquation(1).Envisioningtheexampleinputsignal x asacolumn 36A.W.-c.Fuetal. candidatesatLine12.Giventhesubsequence p ,theInnerheuristicordershould pickthesubsequence q closestto p rst,sinceitwillgivethesmallest Dist value, andwhichwillhavethebestchancetobreaktheloopatLine12.Inthissection, wewilldiscussoursuggestedheuristicsearchorder,sothattheinnerloopcan oftenbebrokenintherstfewiterationssavingalotofrunningtime. 3.1Discretization WeshallimposetheheuristicOuterandInnerordersbasedontheHaartrans- formationofsubsequences.Wersttransformalloftheincomingsequencesby theHaarwavelettransform.Inordertoreducethecomplexityoftimeseries comparison,wewouldfurthertransform eachofthetransformedsequencesinto asequence(word)ofnitesymbols.Thealphabetmappingisdecidedbydis- cretizingthevaluerangeforeachHaarwaveletcoecient.Weassumethatforall i ,the i th coecientofallHaarwaveletsinthesamedatabasetendstobeevenly distributedbetweenitsminimumandmax imumvalue,sowecandeterminethe cutpointsbypartitioningthisspecifyregionintoseveralequalsegments.The cutpointsdenethediscretizationofthe i th coecient. Denition3. Cutpoints: Forthe i th coecient,cutpointsareasortedlist ofnumbers B i = i, 1 , i, 2 ,..., i,m ,where m isthenumberofsymbolsinthe alphabet,and i,j i,j +1 = i,a i, 0 a (2) i, 0 and i,a aredenedasthesmallestandthelargestpossiblevalueofthe i th coecient,respectively. WethencanmakeuseofthecutpointstomapallHaarcoecientsintodierent symbols.Forexample,ifthe i th coecientfromaHaarwaveletisinbetween i, 0 and i, 1 ,itismappedtotherstsymbola.Ifthe i th coecientisbetween i,j 1 and i,j ,itwillbemappedtothe j th symbol,etc.Inthiswayweforma wordforeachsubsequence. Denition4. Wordmapping: Awordisastringofalphabet.Asubsequence C oflength n canbemappedtoaword C = c 1 , c 2 ,..., c n .Supposethat C is transformedtoaHaarwavelet ¯ C = { ¯ c 1 , ¯ c 2 ...., ¯ c n } .Let j denotethe j th element ofthealphabet,e.g., 1 = a and 2 = b ,....Let B i = i, 1 ,... i,m betheCutpoints forthe i -thcoecientoftheHaartransform.Thenthemappingfromtoaword C isobtainedasfollows: c i = j i,j 1 ¯ c i i,j (3) 3.2OuterLoopHeuristic First,wetransformallthesubsequences ,whichareextractedbyslidingawin- dowwithlengthnacrosstimeseriesT,bymeansoftheHaartransform.The transformedsubsequencesaretransformedintowordsbyusingourproposeddis- cretizingalgorithm.Finally,allthewordsareplacedinan array withapointer referringbacktotheoriginalsequences.Figure1illustratesthisidea.