/
Data Association for Topic Intensity Tracking Andreas Krause krauseacs Data Association for Topic Intensity Tracking Andreas Krause krauseacs

Data Association for Topic Intensity Tracking Andreas Krause krauseacs - PDF document

phoebe-click
phoebe-click . @phoebe-click
Follow
511 views
Uploaded On 2014-12-19

Data Association for Topic Intensity Tracking Andreas Krause krauseacs - PPT Presentation

cmuedu Jure Leskovec jurecscmuedu Carlos Guestrin guestrincscmuedu School of Computer Science Carnegie Mellon University Pittsburgh PA USA Abstract We present a uni64257ed model of what was tradi tionally viewed as two separate tasks data asso ciatio ID: 26649

cmuedu Jure Leskovec jurecscmuedu Carlos

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Data Association for Topic Intensity Tra..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

DataAssociationforTopicIntensityTracking Inthefollowingsectionswewilluseemailtopicdetec-tionandtrackingasourrunningexample.Wealsousethetermstopicandclassassynonyms.Alsonote,thatourapproachisnotlimitedtothetextdomain.Allourmethodsaregeneralinasensethattheycanbeap-pliedtoanyproblemwithsimultaneousclassi cationandclassintensitytracking(e.g.,activityrecognition).2.Classi cationandintensitytrackinginthestaticcaseTraditionally,classi cationreferstothetaskofassign-ingaclasslabelctoanunlabeledexamplex,givenasetoftrainingexamplesxiandcorrespondingclassesci.Classi cationcanbeperformedbycalculatingtheprobabilitydistributionovertheclassassignments,P(cjx),usingBayes'rule,P(cjx)/P(c)P(xjc),wheretheclasspriorP(c)andconditionalprobabilityofthedataP(xjc)areestimatedfromthetrainingset.Workintheareasofclustering,topicdetectionandtracking,e.g.,(Allanetal.,1998;Yangetal.,2000),andtextmining,e.g.,(Swan&Allan,2000;Bleietal.,2003),hasexploredtechniquesforidentifyingtopicsindocumentstreamsusingacombinationofcontentanalysisandtime-seriesmodeling.Mostofthesetech-niquesareguidedbytheintuitionthattheappearanceofatopicinadocumentstreamissignaledbyaburst,asharpincreaseofintensityofdocumentarrivals.Forexample,intheproblemofclassifyingemailsintotop-ics,thefocusofattentionmightchangefromonetopictoanotherandhencetakingintoaccountthetopicin-tensityshouldhelpusintheclassi cationtask.Tode nethenotionofintensity,considerataskwherewearegivenasequenceofnemailmessages,x1;:::;xn,andareaskedtoassignatopicctoeachemail.Wealsoobservethemessagearrivaltimest1;:::;tn.Theintensitycofatopiccisde nedastherateatwhichdocumentsofthattopicappear,orequivalentlyastheinverseexpectedinterarrivaltimeE[c]�1ofthetopicc,wherec;i=tc;i�tc;i�1isthetimedi erencebetweentwosubsequentemailsfromthesametopicc.Anaturalmodelofinterarrivaltimesistheexponentialdistribution(Kleinberg,2003),i.e.,Exp(),withdensityp(j)=exp(�).Letus rstconsiderthecaseofasingletopic.Anavesolutiontoestimatingintensitydynamicswouldbetocomputeaverageintensitiesover xedtimewindows.Sincetheexponentialdistributionhasveryhighvari-ance,thisprocedureislikelynottobeveryrobust.Furthermore,itisnoteasytoselecttheappropriatelengthforthetimewindow,since,dependingonthetopicintensity,thesametimewindowwillcontainverydi erentnumbersofmessages.Also,fromtheperspec-tiveofidentifyingburstsinthedata,asetofdiscretelevelsofintensityispreferable(Aizenetal.,2004).Toovercometheseproblems,Kleinberg(2003)proposedaweightedautomatonmodel(WAM),anin nite-stateautomaton,whereeachstatecorrespondstoapartic-ulardiscretelevelofintensity.Foreachemail,atran-sitionismadeintheautomaton,wherebychangesinintensitiesarepenalized.ThiscanbeinterpretedasaHiddenMarkovModel,wherethesearchforthemostlikelyparametersoftheexponentiallydistributedtopicdeltasc;ireducestotheViterbialgorithm.SincetheWAMmodeloperatesonasingletopiconly,hardassignmentsofmessagestotopicshavetobemadeinadvance.Althoughclassi cationcanbedoneusingmethodsasdescribedin(Blei&La erty,2005;Segal&Kephart,1999),thesehardassignmentsim-plythattopicdetectionandidenti cationofburstsareseparated.However,ourintuitionisthattemporalinformationshouldhelpusassigntherighttopicandthatthetopicofanemailwillin uencetopicintensity.Forexample,ifweareworkingonatopicwithaveryhighintensityandthenextemailarrivesattherightmoment,thenthiswillin uenceourbeliefabouttheemail'stopic.Ontheotherhand,ifanemailarriveslateandweareverysureaboutitstopic,wewillhavetoreviseourbeliefabouttheintensityofthetopic.Inthefollowingsections,weproposeasuiteofmodelswhichsimultaneouslyreasonabouttopiclabelsandtopicintensities.InSection6weshowhowalittleclasstopicassignmentnoisecanconfuseWAM,whileourmodelstillidenti esthetruetopicintensitylevel.3.Classi cationandintensitytrackinginthedynamiccaseGivenastreamofdatapoints(wecanthinkofthemasemails)onKtopics(classes)togetherwiththeirarrivaltimes,(x1;t1),(x2;t2),(x3;t3),:::,wewanttosimultaneouslyclassifytheemailsintotopicsanddetectburstsintheintensityofeachofthetopics.Wehaveadataassociationproblem:Weobservethemessagedeltasi=ti�ti�1,thetimebetweenarrivalsofconsecutiveemails.One rstneedstoassociateeachemailwithacorrecttopicto ndthetopicdeltas,thetimebetweenmessagesofthesametopic.Giventhetopicdeltasonecanthendeterminethetopicintensity.Forexample,Figure1(a)showsarrivaltimesforemaildataandindicatesimportanceofthedataassociationpart.Eachdotrepresentsanemailmessageandweplotthemessagenumbervs.thetimeofamessage.Verticalpartsoftheplotcorrespondtoburstsofactiv-ity.Horizontalpartscorrespondtolowactivity(long DataAssociationforTopicIntensityTracking (CtjLt=l)argminfExp[(l1)];::;Exp[(lk)])g;(1)(tjLt=l)minfExp[(l1)];::;Exp[(lk)])g:(2)Bothconditionalprobabilitydistributions(CPDs)relyonexponentialorderstatistics:Theobservedmessagedeltaistheminimumofseveralexponentialdistribu-tions(Eq.2),whereastheselectedtopicisthecorre-spondingindexofthesmallestvariable(Eq.1).At rstglance,sincetheseCPDsrepresentcomplexorderstatistics,itisnotobviouswhethertheycanberep-resentedcompactlyandevaluatedeciently.Thefol-lowingresult(Trivedi,2002)givessimpleclosedformexpressionsfortheCPDs1and2:Proposition1Let1;:::;n�0andZ1Exp(1),...,ZnExp(n).ThenminfZ1;:::;ZngExp(Pjj)andP(Zi=minfZ1;:::;Zng)=i Pjj.UsingtheseCPDs,wearriveatthemodelpresentedinFigure2(a).WeretaintheintensityprocessesL(k)t,butinsteadofkeepingtrackof(k)t,thetimeoflastemailofeachtopic,andderivingthetopiclabelctfromit,weusetheintensitiesLtdirectlytomodelthetopicprior.Inthismodel,theassociationofmessagedeltas(timebetweenconsecutiveemails)totopicdeltas(timebetweenconsecutiveemailsofthesametopic)isim-plicitlyrepresented.WerefertothismodelasIDA{IT,ImplicitDataAssociationforIntensityTracking.Theorderstatisticssimpli cationisanapproximation,sinceingeneralthetopicintensitiesarenotconstantduringtheintervalbetweenemails.Ourmodelmakesthesimplifyingassumptionthatthetopiciscondition-allyindependentofthemessagedeltagiventhetopicintensities.However,ourexperimentalresultsindicatethatthisapproximationisverypowerfulandperformsverywellinpractice.Moreover,theIDA{ITmodelnowlendsitselftoexactinference(forasmallnumberoftopics).IDA{ITisasimpleextensionoftheFac-torialHiddenMarkovModel(Ghahramani&Jordan,1995),forwhichalargevarietyofecientapproxi-mateinferencemethodsarereadilyavailable.NotethattheIDA{ITmodelisaspecialcaseofcontin-uoustimemodelssuchascontinuoustimeBayesianNetworks(CTBNs)(Nodelmanetal.,2003).Unlikeourmodel,CTBNsareingeneralintractable,andonehastoresorttoapproximateinference(c.f.,Ngetal.,2005).4.2.IDA{ITT:UnsupervisedtopicandintensitytrackingInatrulydynamicsetting,suchasastreamofdocu-ments,wedonotonlyexpectthetopicintensitiestochangeovertime,butthevocabularyofthetopicitselfisalsolikelytochange,ane ectknownastopicdrift.Next,wepresentanextensionofIDA{ITmodelthat (a)IDA{IT (b)IDA{ITTFigure2. Proposedgraphicalmodels.(a)Implicit(andtractable)dataassociationandintensitytracking;(b)Im-plicitdataassociationwithintensityandtopictracking.alsoallowsfortrackingtheevolutionofthecontentofthetopics.HereweusetheSwitchingKalmanFiltertotrackthetimeevolutionofthewordsassociatedwitheachtopic.Werepresenteachtopicwithitscentroid{acenterofthedocumentsinthetopic.Asthetopiccontentchanges,theKalman ltertracksthecentroidofthetopicovertime.Sincerepresentingdocumentsinthebag{of{wordsfashionresultsinextremelyhighdimen-sionalspaces,wheremodelingtopicdriftbecomesdif- cult,weadoptthecommonlyusedLatentSemanticIndexing(Deerwesteretal.,1990)torepresentdocu-mentsasvectorsinalowdimensionalspace.UsingtheGaussianNaveBayesmodel,theobser-vationmodelfordocumentsbecomesP(Wt;ijCt=k)N(i;k;2i;k),wherewerepresenteachtopicbyitsmean(k)andvariance(k).Forsimplicityofpre-sentation,wewillassumethatonlythetopiccenterschangeovertime,whilevariancesremainconstant.As-suminganormalprioronthemean,andanormaldrift,i.e.,(k)t+1=(k)t+forN(0;"2),wecanmodelthetopicdrift(k)1;:::;(k)TbypluggingaSwitchingKalmanFilter(SKF)intoourIDA{ITmodel.WecallthismodelImplicitDataAssociationforIntensityandTopicTracking(IDA{ITT),presentedinFigure2(b).TheSKFmodel tsinthefollowingway:Thecon-tinuousstatevectort=((1)t;:::;(K)t)describesthepriorforthetopicmeans.Thelineartransitionmodelissimplytheidentity,i.e.,t+1=t+.Thismeansthatweexpectthepriortostayconstant,butallowasmallGaussiandrift.TheobservationmodelisaGaussiandistributiondependentonthetopic:Wtj[t;Ct=c]N(Hct;c).Hereby,Hcisama-trixselectingthemean(c)tfromthestatevectort.Forexample,inthecaseoftwoclasses,andthedoc-umentsrepresentedaspointsinR2,H1=(1;1;0;0)andH2=(0;0;1;1).Wecanestimatecfromtrain-ingdataandkeepitconstant,orassociateitwithaWishartprior.Inthispaper,weselectthe rstoptionforclarityofpresentation. DataAssociationforTopicIntensityTracking Unfortunately,wecannotexpecttodoexactinferenceanymore,sinceinferenceinsuchhybridmodelsisin-tractable(Lerner&Parr,2001).However,thereareverygoodapproximationsforinferenceinSwitchingKalmanFilters(Lerner,2002).Wewillbrie yexplainourapproachtoinferenceinSection4.5.4.3.ActiveLearningforIDA{ITTWealsoextendedofourmodeltothesemi-supervised,expert-guidedclassi cationcase,whereoccasionalex-pertlabelsforthehiddenvariablesareavailable,andinvestigatedanactivelearningmethodforselectingmostinformativesuchlabels.Duetospaceconstraintswedonotpresentthemodelderivationandexperi-mentalresultsforthiscase.Pleasereferto(Krauseetal.,2006)forfurtherdetailsonthemodelforthesemi-supervisedcase.4.4.GeneralizationsOurapproachisgeneral,inatleastthreeways.Firstly,asarguedinSection1,theapplicationisnotlimitedtodocumentstreams.Anotherpossibleapplicationofourmodelsisfaultdiagnosisinasystemofma-chineswithdi erentfailurerates,oractivityrecogni-tion,wheretheobservedpersonisworkingonseveraltasksinparallelwithdynamicintensities.Secondly,ourmodels twellinthesupervised,unsupervisedandsemi-supervisedcaseasdemonstratedinthepaper.Lastly,insteadofusingaNaveBayesclassi erasdonehere,anyothergenerativemodelforclassi cationcanbe\plugged"intoourmodel,suchasTANtrees(Fried-manetal.,1997)ormorecomplexgraphicalmodels.InsteadofusingLatentSemanticIndexingtorepre-sentdocuments,itispossibletousetopicmixtureproportionscomputedusingLatentDirichletAlloca-tion(LDA)(Bleietal.,2003)orsomeothermethod.IntheLDAexample,onecaneitherapplytheSKFdirectlytothenumericaltopicmixtureproportions,ortrackthemixtureproportionsusingtheDirichletdistribution(whichmakesinferencemoredicult).Mostgenerally,ourmodelcanbeconsideredasaprin-cipledwayofadaptingclasspriorsaccordingtoclassfrequencieschangingovertime.Insteadofassumingthatthetransitionprobabilitiesstayconstantbe-tweenanytwosubsequentevents,apossibleextensionistoletthemdependontheactualobservedmessagedeltas,bymodelingL(k)tascontinuous-timeMarkovprocesses.Weexperimentedwiththisextension,butdidnotobservesigni cantdi erenceinthebehavior,sinceinourdatasetstheactualobserveddeltaswereratheruniform(Figure1(b)).Similarly,theGaussiantopicdriftintheIDA{ITTmodelcanbemadede-pendentontheobservedmessagedelta,allowinglargerdriftswhentheintervalbetweenmessagesislonger.4.5.ScalabilityandimplementationdetailsForasmallnumberoftopics,exactinferenceintheIDA{ITmodelisfeasible.ThevariablesL(k)tandCtarediscrete,andthecontinuousvariablesareallob-served.Hence,thestandardforward-backwardandViterbialgorithmforHiddenMarkovModelscanbeusedforinference.Unfortunately,eventhoughthein-tensityprocessesL(k)tareallmarginallyindependent,theybecomefullyconnecteduponobservingthedoc-umentsandthearrivaltimes,andthetree-widthofthemodelincreaseslinearly{thecomplexityofexactinferenceincreasesexponentially{inthenumberoftopics.ExactinferencehascomplexityO(TK2jLj2K),whereListhesetofintensitylevels,andK;Tarethenumberoftopicsanddocuments,respectively.How-ever,thereareseveralalgorithmsavailableforap-proximateinferenceinsuchFactorialHiddenMarkovModels(Ghahramani&Jordan,1995).Weimple-mentedanapproachbasedonparticle ltering,andfully-factorizedmean eldvariationalinference.InSection6,wepresentresultsofourcomparisonofthesemethodswiththeexactinference.OurimplementationofthetopictrackingmodelIDA{ITTisbasedonthealgorithmforinferenceinSKFsproposedbyLerner(2002).Ateachtimestep,thealgorithmmaintainsabeliefstateoverpossiblelocationsofthetopiccenters,representedbyamix-tureofGaussians.Toavoidthemultiplicativeincreaseofmixturecomponentsduringeachtimeupdatestep,andtheresultingexponentialblow-upinrepresenta-tioncomplexity,themixtureiscollapsedintoamix-turewithfewercomponents.Inourimplementation,wekeepthefourcomponentswiththelargestweightfromeachtopicandeachintensity.5.ExperimentalsetupSyntheticdatasets.First,weevaluateourmodelsontwosyntheticdatasets.The rstdataset(S1)wasdesignedtotestwhetherimplicitdataassociationre-coverstruetopicintensitylevels.Foreachofthetwotopics,wegeneratedasequenceof300observations,withexponentiallydistributedtimedi erences.Everyhundredsamples,wechangedthetopicintensity,inthesequence[1 4;1 128;1 32]fortopic1and[1 128;1 32;1 4]fortopic2.TheobservedfeatureWtisanoisycopyofthetopicvariableCt,takingtheprobability0.9forcorrecttopictointroduceadditionalclassi cationuncertainty.Theseconddataset(S2)teststheresiliencetowardsnoiseintheassignmentsofmessagestotopics.Obser-vationsinthedatasetareuniformlyspacedfourhours DataAssociationforTopicIntensityTracking (a)Synthetic1 (b)Synthetic2 (c)Enrontopic1 (d)Enrontopic2 (e)Reuterstopic1 (f)Reuterstopic4Figure3. (a)Trueandrecoveredtopicintensities(topicdeltas)usingvariousinferencetechniqueswithIDA{ITonsyn-theticdataset1.(b)Classi cationnoiseconfusestraditionalapproachofseparateclassi cationandtopicintensitytracking.Bycouplingclassi cationandintensitytracking,IDA{ITrecoverstruetopicintensities.(c)-(f):ComparisonofIDA{ITandWAMonEnronandReutersdatasets.Weplotintensitylevelvs.messagenumber.Dashedlinepresentstrueintensityandsolidlinespresentrecoveredintensitylevel.WecircledtheareaswhereWAMmodelsigni cantlydeviatesfromthetruth.Onlyinonecase( rstellipsein(c))doesWAMperformbetter.howintheReutersdataset(Figure1(b))theobservedmessagedeltasarealmostuniform,buttheindividualtopicsexhibitstrongburstsofactivity(sharpverticaljumpsontheplot).Figure3showstheresultsonintensitytracking.Fig-ures3(c)and3(d)compareourIDA{ITwiththetra-ditionalapproach,whereeachmessageis rstassignedatopicandthenWAMisrunseparatelyoneachtopic.WecircledthespotswheretheWAMmodelgetsconfusedduetomisclassi cationsanddeterminesthewrongintensitylevel.Onthecontrary,IDA{ITcancompensateforclassi cationnoiseandmoreaccu-ratelyrecovertrueintensitylevels.Similarly,Figures3(e)and3(f)showtheresultsfor2outof4topicsfromtheReutersdataset.Noticehowtopic1interchangesthelowandhighactivityandhowusinghardclassi cationwithWAMmodelmissessev-eraltransitionsbetweenintensities.Inadata-miningapplicationaimingatthedetectionofbursts,theselapseswouldbehighlyproblematic.6.2.Improvedclassi cationIntheprevioussection,weshowedhowcouplingclas-si cationandintensitytrackingbettermodelsthein-tensitythanifclassi cationandtrackingaredonesep-arately.Next,weevaluatehowclassi cationaccuracyisin uencedbycombiningitwithintensitytracking.Figure4comparestheoverallclassi cationerrorofthebaseline,theGaussianNaveBayesclassi er,withtheproposedIDA{ITmodel.Weran3experiments:Enronemails,topics1and2fromReutersandall4Reuterstopics.Weusedsamepreprocessingofthedataasintheotherexperiments(seeSection5).ForEnronwedeterminedasetofintensitylevels1,1 4,1 16,1 64andthetransitionprobabilityof0:1usingcross-validation.TheerrorrateofGaussianNaveBayes(GNB)is0.053,IDA{ITscores0.036,whichisa32%relativedecreaseoferror.WerantwoexperimentswithReuters.Forbothexper-imentsweusedintensitylevels1 2,1 8,1 32andtransitionprobability0:2.In rstexperiment,weusedonlytop-ics1and2.Classi cationerrorofGNBis0.121anderrorofIDA{ITis0.068,whichmeans45%relativede-creaseinclassi cationerror.Thesecondexperimentuses4topics,sotheoverallperformanceofbothclassi- ersislower,butwestillget22%relativeimprovementinclassi cation.Note,thatourmodelsdonothaveanexplicitclasspriorbutmodelitthroughtopicintensity.Thishasthee ectthatthetopicwhichiscurrentlyathighactivityalsohashigherpriortopicprobability.Thereforetheprecisionoftheburstytopicincreasesatthecostof DataAssociationforTopicIntensityTracking Figure4.ReductioninerrorforEnronandReutersreducedrecallforofclasseswithlowerintensity.Thisusuallyleadstooverallimprovementofclassi cationaccuracy,buttherearecaseswhereimprovementismarginalorevendecreasesduetothelowerrecallonlowintensitytopics.6.3.TopictrackinginunsupervisedcaseNext,wepresenttheapplicationofimplicitdataas-sociationandintensitytrackingmodeltotheunsu-pervisedsetting,whereweareusingtheSwitchingKalmanFilterasintroducedinIDA{ITT(section4.2).ForthisexperimentwechosetwoReuterstopics,wholesalepricesandenvironmentissues.UsingLSI,wereducethedimensionalityofdatatotwodimen-sions.Wethenrepresenteachdocumentasapointinthistwo-dimensionalspaceanduseIDA{ITTtotracktheevolutionofcontentandintensityofthetopics.Exploringthemostimportantwordsfromtheclustercentroidoftopicwholesaleprices,measuredbymagni-tudeofLSIcoecients,weseethatwordseconomist,price,bank,index,industry,percentareimportantthroughoutthetime.However,atthebeginningandtheendofthedataset,importantwordsarealsobu-reau,indicator,national,oce,period,report.ThenforfewweeksinDecemberandearlyJanuarythetopicdriftstowardsexpected,higher,impact,market,strong,whicharetermsusedwhenlastyear'strendsareana-lyzedandestimatesfornextyearareannounced.7.ConclusionWepresentedageneralapproachtosimultaneousclas-si cationofastreamofdatapointsandidenti cationofburstsinclassintensity.Unlikethetraditionalap-proach,wesimultaneouslyaddressesdataassociation(classi cation,clustering)andintensitytracking.WeshowedhowtocombineanextensionofFactor-ialHiddenMarkovmodelsfortopicintensitytrackingwithexponentialorderstatisticsforimplicitdataasso-ciation,whichallowsecientinference.Additionally,weappliedaswitchingKalmanFiltertotrackthetimeevolutionofthewordsassociatedwitheachtopic.Ourapproachisgeneralinthesensethatitcanbecombinedwithavarietyoflearningtechniques.Wedemonstratedthis exibilitybyapplyingitinasuper-visedandunsupervisedsetting.Extensiveevaluationonrealandsyntheticdatasetsshowedthattheinter-playofclassi cationandtopicintensitytrackingim-provestheaccuracyofbothclassi cationandintensitytracking.Acknowledgements.WewouldliketothankDavidBleiforhelpfuldiscussions.Thisworkwassup-portedbyNSFGrantsNo.CNS-0509383,IIS-0209107IIS-0205224INT-0318547SENSOR-0329549EF-0331657IIS-0326322,PennsylvaniaInfrastructureTechnologyAlliance(PITA)andadonationfromHewlett-Packard.CarlosGuestrinwaspartlysup-portedbyanAlfredP.SloanFellowship.ReferencesAizen,J.,Huttenlocher,D.,Kleinberg,J.,&Novak,A.(2004).Trac-basedfeedbackontheweb.Proc.Natl.Acad.Sci.,101,5254{5260.Allan,J.,Papka,R.,&Lavrenko,V.(1998).On-lineneweventdetectionandtracking.SIGIR'98.Blei,D.,&La erty,J.(2005).Correlatedtopicmodels.NIPS'05.Blei,D.M.,Ng,A.Y.,&Jordan,M.I.(2003).Latentdirichletallocation.JMLR.Deerwester,S.C.,Dumais,S.T.,Landauer,T.K.,Furnas,G.W.,&Harshman,R.A.(1990).Indexingbylatentsemanticanalysis.J.oftheAm.Soc.ofInf.Sci.,41.Friedman,N.,Geiger,D.,&Goldszmidt,M.(1997).Bayesiannetworkclassi ers.MachineLearning,29.Ghahramani,Z.,&Jordan,M.I.(1995).FactorialhiddenMarkovmodels.NIPS'95.Kleinberg,J.(2003).Burstyandhierarchicalstructureinstreams.KDD'03.Krause,A.,Leskovec,J.,&Guestrin,C.(2006).Dataas-sociationfortopicintensitytracking(TechnicalReportCMU-ML-06-100).CarnegieMellonUniversity.Lerner,U.(2002).Hybridbayesiannetworksforreasoningaboutcomplexsystems.Ph.d.thesis,StanfordUniversity.Lerner,U.,&Parr,R.(2001).Inferenceinhybridnetworks:Theoreticallimitsandpracticalalgorithms.UAI.Ng,B.,Pfe er,A.,&Dearden,R.(2005).Continuoustimeparticle ltering.IJCAI.Nodelman,U.,Shelton,C.,&Koller,D.(2003).Learningcontinuoustimebayesiannetworks.UAI.Segal,R.B.,&Kephart,J.O.(1999).Mailcat:anintelli-gentassistantfororganizinge-mail.AGENTS'99.Swan,R.,&Allan,J.(2000).Automaticgenerationofoverviewtimelines.SIGIR'00.Trivedi,K.(2002).Probabilityandstatisticswithreliabil-ity,queuing,andcomputerscienceapplications.PrenticeHall.Yang,Y.,Ault,T.,Pierce,T.,&Lattimer,C.W.(2000).Improvingtextcategorizationmethodsforeventtrack-ing.SIGIR'00.