cmuedu Jure Leskovec jurecscmuedu Carlos Guestrin guestrincscmuedu School of Computer Science Carnegie Mellon University Pittsburgh PA USA Abstract We present a uni64257ed model of what was tradi tionally viewed as two separate tasks data asso ciatio ID: 26649
Download Pdf The PPT/PDF document "Data Association for Topic Intensity Tra..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
DataAssociationforTopicIntensityTracking Inthefollowingsectionswewilluseemailtopicdetec-tionandtrackingasourrunningexample.Wealsousethetermstopicandclassassynonyms.Alsonote,thatourapproachisnotlimitedtothetextdomain.Allourmethodsaregeneralinasensethattheycanbeap-pliedtoanyproblemwithsimultaneousclassicationandclassintensitytracking(e.g.,activityrecognition).2.ClassicationandintensitytrackinginthestaticcaseTraditionally,classicationreferstothetaskofassign-ingaclasslabelctoanunlabeledexamplex,givenasetoftrainingexamplesxiandcorrespondingclassesci.Classicationcanbeperformedbycalculatingtheprobabilitydistributionovertheclassassignments,P(cjx),usingBayes'rule,P(cjx)/P(c)P(xjc),wheretheclasspriorP(c)andconditionalprobabilityofthedataP(xjc)areestimatedfromthetrainingset.Workintheareasofclustering,topicdetectionandtracking,e.g.,(Allanetal.,1998;Yangetal.,2000),andtextmining,e.g.,(Swan&Allan,2000;Bleietal.,2003),hasexploredtechniquesforidentifyingtopicsindocumentstreamsusingacombinationofcontentanalysisandtime-seriesmodeling.Mostofthesetech-niquesareguidedbytheintuitionthattheappearanceofatopicinadocumentstreamissignaledbyaburst,asharpincreaseofintensityofdocumentarrivals.Forexample,intheproblemofclassifyingemailsintotop-ics,thefocusofattentionmightchangefromonetopictoanotherandhencetakingintoaccountthetopicin-tensityshouldhelpusintheclassicationtask.Todenethenotionofintensity,considerataskwherewearegivenasequenceofnemailmessages,x1;:::;xn,andareaskedtoassignatopicctoeachemail.Wealsoobservethemessagearrivaltimest1;:::;tn.Theintensitycofatopiccisdenedastherateatwhichdocumentsofthattopicappear,orequivalentlyastheinverseexpectedinterarrivaltimeE[c]1ofthetopicc,wherec;i=tc;itc;i1isthetimedierencebetweentwosubsequentemailsfromthesametopicc.Anaturalmodelofinterarrivaltimesistheexponentialdistribution(Kleinberg,2003),i.e.,Exp(),withdensityp(j)=exp().Letusrstconsiderthecaseofasingletopic.Anavesolutiontoestimatingintensitydynamicswouldbetocomputeaverageintensitiesoverxedtimewindows.Sincetheexponentialdistributionhasveryhighvari-ance,thisprocedureislikelynottobeveryrobust.Furthermore,itisnoteasytoselecttheappropriatelengthforthetimewindow,since,dependingonthetopicintensity,thesametimewindowwillcontainverydierentnumbersofmessages.Also,fromtheperspec-tiveofidentifyingburstsinthedata,asetofdiscretelevelsofintensityispreferable(Aizenetal.,2004).Toovercometheseproblems,Kleinberg(2003)proposedaweightedautomatonmodel(WAM),aninnite-stateautomaton,whereeachstatecorrespondstoapartic-ulardiscretelevelofintensity.Foreachemail,atran-sitionismadeintheautomaton,wherebychangesinintensitiesarepenalized.ThiscanbeinterpretedasaHiddenMarkovModel,wherethesearchforthemostlikelyparametersoftheexponentiallydistributedtopicdeltasc;ireducestotheViterbialgorithm.SincetheWAMmodeloperatesonasingletopiconly,hardassignmentsofmessagestotopicshavetobemadeinadvance.Althoughclassicationcanbedoneusingmethodsasdescribedin(Blei&Laerty,2005;Segal&Kephart,1999),thesehardassignmentsim-plythattopicdetectionandidenticationofburstsareseparated.However,ourintuitionisthattemporalinformationshouldhelpusassigntherighttopicandthatthetopicofanemailwillin uencetopicintensity.Forexample,ifweareworkingonatopicwithaveryhighintensityandthenextemailarrivesattherightmoment,thenthiswillin uenceourbeliefabouttheemail'stopic.Ontheotherhand,ifanemailarriveslateandweareverysureaboutitstopic,wewillhavetoreviseourbeliefabouttheintensityofthetopic.Inthefollowingsections,weproposeasuiteofmodelswhichsimultaneouslyreasonabouttopiclabelsandtopicintensities.InSection6weshowhowalittleclasstopicassignmentnoisecanconfuseWAM,whileourmodelstillidentiesthetruetopicintensitylevel.3.ClassicationandintensitytrackinginthedynamiccaseGivenastreamofdatapoints(wecanthinkofthemasemails)onKtopics(classes)togetherwiththeirarrivaltimes,(x1;t1),(x2;t2),(x3;t3),:::,wewanttosimultaneouslyclassifytheemailsintotopicsanddetectburstsintheintensityofeachofthetopics.Wehaveadataassociationproblem:Weobservethemessagedeltasi=titi1,thetimebetweenarrivalsofconsecutiveemails.Onerstneedstoassociateeachemailwithacorrecttopictondthetopicdeltas,thetimebetweenmessagesofthesametopic.Giventhetopicdeltasonecanthendeterminethetopicintensity.Forexample,Figure1(a)showsarrivaltimesforemaildataandindicatesimportanceofthedataassociationpart.Eachdotrepresentsanemailmessageandweplotthemessagenumbervs.thetimeofamessage.Verticalpartsoftheplotcorrespondtoburstsofactiv-ity.Horizontalpartscorrespondtolowactivity(long DataAssociationforTopicIntensityTracking (CtjLt=l)argminfExp[(l1)];::;Exp[(lk)])g;(1)(tjLt=l)minfExp[(l1)];::;Exp[(lk)])g:(2)Bothconditionalprobabilitydistributions(CPDs)relyonexponentialorderstatistics:Theobservedmessagedeltaistheminimumofseveralexponentialdistribu-tions(Eq.2),whereastheselectedtopicisthecorre-spondingindexofthesmallestvariable(Eq.1).Atrstglance,sincetheseCPDsrepresentcomplexorderstatistics,itisnotobviouswhethertheycanberep-resentedcompactlyandevaluatedeciently.Thefol-lowingresult(Trivedi,2002)givessimpleclosedformexpressionsfortheCPDs1and2:Proposition1Let1;:::;n0andZ1Exp(1),...,ZnExp(n).ThenminfZ1;:::;ZngExp(Pjj)andP(Zi=minfZ1;:::;Zng)=i Pjj.UsingtheseCPDs,wearriveatthemodelpresentedinFigure2(a).WeretaintheintensityprocessesL(k)t,butinsteadofkeepingtrackof(k)t,thetimeoflastemailofeachtopic,andderivingthetopiclabelctfromit,weusetheintensitiesLtdirectlytomodelthetopicprior.Inthismodel,theassociationofmessagedeltas(timebetweenconsecutiveemails)totopicdeltas(timebetweenconsecutiveemailsofthesametopic)isim-plicitlyrepresented.WerefertothismodelasIDA{IT,ImplicitDataAssociationforIntensityTracking.Theorderstatisticssimplicationisanapproximation,sinceingeneralthetopicintensitiesarenotconstantduringtheintervalbetweenemails.Ourmodelmakesthesimplifyingassumptionthatthetopiciscondition-allyindependentofthemessagedeltagiventhetopicintensities.However,ourexperimentalresultsindicatethatthisapproximationisverypowerfulandperformsverywellinpractice.Moreover,theIDA{ITmodelnowlendsitselftoexactinference(forasmallnumberoftopics).IDA{ITisasimpleextensionoftheFac-torialHiddenMarkovModel(Ghahramani&Jordan,1995),forwhichalargevarietyofecientapproxi-mateinferencemethodsarereadilyavailable.NotethattheIDA{ITmodelisaspecialcaseofcontin-uoustimemodelssuchascontinuoustimeBayesianNetworks(CTBNs)(Nodelmanetal.,2003).Unlikeourmodel,CTBNsareingeneralintractable,andonehastoresorttoapproximateinference(c.f.,Ngetal.,2005).4.2.IDA{ITT:UnsupervisedtopicandintensitytrackingInatrulydynamicsetting,suchasastreamofdocu-ments,wedonotonlyexpectthetopicintensitiestochangeovertime,butthevocabularyofthetopicitselfisalsolikelytochange,aneectknownastopicdrift.Next,wepresentanextensionofIDA{ITmodelthat (a)IDA{IT (b)IDA{ITTFigure2. Proposedgraphicalmodels.(a)Implicit(andtractable)dataassociationandintensitytracking;(b)Im-plicitdataassociationwithintensityandtopictracking.alsoallowsfortrackingtheevolutionofthecontentofthetopics.HereweusetheSwitchingKalmanFiltertotrackthetimeevolutionofthewordsassociatedwitheachtopic.Werepresenteachtopicwithitscentroid{acenterofthedocumentsinthetopic.Asthetopiccontentchanges,theKalmanltertracksthecentroidofthetopicovertime.Sincerepresentingdocumentsinthebag{of{wordsfashionresultsinextremelyhighdimen-sionalspaces,wheremodelingtopicdriftbecomesdif-cult,weadoptthecommonlyusedLatentSemanticIndexing(Deerwesteretal.,1990)torepresentdocu-mentsasvectorsinalowdimensionalspace.UsingtheGaussianNaveBayesmodel,theobser-vationmodelfordocumentsbecomesP(Wt;ijCt=k)N(i;k;2i;k),wherewerepresenteachtopicbyitsmean(k)andvariance(k).Forsimplicityofpre-sentation,wewillassumethatonlythetopiccenterschangeovertime,whilevariancesremainconstant.As-suminganormalprioronthemean,andanormaldrift,i.e.,(k)t+1=(k)t+forN(0;"2),wecanmodelthetopicdrift(k)1;:::;(k)TbypluggingaSwitchingKalmanFilter(SKF)intoourIDA{ITmodel.WecallthismodelImplicitDataAssociationforIntensityandTopicTracking(IDA{ITT),presentedinFigure2(b).TheSKFmodeltsinthefollowingway:Thecon-tinuousstatevectort=((1)t;:::;(K)t)describesthepriorforthetopicmeans.Thelineartransitionmodelissimplytheidentity,i.e.,t+1=t+.Thismeansthatweexpectthepriortostayconstant,butallowasmallGaussiandrift.TheobservationmodelisaGaussiandistributiondependentonthetopic:Wtj[t;Ct=c]N(Hct;c).Hereby,Hcisama-trixselectingthemean(c)tfromthestatevectort.Forexample,inthecaseoftwoclasses,andthedoc-umentsrepresentedaspointsinR2,H1=(1;1;0;0)andH2=(0;0;1;1).Wecanestimatecfromtrain-ingdataandkeepitconstant,orassociateitwithaWishartprior.Inthispaper,weselecttherstoptionforclarityofpresentation. DataAssociationforTopicIntensityTracking Unfortunately,wecannotexpecttodoexactinferenceanymore,sinceinferenceinsuchhybridmodelsisin-tractable(Lerner&Parr,2001).However,thereareverygoodapproximationsforinferenceinSwitchingKalmanFilters(Lerner,2002).Wewillbrie yexplainourapproachtoinferenceinSection4.5.4.3.ActiveLearningforIDA{ITTWealsoextendedofourmodeltothesemi-supervised,expert-guidedclassicationcase,whereoccasionalex-pertlabelsforthehiddenvariablesareavailable,andinvestigatedanactivelearningmethodforselectingmostinformativesuchlabels.Duetospaceconstraintswedonotpresentthemodelderivationandexperi-mentalresultsforthiscase.Pleasereferto(Krauseetal.,2006)forfurtherdetailsonthemodelforthesemi-supervisedcase.4.4.GeneralizationsOurapproachisgeneral,inatleastthreeways.Firstly,asarguedinSection1,theapplicationisnotlimitedtodocumentstreams.Anotherpossibleapplicationofourmodelsisfaultdiagnosisinasystemofma-chineswithdierentfailurerates,oractivityrecogni-tion,wheretheobservedpersonisworkingonseveraltasksinparallelwithdynamicintensities.Secondly,ourmodelstwellinthesupervised,unsupervisedandsemi-supervisedcaseasdemonstratedinthepaper.Lastly,insteadofusingaNaveBayesclassierasdonehere,anyothergenerativemodelforclassicationcanbe\plugged"intoourmodel,suchasTANtrees(Fried-manetal.,1997)ormorecomplexgraphicalmodels.InsteadofusingLatentSemanticIndexingtorepre-sentdocuments,itispossibletousetopicmixtureproportionscomputedusingLatentDirichletAlloca-tion(LDA)(Bleietal.,2003)orsomeothermethod.IntheLDAexample,onecaneitherapplytheSKFdirectlytothenumericaltopicmixtureproportions,ortrackthemixtureproportionsusingtheDirichletdistribution(whichmakesinferencemoredicult).Mostgenerally,ourmodelcanbeconsideredasaprin-cipledwayofadaptingclasspriorsaccordingtoclassfrequencieschangingovertime.Insteadofassumingthatthetransitionprobabilitiesstayconstantbe-tweenanytwosubsequentevents,apossibleextensionistoletthemdependontheactualobservedmessagedeltas,bymodelingL(k)tascontinuous-timeMarkovprocesses.Weexperimentedwiththisextension,butdidnotobservesignicantdierenceinthebehavior,sinceinourdatasetstheactualobserveddeltaswereratheruniform(Figure1(b)).Similarly,theGaussiantopicdriftintheIDA{ITTmodelcanbemadede-pendentontheobservedmessagedelta,allowinglargerdriftswhentheintervalbetweenmessagesislonger.4.5.ScalabilityandimplementationdetailsForasmallnumberoftopics,exactinferenceintheIDA{ITmodelisfeasible.ThevariablesL(k)tandCtarediscrete,andthecontinuousvariablesareallob-served.Hence,thestandardforward-backwardandViterbialgorithmforHiddenMarkovModelscanbeusedforinference.Unfortunately,eventhoughthein-tensityprocessesL(k)tareallmarginallyindependent,theybecomefullyconnecteduponobservingthedoc-umentsandthearrivaltimes,andthetree-widthofthemodelincreaseslinearly{thecomplexityofexactinferenceincreasesexponentially{inthenumberoftopics.ExactinferencehascomplexityO(TK2jLj2K),whereListhesetofintensitylevels,andK;Tarethenumberoftopicsanddocuments,respectively.How-ever,thereareseveralalgorithmsavailableforap-proximateinferenceinsuchFactorialHiddenMarkovModels(Ghahramani&Jordan,1995).Weimple-mentedanapproachbasedonparticleltering,andfully-factorizedmeaneldvariationalinference.InSection6,wepresentresultsofourcomparisonofthesemethodswiththeexactinference.OurimplementationofthetopictrackingmodelIDA{ITTisbasedonthealgorithmforinferenceinSKFsproposedbyLerner(2002).Ateachtimestep,thealgorithmmaintainsabeliefstateoverpossiblelocationsofthetopiccenters,representedbyamix-tureofGaussians.Toavoidthemultiplicativeincreaseofmixturecomponentsduringeachtimeupdatestep,andtheresultingexponentialblow-upinrepresenta-tioncomplexity,themixtureiscollapsedintoamix-turewithfewercomponents.Inourimplementation,wekeepthefourcomponentswiththelargestweightfromeachtopicandeachintensity.5.ExperimentalsetupSyntheticdatasets.First,weevaluateourmodelsontwosyntheticdatasets.Therstdataset(S1)wasdesignedtotestwhetherimplicitdataassociationre-coverstruetopicintensitylevels.Foreachofthetwotopics,wegeneratedasequenceof300observations,withexponentiallydistributedtimedierences.Everyhundredsamples,wechangedthetopicintensity,inthesequence[1 4;1 128;1 32]fortopic1and[1 128;1 32;1 4]fortopic2.TheobservedfeatureWtisanoisycopyofthetopicvariableCt,takingtheprobability0.9forcorrecttopictointroduceadditionalclassicationuncertainty.Theseconddataset(S2)teststheresiliencetowardsnoiseintheassignmentsofmessagestotopics.Obser-vationsinthedatasetareuniformlyspacedfourhours DataAssociationforTopicIntensityTracking (a)Synthetic1 (b)Synthetic2 (c)Enrontopic1 (d)Enrontopic2 (e)Reuterstopic1 (f)Reuterstopic4Figure3. (a)Trueandrecoveredtopicintensities(topicdeltas)usingvariousinferencetechniqueswithIDA{ITonsyn-theticdataset1.(b)Classicationnoiseconfusestraditionalapproachofseparateclassicationandtopicintensitytracking.Bycouplingclassicationandintensitytracking,IDA{ITrecoverstruetopicintensities.(c)-(f):ComparisonofIDA{ITandWAMonEnronandReutersdatasets.Weplotintensitylevelvs.messagenumber.Dashedlinepresentstrueintensityandsolidlinespresentrecoveredintensitylevel.WecircledtheareaswhereWAMmodelsignicantlydeviatesfromthetruth.Onlyinonecase(rstellipsein(c))doesWAMperformbetter.howintheReutersdataset(Figure1(b))theobservedmessagedeltasarealmostuniform,buttheindividualtopicsexhibitstrongburstsofactivity(sharpverticaljumpsontheplot).Figure3showstheresultsonintensitytracking.Fig-ures3(c)and3(d)compareourIDA{ITwiththetra-ditionalapproach,whereeachmessageisrstassignedatopicandthenWAMisrunseparatelyoneachtopic.WecircledthespotswheretheWAMmodelgetsconfusedduetomisclassicationsanddeterminesthewrongintensitylevel.Onthecontrary,IDA{ITcancompensateforclassicationnoiseandmoreaccu-ratelyrecovertrueintensitylevels.Similarly,Figures3(e)and3(f)showtheresultsfor2outof4topicsfromtheReutersdataset.Noticehowtopic1interchangesthelowandhighactivityandhowusinghardclassicationwithWAMmodelmissessev-eraltransitionsbetweenintensities.Inadata-miningapplicationaimingatthedetectionofbursts,theselapseswouldbehighlyproblematic.6.2.ImprovedclassicationIntheprevioussection,weshowedhowcouplingclas-sicationandintensitytrackingbettermodelsthein-tensitythanifclassicationandtrackingaredonesep-arately.Next,weevaluatehowclassicationaccuracyisin uencedbycombiningitwithintensitytracking.Figure4comparestheoverallclassicationerrorofthebaseline,theGaussianNaveBayesclassier,withtheproposedIDA{ITmodel.Weran3experiments:Enronemails,topics1and2fromReutersandall4Reuterstopics.Weusedsamepreprocessingofthedataasintheotherexperiments(seeSection5).ForEnronwedeterminedasetofintensitylevels1,1 4,1 16,1 64andthetransitionprobabilityof0:1usingcross-validation.TheerrorrateofGaussianNaveBayes(GNB)is0.053,IDA{ITscores0.036,whichisa32%relativedecreaseoferror.WerantwoexperimentswithReuters.Forbothexper-imentsweusedintensitylevels1 2,1 8,1 32andtransitionprobability0:2.Inrstexperiment,weusedonlytop-ics1and2.ClassicationerrorofGNBis0.121anderrorofIDA{ITis0.068,whichmeans45%relativede-creaseinclassicationerror.Thesecondexperimentuses4topics,sotheoverallperformanceofbothclassi-ersislower,butwestillget22%relativeimprovementinclassication.Note,thatourmodelsdonothaveanexplicitclasspriorbutmodelitthroughtopicintensity.Thishastheeectthatthetopicwhichiscurrentlyathighactivityalsohashigherpriortopicprobability.Thereforetheprecisionoftheburstytopicincreasesatthecostof DataAssociationforTopicIntensityTracking Figure4.ReductioninerrorforEnronandReutersreducedrecallforofclasseswithlowerintensity.Thisusuallyleadstooverallimprovementofclassicationaccuracy,buttherearecaseswhereimprovementismarginalorevendecreasesduetothelowerrecallonlowintensitytopics.6.3.TopictrackinginunsupervisedcaseNext,wepresenttheapplicationofimplicitdataas-sociationandintensitytrackingmodeltotheunsu-pervisedsetting,whereweareusingtheSwitchingKalmanFilterasintroducedinIDA{ITT(section4.2).ForthisexperimentwechosetwoReuterstopics,wholesalepricesandenvironmentissues.UsingLSI,wereducethedimensionalityofdatatotwodimen-sions.Wethenrepresenteachdocumentasapointinthistwo-dimensionalspaceanduseIDA{ITTtotracktheevolutionofcontentandintensityofthetopics.Exploringthemostimportantwordsfromtheclustercentroidoftopicwholesaleprices,measuredbymagni-tudeofLSIcoecients,weseethatwordseconomist,price,bank,index,industry,percentareimportantthroughoutthetime.However,atthebeginningandtheendofthedataset,importantwordsarealsobu-reau,indicator,national,oce,period,report.ThenforfewweeksinDecemberandearlyJanuarythetopicdriftstowardsexpected,higher,impact,market,strong,whicharetermsusedwhenlastyear'strendsareana-lyzedandestimatesfornextyearareannounced.7.ConclusionWepresentedageneralapproachtosimultaneousclas-sicationofastreamofdatapointsandidenticationofburstsinclassintensity.Unlikethetraditionalap-proach,wesimultaneouslyaddressesdataassociation(classication,clustering)andintensitytracking.WeshowedhowtocombineanextensionofFactor-ialHiddenMarkovmodelsfortopicintensitytrackingwithexponentialorderstatisticsforimplicitdataasso-ciation,whichallowsecientinference.Additionally,weappliedaswitchingKalmanFiltertotrackthetimeevolutionofthewordsassociatedwitheachtopic.Ourapproachisgeneralinthesensethatitcanbecombinedwithavarietyoflearningtechniques.Wedemonstratedthis exibilitybyapplyingitinasuper-visedandunsupervisedsetting.Extensiveevaluationonrealandsyntheticdatasetsshowedthattheinter-playofclassicationandtopicintensitytrackingim-provestheaccuracyofbothclassicationandintensitytracking.Acknowledgements.WewouldliketothankDavidBleiforhelpfuldiscussions.Thisworkwassup-portedbyNSFGrantsNo.CNS-0509383,IIS-0209107IIS-0205224INT-0318547SENSOR-0329549EF-0331657IIS-0326322,PennsylvaniaInfrastructureTechnologyAlliance(PITA)andadonationfromHewlett-Packard.CarlosGuestrinwaspartlysup-portedbyanAlfredP.SloanFellowship.ReferencesAizen,J.,Huttenlocher,D.,Kleinberg,J.,&Novak,A.(2004).Trac-basedfeedbackontheweb.Proc.Natl.Acad.Sci.,101,5254{5260.Allan,J.,Papka,R.,&Lavrenko,V.(1998).On-lineneweventdetectionandtracking.SIGIR'98.Blei,D.,&Laerty,J.(2005).Correlatedtopicmodels.NIPS'05.Blei,D.M.,Ng,A.Y.,&Jordan,M.I.(2003).Latentdirichletallocation.JMLR.Deerwester,S.C.,Dumais,S.T.,Landauer,T.K.,Furnas,G.W.,&Harshman,R.A.(1990).Indexingbylatentsemanticanalysis.J.oftheAm.Soc.ofInf.Sci.,41.Friedman,N.,Geiger,D.,&Goldszmidt,M.(1997).Bayesiannetworkclassiers.MachineLearning,29.Ghahramani,Z.,&Jordan,M.I.(1995).FactorialhiddenMarkovmodels.NIPS'95.Kleinberg,J.(2003).Burstyandhierarchicalstructureinstreams.KDD'03.Krause,A.,Leskovec,J.,&Guestrin,C.(2006).Dataas-sociationfortopicintensitytracking(TechnicalReportCMU-ML-06-100).CarnegieMellonUniversity.Lerner,U.(2002).Hybridbayesiannetworksforreasoningaboutcomplexsystems.Ph.d.thesis,StanfordUniversity.Lerner,U.,&Parr,R.(2001).Inferenceinhybridnetworks:Theoreticallimitsandpracticalalgorithms.UAI.Ng,B.,Pfeer,A.,&Dearden,R.(2005).Continuoustimeparticleltering.IJCAI.Nodelman,U.,Shelton,C.,&Koller,D.(2003).Learningcontinuoustimebayesiannetworks.UAI.Segal,R.B.,&Kephart,J.O.(1999).Mailcat:anintelli-gentassistantfororganizinge-mail.AGENTS'99.Swan,R.,&Allan,J.(2000).Automaticgenerationofoverviewtimelines.SIGIR'00.Trivedi,K.(2002).Probabilityandstatisticswithreliabil-ity,queuing,andcomputerscienceapplications.PrenticeHall.Yang,Y.,Ault,T.,Pierce,T.,&Lattimer,C.W.(2000).Improvingtextcategorizationmethodsforeventtrack-ing.SIGIR'00.