/
Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real time story identification Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real time story identification

Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real time story identification - PDF document

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
393 views
Uploaded On 2017-04-06

Dense Subgraph Maintenance under Streaming Edge Weight Updates for Real time story identification - PPT Presentation

581 579 585 578 577 574 576 584 580 575 582 583 Figure1Realtimeidenticationof ID: 336717

581 579 585 578 577 574 576 584 580 575 582 583 Figure1:Real-timeidenticationof

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Dense Subgraph Maintenance under Streami..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

581 579 585 578 577 574 576 584 580 575 582 583 DenseSubgraphMaintenanceunderStreamingEdgeWeightUpdatesforReal­timeStoryIdenticationAlbertAngelUniversityofTorontoalbert@cs.toronto.eduNickKoudasUniversityofTorontokoudas@cs.toronto.eduNikosSarkasUniversityofTorontonsarkas@cs.toronto.eduDiveshSrivastavaAT&TLabs­Researchdivesh@research.att.comABSTRACTRecentyearshavewitnessedanunprecedentedproliferationofso-cialmedia.Peoplearoundtheglobeauthor,everyday,millionsofblogposts,micro-blogposts,socialnetworkstatusupdates,etc.Thisrichstreamofinformationcanbeusedtoidentify,onanongo-ingbasis,emergingstories,andeventsthatcapturepopularatten-tion.Storiescanbeidentiedviagroupsoftightly-coupledreal-worldentities,namelythepeople,locations,products,etc.,thatareinvolvedinthestory.Thesheerscale,andrapidevolutionofthedatainvolvednecessitatehighlyefcienttechniquesforidentifyingimportantstoriesateverypointoftime.Themainchallengeinreal-timestoryidenticationisthemain-tenanceofdensesubgraphs(correspondingtogroupsoftightly-coupledentities)understreamingedgeweightupdates(resultingfromastreamofuser-generatedcontent).Thisistherstworktostudytheefcientmaintenanceofdensesubgraphsundersuchstreamingedgeweightupdates.Forawiderangeofdenitionsofdensity,wederivetheoreticalresultsregardingthemagnitudeofchangethatasingleedgeweightupdatecancause.Basedonthese,weproposeanovelalgorithm,DYNDENS,whichoutper-formsadaptationsofexistingtechniquestothissetting,andyieldsmeaningfulresults.Ourapproachisvalidatedbyathoroughexper-imentalevaluationonlarge-scalerealandsyntheticdatasets.1.INTRODUCTIONRecentyearshavewitnessedanunprecedentedproliferationofsocialmedia.Millionsofpeoplearoundtheglobeauthoronadailybasismillionsofblogposts,micro-blogpostsandsocialnetworkstatusupdates.Thiscontentoffersanuncensoredwindowintocur-rentevents,andemergingstoriescapturingpopularattention.Forinstance,considertheU.S.militarystrikeinAbbottabad,PakistaninearlyMay2011,whichresultedinthedeathofOsamabinLaden.ThiseventwasextensivelycoveredonTwitter,thepop-ularmicro-bloggingservice,signicantlyinadvanceoftraditionalmedia,startingwiththelivecoverageoftheoperationbyan(unwit-ting)localwitness,tomillionsoftweetsaroundtheworldproviding Figure1:Real-timeidenticationof“binLadenraid”story,andconnectiontoENGAGEMENTamultifacetedcommentaryoneveryaspectofthestory.Similar,iffewer,onlinediscussionscoverimportanteventsonaneverydaybasis,frompoliticsandsports,totheeconomyandculture(no-tableexamplesfromrecentyearsrangefromthedeathofMichaelJackson,torevolutionsintheMiddleEastandtheeconomicre-cession).Inallcases,storieshaveastrongtemporalcomponent,makingtimelinessaprimeconcernintheiridentication.Interestingly,suchstoriescanbeidentiedbyleveragingthereal-worldentitiesinvolvedinthem(e.g.people,politicians,prod-uctsandlocations)[26].Thekeyobservationisthateachpostonthestorywilltendtomentionthesamesetofentities,aroundwhichthestoryiscentered.Inparticular,aspostlengthrestrictionsorconventionstypicallylimitthenumberofentitiesmentionedinasinglepost,eachpostwilltendtomentionentitiescorrespondingtoasinglefacetofastory.Thus,byidentifyingpairsofentitiesthatarestronglyassociated(recurrentlymentionedtogether),onecanimplicitlydetectfacetsoftheunderlyingeventofwhichtheyarethemainactors.Bypiecingtogethertheseaspects,theoveralleventofinterestcanbeinferred.Forexample,inthecaseoftheU.S.militarystrikementionedabove,onefacet,consistingofpeoplediscussingtheraid,iscen-teredaround“Abbottabad”wheretheraidtookplace,andthein-volvementofthe“C.I.A.”;anotherthreadcommentingonthepres-identialannouncement,involves“BarackObama”and“Osamabin Laden”;andsoon.Theresultingoverallstoryatsomepointoftimeinvolvestheunionoftheseentities.SuchsetsofentitiescanbethenusedbyusersofsystemssuchasGrapevine[3]toenabletheinteractiveexplorationofthestory.Givenameasuretoquantifythestrengthofassociationbetweentwoentities(suchastheLog-likelihoodratio[26],the2measure,orthecorrelation-coefcient[5],etc.),onecanabstractthereal-timestreamofpostsgivingrisetoanevolving(weighted)entitygraph,denotingthepairwiseentityassociationstrength1.Anim-portantstorycanthenbeidentiedviaacohesivegroupofstronglyassociatedentitypairs;i.e.adensesubgraphintheentitygraph,givenanappropriatedenitionofdensity.Moreover,notethat,astheentitiesinastoryneedtobepresentedtouserstofacilitatenav-igation,storycardinalityneedstobeconstrainedtomoderatesizes;afterall,itwouldnotbeveryinterestingorhelpfultopresentuserswithastorycenteredaround100mainentities.ThisprocessisillustratedinFigure1.Everypostthatispublished,resultsintheweightupdateofoneormoreedgesintheentitygraph.Thehighfrequencyofpostgeneration,coupledwithourneedfortimelyreportingofemerg-ingstories,necessitatesthattheidenticationofdensestructuresintheentitygraphbehighlyefcient.ThisworkthusaddressestheproblemofdENsesubGrAphmaintenanceforedGE-weightupdatestreaMsundersizEconstraiNTs,orENGAGEMENTforbrevity.Be-sidesbeingusefulas-isforidentifyingstoriesfromsocialmediainreal-time,solutionstothisproblemcanalsobeusedasbuildingblocksformorecomplexcomputations;e.g.identieddensesub-graphscanundergodiversicationbeforebeingpresentedtotheuser[2],ortheycanbererankedtakingtheirexternalsparsityintoaccount,inordertoidentify(soft)clustersofassociatedentities.AddressingENGAGEMENTatwebscalespresentsseveralchal-lenges.Principalamongtheseisthat,achangeintheweightofasingleedge,canimpactthedensityofmanysubgraphs,neces-sitatingapotentiallyunboundedexplorationoftheentitygraph.Thus,anyefcientsolutiontoENGAGEMENTneedstoincremen-tallymaintaindensesubgraphs,withoutrecomputingthemfromscratch.Moreover,theredoesnotexistasingledenitionofgraphdensitysuitableforallscenarios;selectingthemostappropriatedenitionforagivensettingdepends,forinstance,ontheperceivedrelativeimportanceofhavinglarge,versuswell-connected,densesubgraphs.Thus,solutionstoENGAGEMENTneedtobeapplicableundergeneralnotionsofdensity;however,existingtechniquesareonlyapplicabletolimitedsubsetsofthisproblem.Inthiscontext,inthisworkweproposeDYNDENS,anefcientalgorithmforENGAGEMENT.Wetheoreticallyquantifythemagni-tudeofchangeindensesubgraphsthatasingleedgeweightupdatecancause.Basedonthis,weshowhowmaintainingsomesparsesubgraphs,inadditiontodenseones,enablestheincrementalmain-tenanceofdensesubgraphs.Theresultingalgorithm,DYNDENS,makesuseofanefcientindexforsubgraphs,whichdecreasesmemoryconsumptionandprocessingeffort.Itiscomplementedbytheoreticallysoundheuristics,thatcanofferimprovedperfor-mance.Acomprehensiveexperimentalevaluationonrealandsyn-theticdatahighlightstheeffectivenessofourapproach.Tosummarize,ourmaincontributionsinthisworkare:i)Motivatedbytheneedtoidentifyemergingstoriesinreal-time,forawiderangeofmeasuresofentityassociation,wefor-malizetheproblemofdENsesubGrAphmaintenanceforedGE-weightupdatestreaMsundersizEconstraiNTs(ENGAGEMENT),foraverybroadnotionofgraphdensity.ii)WeproposeanefcientalgorithmDYNDENS,basedona 1Theassociationmeasurecanalsoincorporatenotionsofrecencyofassociation,e.g.byincludingsomeformoftemporaldecay.novelquanticationofthemaximumpossiblechangecausedbyasingleedgeweightupdate.Bymaintainingasmallnumberofsparsesubgraphs,DYNDENSisabletoefcientlyandincremen-tallycomputedensesubgraphs.iii)Wedesignanefcientdensesubgraphindex,whichdecreasesmemoryconsumptionandprocessingeffort,andproposetheoreti-callysoundheuristicsforDYNDENSthatcanofferimprovedper-formance.iv)Wevalidateourtechniquesviaathoroughexperimentaleval-uationonbothrealandsyntheticdatasets.Theremainderofthispaperisorganizedasfollows:Afterpro-vidingaformalproblemstatementinSection2,wepresentourproposedalgorithmDYNDENSinSection3.Weexplorethethe-oreticalbasisforDYNDENSinSection4,evaluatetheproposedtechniquesinSection5,anddiscusssomeimprovementstoDYN-DENSinSection6.Finally,wereviewrelatedworkinSection7,andconcludeinSection8.2.FORMALIZATIONLetusnowturntodeningENGAGEMENT.Atahighlevel,letusconsideraweightedgraph,withaconstantnumberofvertices.Ateverydiscretetimeinterval,theweightsofoneormoreedgesareadjusted(includingpotentiallyedgeadditionsandremovals).Thegoalistomaintain,ateachpointoftime,allsubgraphswith“density”greaterthanagiventhreshold.Connectionstoreal-timestoryidentication:Beforefullyformalizingtheproblem,letusrstdrawsomeconnectionstoitsapplicationinreal-timestoryidentication.Inthiscontext,ver-ticescorrespondtoreal-worldentities,andedgeweightstotheir(current)pairwiseassociationstrengths(thechoiceofassociationstrengthmeasurewilldependoncharacteristicsofthespecicprob-leminstance;inSection5wediscussseveralsuchchoices).Weassumethataprocedureexistsforprocessingstreamsof(entity-annotated2)posts,andgeneratingtheappropriateedgeweightup-datesateachtimeinterval(inSection5wediscusssuchproceduresforavarietyofmeasuresofinterest).Datamodel:Werepresenttheproblemdomainasi)acompleteweightedgraphG=(V;E)withNvertices,wherewijistheweightofedgebetweennodesiandj;andii)astreamofedgeweightupdatesoftheformupdatei=(a;b;),signifyingthatattimeinstanti,theweightoftheedgebetweenverticesaandbchangedfromwabtowab+.Density:Wedenesubgraphdensityasfollows:foreverysub-graphCV,itsdensityisdens(C)=score(C) SjCj,wherescore(C)=Pi;j2C^ij(wij).Snisafunctionquantifyingtherelativeim-portanceofasubgraph'scardinality,n,toitsdensity;withtheap-propriatechoiceofSn,virtuallyallquanticationsofgraphdensitycanberepresented.Notethatwedonotconsidercounter-intuitivequanticationsofgraphdensity,suchas(butnotlimitedto)adenitionofdensitywheretheremovalofavertexfromanunweightedcliqueresultsinanincreaseofitsdensity.Tosafeguardagainstsuchquanticationsofdensity,werequirethatSnhavethefollowingintuitivemono-tonicityproperties:n n1Sn Sn1n n2.3Thisencompasses 2Thepreciseprocedureusedforidentifyingnamedentitiesindoc-uments,e.g.[3],isorthogonaltothiswork.3ObservethatifSn Sn1�n n2,thedensityofanunweightedcliquewillincreaseifverticesareremoved.Moreover,observethatifn n1�Sn Sn1,inanunweightedgraphthedensityofan3-vertexcliqueK3willincreaseifitisaugmentedbyasinglevertex,con-nectedwithasingleedgetooneofthecliquevertices. thefullspectrumofchoicesofdensityfunctionscommonlyusedintheliterature;typicalchoicesincludeSn=n(n1) 2(thusden-sityisdenedastheaverageedgeweight,favoringsmall,densesubgraphs;wetermthisinstantiationAVGWEIGHT),andSn=n(thusdensityrepresentsageneralizedaveragenode“degree”,fa-voringlargesubgraphs;wetermthiscaseAVGDEGREE).Cardinalityconstraint:Finally,letNmaxbea(user-specied)maximumcardinalityforsubgraphsofinterest.(Inthecontextofreal-timestoryidentication,thisconstraintensuresthatanysub-graphsidentiedaresmallenoughtobeusedfornavigation/ex-plorationpurposes-cf.Section1).ENGAGEMENT:Giventheabove,thegoalofENGAGEMENTistomaintain,ateverypointoftimei,thesubgraphs(vertexsubsets)withdensityoveragiventhresholdT,subjecttocardinalitycon-straints,i.e.fVjjVjV^dens(Vj)T^jVjjNmaxg.Wetermtheseoutput-densesubgraphs.Notation:Beforegoingintothedetailsofourproposedap-proach,letusintroducesomeusefulnotation.Wedenoteeachvertexbyanaturalnumber,soV=f1;;NgdenotesthesetofverticesinG.Let^eibethei'thbasisvector(anN-dimensionalvector,withvalue1initsi'thcoordinate,and0elsewhere).WewilldenoteasubsetCVbyitscorrespondingvector~c=Pi2C^ei,andwillsometimesrefertoeitherinterchangeably;wewillalsoonoccasiondenotethecardinalityofsubsetCasj~cj.Let~ubetheneighborhoodvectorofvertexu:~u=(w1u;w2u;;wNu).Forconvenience,wewillalsomakeuseofthefollowingnor-malizedversionofSn:Letgn=Sn n(n1).BythemonotonicitypropertiesofSn,itfollowsthatgngn1.Unlessexplicitlystated,wewillfocusonthetimeinstantwheretheweightoftheedgebetweenverticesaandbisupdatedfromwab=wtow+.WheneveraquantityXcanbeaffectedbythisupdate,wewilldenoteitsvaluebeforetheupdateasX-anditsvalueaftertheupdateasX+.Weomitthissuperscriptwhenitdoesnotaffectresultsinanyway.Forexample,wab-=w,wab+=w+.3.THEDYNDENSAPPROACHLetusnowdiscusshowourproposedalgorithm,DYNDENS,identies,ateverypointoftime,alloutput-densesubgraphs.Densesubgraphsandgrowthproperty:ObservethatthereisaninherenttradeoffinthesetofsubgraphsthatDYNDENSwillmaintain,whichweterm“dense”subgraphs.Atoneextreme,DYN-DENScouldopttomaintainonlyoutput-densesubgraphs,withtheotherextremebeingtomaintainallsubgraphs.However,neitheroftheseisdesirable:theformerbecauseitdoesnotenableincre-mentalcomputationofoutput-densesubgraphs,thelatterduetoitsprohibitivecosts.Wewillsubsequently(Section4.2)formallyquantifythistradeoff.Fornow,looselyspeaking,wewillsaythatCisadensesubgraphiffithasdensitygreaterthanagiventhresh-oldTjCj(whichisafunctionofthecardinalityofC),andcardi-nalityofatmostNmax(foracompletelistofdensity-relatedtermsusedinthisworkcf.Table1).Tnisdenedinamannerthaten-suresthateverydensegraphwithnverticeshasatleastonedensesubgraphwithn1vertices(thusitispossibletoidentifyalldensesubgraphsby“growing”densesubgraphsofsmallercardinalities).Specically,TnisamonotonicallyincreasingfunctionofnwiththepropertyTngn�Tn1gn1.Atahighlevel,thismono-tonicitypropertyensuresthedesiredcontainmentpropertymen-tionedearlier(seeSection4fordetails4).Moreover,werequire 4Anotherwaytoviewdensegraphsisthefollowing:ConsidertheTable1:Denitionsofdensity-relatedproperties SubgraphCis iff Staticproperties dense dens(C)TjCj sparse dens(C)TjCj output-dense dens(C)T too-dense dens(C)TjCj+1 Dynamicproperties stable-dense dens(C)-TjCj^dens(C)+TjCj newly-dense dens(C)-TjCj^dens(C)+TjCj losing-dense dens(C)-TjCj^dens(C)+TjCj Table2:Summaryofmainsymbolsused Symbol Description V Setofverticesingraph N Numberofverticesingraph wij Weightofedgebetweenverticesiandj ~u Neighborhoodvectorofvertexu dens(C) DensityofC dens(C)=Pi;j2C^ij(wij) SjCj Sn Quantiesrelativeimportanceofsubgraph cardinalityntodensity gn NormalizedversionofSn:gn=Sn n(n1) AVGWEIGHT CasewhereSn=n(n1)=2 SQRTDENS CasewhereSn=p n(n1) AVGDEGREE CasewhereSn=n Nmax Max.cardinalityofsubgraphtobereturned T Min.densityforasubgraphtobereturned Tn Min.densityforsubgraphofcardinalityntobedense it TunableparameterofDYNDENS,inuencesTn a;b Verticesthatwerejustupdated x- quantityxbeforetheupdate x+ quantityxaftertheupdate w Weightofedge(a;b)beforetheupdate,ie.wab- w+ Weightofedge(a;b)aftertheupdate,ie.wab+ thatTNmax=T.5WediscusstheconcreteinstantiationofTnusedbyDYNDENSinSection4.2.Edgeweightupdates:ThebasicoperationofDYNDENSistomaintaindensesubgraphs,followingtheupdateoftheweightofanedge(a;b),fromwtow+.Ifthisimpactsthesetofoutput-densesubgraphs,thelatterisupdatedaswell.Handlingupdateswith0(i.e.wheretheweightofanedgedecreases)isstraightforward:alldensesubgraphscontainingbothaandbareexamined,andtheirdensityisdecreasedbyanappropriateamount.Iftheyarenolongeroutput-dense,thisisreported;if,inaddition,theyarenolongerdense(losing-dense),theyareevictedfromtheindex.Positiveupdates:Ofgreaterinterestisthecasewhere&#x-3.6;⚃0,i.e.theedgeweightupdatecorrespondstoanincreaseinweight.Inthiscase,additionalsubgraphs,thatwerenotdensepriortotheupdate,mightnowbedense(newly-densesubgraphs).DYNDENSleveragesthegrowthpropertytocomputetheseasfollows: measurenormDens(C)=dens(C) TjCj,consistingofadensitymea-sure,normalizedbythethresholdfunctionTn;agraphCisdenseiffithasnormDens(C)1.WhilenormDens(C)isnotasuitablemeasureofdensityperse,ithasthefollowingimportantgrowthproperty:everygraphChasasubgraphC0ofcardinalityjC0j=jCj1withnormDens(C0)normDens(C).Thiscontainment/growthpropertyadditionallyimpliesthat,iftherearenodensesubgraphsofcardinalityn,therecanbenodensesub-graphsofanycardinality�n.5RecallthatTnisanincreasingfunctionofn,andthesetofmain-tainedsubgraphsneedstoincludealloutput-densesubgraphsofcardinalityNmaxhavingdensityT. Algorithm1AlgorithmDYNDENS Input:Updatededge(a;b),magnitudeofupdate1:if0then2:Updatethedensityofalldensesubgraphscontainingaandb;evictlosing-densesubgraphsfromtheindex;reportanysubgraphsthatarenolongeroutput-dense3:return4:foralldensesubgraphsCst.a2C_b2Cdof//includingC=fa;bgifitisnewly-denseg5:ifa=2Corb=2Cthen6:ifCshouldbecheap-exploredandC[fa;bgisnewly-densethen7:AddC[fa;bgtotheindex,reportitifitisoutput-dense8:explore(C[fa;bg;2)9:else10:UpdatethedensityofC,reportitifitjustbecameoutput-dense11:explore(C;1) Cheapexplore:DYNDENSwilltrytoaugmentalldensesub-graphscontainingeitheraorb,withbora,respectively;resultingnewly-densesubgraphswillbeinsertedintothedensesubgraphindex.Insomecases,thisstepaloneissufcientand/orcanbeap-pliedonlytoasubsetofthesesubgraphs(cf.Section6)fordetails).Explore:DYNDENSwilltrytoaugmentdensesubgraphscon-tainingbothaandb,withoneneighboringvertex;resultingnewly-densesubgraphswillbeinsertedintothedensesubgraphindex.Explorationiterations:Theaboveproceduremayneedtobeperformediterativelyfornewly-densesubgraphsdiscoveredviaex-plorationorcheapexploration.Interestingly,theiterationdepthisupperboundedbyacorollaryofthegrowthproperty.Specically,inSection4.2,wedeneTnparametrizedbyaparameteritthatindirectlycontrolsthenumberofdensesubgraphsmaintainedbyDYNDENS.AsweshowinSection4,wecanguaranteethatatmostd iteiterativeexplorationiterationsneedtobeperformed,inordertoidentifyallnewly-densesubgraphs,followinganedgeweightupdateofmagnitude.Exploreall:Inafewcases,theaboveexplorationmayneedtobeperformedonnon-neighboringnodesaswell,resultinginaverycostlyprocedure.Inmostcases,DYNDENSavoidsperformingthisprocedureviaabetter,implicitrepresentationofsomedensesubgraphsintheindex(cf.Section3.2.3).Inonesentence,DYNDENSexplorestheneighborhoodofsomematerializeddensesubgraphs,usingpruningconditionsforwhentostopexploringaroundasubgraph.Theremainderofthissectionaimstollintheblanksintheprecedingsentence.WediscusstheworkingsofDYNDENS,andillustratethemwithapracticalexampleinSection3.1,followedbyimportanttechnicaldetailsinSection3.2.WedefertheexpositionofthetheoreticalresultsonwhichDYNDENSisbasedtillSection4.3.1TheDYNDENSAlgorithmLetusnowdiscussDYNDENSingreaterdetail,withreferencetoAlgorithm1.Atahighlevel,DYNDENSmaintainsanin-memoryindexofalldensesubgraphs(wedeferdiscussingindeximplemen-tationdetailstoSection3.2);ateveryedgeweightupdate,itout-putsinformationregardingsubgraphsthatbecame,orstoppedbe-ingoutput-dense.Iftheedgeweightupdatewasnegative,onlysomeindexmaintenanceneedstobedone(line2).Otherwise,somestable-densesubgraphscontainingaand/orbarefurtherex- Algorithm2Procedureexplore(C;i) Input:SubgraphC.Iterationnumberi1:ifCwasnottoo-densebeforetheupdateandid iteandjCjNmaxthen2:ifCistoo-densethen3:forally=2Cdof//Explore-Allg4:AddC[fygtotheindex;reportitifitisoutput-dense5:explore(C[fyg;i+1)6:else7:forallneighborsyofCdo8:ifC[fygisnewly-densethen9:AddC[fygtotheindex;reportitifitisoutput-dense10:explore(C[fyg;i+1) amined(lines4-11).Notethat,toensurecorrectness,alsothesub-graphfa;bgmaybeexamined,evenifitwasnotpresentintheindex(basecaseinline4).Subgraphsintheindexcontainingonlyoneofa;barecheap-explored,ifneeded6(line6).Subgraphsintheindexthatcontainbothaandb,aswellasnewly-densesubgraphspreviouslyidentied,aresubsequentlyex-plored(line11)-i.e.DYNDENSwilltrytoaugmentthemwithaneighboringnode(wedeferdiscussingtheprecisedetailsonhowthisisdoneefcientlytoSection3.2).Thiswillberecursivelyre-peatedonanynewly-densesubgraphsdiscovereduptod itetimes(thetheoreticalresultsthatenablethisboundingarediscussedinSection4).Ahigh-leveldescriptionoftheexplorationprocedureisshowninAlgorithm2.Algorithm2willrstensurethatthesubgraphshouldbeex-plored.Specically,thesubgraphshouldnothavebeentoo-densebeforetheupdate(line1),forotherwiseitsdensesupergraphswouldhavebeenstable-dense,andhencealreadyidentied.Moreover,aspreviouslymentioned,DYNDENSwillnotexplorearoundanysub-graphmoretimesthannecessary.Finally,inafewcases,exploredsubgraphswillneedtobeaugmentedwitheveryothervertex,notjustneighboringones(Explore-All;line2).Asthelatterisacostlyprocedure,inSection3.2.3wewillpresentawaytomitigatetheassociatedcost.Executionexample.ToillustratetheworkingsofDYNDENS,letusexamineasimpleexampleofitsexecution.ConsiderthesampleentitygraphofFigure2(a),andassumeanAVGWEIGHTdenitionofdensity(i.e.thedensityofasubgraphisitsaverageedgeweight),adensitythresholdofT=1,andamaximumde-siredsubgraphcardinalityofNmax=4.Assumethatithasbeensetto0:15,sothatthethresholdsTn,forsubgraphsofcardinalityntobeconsidereddenseareT2=0:9;T3=0:975andT4=T=1(cf.Section4.2fordetails).Thus,thedensesubgraphsforthisgraphareshowninFigure2(b)(output-densesubgraphsareem-phasized).Finally,assumethattheweightofedge(1;2)isupdatedfrom0:8to0:95(=it=0:15).LetusexaminehowDYNDENSwillhandlethisupdate;tofacilitatethisdiscourse,thenewly-densesubgraphsthatareinsertedintotheindexareshowninthebottomhalfofFigure2(b).Atahighlevel,DYNDENSwillexaminef1;2g,aswellasalldensesubgraphscontainingvertex1and/or2(Algorithm1,line4),i.e.f1;3g,f1;4g,f2;3g,f2;4g,f1;3;4g,f2;3;4g.f1;2gwill 6Forinstance,subgraphsthatweretoo-denseneednotbeexplored,as,bydenition,theirdensesupergraphswouldhavebeenstable-dense,andhencealreadyidentied.Moreover,thisstepcanalsobeskippedinothercircumstances,cf.Section6fordetails. (a)Entitygraph Subgraph Density output- dense? Dense,beforeupdate 1,3 1.0 Y 1,4 1.0 Y 2,3 1.1 Y 2,4 1.0 Y 3,4 1.0 Y 1,3,4 1.0 Y 2,3,4 1:0 3 Y newly-dense,afterupdate 1,2 0.95 N 1,2,3 1:01 6 Y 1,2,4 0:98 3 N 1,2,3,4 1:008 3 Y (b)DensesubgraphindexFigure2:Executionexamplebeaddedtotheindex(Algorithm1,line10),andwillbeexplored(line11).Itsexplorationwillentailtheadditionofnewly-densesubgraphsf1;2;3gandf1;2;4gtotheindex(Algorithm2,line8);theformerwillalsobereportedasoutput-dense.Since it=1,thesenewly-densesubgraphswillnotbefurtherexplored(Algo-rithm2,line10andline1).Moreover,duringthisexplorationsub-graphf1;2;5gwillbeexamined,butasitsdensityislessthanT3,itwillnotbeaddedtotheindex.DYNDENSwillalsocheap-exploresubgraphsf1;3g,f1;4g,f2;3g,f2;4g(Algorithm1,line6).Thiswillresultinsubgraphsf1;2;3g,f1;2;4gbeingexamined(twice)(Algorithm1,line7);astheyarealreadypresentintheindex,thiswillnotaffectanything.Moreover,DYNDENSwillattempttoexplorethesesubgraphs(Al-gorithm1,line8);however,since it=1,theywillnotbeex-plored(Algorithm2,line1).Finally,DYNDENSwillcheap-exploresubgraphsf1;3;4gandf2;3;4g.Therstcheapexplorationwillresultinnewly-densesubgraphf1;2;3;4gbeingaddedtotheindex,andreportedasoutput-dense(Algorithm1,line7);thesecondonewillrevisitthissubgraph,anddonothing.Moreover,inbothcases,sincejf1;2;3;4gj=4Nmax,thesesubgraphswillnotbeexplored(Algo-rithm2,line1).Observation:Fromthesimpliedexecutionexamplepresentedabove,onecanobservethatDYNDENS(ascurrentlypresented)canendupperformingredundantcomputations;e.g.somesub-graphsareexaminedunnecessarilymanytimes.Subsequently,inSection3.2.2andSection6,wediscusshowtoreducesuchunnec-essarycomputations.3.2ImplementationConsiderationsHavingpresentedDYNDENSatahighlevel,letusnowseesomeimportantconsiderationsthatarisewhenimplementingitinprac-tice.WerstintroducetheunderlyingindexingstructureusedbyDYNDENSinSection3.2.1;thisindexalsoenablesDYNDENStoavoidredundantcomputations(Section3.2.2)aswellasthecostlyoperationofexplore-all(Algorithm2,line2cf.Section3.2.3).3.2.1IndexDYNDENSrequiresanefcientindexforboththeevolvinggraphitself,aswellasfordensesubgraphs.Forthegraphindex,main-tainingnodeadjacencylistsissufcient(i.e.amapping8u2V:u!~u);thisalsoenablestheefcientexplorationofasubgraph Figure3:Densesubgraphindex(viamergingtherelevantadjacencylists7).Thedensesubgraphindexismoreinterestingtoexamine,asitneedstoefcientlysupportseveralfunctionalities.Tonameafew:foreverydensesubgraph,accesstoitsvertices,cardinalityandden-sity;insertion,updateanddeletionofdensesubgraphsfromthein-dex;iterationoveralldensesubgraphscontainingverticesaorb,whereeachsubgraphmustbeaccessedexactlyonetime(neededforpositiveedgeweightupdates);andforagivendensesubgraphC,andagivenvertexu,accesstosubgraphC[fug,andinsertionofC[fugintotheindexifitisnotalreadypresent(neededforexplo-ration).Moreover,asDYNDENSneedstoperformfrequentrandomaccessesondensesubgraphs,theindexneedstobein-memory,somaintainingalowmemoryfootprintisimportant.Asmostdensesubgraphswilltendtohavehighoverlap,thedensesubgraphindexshouldminimizetheamountofredundantinformationstored.ToaddresstheserequirementsposedbyDYNDENS,wepro-posethefollowingin-memoryindex.Eachsubgraphhasauniqueidcorrespondingtoitslocationinmemory;itisalsorepresentedbyits(sorted)setofvertices.DYNDENSwillmaintainapre-xtreeofdensesubgraphs,illustratedinFigure3.Eachnodeintheprextreecontainspointerstoitschildren,indexedbyver-texid,apointertoitsparent,aswellasinformation(suchascar-dinalityanddensity)onthedensesubgraphitrepresents,ifap-plicable.Figure3showsaviewoftheindexwhensubgraphsf1;3g;f1;3;4g;f1;3;5g;f3;4;5g;f4;5garedense(ignorenodelabeledfornow),alongwiththedensityofeachsubgraph.Additionally,toenableeffectiveiterationoverdensesubgraphscontainingoneortwogivenvertices,DYNDENSwillalsomaintaininvertedlists,i.e.amappingfromverticesto(pointersto)allsub-graphscontainingavertex.Todecreasethesizeofinvertedlists,theinvertedlistofavertexuwillonlycontaintreenodeswherethelex-icographicallylargestvertexisu.Thus,inordertoiterateoverallsubgraphscontainingu,DYNDENSwilliterateoverallsubgraphsinitsinvertedlist,andtheirtreedescendants.Furthermore,tofa-cilitateinvertedlistmaintenance,invertedlistsareimplementedaslinkedlistsofprextreenodes(showninFigure3asdashedar-rows).Invertedlistsareupdatedwheneveranewnodeiscreated,orwhenaleafnodeisdeleted.Moreover,ifthedeletionofaleafnoderesultsinitsparenthavingnochildren,andrepresentingnodensesubgraph,theparentwillberecursivelydeleted.Ourproposeddensesubgraphindexefcientlyaddressesthere-quirementsofDYNDENS.Specically,theprextreeenablesDYN-DENStoreduceitsmemoryfootprint,bynotstoringredundantlymanyoverlappingdensesubgraphs.Moreover,lookingupC[fug 7Specically,whenexploringsubgraphC,DYNDENSwillcom-pute~C=Pv2C~v;foreveryvertexu=2C,thescoreofC[fugcanbecomputedasscore(C[fug)=score(C)+~C^eu. isO(jCj+1)inallcases(andO(1)ifvertexuislexicographicallygreaterthananyothervertexinC);afteralook-up,updateorinser-tionintotheindexisO(1).EnumeratingtheverticesinasubgraphCisO(jCj),viaparentpointertraversal.DeletingasubgraphCfromtheindexisO(numberofleafnodesdeleted);thisistypicallyO(1)andatworstO(jCj),duetothedesignoftheprextreewithembeddedinvertedlists.3.2.2AvoidingredundantcomputationBesidesefcientlyprovidingtherequisitefunctionalityforDYN-DENS,ourproposeddensesubgraphindexcanalsobeused(i)toensurethatsubgraphsthatweredensebeforetheupdateareexam-inedexactlyonce(requiredforthecorrectnessofDYNDENS),and(ii)togreatlyreducethenumberofnewly-densesubgraphsexam-inedmorethanonce(withoutsacricingcorrectness).Theformer(i)canbeguaranteedbyxingtheorderinwhichdensesubgraphsareexamined.Specically,ifsubgraphscontain-ingverticesaand/orbneedtobeexamined,andassumingab(lexicographically),DYNDENSwilltraversethesubtreesofallin-dexnodesontheinvertedlistcorrespondingtob.Subsequently,itwilltraversethesubtreesofindexnodesontheinvertedlistcorre-spondingtoa,stoppingthetraversalwheneverabnodeisencoun-tered.Thisprocedureisaidedbyagsthataresetonaper-indexnodebasis,tohelpDYNDENSdistinguishnewly-densesubgraphsintheindex.Forthelatter(ii),weleveragethetheoreticalresultthatallnewly-densesubgraphscanbeidentiedinatmostd iteexplorationit-erations(Section4).Uponinsertionintotheindex,densesub-graphsareannotatedwiththeexplorationiterationatwhichtheywereidentied(iinAlgorithm2);theseannotationspersistuntiltheendofAlgorithm1.Algorithm2willoperateasaboveforsub-graphsnotannotatedwithaniterationnumber,orannotatedwithaniterationnumbergreaterthanthecurrenti.Otherwise,thesub-graphdoesnotneedtobefurtherexamined.3.2.3Implicitrepresentationoftoo­densesubgraphsHavingintroducedthedensesubgraphindexusedbyDYNDENS,letusrevisitachallengeposedbythepresenceoftoo-densesub-graphs,andshowhowtheindexcanbeleveragedtoovercomeit.Recallthatasubgraphistoo-denseiff,afteraddinganyothervertextoit,itisstilldense.Thus,whenexploringatoo-densesubgraph,DYNDENSneedstoconsideritscartesianproductwiththeentiresetofverticesV,resultinginjVjdensesubgraphinser-tionsintotheindex(explore-all,Algorithm2,line2).Thisisaverycostlyprocedure;unsurprisingly,itwasexperimentallyfoundtodominateallotherprocessingcosts,incaseswheretoo-densesubgraphsexisted(cf.Section5.1).Toavoidthiscost,weproposeamodicationtothedensesub-graphindex,whichwetermIMPLICITTOODENSE.Atahighlevel,itentailstheimplicitrepresentationofsupergraphsoftoo-densesubgraphs,sothatexplore-allwillonlyexamine/insertintothein-dexasmallnumberofdensesubgraphs.Specically,weintroduceactitiousvertexnamed,whichislexicographicallylargerthanallothervertices.Foreverytoo-densesubgraphC,theindexwillstoreasubgraphC[fg,representingallC[fygwhereyisavertexdisconnectedfromC;thesesu-pergraphsofCwillnotbeexplicitlyinsertedintheindex.Giventhisconvention,DYNDENSwillhandletheexplore-allprocedureofasubgraphCthatjustbecametoo-densebynormallyexplor-ingallneighborsofC(asinAlgorithm2,line7),andinsertingthesubgraphC[fgintotheindex.Forinstance,revisitingFig-ure3,assumesubgraphf1;3gistoo-dense.Ratherthanexplor-ing,andinsertingintotheindexallitsdisconnectedsupergraphsf1;3;6g;f1;3;7g;;f1;3;jVjg,DYNDENShasonlyinsertedanoderepresentingf1;3;g.IntheunlikelyeventC[fgneedstobeexploredatanytime(correspondingtotheexplorationofallsupergraphsofCaugmentedwithonedisconnectedvertex),DYNDENSwilltryinsteadtoaug-mentCwithalledgesinthegraphthatarenotincidentonC.BecauseeveryvertexaispotentiallycontainedinC[fg,when-everaniterationisperformedontheindex(Algorithm1,line4),theinvertedlistcorrespondingtoneedstobeexaminedaswell.Thisinvertedlistalsoneedstobemaintainedduringnegativeedgeweightupdates,ifasubgraphstopsbeingtoo-dense.Finally,notethatwheneverdealingwithasubgraphrepresentedbyaindexentry,DYNDENSalsoneedstoensurethatthesubgraphisnotex-plicitlyrepresentedelsewhereintheindex,whichis,however,averyefcientoperation.Asweverifyexperimentally(Section5.1),theaboveIMPLICIT-TOODENSEmodicationtotheindexofferssignicantperformancebenetstoDYNDENS.4.THEORETICALRESULTSHavingintroducedourproposedDYNDENSalgorithm,inthissectionweelaborateonitstheoreticalunderpinnings.Werstproveitscorrectness,byderivingaboundonthenumberofexplorationiterationsthatarerequired,asafunctionofthemagnitudeoftheedgeweightupdateperformed(thisisthebasisofDYNDENS,cf.Algorithm2,line1).Specically,Section4.1presentsageneralresult,onwhenasingleexplorationiterationperstable-densesub-graphissufcient.Section4.2providesaconcreteinstantiationforTn(recallthatTndeterminestherelationshipbetweendenseandoutput-densesubgraphs),basedonwhichthedesiredboundisthenobtainedinSection4.3.Duetospaceconstraints,detailedproofs,andresultspertainingtothecomplexityofDYNDENSareomitted;thesecanbefoundin[4].Formalization:ThenotionofexplorationiterationsperformedbyDYNDENShasbeenusedthroughoutitsdescription;beforepre-sentingtheoreticalresultsonthem,thiswouldbeagoodopportu-nitytoformalizethisnotion.LetCA=fC[fbgjCV^a2C^b=2C^Cisstable-densegbethesetofgraphsconsistingofastable-densesubgraphcontain-inga,augmentedwithb(similarly,letCB=fC[fagjCV^b2C^a=2C^Cisstable-denseg).LetC0=CA[CB;thisisthesetofallsubgraphsthatwillbeexaminedviacheap-explorationonly.LetCAB=fC[fygjCV^a;b2C^Cisstable-dense^yisaneighborofsomenodeinCgbethesetofgraphsconsistingofastable-densesubgraphcontainingaandb,augmentedwithsomeothernode;thisisthesetofallsubgraphsthatwillbeexaminedviaasingleexplorationiteration.LetC1=C0[CAB;thisisthesetofgraphscontainingaandbthatconsistofastable-densesubgraph,augmentedwithonenode.Fori�1,letCi=fC[fygjC2Ci1^Cisnewly-dense^yisaneighborofsomenodeinCg.Ciisthesetofgraphscontaininganewly-densesubgraphthatcontainsa;b,andisdiscoverableafteriexplorationiterations.4.1WhenisaSingleExplorationSufcient?Letusnowprovideasufcientconditionforallnewly-densesubgraphsCofcardinalityjCj=n3tocontainastable-densesubgraphofcardinalityn1.Specically,itissufcientthat:(n2)(n1)(gnTngn1Tn1)(1) (recallthatgn=Sn n(n1),andthatthepropertiesofTnguaran-teethattheaboveboundonisstrictlypositive)Proofsketch:(pigeonholeargument)Ifalln1subgraphsofCweresparsebeforetheupdate,thenthecontributionsofeachvertexinCtodens-(C)shouldbelarge.Hence,Cmustbeverydense.However,Cwassparsebeforetheupdate.Thus,theupdatemusthavebeenverylarge.Iftheupdateisnotverylarge,thentherewillexistann1subgraphthatwasdensebeforetheupdate.Corollary:Then1subgraphofCthatwasdensebeforetheupdatewilleithernotcontainoneofaorb(soaugmentingitwiththatvertexwillyieldC),oritwillcontainbothaandb.Consequently,forvaluesofnwhereEquation1holds,allnewly-densesubgraphsofcardinalitynwillbecontainedinCA[CB[CAB=C1.4.2InstantiatingTnBasedontheformofEquation1,letusnowproposeacon-venientinstantiationforTn,thatwillsatisfytherequisitemono-tonicityproperties,whilegreatlysimplifyingtheboundswesub-sequentlyderive,thusprovidingadditionalintuitions.Specically,theinstantiationofTnthatwillbeusedthroughoutthisworkis:Tn=1 gngNmaxT+itn2 n1Nmax2 Nmax1(2)whereitisatunableparameter.NotethatthisisareasonablevalueforTnfromamaintenanceperspective;forinstance,ifSn=n,thenTn=(n1)T2+(n2)it=n1 Nmax1(T+it)it=O(n),whileifSn=n(n1),thenTn=T2+(11 n1)it=Tit(1 n11 Nmax1)=O(1).Importantly,thisinstantiationresultsinamuchsimpliedformofEquation1,specicallyit.Inthefollowing,wewilllever-agethisfact,toobtainaboundonthenumberofexplorationitera-tionsthatDYNDENSneedstoperform.Moreover,forourproposedtechniquestobemeaningful,itmustbethecasethatTn&#x-284;&#x.667;&#x-284;&#x.667;08n2f2;;Nmaxg.This,alongwiththeabovesimpliedformofEquation1,leadstothefollowingva-lidityrangeforit:it2(0;SNmaxT Nmax(Nmax2)).Thelowerboundwouldcorrespondtomaintainingthesmallestpossiblenumberofsubgraphs,andtheupperboundtomaintainingmostsubgraphs(specically,allsubgraphsofcardinalityNmax,andmostsub-graphsoflowercardinalities)-realisticallyspeaking,oneshouldnotsetittoanyvalueclosetoitsupperbound.4.3BoundingtheNumberofIterationsWearenowabletoextendEquation1,tocaseswhere�it.Specically,wewillshowthatallnewly-densesubgraphsofcar-dinalitynarecontainedinC0[C1[Cd ite,thusinor-dertocomputeallnewly-densesubgraphs,itissufcienttoex-plorearoundstable-denseandnewly-densesubgraphscontainedinC0[C1[[Cd ite.Proofsketch:Anupdateofmagnitudeisequivalenttod iteupdatesofmagnitudeuptoit;furthermore,re-exploringstable-densesubgraphswillnotyieldanynewdensesubgraphs,thusonlynewly-densesubgraphswillneedtobeexploredsubsequently.Discussion:Aswitnessedfromtheaboveresult,themagnitudeofisdirectlycorrelatedwiththeimpactondensesubgraphs.Ausefulanalogyisthatofanedgeweightupdateasaperturbation:thegreateritsmagnitude,thefurtherawayinthegraphitseffectscanbepotentiallyfelt(i.e.thefurtherawaydensesubgraphswillneedtobeexplored).Inthiscontext,parameteritoffersatunablespace-timetrade-off.Bysettingittohighervalues,moredensesubgraphswillbemaintained,butfewerexplorationiterationswillberequiredperedgeupdate.Bysettingittolowervalues,thespaceoverhead(i.e.thenumberofdensesubgraphsmaintainedthatarenotoutput-dense)canbemademinimal:nearly0forAVGWEIGHT,andcom-parabletoanofineapproachotherwise8.Consequently,selectinganoptimalgoodvalueforitisdata-dependent;inpractice,weob-servethatDYNDENSperformswellforawiderangeofitvalues.5.EVALUATIONLetusnowdiscusstheexperimentalvalidationofourtechniques.Wewillrstbrieygoovertheexperimentalsetup.InSection5.1wewillpresentexperimentalevidenceforthefeasibilityofreal-timestoryidenticationviaENGAGEMENT,aswellasthescala-bilityofourproposedapproach.WewillalsoexaminethemainfactorsthatcontributetotheefciencyofDYNDENS.Aswehaveseenthroughoutthiswork,thereisalackofexistingtechniquesforefcientlyaddressingENGAGEMENT.Nevertheless,inSection5.2weevaluateadaptationsofrelevanttechniquestothisproblem,soastohaveabasisforcomparison.Finally,althoughefciencyhasbeenourmainfocusinthiswork,inSection5.3wepresentsomequalitativeresultsthathighlighttheeffectivenessofourapproach.Experimentalsetup:Allalgorithmsevaluatedwereimple-mentedinJava,andexecutedon64-bitHotspotVM,onamachinewith8Intel(R)Xeon(R)CPUE5540coresclockedat2.53GHz.Inourexperiments,onlyonecorewasused,andthememoryusageoftheJVMwascappedat25GofRAM(theactualmemoryconsump-tionwastypicallylower).Finally,inallperformanceexperiments,thetimereportedisthemediantimeof3identicalruns.Datasets:Unlessotherwisenoted,allourexperimentswererunusingreal-worlddatasets,basedonasampleofalltweetsforMay1st,2011(Ourdatasetconsistedof13.8Mtweets.ThesamplingwasperformedbyTwitteritself,aspartoftherestrictedaccesspro-videdtoitsdatastream;fordetailscf.tinyurl.com/twsam).Fromthese,weremovednon-Englishtweets,andtweetsthatwerelabeledasspam(usinganin-housetweetspamlter[24]),resultingin3.8Mtweets.Subsequently,weusedanin-houseentityextrac-tor[3]toidentifymentionsofreal-worldentities(suchaspeople,politicians,products,etc).76.5%ofthetweetsdidnotmentionanyentityofinterest;18.3%mentionedone;4.3%mentionedtwo,andunder1%mentionedthreeormoreentities.Theentireproceduretookunder1h20'(under350secpertweetonaverage).Measuringcorrelation:Giventhesesetsofco-occurringenti-ties,therearemanywaysinwhichentityassociationcanbemea-sured;ourtechniquesareequallyapplicable,irrespectiveofthemeasureused.Forourevaluation,weselectedtwomeasuresfromtheliteraturethatwefoundtoyieldmeaningfulresultsunderdi-versecircumstances:acombinationofthe2measureandthecorrelationcoefcientinspiredby[5](weighteddataset),thathasbeenfoundtobehighlyeffectiveinidentifyingstoriesintheblogo-sphere,aswellasathresholdedvariantofthelog-likelihoodratio[26](unweighteddataset)thathasbeensuccessfullyusedtoiden-tifystoriesinGrapevineoveranextendedperiodoftime.Ingen-eral,wenotethatanymeasurethatmeasuresstrengthofpairwiseassociation,basedonentityoccurrencesandpairwiseco-occurrencescanequallybeusedbyourtechniques.Identifyingemergingstories:Sincethegoalofourtechniques 8Allexactofineapproaches,tothebestofourknowledge,utilizesomeformofagrowthproperty,henceneedtocomputeasmanysubgraphsasDYNDENSwithit'0 istoidentifystoriesinreal-time,i.e.“storieshappeningnow”,amechanismfordiscountingolderstoriesisrequired.Toachievethis,wemodifyourmeasuresofcorrelation,byapplyingexponen-tialdecaytoallentityoccurrencesandco-occurrences;forinstance,inourexperimentsweusedameanlifeforatweetof2hours.Notethatourtechniquesareequallyapplicablewithoutapplyinganydecay,butthestoriesidentiedwouldthencorrespondto“cu-mulativestoriestodate”(cf.Table3showingstoriesfortheentireday)asopposedto“currentemergingstories”(cf.onlinedemowww.onthegrapevine.ca/now.jsp).Approximatingcomplexassociationmeasures:Finally,formanymeasuresofassociation(e.g.statisticalmeasures,suchasthelog-likelihoodratio),theappearanceofadocumentwithjustasingleentity,caninuencetheweightofalledgesinthegraph(e.g.thelog-likelihoodratioofapairofentitiesisafunctionofthenumberofdocumentsthathaveappearedtodate).Thiswouldposeasignicantchallengetoincrementalcomputations;toovercomeit,wemakeuseofthefollowingapproximation,thatisapplicabletoanymeasure:theweightofanedgeconnectingentitiese1;e2iscomputedbyignoringalldocumentsthathaveappearedafterthelatesttimethateithere1ore2appearedinsomedocument.Intuitively,thiswillnotsignicantlyaffectedgesconnectingpop-ularentities;indeedweobservedthatinpracticetheresultingdropinprecisionentailedbythisapproximationwasfairlylow9.Impor-tantly,thisapproximationenablesus,afterobservingadocumentthatmentionsentitiese1;;ej,toonlyupdatetheweightsofedgesthatareincidenttoatleastoneoftheseentitiesareupdated,i.e.onlytheweightsofedgesf(ei;X)ji2f1;;jg;X2Vgwillbeupdated.Takingtheaboveintoaccount,theprecisemannerinwhichourexperimentaldatasetswerecreatedisasfollows.Foreverytweetwhereatleastoneentitywasidentied,en-tityoccurrencesandco-occurrenceswereupdated(takingexpo-nentialdecayintoaccount,withameantweetlifeoftwohours).Thereafter,inthecaseoftheweighteddataset,the2andcorre-lationcoefcientofsaliententitypairswasupdated;theupdatededgeweightwascomputedasmax(correlationcoefcient;0)if2showedsignicantcorrelation(p5%),and0otherwise.Thisprocedureresultedin952Kpositiveand40.5Mnegativeedgeweightupdates(recallthatthelatterareverycheaptoprocess).Inthecaseoftheunweighteddataset,thelog-likelihoodratioofsaliententitypairswasupdated.Twoentitieswereconnectedwithanedgeiffeachentityappearedinatleast5tweets,andlog-likelihoodshowedsignicantcorrelation(p1%).Thisproce-dureresultedin43Kpositiveedgeweightupdates(edgeadditions),and41Knegativeones(edgeremovals).Ineithercase,thissteptookunder90secondsfortheentireday.Thestreamsofedgeweightupdateswereloadedtomemorybe-foreinitiatingourexperiments,andtheupdateswereprovidedtoDYNDENSsequentially,andin-memory.ThisreectstheexpectedusageofDYNDENS,astheedgeweightupdatesthatconstituteitsinputwilltypicallybegeneratedbyanotherprocessinreal-time.Alltimesreportedcorrespondtothetimerequiredtoprocessalledgeweightupdatesresultingfromadataset,whilemaintainingoutput-densesubgraphsaftereachupdate.Specically,theydonot 9Specically,wemeasuredtheerrorentailedbythisapproxima-tion,i.e.theabsolutedifferenceoftheapproximatedvalueofeachedgeweight,minustheactualvalueofthecorrelationmeasure,foralledges,at100uniformlydistributedtimeinstants.Themedianerroroveralledgeswasinvariably0;theaverageabsoluteerroroveralledgesandalltimeinstantswas0.0003fortheweighteddataset,and0.002fortheunweightedone,andtheaveragerelativeerrorwas10%and6%respectively.includethetimerequiredtopreprocessthedataset(e.g.entityex-traction,correlationcomputation),nordotheyincludethexedinitializationcostsofDYNDENS(suchasJVMinitializationandinitializationofnecessaryindexingstructures).Itisworthnoting,however,thatthethroughputofDYNDENScanmorethanmatchthestreamrate,evenafterfactoringinallpreprocessingsteps(intotal,theoverheadforallpreprocessingandexecutionofDYN-DENSforourdatasetofonedaywasgenerallyunder90minutes;moreoverthemostcostlypreprocessingsteps-i.e.namedentityextraction-areinherentlyparallellizable).5.1EfciencyandScalabilityLetusnowexaminesomeofourexperimentalndings.Fig-ures4(a)-4(d)showthetimerequiredtoprocessallupdatesfromeitherdataset,foravarietyofdenitionsofdensity(experimentsinvolvingadditionaldensityfunctionscanbefoundin[4]),andforawiderangeofvaluesofdensitythresholdT,maximumdensesubgraphcardinalityNmax.Inthesegures,ithasbeensetto1%ofitsmaximumvalue,giventhevaluesoftheotherparameters(thusthenumberofmaintaineddensesubgraphsistypicallyclosetothenumberofoutput-densesubgraphs).Allrunswerecappedat10minutes(runsthattooklongerthanthatwereterminated);allguresarecroppedtoexcludesuchtime-outs10.WeobservethatDYNDENSisabletoveryefcientlyprocesslargedatasets,acrossawiderangeofusefuloperatingparame-ters,validatingitsapplicabilityforefcientlyaddressingENGAGE-MENT.Thechosenparametersrangefrominstanceswithnone,oronlyafewoutput-densesubgraphs,toinstanceswithtoomanyoutput-densesubgraphs(inthethousands);i.e.theextremalparam-etervaluescorrespondtoinstancesoflesspracticalinterest.Inter-estingly,onecanobserveasharpincreaseinperformancebeyondcertainvaluesofparametersTandNmax.Thisisduetotheen-suingsharpdropintheaveragenumberofoutput-densesubgraphs.Forinstance,withreferencetoFigure4(c),theaverage11numberofoutput-densesubgraphsofcardinalityatmost6,forT=1is3.4K;forT=0:8itis13.4K;whileforT=0:7itisover52K.Similartrendscanbeobservedintheotherguresaswell;cf.[4].HavingdiscussedthescalabilityandefciencyofDYNDENS,letusnowturntoevaluatingitsinnerworkings.Firstly,letusexaminetheeffectsoftheitparameter.Recallthat,lowvaluesofitcor-respondtoDYNDENSmaterializingfewerdensesubgraphs,and,correspondingly,havingtoperformpotentiallymoreexplorations.Inourexperiments,wefoundourtechniquestoperformequallywellforawiderangeofvaluesofit;however,selectingavalueforit,basedoncharacteristicsofthedatasetcanbebenecialtoperformance.InFigure4(e),weshowthetimetakenbyDYN-DENStoprocesstheunweighteddataset(notethesemilogscale),forNmax=10andAVGWEIGHT,acrossallpossiblevaluesforit(shownnormalizedtoitsmaximumvalueforeachthreshold).Weobserveaninterestinglocaloptimumwrt.it,arisingfromthetradeoffofhavingtomaterializemoresubgraphs,whileenablingfasterupdates;i.e.increasingitimprovesperformance,uptoapointwheretheadditionaldensesubgraphsthatneedtobemain-tainedmakethisaperformancedrain.Forinstance,thispointisaround0.2forT=0:8,around0.1forT=0:9,andaround0.6forT=1.Itisalsointerestingtonotethatthistradeoffcomesinto 10Theonlydatapointsthathadterminatedrunsareoutsidethedis-playedrange;theseinstanceshadtoolargeanumberofoutput-densesubgraphs,asaresultofunrealisticvaluesforT;Nmaxand/orit,andwerenotexpectedtonishinareasonabletime11Averagedoverallupdates,andexcludingoutput-densesubgraphsthatarenotrepresentedintheindex,e.g.mosttoo-densesubgraphs,augmentedwithanon-neighboringnode(cf.Section3.2.3). (a)AVGWEIGHT,weighted (b)AVGDEGREE,weighted (c)AVGWEIGHT,unweighted (d)AVGDEGREE,unweighted (e)Effectsofit,unweighted (f)RecallofGRASP,unweighted (g)PerformanceofGRASPrela-tivetoDYNDENS,unweighted (h)Effectofheuristics,syntheticFigure4:ExperimentalevaluationplayagainforT=1andhighit.AswepreviouslysawinSection3.2.3,IMPLICITTOODENSEiscruciallyimportantforDYNDENStooperateefciently,inthepresenceoftoo-densesubgraphs.Wevalidatedthisintuitionexper-imentally,byexecutingavariantofDYNDENSthatdidnotmakeuseofIMPLICITTOODENSE,ontheweighteddataset,andcompar-ingitsruntimetothatofDYNDENS.Weexperimentedwithexe-cutionparameters(Nmax2f9;10g;T2[0:44;0:5]andwithitbetween1%and50%ofitsmaximumvalue,giventhevaluesoftheotherparameters.Invariably,thevariantwithoutIMPLICITTOOD-ENSEtooklongerthan20minutestocomplete(andwaskilledafter20minutes,intheinterestsofbrevity),whileDYNDENStook40-85secondstocomplete.5.2ComparisonwithOtherTechniquesAswehavealreadydiscussedthroughoutthiswork,tothebestofourknowledge,priortoDYNDENS,notechniqueshavebeenpro-posedforefcientlyaddressingENGAGEMENTinitsgeneralform.Thus,inordertohaveabasisforcomparison,inthissectionweevaluateadaptationsofrelevanttechniquestosubsetsofENGAGE-MENT,namelythedynamicmaximalcliquealgorithmproposedin[27](STIX),theGreedyRandomizedAdaptiveSearchProce-dureusedtoidentifylargequasi-cliquesin[1](GRASP),aswellasabaselineefcientofineprocedurethatperiodicallyrecom-putesallAVGWEIGHTdensesubgraphs(BASELINE).Wewishtostressthat,byitsverynature,thesecomparisonsarenotfair,asthegoalsoftheaforementionedtechniquesareentirelydifferentfromthoseofENGAGEMENT,whilesaidtechniquesarenotasgeneralasDYNDENS.Letusrevieweachcomparisonindetail.TheSTIXalgorithm[27]identiesallmaximalcliquesindynamicunweightedgraphs.ThisissimilartoENGAGEMENTforT=1,AVGWEIGHTandunweightedgraphs,butsubtlydifferent,inthatENGAGEMENTre-quirestheidenticationofallcliques.RecallthattheoutputofENGAGEMENTwillbeusedtopresentstoriestoahumanuser,thusthesubgraphsproducedcannotbetoolarge.IfSTIXwereusedtoaddressENGAGEMENT,andamaximalcliqueofcardinalitye.g.20wereidentied,allitssubgraphsofcardinalitye.g.5orlesswouldneedtobeenumerated,andprovidedasoutput.Keepinginmindthecaveatsabove,weimplementedSTIXus-inganefcientin-memoryhash-basedindex12,andexecuteditontheunweighteddataset,measuringitsexecutiontime,andignor-ingthetimethatwouldbeneededforenumeratingallsubgraphsofmaximalcliques.WecomparedthisruntimetoDYNDENSwithAVGWEIGHT,T=1(soastohaveabasisforcomparison),Nmax=5,13andsetittohalfitsmaximumvalue,giventhevaluesoftheotherparameters.EventhoughacomparisonofSTIXandDYNDENSisentirelyarticial,theruntimeofSTIXandDYNDENSwereroughlyequal:STIXtook958secondstoprocessthedataset,comparedto936secforDYNDENS.DYNDENSperformedevenbetterforlowerNmax,andtookmoretimeforhigherNmax.Thus,weconcludethatDYNDENSisbestsuitedtoapplicationsofENGAGEMENT,whileSTIXispreferableforapplicationsthatrequireidentifyingmaximalcliquesinunweightedsubgraphs.LetusnowreviewthecomparisontoGRASP,proposedin[1].Thisisanapproximaterandomizedalgorithmforidentifyinglargedensesubgraphsinunweightedgraphs.While[1]hassignicantlymoregeneralcontributions,forthepurposesofthisdiscussion,thealgorithmproposedthereincanbeusedtoidentifysubgraphswithdensityoveragiventhresholdT,underAVGWEIGHT,inun-weightedgraphs.GRASPwillnotnecessarilyidentifyalldensesubgraphs,butcanbeexecutedmultipletimesperupdate,toiden-tifyanincreasinglylargernumberofsuchsubgraphs.Itisim-portanttonotethat,again,thecomparisonwithDYNDENSisnot 12[27]doesnotprovideindexingdetails,soweoptedforanefcientsolution,albeitwithhighmemoryconsumption.Wealsoexperi-mentedwithanadaptationofSTIXthatusedourproposedindex,whichhasmuchlowermemoryrequirements,butthisinvariablyresultedinincreasedruntimeforSTIX.13Sincethegoalisstoryidentication,wesetNmaxtoalowvalue,correspondingtostorycardinalitiessuitableforhumans. Table3:Topstories,May1st2011 Pres.ObamaannounceskillingofOsamabinLadeninvolving:BarackObama,U.S.HousePermanentSelectCommitteeonIntelligence,OsamabinLaden,NBCNews CommentaryondeathofbinLaden,comparisontofamousathletesinvolving14:BarackObama,LeBronJames,DelonteWest,OsamabinLaden DiscussionsonLadyGaga'sactivitiesinvolving:LadyGaga,Galeria Libyacrisis:NATOAirstrikeresultsindeathof3grandchildrenofGaddainvolving:NATO,Libya DiscussionsonHarryPotterinvolving:HermioneGranger,DracoMalfoy,BellaSwan NewsonOsamaBinLaden'sDeathSpreadsOnTwitterinvolving15:ClintEastwood,BarackObama,U.S.HousePermanentSelectCommitteeonIntelligence,OsamabinLaden,CBSNews straightforward,asGRASPisgearedtowardsidentifyingafewlargedensesubgraphs,asopposedtoalldensesubgraphs.Nevertheless,weimplementedGRASP,usinganefcienthash-basedin-memoryindex16.Wesettheparameter thatcontrolsitsgreedinessvs.randomnesstradeoffto0:5,afterensuringthisdidnotresultinanysignicantperformancedifferences17.Weex-ecutedGRASPontheunweighteddataset,foravaryingnumberofiterationsperedgeweightupdate(moreiterationsmeanhigherrun-time,andahigherlikelihoodofidentifyingmoredensesubgraphs),andmeasureditsruntime,andrecall(fractionofoutput-densesub-graphsthatitidentied,excludingdisconnectedsubgraphs,whichitdoesnotproduce).WelimitedGRASPtosearchingforsubgraphsofcardinalitiesuptoNmax=5,andnormalizedtheruntimeofGRASPtotheruntimeofDYNDENSforthesameparameters18(i.e.thenormalizedruntimeofDYNDENSis1).Thenormalizedrun-timeofGRASPisreportedinFigure4(g),anditsrecallinFig-ure4(f).Aswecansee,GRASPoffersaruntime/recalltradeoff,andcanthusbeattimesmoreefcientthanDYNDENS(however,insuchcases,itoffersrecallofunder80%).Moreover,GRASPoffersdiminishingreturnswrt.recall(i.e.ittakesincreasinglymanyiter-ationstoachievearbitrarilyhighrecall;eventhoughtheincreaseinruntimeislinearwrt.thenumberofiterations,theincreaseinrecallisdecidedlysublinear).Thus,inthiscontext,GRASPisbestsuitedtoidentifyingasampleofalldensesubgraphs.However,sincehighrecallisofcrucialimportanceinstoryidentication(missing20%ofimportantstorieswouldnotgenerallybeacceptable),DYNDENSisbestsuitedtoaddressingENGAGEMENTinthissetting.Finally,wealsoinvestigatedasimplebaselineapproach(BASE-LINE),whichperiodicallyrecomputesalloutput-densesubgraphswrt.AVGWEIGHT.Theaimofthiscomparisonwastovalidatethenecessityforincrementalcomputationasopposedtoperiodicofinerecomputation.WeimplementedBASELINEusinganef-cienthash-basedin-memoryindex,andexecuteditonourexperi-mentaldatasetswithvaryingparameters(T;Nmax),andatvaryinguniformsamplingintervals(i.e.everyXtweets).Wemeasured 14AClevelandbloggercomparedOsamabinLadentoathleteLe-BronJames;thediscussioncontinuedonTwitter,resultinginasports-relatedmemearoundthedeathofbinLaden.15C.EastwoodwasmentionedinconjunctionwiththisstoryaspartofahumorousmemestartedbycomedianSteveMartinonTwitter.16Theindexusedin[1]isoptimizedforsecondarystorage,hencenotveryusefulforthepurposesofourcomparison.17Theaverage(overthevaluesofallotherparameterstested)stan-darddeviationofvarying 2(0;1)was4%,andthemedianstan-darddeviationwas1%.18ForDYNDENSweselectedareasonablevalueofit,giventhevaluesoftherestoftheparameters.thenumberofrecomputationsthatBASELINEwasabletoperform,giventhesametimeasDYNDENStookfortheentiredatasetEvengiventheaboverestrictedproblemsetting,weobservedthatBASELINEwasgenerallynotuptothetaskofrealtimestoryidentication.Inourweighteddataset,andforawiderangeofparameters,itwasabletoperformupto15-30recomputationsinthesametimethatDYNDENSprocessedtheentiredataset(corre-spondingtoidentifyingnewstoriesevery48-96minutes19).Intheunweighteddataset(whichhadonaveragefeweredges,andwasthusmoreamenabletoreprocessingfromscratch),BASELINEdidsomewhatbetter,performing135-300recomputationsforthepa-rametersweexperimentedwith(correspondingtoidentifyingnewstoriesaboutevery5-10minutes).Moredetailedresultscanbefoundin[4].Weconcludethat,althoughperiodicrecomputationmaybeanoptioninlimitedscenarios(e.g.unweightedgraphs,AVGWEIGHT,notverystrictrealtimerequirements),ingeneraltheperformancebenetsofincrementalrecomputationareneededtosupportrealtimestoryidentication.5.3QualitativeResultsWhereasthefocusofthisworkistoefcientlyidentifydensesubgraphsinanincrementalmanner,wealsoprovideevidenceoftheeffectivenessofourapproach.Evaluatingthequalityofourre-sultsforrealtimestoryidenticationisbothinherentlychallenging,duetothelackofagroundtruthforwhatconstitutesanimportantstoryforagivenmedium(e.g.amicro-bloggingsitevs.anewsagency),aswellasbeyondthescopeofthiswork.Wewillthuspresentsomesampleresultsofutilizingdensesubgraphsforstoryidentication.Wehavealsobuiltalivedemoforourtechniques,whichwewillbrieydiscuss,andencourageinterestedreaderstovisitsoastoviewthisworkinaction.Inordertopresentsampleresults,wechosetofocusonstoriesatthegranularityofasingleday(sincepresentingstoriesthatwereheavilydiscussedataspecicdateandtimewouldbehardtopro-cessoutofcontext).Weusedadatasetsimilartothe“unweighted”onefromourperformanceexperiments,withthefollowingtwomodications:entitycorrelationswerecomputedovertheentiredataset,asopposedtousingexponentialdecay;andedgeweightswereretainedforpairsofentitieswithloglikelihoodofover5%signicance,ratherthanbeingthresholdedandrestrictedtof0;1g.WecomputeddensesubgraphsofcardinalityuptoNmax=5,us-ingAVGDEGREEtoquantifydensity,soastofavorlargerdensesubgraphs;forpresentationpurposestheseweresubsequentlyre-rankedinadiversity-awaremanner[2](subgraphoverlapwaspe-nalizedbymultiplyingsubgraphdensityby10:8(fractionofstoryentitiescoveredbypreviousstories)).Table3presentstheresultingtopstories.Weobservethatdis-cussionsonbinLaden'sdeathfeatureprominentlyinthelist;more-over,giventhetypicalconversationtoneonTwitter,distinctdiscus-sionsinvolvedcomparingthepresidentialannouncementtofamousathletes14,andeventherapidpropagationofthenewsonTwitter.OtherstoriescovertheevolvingcrisisinLibya,aswellaslighter,ongoingissues,suchasHarryPotter,andLadyGaga'santics.Forcomparativepurposes,wealsoperformedthesameproce-dureonadatasetconsistingofallblogpostsmadeonmajorbloghostingplatformsduringthesameday;duetospaceconstraintstheresultscanbefoundin[4].Finally,tovalidatetheeffectivenessourapproach,wehavebuiltalivedemoofourtechniques,inthecontextofGrapevine[3].Thisprototypeprocessesmillionsofblogpostsonadailybasis,andcomputesimportantstoriesinreal-time.Itconsistsofapipeline 19Asourdatasetcorrespondstotweetsmadeinoneday. thatprocessesblogpostsastheyarecrawled,rejectingspamandnon-englishlanguageposts,extractsnamedentitymentions,up-datestheentitygraph,andusesDYNDENStoupdatethesetofcur-rentdensesubgraphs,asinthe“unweighted”datasetusedinourexperiments.Italsomaintainstrackofoutput-densesubgraphs,whicharereportedtotheuseruponrequest.Besidestheentitiesinvolvedineachoutput-densesubgraph/story,afewlinkstorele-vantblogpostsareprovided,aswellasalinkbacktoGrapevineforfurtherexplorationofthehistoricalevolutionofthestory.In-terestedreadersareencouragedtoexplorethisprototype,availableatwww.onthegrapevine.ca/now.jsp.6.HEURISTICSInconcludingourexpositionofDYNDENS,letusalsoexam-inetwoadditionalheuristicsthatcanoffermodestperformanceim-provements,withoutaffectingthequalityofresults.Botharere-latedtolimitingthenumberofexplorations,andcheapexplorationsperformed.Duetospaceconstraints,thefulldetailsforthese,andproofsoftheircorrectness,areomitted,andcanbefoundin[4].MAXEXPLORE:WhereasitservestoprovethecorrectnessofDYNDENS,thepreviousboundonexplorationiterationsthatneedtobeperformedonasubgraphCisoverlypessimistic,asitisbasedonseveralworst-caseassumptions.Toovercomethischallenge,wedevelopedMAXEXPLORE,animprovementoverthepreviousbound,thattakesthegraphneighborhoodoftheupdatededge,aswellasthecardinalityofthesubgraphbeingexplored,intoaccount.Asitisafairlycheapboundtocompute,wecanexpectMAXEX-PLOREtoleadtoperformanceimprovementsinthecaseofdensesubgraphsonwhichmultipleexplorationiterationswouldhaveoth-erwisebeenperformed.DEGREEPRIORITIZE:AnotherchallengeinthebasicformofDYNDENSdiscussedsofar,isthatasinglegraphmightbeex-ploredmultipletimes,byexplorationproceduresoriginatingfromeachofitsdensesubgraphs.Tomitigatetheadverseeffectsthiscanhaveonperformance,wedevelopedDEGREEPRIORITIZE,awaytoorganizethesearchspace,andthusoftenavoidperformingredun-dantexplorations,inspiredbythedegree-basedcriterionproposedin[28].Atahighlevel,itguaranteesthatDYNDENSdoesnotneedtoexplore(orcheap-explore)asubgraphwithverticeshavingdenseconnectionstothesubgraph.WethusexpectDEGREEPRIORITIZEtoofferthegreatestbenettoperformanceincasesofdensesub-graphsonwhichredundant,multiple-iterationexplorationswouldhaveotherwisebeenperformed.Evaluation:InourevaluationofDYNDENS,theaboveheuris-ticswereenabled.Thus,toevaluatetheirperformancebenets,wealsoevaluatedvariantsofDYNDENSwhereeitherDEGREEPRIOR-ITIZEand/orMAXEXPLOREweredisabled,onbothourweightedandunweighteddatasets.Weobservedthattheseheuristicswereresponsibleforverymodestperformanceimprovementsofupto4%,andsometimesevenresultedinworseperformance.Bydesign,weexpecttheproposedheuristicstoofferperfor-mancebenetsincaseswheremanyexplorationswouldhaveother-wisebeenperformedintheirabsence.Tovalidatethis,andfurtherinvestigatetheirpotentialtoimproveperformance,weevaluatedthemonasyntheticdatasetthatconsistedofnear-cliques,mixedwithrandomedges,thatwasgeneratedasfollows:Inaninitiallyemptygraphwith100Kvertices,250Kupdatesweregenerated,eachofmagnitude(0;0:1](withprobability0:3theupdatewasnegative).Withprobability0:9,theupdateoccurredwithinoneof100predenedsetsof10verticeseach;otherwise,itwasuni-formlyrandomlydistributedtotheremainderofthegraph.Finally,inordertoevaluatetheproposedheuristicsintheabsenceoftoo-densesubgraphs,updatesthatwouldresultintoo-densesubgraphsforT=0:7anditat40%ofitsmaximumvalue,wererejected.Figure4(h)showsthetimetakenbyeachDYNDENSvariant(noheuristicsenabled,onlyDEGREEPRIORITIZEenable,onlyMAX-EXPLOREenabled,bothheuristicsenabled),normalizedbythetimetakenbytherstvariant;theoperatingparameterswereT=0:7,Nmax2f8;9;10g,anditat40%ofitsmaximumvalue(notethattheYaxisdoesnotstartat0).Theproposedheuristicsareseentoofferperformanceimprovementsofuptoover10%;thus,whilenotascrucialasIMPLICITTOODENSEtoperformance,webelievethattheloweffortrequiredtoimplementtheseheuristicsmakethemworthwhileforinclusioninDYNDENS.7.RELATEDWORKWhilewearenotawareofanyworkthataddressesthemain-tenanceofdensesubgraphsinweightedgraphs,understreamingedgeweightupdates,forabroaddenitionofdensity,thereexistsarichliteratureofworksdealingwithrelatedproblems.[27]addressesincrementalmaximalcliquemaintenance,fromamostlytheoreticalperspective,andusingagrowthproperty.ThisisverycloselyrelatedtoaspecialcaseofENGAGEMENT(namely,forunweightedgraphs,AVGWEIGHT,andT=1).AnimportantdifferenceisthatourinstantiationofENGAGEMENTdealswithallcliques,withcardinalityconstraints,asopposedtomaximalcliquesofunconstrainedcardinality.AsdiscussedinSection5.2,whiletheformerisbettersuitedtoreal-timestoryidentication,thelattermaybepreferableinotherscenarios.[28]addressesnear-cliqueidentication,inanofinesetting,againfromamostlytheoreticalperspective,andusingagrowthproperty;thiscorrespondstotheofineversionofENGAGEMENTforunweightedgraphs,andAVGWEIGHT.Thetechniquespro-posedthereincannotbeefcientlydynamizedinastraight-forwardfashion,astheinformationtheyrelyuponcannotbeefcientlymaintainedacrossupdates.OurDEGREEPRIORITIZEpruningcon-ditionisinspiredbytheparentdegree-basedcriterionproposedinthiswork.[23]addressesthesameproblem,usingasimilargrowthproperty,andwithafocusonaparallelimplementation.Aswiththeotherworks,thetechniquesdevelopedthereinarenotstraight-forwardtoefcientlydynamize.Max(quasi-)clique:Relatedproblemsoccurinthemaximumclique[25]andquasi-cliqueliterature.Toovercometheintractabil-ityandinapproximabilityofthisproblem,heuristics(typicallyran-domized)havebeenusedtodiscoverlarge(quasi-)cliques.Acru-cialdifferenceisthatENGAGEMENTrequirestheenumerationofalldensesubgraphs(asfromanapplicationperspective,eachsub-graphcorrespondstoastoryofinterest).Incontrast,worksinthemaximum(quasi-)cliquedomainaregearedtowardsidentifyingone“good”subgraphperexecutioniteration.Moreover,mostsuchheuristictechniquesarenotstraightforwardtoefcientlydynamize.Perhapsmostcloselyrelatedisthestate-of-the-artGreedyRan-domizedAdaptiveSearchProcedureusedin[1]toidentifylargedensesubgraphs(quasi-cliques).Althoughthisworkismorefo-cusedtowardsdevelopingtechniquesforlimitedmain-memorysce-narios,theirtechniquescanbedynamizedinanefcientmannertoaddressENGAGEMENTforunweightedgraphsandAVGWEIGHT(cf.Section5.2).Localdensity:Otherworkshavedealtwithedge-weightupdatesemantics,albeitwithmuchsimplerdenitionsofdensity.Forin-stance[30]andothersmaintaindensesubgraphsoverslidingwin-dowsusingneighbor-basedpatterns(i.e.whetheradensesubgraphshouldbeaugmentedwithanadditionalnodeisdecidedbasedonlocalinformationonly).AstheproblembeingaddressedthereinisverydifferentfromENGAGEMENT,theproposedtechniquesareinapplicableinthelatterdomain. Max-ow:[12],[20]andothersuse(primarily)max-owbasedalgorithmstoidentifydensesubgraphs.Whilemax-owalgorithmscanbedynamized[22],[18],thesealgorithmscanonlyidentifyandmaintainclusterscontaininguser-speciednodes.Inarelatedvein,[14]usesmax-owtondthetop-1densesubgraph(forAVGDE-GREE);howevertheirtechniquescannotbeefcientlyappliedtoatop-korthresholdvariant,norcantheybeefcientlydynamized.Dynamicgraphs:Otherworks(e.g.[10],[6])havedealtwithdynamicgraphalgorithmsunderedgeweightupdates,butdonotdealwithdensityproblems,focusinginsteadonpropertiessuchasplanarity,connectivity,trianglecounting,etc.Anotableexceptionis[17],whichdiscussesapproximationalgorithmstogeneralmaxi-mizationproblemsindynamicgraphs.Itis,however,theoreticalinnature,anditsfocusisontheapproximationratiooftheresultingalgorithm,notonefciency.Clustering:Relatedproblemsarealsodealtwithintheincre-mentalclusteringliterature(e.g.[11],[15],[8]);however,thesedealwithgraphnodeinsertionanddeletion,andtheproposedtech-niquescannotdirectlyaccommodatestreamingedgeweightup-dates.Atangentiallyrelatedproblemisevolutionaryclustering([7],[21])whichidentiesclustersbasedonbothdensity,andhis-toricaldata;thegoalistointroducetemporalsmoothing,sothatclustersbehaveinastablefashionovertime.Communitiesofinterest:[9],anditsextension[19],addresstheproblemofsupportingefcientretrievalofimportant2-neighborsofanynode,wheretheimportanceofaneighborisrelatedtolocalandglobaledgethresholds.Thefocusisonbetterrepresentationofactualinteractions,andremovalofspuriousinformation,andtheprovidedinsightsareinvaluableforanyapplicationsthatin-volvedynamicgraphs.However,theproblemexaminedintheseworks,issubstantiallydifferentfromENGAGEMENT,hencetech-niquesproposedintheseworksdonotapplyinENGAGEMENT.Shingling:[13]proposestechniquestoidentifylargedensesub-graphsinanofinefashionviarecursiveshingling.Whilethiscouldpotentiallybedynamized,itisgearedtowardslargesub-graphs(100-10Knodes),andwouldnotbeeffectiveonsmallersubgraphs.[29]alsousesLSHtoidentifycliquesofmoderatesizeinlargegraphs;itishowevernoteasilyamenabletodynamization,asithasasignicantpreprocessingphase.Datastructures:Finally,theindexstructureusedbyDYNDENSresemblestheFP-tree[16],inthatbothstoreoverlappingsubsetsinaprextree,withinvertedlistsembeddedintothetreestructure.However,theFP-treeisoptimizedforstaticdata,andassumesthattreenodescanbestaticallyorderedinawaythatheuristicallyde-creasestreesize;thismakesitunsuitableforENGAGEMENT,wheretreenodesdynamicallychange.Moreover,otherimprovementsoftheFP-treeoveraplainprextreearenotapplicabletoENGAGE-MENT,astheproblemssolvedaredifferent.8.CONCLUSIONSMotivatedbytheneedtomineimportantstoriesandeventsfromthesocialmediacollective,astheyemerge,inthisworkweexam-inetheproblemofmaintainingdensesubgraphsunderstreamingedgeweightupdates.Forabroaddenitionofgraphdensity,weproposetherstefcientalgorithm,DYNDENS,whichisbasedonnoveltheoreticalresultsregardingthemagnitudeofchangethatasingleedgeweightupdatecanhave.DYNDENSishighlyefcient,andabletogracefullyscaletorapidlyevolvingdatasets,andwevalidatetheefciencyandeffectivenessofourapproachviaathor-oughevaluationonrealandsyntheticdatasets.Moreover,therearemanyexcitingnewdirectionsstemmingfromthiswork.Forexample,animportantprobleminthesocialmediaspaceisthetimelyidenticationofonlinecommunities.WhileitiseasytoseehowENGAGEMENTcanbeappliedtothisdomain,itscharacteristicsaresomewhatdifferentfromthoseofreal-timestoryidentication(e.g.socialgraphsarefrequentlydirected,communi-tiesaretypicallysubgraphsoflargercardinalitythanstories,etc.),anditwouldbeinterestingtoexplorehowtoadaptDYNDENStothediversechallengesthisdomainimposes.AnotherinterestingtechnicalproblemariseswhenconsideringtheneedforadjustingthedensitythresholdT,duringexecution-e.g.inordertoadapttochangesinthedataset.WeareactivelyexploringadaptingthetechniquesusedinDYNDENStomoreefcientlyperformthistask.9.REFERENCES[1]J.Abello,M.G.C.Resende,andS.Sudarsky.Massivequasi-cliquedetection.InLATIN,pages598–612,2002.[2]A.AngelandN.Koudas.Efcientdiversity-awaresearch.InSIGMOD,pages781–792,2011.[3]A.Angel,N.Koudas,N.Sarkas,andD.Srivastava.What'sonthegrapevine?InSIGMOD,pages1047–1050,2009.[4]A.Angel,N.Koudas,N.Sarkas,andD.Srivastava.Densesubgraphmaintenanceunderstreamingedgeweightupdatesforreal-timestoryidentication.Tr,2011.Availableathttp://tinyurl.com/dyndens.[5]N.Bansal,F.Chiang,N.Koudas,andF.W.Tompa.Seekingstableclustersintheblogosphere.InVLDB,pages806–817,2007.[6]Z.Bar-Yossef,R.Kumar,andD.Sivakumar.Reductionsinstreamingalgorithms,withanapplicationtocountingtrianglesingraphs.InSODA,pages623–632,2002.[7]D.Chakrabarti,R.Kumar,andA.Tomkins.Evolutionaryclustering.InKDD,pages554–560,2006.[8]M.Charikar,C.Chekuri,T.Feder,andR.Motwani.Incrementalclusteringanddynamicinformationretrieval.InSTOC,pages626–635,1997.[9]C.Cortes,D.Pregibon,andC.Volinsky.Computationalmethodsfordynamicgraphs.JCGS,12(4):950–970,2003.[10]D.Eppstein,Z.Galil,andG.F.Italiano.Dynamicgraphalgorithms.InAlgorithmsandTheoryofComputationHandbook,chapter8.1999.[11]M.Ester,H.-P.Kriegel,J.Sander,M.Wimmer,andX.Xu.Incrementalclusteringformininginadatawarehousingenvironment.InVLDB,pages323–333,1998.[12]G.W.Flake,S.Lawrence,andC.L.Giles.Efcientidenticationofwebcommunities.InKDD,pages150–160,2000.[13]D.Gibson,R.Kumar,andA.Tomkins.Discoveringlargedensesubgraphsinmassivegraphs.InVLDB,pages721–732,2005.[14]A.Goldberg.Findingamaximumdensitysubgraph.Technicalreport,UniversityofCaliforniaatBerkeley,1984.[15]S.Guha,A.Meyerson,N.Mishra,R.Motwani,andL.O'Callaghan.Clusteringdatastreams:Theoryandpractice.TKDE,15(3):515–528,2003.[16]J.Han,J.Pei,andY.Yin.Miningfrequentpatternswithoutcandidategeneration.InSIGMOD,pages1–12,2000.[17]J.HartlineandA.Sharp.Anincrementalmodelforcombinatorialmaximizationproblems.InWEA,pages36–48,2006.[18]J.HartlineandA.Sharp.Incrementalow.Networks,50(1):77–85,2007.[19]S.Hill,D.K.Agarwal,R.Bell,andC.Volinsky.Buildinganeffectiverepresentationfordynamicnetworks.JournalofComputationalandGraphicalStatistics,15(3):584–608,2006.[20]S.KhullerandB.Saha.Onndingdensesubgraphs.InICALP,pages597–608,2009.[21]M.-S.KimandJ.Han.Chronicle:Atwo-stagedensity-basedclusteringalgorithmfordynamicnetworks.InDiscoveryScience,pages152–167,2009.[22]S.KumarandP.Gupta.Anincrementalalgorithmforthemaximumowproblem.JMMA,2(1):1–16,2003.[23]J.LongandC.Hartman.ODES:anoverlappingdensesub-graphalgorithm.Bioinformatics,26(21):2788–2789,2010.[24]M.MathioudakisandN.Koudas.Twittermonitor:trenddetectionoverthetwitterstream.InSIGMOD,pages1155–1158,2010.[25]P.M.PardalosandJ.Xue.Themaximumcliqueproblem.JournalofGlobalOptimization,4(3):301–328,1994.[26]N.Sarkas,A.Angel,N.Koudas,andD.Srivastava.Efcientidenticationofcoupledentitiesindocumentcollections.InICDE,pages769–772,2010.[27]V.Stix.Findingallmaximalcliquesindynamicgraphs.ComputationalOptimizationandApplications,27(2):173–186,2004.[28]T.Uno.Anefcientalgorithmforsolvingpseudocliqueenumerationproblem.Algorithmica,56(1):3–16,2010.[29]N.Wang,S.Parthasarathy,K.-L.Tan,andA.K.H.Tung.Csv:visualizingandminingcohesivesubgraphs.InSIGMOD,pages445–458,2008.[30]D.Yang,E.A.Rundensteiner,andM.O.Ward.Neighbor-basedpatterndetectionforwindowsoverstreamingdata.InEDBT,pages529–540,2009.