10 HoweverfrequentlythedensestsubgraphproblemfailsindetectinglargenearcliquesinnetworksInthisworkweintroducethecliquedensestsubgraphproblem2Thisgeneralizesthewellstudieddensestsubgraphprobl ID: 336725
Download Pdf The PPT/PDF document "The K clique Densest Subgraph Problem" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
TheK-cliqueDensestSubgraphProblemCharalamposE.TsourakakisHarvardSchoolofEngineeringandAppliedSciencesbabis@seas.harvard.eduABSTRACTNumerousgraphminingapplicationsrelyondetectingsub-graphswhicharelargenear-cliques.Sinceformulationsthataregearedtowardsndinglargenear-cliquesareNP-hardandfrequentlyinapproximableduetoconnectionswiththeMaximumCliqueproblem,thepoly-timesolvabledensestsubgraphproblemwhichmaximizestheaveragedegreeoverallpossiblesubgraphs\liesatthecoreoflargescaledatamining"[ 10 ].However,frequentlythedensestsubgraphprob-lemfailsindetectinglargenear-cliquesinnetworks.Inthiswork,weintroducethe-cliquedensestsubgraphproblem,2.Thisgeneralizesthewellstudieddens-estsubgraphproblemwhichisobtainedasaspecialcasefor=2.For=3weobtainanovelformulationwhichwerefertoasthetriangledensestsubgraphproblem:givenagraphG(V;E),ndasubsetofverticesSsuchthat(S)=maxSVt(S) jSj,wheret(S)isthenumberoftrianglesinducedbythesetS.Onthetheoryside,weprovethatforanyconstant,thereexistanexactpolynomialtimealgorithmforthe-cliquedensestsubgraphproblem.Furthermore,weproposeanecient1 k-approximationalgorithmwhichgeneralizesthegreedypeelingalgorithmofAsahiroandCharikar[ 8 , 18 ]for=2.Finally,weshowhowtoimplementecientlythispeelingframeworkonMapReduceforany3,generaliz-ingtheworkofBahmani,KumarandVassilvitskiiforthecase=2[ 10 ].Ontheempiricalside,ourtwomainnd-ingsarethat(i)thetriangledensestsubgraphisconsistentlyclosertobeingalargenear-cliquecomparedtothedensestsubgraphand(ii)thepeelingapproximationalgorithmsforboth=2and=3achieveonreal-worldnetworksap-proximationratioscloserto1ratherthanthepessimistic1 kguarantee.Aninterestingconsequenceofourworkisthattrianglecounting,awell-studiedcomputationalprobleminthecontextofsocialnetworkanalysiscanbeusedtode-tectlargenear-cliques.Finally,weevaluateourproposedmethodonapopulargraphminingapplication.CopyrightisheldbytheInternationalWorldWideWebConferenceCom-mittee(IW3C2).IW3C2reservestherighttoprovideahyperlinktotheauthor'ssiteiftheMaterialisusedinelectronicmedia.WWW2015,May1822,2015,Florence,Italy.ACM978-1-4503-3469-3/15/05.http://dx.doi.org/10.1145/2736277.2741128.CategoriesandSubjectDescriptorsG.2.2[GraphTheory]:GraphAlgorithms;H.2.8[DatabaseApplications]:DataminingGeneralTermsTheory,ExperimentationKeywordsDensestsubgraphproblem;Graphalgorithms;GraphMin-ing;Near-cliqueextraction1.INTRODUCTIONAwidevarietyofdataminingapplicationsreliesonex-tractingdensesubgraphsfromlargegraphs.Inbioinformat-icsdensesubgraphsareusedfordetectingproteincomplexesinprotein-proteininteractionnetworks[ 9 ]andforndingregulatorymotifsinDNA[ 24 ].Theyarealsousedforde-tectinglinkspaminWebgraphs[ 26 ],graphcompression[ 17 ]andminingmicro-bloggingstreams[ 6 ].Amongthevariousformulationsforndingdensesub-graphs,thedensestsubgraphproblem(DS-Problem)standsoutforthefactsthatissolvableinpolynomialtime[ 28 ]and1 2-approximableinlineartime[ 18 , 40 ].TostatetheDS-Problemwerstintroducethenecessarynotation.Inthisworkwefocusonsimpleunweighted,undirectedgraphs.GivenagraphG=(V;E)andasubsetofverticesSV,letG(S)=(S;E(S))bethesubgraphinducedbyS,andlete(S)=jE(S)jbethesizeofE(S).Also,theedgeden-sityofthesetSisdenedasfe(S)=e(S)= jSj2.TheDS-Problemmaximizesthedegreedensitye(S) jSjoverallsubgraphsSV.Noticethatthisisequivalenttomaxi-mizingtheaveragedegree 1 .TheDS-Problemisapowerfulprimitiveformanygraphapplicationsincludingsocialpig-gybacking[ 27 ],reachabilityanddistancequeryindexing[ 19 , 37 ].However,alargenumberofapplicationsaimstondsubgraphswhicharelargenear-cliquesratherthanthesub-graphthatmaximizestheaveragedegree.FrequentlytheDS-Problemfailsinndinglargenear-cliquesbecauseitfavorslargesubgraphswithlowhighedgedensityfe.For 1Ingraphtheorythetermedgedensityrefersbydefaulttofee(S)= jSj22[0;1].However,sincedirectmaxi-mizationoffeisnotameaningfulproblem(evenasingleedgeachievesthemaximumpossibleedgedensity),theDS-Problemmaximizestheaveragedegree.Inthefollowing,werefertotheaveragedegreeofasetasitsdegreedensity. thisreasonotherformulationshavebeenproposed,seeSec-tion 2 .Unfortunately,theseformulationsareNP-hardandalsoinapproximableduetheconnectionswiththeMaximumCliqueproblem[ 32 ].Themaincontributionofthisworkisafamilyoftractableformulationswhichattacksecientlytheproblemofex-tractinglargenear-cliquesandincontrasttowell-performingheuristicscomeswithstrongtheoreticalguarantees.Inde-tail,ourcontributionsaresummarizedasfollows.Newformulation:Weintroducethek-cliquedensestsub-graphproblem(k-Clique-DS-Problem)whichgeneralizesthewell-studiedDS-Problem[ 18 , 25 , 28 , 40 ].Thegoalistomaximizetheaveragenumberof-cliquesinducedbyasetSVoverallpossiblevertexsubsets.NoticethattheDS-Problemisobtainedasaspecialcasefor=2.Weintroducealsothespecialcaseobtainedfor=3asthetriangledensestsubgraphproblem(TDS-Problem).Exactalgorithms.WepresenttwoexactalgorithmsfortheTDS-Problem.Thealgorithmwhichachievesthebestrunningtimeisbasedonmaximum\rowcomputationsandusesO(nt)space.ItisworthoutliningthatGoldberg'snetworkconstructionfortheDS-ProblemwhichusesO(nm)space[ 25 , 28 ]doesnotgeneralizetotheTDS-Problem.ThesloweroneisbasedonsupermodularmaximizationanduseslinearspaceO(nm).Here,n;m;tarethenumberofvertices,edgesandtrianglesintheinputgraph.Ouralgo-rithmscanbemodiedtoyieldpolynomialtimealgorithmsforthek-Clique-DS-Problemwhen=(1).Approximationalgorithm.Weproposea1 3-approximationalgorithmfortheTDS-Problemwhichrunsasymptoticallyfasterthananyoftheexactalgorithms.Wealsoproposea1 3+3-approximationalgorithmforany0whichcanbeimplementedecientlyinMapReduce.Thealgorithmre-quiresO(log(n)=)roundsandisMapReduceecient[ 39 ]duetotheexistenceofecientMapReducetrianglecount-ingalgorithms,e.g.,[ 48 ].Ouralgorithmscanbeadaptedtothek-Clique-DS-Problemforany=(1).Experimentalevaluation.Weevaluateourexactandap-proximationalgorithmsonnumerousreal-worldnetworks.Amongourndings,weobservethattheoptimaltriangledensestsubgraphisconsistentlyclosertobeingalargenear-cliquecomparedtotheoptimaldensestsubgraph.Forin-stance,intheFootballnetwork(seeTable 1 foradescriptionofthedataset)theDS-Problemreturnsthewholegraphasthedensestsubgraph,withfe=0:094whereastheTDS-Problemreturnsasubgraphon18verticeswithfe=0:48.Wealsoobservethatthepeelingapproximationalgorithmsforboth=2and=3achieveonreal-worldnetworksapproximationratioscloserto1ratherthanthepessimistic1 kguarantee.Graphminingapplication.Weproposeamodiedver-sionoftheTDS-Problem,theconstrainedtriangledensestsubgraphproblem(Constrained-TDS-Problem),whichaimstomaximizethetriangledensitysubjecttothecon-straintthattheoutputshouldcontainaspeciedsetofver-ticesQ.WeshowhowtosolveexactlytheTDS-Problem.Thisvariationisusefulinvariousdata-miningandbioinfor-maticstasks,see[ 49 ].2.RELATEDWORKSincedensesubgraphdiscoveryconstitutesamainre-searchtopicingraphanalysis,awidevarietyofrelatedmethodsexists:heuristics[ 49 , 52 , 54 ],algorithmiccontri-butionsonNP-hardformulations[ 5 , 12 , 22 , 40 , 49 ]andpoly-timesolvableformulations[ 18 , 40 , 49 ].Wefocusonthelatter.DensestSubgraph.InthedensestsubgraphproblemwearegivenagraphGandwewishtondthesetSVwhichmaximizestheaveragedegree[ 28 , 38 ].Thedensestsubgraphcanbeidentiedinpolynomialtimebysolvingamaximum\rowproblem[ 25 , 28 ].Charikar[ 18 ]provedthatthegreedyalgorithmproposedbyAsashiroetal.[ 8 ]producesa1 2-approximationofthedensestsubgraphinlin-eartime.Asashiroetal.studythecomplexityofndingdensesubgraphsbyintroducingageneralizationoftheDS-Problemandthemaximumcliqueproblem[ 7 ].A-coreisamaximalconnectedsubgraphofGinwhichallverticeshavedegreeatleast.Itisworthremarkingthatthesamealgorithmprovidesa-coredecompositionofthegraphandsolvestheproblemofndingthedegeneracy[ 11 ].Inthecaseofdirectedgraphs,thedensestsubgraphproblemissolvedinpolynomialtimeaswell[ 18 ].KhullerandSaha[ 40 ]pro-videalineartime1 2-approximationalgorithmforthecaseofdirectedgraphsamongothercontributions.TwointerestingvariationsoftheDkSproblemwereintroducedbyAnder-senandChellapilla[ 5 ].ThetwoproblemsaskforthesetSthatmaximizesthedensitysubjecttojSj(DamkS)andjSj(DalkS).WhenrestrictionsonthesizeofSareimposedtheproblembecomesNP-hard[ 40 ].Finally,thedensestsubgraphproblemhasbeenconsideredinvari-oussettings,includingMapReduce[ 10 ],thestreaming[ 10 ],thedynamicsetting[ 21 , 45 ]andtheircombinationrecently[ 13 ].TriangleCountingandListing.ThestateoftheartalgorithmforexacttrianglecountingisduetoAlon,YusterandZwick[ 4 ]andrunsinO(m2! !+1),wherecurrentlythefastmatrixmultiplicationexponent!is2.3729[ 53 ].Thus,theiralgorithmcurrentlyrunsinO(m1:4081)time.ThebestknownlistingalgorithmuntilrecentlywasduetoItaiandRodeh[ 33 ]whichrunsinO(m(G))time,where(G)isthegrapharboricity.Since(G)=O(p m),therunningtimeisalwaysO(m3=2).Recently,Bjorklund,Pagh,WilliamsandZwickgaverenedalgorithmswhichareoutputsensitivealgorithms[ 14 ].Finallyawealthofapproximatetrianglecountingmethodsexist[ 35 , 41 , 44 , 51 ].3.PROBLEMDEFINITIONWedenethenotionofaveragetriangledensity.Definition1(TriangleDensity).LetG(V;E)beanundi-rectedgraph.ForanySVwedeneitstriangledensity(S)as(S)=t(S) s,wheret(S)isthenumberoftrianglesinducedbySandsjSj.Noticethat3(S)istheaveragenumberof(induced)trian-glespervertexinS.Theoptimizationproblemwefocusonfollows.Problem1(TDS-Problem).GivenG(V;E),ndasubsetofverticesSsuchthat(S)=GwhereGmaxSV(S).ItisclearthattheDS-ProblemandtheTDS-Problemingeneralcanresultinradicallydierentsolutions.Con-siderforinstanceagraphGon2n+3verticeswhichis theunionofatriangleK3andofabipartitecliqueKn;n.TheoptimalsolutionsoftheDS-ProblemandtheTDS-Problemarethebipartitecliqueandthetrianglerespec-tively.Therefore,theinterestingquestioniswhethermaxi-mizingtheaveragedegreeandthetriangledensityresultindierentresultsinreal-worldnetworks.OurresultsinSec-tion 5 indicatethattheanswerispositivesincethetriangledensestsubgraphcomparedtodensestsubgraphissmallerwhichexhibitsastrongernear-cliquestructure.4.PROPOSEDMETHODSection 4.1 providestwoalgorithmswhichsolvetheTDS-Problemexactly.Sections 4.2 and 4.3 providetwoapprox-imationalgorithmsfortheTDS-Problem.Finally,Sec-tion 4.4 providesageneralizationoftheDS-ProblemandtheTDS-Problemtomaximizingtheaverage-cliqueden-sityandshowshowtheresultsfrompreviousSectionsadapttothisproblem.4.1ExactSolutionsLetn;m;tbethenumberofvertices,edgesandtrianglesingraphGrespectively.ThealgorithmpresentedinSec-tion 4.1.1 achievesthebestrunningtime.WepresentanalgorithmwhichreliesonthesupermodularitypropertyofourobjectiveinSection 4.1.2 .Thelatteralgorithm,evenifslower,requiresO(nm)space,whereastheformerO(nt)space.Inreal-worldnetworks,typicallymt.Finally,itisworthmentioningthatCharikar'slinearprogram,seex2in[ 18 ],canbeextendedtoalinearprogram(LP)whichsolvestheTDS-Problem,see[ 50 ]forthedetails.4.1.1AnO m3=2nt+min(n;t)3-timeexactsolution Algorithm1triangle-densestsubgraph(G) 1:ListthesetoftrianglesT(G),tjT(G)j2:l t n;u (n 1)(n 2) 63:S ;4:whileul1 n(n 1)do5: lu 26:H Construct-Network(G;;T(G))7:(S;T) minimumst-cutinH8:ifSfsgthen9:u 10:else11:l 12:S Snfsg\V(G)13:endif14:ReturnS15:endwhile Ourmaintheoreticalresultisthefollowingtheorem.Itsproofisconstructive.Theorem1.ThereexistsanalgorithmwhichwhichsolvestheTDS-ProblemandrunsinO m3=2nt+min(n;t)3time.ThersttermO(m3=2)comesfromusingtheItai-Rodehal-gorithm[ 33 ]asourtrianglelistingblackbox.IfweusethenaiveO(n3)trianglelistingalgorithmthentherunningtimeexpressionissimpliedtoO(n3nt).Ontheotherhand,ifweusethealgorithmsofBjorklundetal.[ 14 ]therstterm Algorithm2(G;;T(G)) 1:V(H) fsg[V(G)[T(G)[ftg.2:Foreachvertexv2V(G)addanarcofcapacity1toeachtriangletiitparticipatesin.3:Foreachtriangle=(u;v;w)2T(G)addarcstou;v;wofcapacity2.4:Adddirectedarc(s;v)2A(H)ofcapacitytvforeachv2V(G).5:Addweighteddirectedarc(v;t)2A(H)ofcapacity3foreachv2V(G).6:ReturnnetworkH(V(H);A(H);w);s;t2V(H). becomesfordensegraphs~O n!n3(! 1)=(5 !)t2(3 !)=(5 !)andforsparsegraphs~O m2!=(!+1)m3! 1 !+1t3 ! !+1,where!isthematrixmultiplicationexponent.Currently!2:3729dueto[ 53 ].Wemaintain[ 33 ]asourblack-boxtokeeptheexpressionssimpler.However,thereadershouldkeepinmindthattheresultpresentedin[ 14 ]improvesthetotalrunningtimeoftherstterm.WeworkourwaytoprovingTheorem 1 byrstprov-ingthenextkeylemma.Then,weremovethelogarithmicfactor.Lemma1.Algorithm1solvestheTDS-ProblemandrunsinO m3=2+(nt+min(n;t)3)log(n)time.Algorithm1usesmaximum\rowcomputationstosolvetheTDS-Problem.ItisworthoutliningthatGoldberg'smaximum\rowalgorithm[ 28 ]fortheDS-ProblemisbasedonanetworkconstructionthatdoesnotadapttothecaseoftheTDS-Problem.Algorithm1returnsanoptimalsub-graphS,i.e.,(S)=.Thealgorithmperformsabinarysearchonthetriangledensityvalue.Specically,eachbi-narysearchquerycorrespondstoqueryingdoesthereexistasetSVsuchthatt(S)=jSj?.Foreachbinarysearch,weconstructabipartitenetworkHbyinvokingAlgorithm2.LetT(G)bethesetoftrianglesinG.ThevertexsetofHisV(H)=fsg[A[B[ftg,whereAV(G)andBT(G).Noticethatweoverloadthenotationinordertousethefre-quentlyusednotationforthesinkvertext.Itshouldalwaysbeclearfromthecontexttowhichentity(numberoftrian-glesvs.sinkvertex)wereferto.ForthepurposeofndingT(G),atrianglelistingalgorithmisrequired[ 14 , 33 ].ThearcsetofgraphHiscreatedasfollows.Foreachvertexr2Bcorrespondingtotriangle(u;v;w)weaddthreein-comingandthreeout-comingarcs.Theincomingarcscomefromtheverticesu;v;w2Awhichformtriangle(u;v;w).Eachofthesearcshascapacityequalto1.Theoutgoingarcsgotothesamesetofverticesu;v;w,butthecapacitiesareequalto2.Inadditiontothearcsofcapacity1fromeachvertexu2Atothetrianglesitparticipatesin,weaddanoutgoingarcofcapacity3tothesinkvertext.Fromthesourcevertexsweaddanoutgoingarctoeachu2Aofcapacitytv,wheretvisthenumberoftrianglesvertexvparticipatesinG.Aswehavealreadynoticed,HcanbeconstructedinO(m3=2)time[ 33 ].ItisworthoutliningthataftercomputingHforthersttime,subsequentnetworksneedtoupdateonlythearcsthatdependontheparameter,somethingnotshowninthepseudocodeforsimplicity.ToprovethatAlgorithm1solvestheTDS-ProblemandrunsinO m3=2+(nt+min(n;t)3)log(n)timeweproceedinsteps. Forthesakeoftheproof,weintroducethefollowingdef-initionsandnotation.ForagivensetofverticesSletti(S)bethenumberoftrianglesthatinvolveexactlyiverticesfromS,i2f1;2;3g.Noticethatt3(S)isthenumberofinducedtrianglesbyS,forwhichwehavebeenusingthesimplernotationt(S)sofar.Weusethefollowingclaimasourcriteriontosettheinitialvaluesl;uinthebinarysearch.Claim1. t n(S)(n 1)(n 2) 6foranySV.Thelowerboundisobvioussince(V)=t n.Theupperboundalsofollowstriviallybyobservingthat(S) n3=nforany;6SV.ThissuggeststhattheoptimalvalueisalwaysO(n2).Thenextclaimservesasacriteriontodecidewhentostopthebinarysearch.Claim2 Thesmallestpossibledierenceamongtwodistinctvalues(S1);(S2)isequalto1 n(n 1).Toseewhy,noticethatthedierencebetweentwopossibledierenttriangledensityvaluesist(S1)jS2j t(S2)jS1j jS1jjS2j:IfjS1jjS2jthenjj1 n1 n(n 1),otherwisejj1 jS1jjS2j1 n(n 1).Noticethatcombiningtheabovetwoclaimsshowsthatthebinarysearchterminatesinatmostd4lognequeries.ThefollowinglemmaisastructurallemmafortheoptimalstcutthenetworkH.Lemma2.Consideranyminimumst-cut(S;T)inthenetworkH.LetA1S\A;B1S\BandA2T\A;B2T\B.Thecostofthemin-cutisequaltoXv=2A1tv+2t2(A1)+t1(A1)+3jA1j:Proof.CaseI:A1;: Inthiscasethepropositiontriv-iallyholds,asthecostisequaltoPv2Atv=3t.ItisworthnoticingthatinthiscaseB1hastobealsoempty,otherwisewecontradicttheoptimalityof(S;T).HenceSfsg;TA[B[ftg.CaseII:A1;: ConsiderthecostofthearcsfromA1[B1toA2[B2.Weconsiderthreedierentsubcases,onepereachtypeoftrianglewithrespecttosetA1.Type3:Ifthereexistthreeverticesu;v;w2A1thatformatriangle(u;v;w),thenthevertexr2BcorrespondingtothisspecictrianglehastobeinB1.Ifnot,thenr2B2,andwecouldreducethecostofthemin-cutby3,ifwemovethetriangletoB1.Thereforethecostwepayfortrianglesoftypethreeis0.Type2:Considerthreeverticesu;v;wsuchthattheyformatriangle(u;v;w)andu;v2A1;w2A2.Then,thevertexr2BcorrespondingtothistrianglecanbeeitherinB1orB2.Inbothcaseswealwayspay2inthecutforeachtriangleoftypetwo.Type1:Finally,inthecaseu;v;wformatriangle,u2A1;v;w2A2thevertexr2Bcorrespondingtotriangle(u;v;w)willbeinB2.Ifnot,thenitliesinB1andwecoulddecreasethecostofthecutby3ifwemoveitinB2.Hence,wepay1inthecutforeachtriangleoftypeone.ThereforethecostduetothevarioustypesoftriangleswithrespecttoA1isequalto2t2(A1)+t1(A1).Furthermore,thecostofthearcsfromsourcestoTisequaltoPv2A2tvPv=2A1tv.ThecostofthearcsfromA1toTisequalto3jA1j.Summinguptheindividualcostterms,weobtainthatthetotalcostisequaltoPv=2A1tv2t2(A1)+t1(A1)+3jA1j. ThenextlemmaprovesthecorrectnessofthebinarysearchinAlgorithm1.Lemma3.(a)IfthereexistsasetWV(G)suchthatt3(W)jWjthenanyminimumst-cut(S,T)inHsat-isesSnfsg6;.(b)Furthermore,iftheredoesnotexistsasetWsuchthatt3(W)jWjthenthecut(fsg;A[B[ftg)isaminimumst-cut.Proof.(a)LetWVbesuchthatt3(W)jWj:(1)Supposeforthesakeofcontradictionthattheminimumst-cutisachievedby(fsg;A[B[ftg).Inthiscasethecostoftheminimumst-cutisPv2Atv=3t.Now,considerthefollowing(S;T)cut.SetSconsistsofthesourcevertexs,A1WandB1bethesetoftrianglesoftype3and2inducedbyA1.LetTbetherestoftheverticesinH.Thecostofthiscutiscap(S;T)=Xv=2A1tv+2t2(A1)+t1(A1)+3jA1j:Therefore,byourassumptionthattheminimumst-cutisachievedby(fsg;A[B[ftg)weobtain3tXv=2A1tv+2t2(A1)+t1(A1)+3jA1j:(2)Now,noticethatbydoublecountingXv2A1tv=3t3(A1)+2t2(A1)+t1(A1):Furthermore,weobserveXv2A1tvXv=2A1tv=3t:Bycombiningthesetwofacts,andthefactthat3tisthecapacityoftheminimumcut,weobtainthefollowingcon-tradictionofInequality( 1 ).3tXv=2A1tv+2t2(A1)+t1(A1)+3jA1j,t3(W)jWj:(b)ByLemma 2 ,foranyminimumst-cut(S;T)thecapac-ityofthecutisequaltoPv=2A1tv+2t2(A1)+t1(A1)+3jA1j,whereA1A\S;A2A\T.Supposeforthesakeofcon-tradictionthatthecut(fsg;A[B[ftg)isnotaminimumcut.Therefore,cap(fsg;A[B[ftg)=3tXv=2A1tv+2t2(A1)+t1(A1)+3jA1j:Usingthesamealgebraicanalysisasin(a),theabovestatementimpliesthecontradictiont3(W)jWj,whereWA1. NowwecancompletetheproofofLemma 1 .Proof.TheterminationofAlgorithm1followsdirectlyfromClaims1,2.ThecorrectnessfollowsfromLemmata 2 , 3 .TherunningtimefollowsfromClaims1,2whichshowthatthenumberofbinarysearchqueriesisO(log(n))andeachbinarysearchquerycanbeperformedinO nt+min(n;t)3timeusingthealgorithmduetoAhuja,Orlin,SteinandTarjan[ 3 ] 2 orGuseld'salgorithm[ 31 ]. TheproofofTheorem 1 followsfromLemma 1 andthefactthattheparametricmaximum\rowalgorithmofAhuja,Orlin,SteinandTarjan[ 3 ],seealso[ 25 ],s(h)avestheloga-rithmicfactorfromtherunningtime.4.1.2AnO (n5m1:4081n6))log(n)-timeexactsolu-tionInthisSectionweprovideasecondexactalgorithmfortheTDS-Problem.First,weprovidethenecessarytheoreticalbackground.Definition2(Supermodularfunction).LetVbeaniteset.Thesetfunctionf:2V!Rissupermodu-larifandonlyifforallA;BVf(A[B)f(A)+f(B)f(A\B):Afunctionfissupermodularifandonlyiffissubmodu-lar.Sub-andsupermodularfunctionsconstituteanimportantclassoffunctionswithvariousspecialproperties.Inthiswork,weareprimarilyinterestedinthefactthatmaximiz-ingasupermodularfunctionissolvableinstronglypolyno-mialtime[ 30 , 34 , 42 , 46 ].Forourpurposes,westatethefollowingresultwhichweuseasasubroutineinourproposedalgorithm.Theorem2([ 43 ]).Thereexistsanalgorithmformax-imizinganintegervaluedsupermodularfunctionfwhichrunsinO n5EOn6)time,wherenjVjisthesizeofthegroundsetVandEOisthemaximumamountoftimetoevaluatef(S)forasubsetSV.WeshowinthefollowingthatwhenthegroundsetisthesetofverticesVandf:2V!Risdenedbyf(S)=t(S)jSjwhere2R,wecansolvetheTDS-Probleminpolynomialtime.Theorem3.Functionf:V!Rwheref(S)=t(S)jSjissupermodular.Proof.LetA;BV.Lett:2V!Rbethefunc-tionwhichforeachsetofverticesSreturnsthenumberofinducedtrianglest(S).Bycarefulcountingt(A[B)=t(A)+t(B)t(A\B)+t1(A:BnA)+t2(A:BnA);wheret1(A:BnA);t2(A:BnA)arethenumberoftrianglewithone,twoverticesinAandtwo,oneverticesinBnArespectively.Hence,foranyA;BV 2NoticethatthenetworkHhasO(nt)arcs,thereforetherunningtimeof[ 3 ]isO(min(n;t)(nt)+min(n;t)3)=O(nt+min(n;t)3).t(A[B)+t(A\B)t(A)+t(B)andthefunctiontissupermodular.Furthermore,forany0thefunctionjSjissupermodular.Sincethesumoftwosupermodularfunctionsissupermodular,theresultfollows. Theorem 3 naturallysuggestsAlgorithm3.Thealgorithmwillruninalogarithmicnumberofrounds.IneachroundwemaximizefunctionfusingOrlin-Supermodular-OptwhichtakesasinputargumentsthegraphGandtheparameter0[ 43 ].Weassumeforsimplicitythatwithinthepro-cedureOrlin-Supermodular-Optfunctionfisevaluatedusinganecientexacttrianglecountingalgorithm[ 4 ].Thealgo-rithmofAlon,YusterandZwick[ 4 ]runsinO(m2!=(!+1))timewhere!2:3729[ 53 ].ThissuggeststheEOO(m1:4081).TheoverallrunningtimeofAlgorithm3isO (n5m1:4081n6)log(n)andthespaceusageO(nm)ratherthanO(nt). Algorithm3triangle-densestsubgraph(G) 1:l 0;u (n 1)(n 2) 6;S V2:whileul1 n(n 1)do3: lu 24:(val;S) Orlin-Supermodular-Opt(G;)5:ifval0then6:u 7:else8:l 9:S S10:endif11:ReturnS12:endwhile 4.2A1 3-approximationalgorithmInthisSectionweprovideanalgorithmfortheTDS-Problemwhichprovidesa1 3-approximation.Ouralgorithmfollowsthepeelingparadigm,see[ 8 , 18 , 40 , 36 ].Specically,ineachrounditremovesthevertexwhichparticipatesinthesmallestnumberoftrianglesandreturnsthesubgraphthatachievesthelargesttriangledensity.ThepseudocodeisshowninAlgorithm4. Algorithm4(G) 1:Countthenumberoftrianglestvforeachvertexv2V2:Hn G3:fori nto2do4:LetvbethevertexofGiofminimumnumberoftri-angles5:Hi 1 Hinv6:endfor7:ReturnHjthatachievesmaximumtriangledensityamongHis,i=1;:::;n. Theorem4.Algorithm4isa1 3-approximationalgorithmfortheTDS-Problem. Proof.LetSbeanoptimalset.Letv2S;jSjsandtA(v)bethenumberofinducedtrianglesbyAthatvparticipatesin.Then,Gt(S) st(Snfvg) s1,tS(v)G;sincet(Snfvg)=t(S)tS(v).ConsidertheiterationbeforethealgorithmremovestherstvertexvthatbelongsinS.CallthesetofverticesW.Clearly,SWandforeachvertexu2WthefollowinglowerboundholdstW(u)tW(v)tS(v)GduetothegreedinessofAlgorithm3.ThisprovidesalowerboundonthetotalnumberoftrianglesinducedbyWt(W)=1 3Xu2WtW(u)1 3jWjG)t(W) jWj1 3G:Tocompletetheproof,noticethatthealgorithmreturnsasubgraphSsuchthat(S)(W)1 3G. ThekeydierencecomparedtotheDS-Problempeel-ingalgorithm[ 18 ]isthatwhenweremoveavertex,thecountsofitsneighborsmaydecreasemorethan1.There-fore,whenvertexvisremoved,weupdatethecountsofitsneighborsinO deg(v)2time,bylookinghowmanytrian-gleseachofitsneighborshasaftervisremoved.NoticethatO Pv deg(v)2O(mn).4.3MapReduceImplementationTheMapReduceframework[ 20 ]hasbecomethedefactostandardforprocessinglarge-scaledatasets.Inthefollow-ing,weshowhowwecanapproximateecientlytheTDS-ProbleminMapReduce.Beforewedescribethealgo-rithm,weshowthatAlgorithm5forany0terminatesandprovidesa1 3+3-approximation.Theideabehindthisalgorithmistopeelverticesinbatches[ 10 , 29 ]ratherthanonebyone. Algorithm5(G;0) 1:Sout;S V2:whileS;do3:A(S) fi2S:tS(i)3(1+)(S)g4:S SnA(S)5:if(S)(Sout)then6:Sout S7:endif8:endwhile9:ReturnSout. Lemma4.Forany0,Algorithm5providesa1 (3+3)-approximationtotheTDS-Problem.Furthermore,itter-minatesinO(log1+(n))passes.Proof.LetSbeanoptimalsolutiontotheTDS-Problem.AsweprovedinTheorem 4 ,foranyv2SitistruethattS(v)G.Furthermore,ineachroundatleastonever-texisremoved.Toseewhy,assumeforthesakeofcon-tradictionthatA(S)=;forsomeSduringtheexecutionofthealgorithm.Then,weobtainthecontradictionthat3jSj(S)=Pv2StS(v)(3+3)jSj(S).Considertheroundwherethealgorithmforthersttimeremovesavertexv2S.LetWbethecorrespondingsetofvertices.Sincev2A(W)ispeeledo,weobtainanupperboundonitsin-duceddegree,namelyv2A(W))tW(v)(3+3)(W).SinceSW,weobtain(3+3)(W)tW(v)tS(v)(S);whichprovesthatAlgorithm5isa1 (3+3)-approximationtotheTDS-Problem.Toseewhythealgorithmterminatesinlogarithmicnumberofrounds,noticethat3t(S)Xv2SnA(S)tS(v)(3+3) jSj jA(S)jt(S) jSj,jA(S)j 1+jSj,jSnA(S)j1 1+jSj:SinceSdecreasesbyafactorof1 1+ineachround,thealgo-rithmterminatesinO(log1+(n))=O log(n) rounds. MapReduceImplementation:NowweareabletodescribeouralgorithminMapReduce.ItusesanyoftheecientalgorithmsofSuriandVassilvitski[ 48 ]asasubroutinetocounttrianglespervertexineachround.Theremovaloftheverticeswhichparticipateinlesstrianglesthanthethresh-old,isdoneintworounds,asin[ 10 ].Forcompleteness,wedescribetheprocedurehere.ThesetofverticesStobepeeledoineachroundaremarkedbyaddingakey-valuepairhv;$iforeachv2S.Eachedge(u;v)ismappedtohu;vi.Thereducerreceivesallendpointsoftheedgesinci-dentwithvandthesymbol$incasethevertexismarkedfordeletion.Incasethevertexismarked,thenthereducetaskreturnsnothing,otherwiseitcopiesitsinput.Inthesecondround,weperformthesameprocedurewiththeonlydier-encebeingthatwemapeachedge(u;v)tohv;ui.Therefore,theedgeswhichremainhavebothendpointsunmarked.ThealgorithmrunsinO(log(n)=),asittakesO(log(n)=)peel-ingorounds,andineachpeelinground,constantnumberofroundsisneededtocounttrianglespervertex,markver-ticesfordeletionandremovethecorrespondingvertexset.4.4k-cliqueDensestSubgraphWeoutlinethatourproposedmethodscanbeadaptedtothefollowinggeneralizationoftheDS-ProblemandtheTDS-Problem.Definition3(-clique-densestsubgraph).LetG(V;E)beanundirectedgraph.ForanySVwedeneits-cliquedensityhk(S),2ashk(S)=ck(S) s,whereck(S)isthenumberof-cliquesinducedbySandsjSj.Problem2(k-Clique-DS-Problem).GivenG(V;E),ndasubsetofverticesSsuchthathk(S)=hkwherehk=maxSVhk(S).Asinthetriangledensestsubgraphproblem,wecreateanetworkHparameterizedbythevalueonwhichweper-formourbinarysearch.TheprocedureisdescribedinAlgo-rithm6.ThesetC(G)isthesetof-cliquesinG.WetheninvokeAlgorithm1,withtheupperboundusettonk.Fol-lowingtheanalysisofTheorem 1 ,weseethatthek-Clique-DS-Problemissolvableinpolynomialtime.Forinstance,usingGuseld'salgorithm[ 31 ]or[ 3 ]ineachbinarysearch querywegetanoverallrunningtimeO nk+(njC(G)jn3)log(n)O(nk+1log(n)).UsingtheimprovedresultduetoAhuja,Orlin,SteinandTarjanforparametricmax\rowsinunbalancedbipartitegraphs[ 3 ],wesavetheloga-rithmicfactorintherunningtime. Algorithm6(G;;C(G);k) 1:V(H) fsg[V(G)[C(G)[ftg.2:Foreachvertexv2V(G)addanarcofcapacity1toeach-cliqueciitparticipatesin.3:Foreach-clique(ui1;:::;uik)2C(G)addarcstoui1;:::;uikofcapacity1.4:Adddirectedarc(s;v)2A(H)ofcapacitycvforeachv2V(G).5:Addweighteddirectedarc(v;t)2A(H)ofcapacitykforeachv2V(G).6:ReturnnetworkH(V(H);A(H);w);s;t2V(H). Furthermore,Algorithm4canalsobemodied,byremovingineachroundthevertexwiththesmallestnumberof-cliques,toobtainCorollary 2 .AstheanalogyofTheorem 4 .Corollary1.Thealgorithmwhichpeelsoineachroundthevertexwiththeminimumnumberof-cliquesandreturnsthesubgraphthatachievesthelargest-cliquedensity,isa1 k-approximationalgorithmforthek-Clique-DS-Problem.Similarly,Algorithm5andtheMapReduceimplementa-tioncanbemodiedtosolvethek-Clique-DS-Problem.Weomitthedetails.Corollary2.Thealgorithmwhichpeelsoineachroundthesetofverticeswithlessthan(1+)h(S),whereh(S)isthe-cliquedensityinthatround,terminatesinO(log1+(n))roundsandprovidesa1 k(1+)-approximationguaranteeforthek-Clique-DS-Problem.Furthermore,using[ 23 ],weobtainanecientMapReduceimplementation.Weillustrateanexamplewherechoosingalargervalueyieldsbenets.LetGG(n;p)beanErdos-Renyigraph,wherepp(n).AssumethatweplantacliqueKofsizenforsomeconstant\r0.Wewishtoshowanon-trivialrangeofpp(n)valuessuchthatthefollowingconditionshold:h2(C)=jE(K)j jKj(n\r2) n\rp(n2) nE[h2(V)],andfor3hk(C)=(n\rk) n\rp(k2)(nk) nE[hk(V)].BysimplealgebraicmanipulationweseethatpsatisesbothconditionsifO n (1 )pO n 2 k(1 ) 3 .Clearly,forlargervalues,weallowourselvesalargerrangeofpvaluesforwhichwecanndthehiddencliqueinexpectation.Ourpreliminaryexperimentalresultsfor=4indicatethatthe4-clique-densestsubgraphgetsclosertoalargenear-cliquecomparedtothetriangledensestsubgraph.However,thegainofmovingfromthedensestsubgraphtothetriangledensestsubgraphwithrespecttoextractinglargenear-cliquesislargerthanthegainofmovingfromthetriangledensestsubgraphtothe4-clique-densestsubgraph. 3Noticethatforthisrangeofp,thegraphisconnectedandthecliquenumberisconstantwithhighprobability[ 15 ]5.EXPERIMENTALEVALUATIONBeforewepresentourndingsindetail,wesummarizethem:(i)theTDS-Problemandtheproposedalgorithmsconstitutenewvaluablegraphminingprimitivesforndinglargenear-cliques,(ii)the1 3-approximationalgorithm(Al-gorithm4)achievessignicantlybetterapproximationsthanthepessimistic1 3guarantee.Alsoitissignicantlyfaster.(iii)TryingasmallrangeofvaluesforAlgorithm5isingeneralasaferstrategycomparedtorunningexperimentswithxedchoice.5.1ExperimentalSetupThedatasetsweuseareshowninTable 1 .Allgraphsweremadesimpleandundirectedbyignoringtheedgedi-rection,whenthegraphisdirected,andremovingself-loopsandmultipleedges,ifany.Theexperimentswereperformedonasinglemachine,withIntel(R)Core(TM)i5CPUat2.40GHz,with3.86GBofmainmemory.Wehaveimple-mentedAlgorithm1inMatlabR2011ausingamaximum\rowimplementationduetoKolmogorovandBoykov[ 16 ]asoursubroutinewhichrunsintimeO(t(nt)3).Thisimple-mentationisprohibitivelyexpensiveevenforsmallgraphswhichhavealargenumberoftriangles.WehavecodedthepeelingalgorithminC++usingpriorityqueues.Asourtrianglelistingalgorithm,weusethesimplenodeiter-atoralgorithmwhichchecksforeachvertexthenumberofedgesamongitsneighbors.Thecodeispubliclyavailableat http://people.seas.harvard.edu/~babis/code.html .Wemeasurethequalityofeachextractedsubgraphbytwomea-sures:theedgedensityoftheextractedsubgraphfee(S)= jSj2andtheoutputsizejSj.Noticethatwhenfeiscloseto1,theextractedsubgraphisclosetobeingaclique.5.2MainFindingsTable 2 showstheresultsobtainedonseveralpopularsmall-andmedium-sizedgraphs.Eachcolumncorrespondstoadataset.Therowscorrespondtomeasurementsforeachmethodweusetoextractasubgraph.Specically,therst(DS),second(1 2-DS),third(TDS)andfourth(1 3-TDS)rowscorrespondtothesubgraphextractedbyGold-berg'sexactalgorithm[ 28 ]fortheDS-Problem,Charikar's1 2-approximationalgorithm[ 18 ]fortheDS-Problem,Al-gorithm1andAlgorithm4fortheTDS-Problemrespec-tively.ForeachoptimalextractedsubgraphS,weshowitssizeasafractionofthetotalnumberofvertices,theedgedensityfe(S),theaveragedegree2(S)=2e(S)=jSjandtheaveragenumberoftrianglespervertex3(S)=3t(S)=jSj.Weobservethatthetriangle-densestsubgraphisclosertobeinganear-cliquecomparedtothedensestsubgraph.ApronouncedexampleistheFootballnetworkwheretheopti-maldensestsubgraphisthewholenetworkwithfe=0:0094,whereastheoptimaltriangle-densestsubgraphisasetof18verticeswithedgedensity0.48.Finally,weobservethatthequalityofAlgorithm's4outputisveryclosetotheoptimalsolutionandsometimesevenbetter.ThesameobservationholdsforthecaseofCharikar's1 2-approximationalgorithm[ 18 ].WeusetheC++implementationofAlgorithm4andaC++implementationofCharikar's1 2-approximationalgo-rithmontherestofthedatasetsofTable 1 .TheresultsareshowninTable 3 ,uptotwodecimaldigitsofaccuracy.Theca-HepThdatasetisanexception,astheoptimalsolutionscoincide.Wealsonoticethattheruntimesappearthesame Name Nodes Edges Description Adjnoun 112 425 Generatedbyprocessingtextdata AS-735 6475 12572 AutonomousSystems AS-caida 26475 53381 AutonomousSystems ca-Astro 17903 196972 Co-authorship ca-GrQC 4158 13422 Co-authorship ca-HepTh 11204 117619 Co-authorship Celegans 297 4296 NeuralnetworkofC.Elegans DBLP 53442 255936 Co-authorship Epinions 75877 405739 Socialnetwork Enron 33696 180811 Email EuAll 224832 339925 Email Football 115 613 NCAAfootballgamenetwork Karate 34 78 Socialnetwork Lesmis 77 254 Generatedbyprocessingtextdata Politicalblogs 1490 16715 Generatedbyprocessingsalesdata Politicalbooks 105 441 Blognetwork soc-Slashdot0811 77360 469180 PersontoPerson soc-Slashdot0902 82168 504230 PersontoPerson wb-cs-Stanford 8929 26320 WebGraph web-Google 855802 4291352 WebGraph web-NotreDame 325729 1090108 WebGraph Wiki-vote 7066 100736 Wikipedia\who-votes-whom" Table1:Datasetsusedinourexperiments. Method Measure AdjnounCelegansFootballKarateLesmisPolblogsPolbooks DS jSj jVj(%) 42.8645.810047.129.919.151.4 2 9.5817.1610.665.2510.7855.829.40 fe 0.200.130.0940.350.490.1960.18 3 1445.9321.125.6441.61768.8722.68 1 2-DS jSj jVj(%) 41.142.410052.929.918.757.1 2 9.5717.110.665.210.7855.89.3 fe 0.210.140.0940.310.490.200.16 3 14.1646.521.125.1641.61774.622.68 TDS jSj jVj(%) 36.610.415.717.716.98.119.1 2 9.3713.818.224.6710.6255.729.34 fe 0.230.460.480.930.890.460.50 3 1556.82288.0147.31972.3625.95 1 3-TDS jSj jVj(%) 36.69.115.717.716.98.115.2 2 9.3713.568.224.6710.6255.729.13 fe 0.230.520.480.930.890.460.61 3 1556.55288.0147.31972.3625.5 Table2:ComparisonoftheextractedsubgraphsbytheGoldberg'sexactalgorithmfortheDS-Problem(DS),Charikar's1 2-approximationalgorithm(1 2-DS),ourexactalgorithmfortheTDS-Problem(TDS)andour1 3-approximationalgorithm(1 2-TDS).Here,fe(S)=e(S)= jSj2istheedgedensityoftheextractedsubgraph,2(S)=2e(S)=jSjistheaveragedegreeand3(S)=3t(S)=jSjistheaveragenumberoftriangles.forca-HepThduetousingtwodecimaldigitsofaccuracy.Onotherdatasets,weobservedierencesbetweenthetwosolu-tions.Forinstance,forthecollaborationnetworkca-Astrothedensestsubgraphisasubgraphwith1184verticesandfe=0:05.Thetriangle-densestsubgraphisacliquewith57vertices.Overall,weverifythefactthatthetriangle-densestsubgraphisclosertobeinganear-clique.Finally,theruntimesareshown.NoticethattheruntimesreportedforforAlgorithm4includeboththetrianglecountingandthepeelingphase.5.3ExploringparameterinAlgorithm5InthisSectionwepresenttheresultsofAlgorithm5ontheDBLPgraph.Thisisparticularlyinterestinginstanceasitindicatesthatinsteadoftryingtoselectagoodvalue,itisworthtryingoutatleastfewvalues,assumingcompu-tationalresourcesareavailable.Werangefrom0.1to1.8withastepof0.1.Figure 1 (a)plotsthenumberofroundsAlgorithm5takestoterminateasafunctionof.Weob-servethatevenforsmallvaluesthenumberofroundsis6.Thereadershouldcomparethistotheupperboundpre-dictedbyLemma 4 whichexceeds100.Figure 1 (b)plotsthe 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 2 4 6 Rounds 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.2 0.4 0.6 0.8 1 Rel. 1/(3+3) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.2 0.4 0.6 0.8 1 Rel. fe Rel. ft (a)(b)(c)Figure1:Exploringthetrade-obetweenthenumberofroundsandaccuracyasafunctionoftheparameterforAlgorithm5.LetS;SbetheextractedsubgraphsbyAlgorithms5and1respectively.(a)Numberofrounds,(b)relativeaveragetriangledensityratio(S) (blue)andtheapproximationguarantee1=(3+3)(red),and(c)relativeratiosfe(S) fe(S);ft(S) ft(S)asfunctionsof.1 2-DS1 3-TDS jSjfeT jSjfeT AS-735 590.280.00 130.80.07 AS-caida 1430.140.02 270.520.63 ca-Astro 11840.050.06 5711.42 ca-GrQC 420.790.00 140.890.02 ca-HepTh 3210.02 3210.02 Epinions 9990.1210.08 4310.2568.75 Enron 5550.140.02 3900.192.01 EuAll 5070.130.08 2000.299.52 soc-Slashdot0811 2070.410.13 2530.496.85 soc-Slashdot0902 2190.400.16 1730.507.72 wb-cs-Stanford 840.640.48 260.800.67 web-Google 2400.232.54 1200.4479.5 web-NotreDame 13670.1150.50 4570.34516.3 Wiki-vote 8460.110.00 4640.800.19 Table3:Comparisonoftheextractedsubgraphsbythe1 2-approximationalgorithmofCharikarandthe1 3-approximationalgorithm,Algorithm4.There-spectiveruntimesareshowninseconds.relativeratioRel:(S) whereSistheoutputofAlgo-rithm5.Forconvenience,thelowerbound1 3+3isplottedwithredcolor.Besidestheratiofe(S) fe(S),gure 1 (c)plotsalsotherelativeratioft(S) ft(S)asafunctionof.Hereft(S)=t(S) (jSj3).Asweobserve,thequalityofAlgorithm5isclosetotheoptimalsolutionexceptfor=0:7and=0:8.Byinspectingwhythishappensweobservethattheoptimaltriangle-densestsubgraphisacliqueof44vertices.Itturnsoutthatfor=0:7;0:8theoptimalsubgraphwhichisfoundinthelastroundoftheexecutionofthealgorithm(thelatterhappensforallvalues)consistsof98and74verticeswhichcontainasasubgraphtheoptimalK44.Forothervaluesof,thesubgraphinthelastroundiseithertheoptimalK44orclosetoit,withfewmoreextravertices.Thisexampleshowsthepotentialdangerofusingasinglevaluefor,suggestingthattryingoutasmallnumberofvaluescanbesignicantlybenecialintermsoftheapproximationquality.6.APPLICATION:ORGANIZINGCOCKTAILPARTIESAgraphminingproblemthatcomesupinvariousappli-cationsisthefollowing:givenasetofverticesQV,ndadensesubgraphcontainingQ.Werefertothistypeofgraphminingproblemsascocktailproblems,duetothefol-lowingmotivation,c.f.[ 47 ].SupposethatasetofpeopleQwantstoorganizeacocktailparty.Howdotheyinviteotherpeopletothepartysothatthesetofalltheparticipants,includingQ,areassimilaraspossible?AvariationoftheTDS-Problemwhichaddressesthisgraphminingproblemfollows.Problem3(Constrained-TDS-Problem).GivenagraphG(V;E)andQV,ndthesubsetofverticesSthatmaximizesthetriangledensitysuchthatQS,S=argmaxQSV(S):TheConstrained-TDS-Problemcanbesolvedbymod-ifyingourproposedalgorithmsaccordingly.Ausefulcorol-laryfollows.Corollary3.TheConstrained-TDS-Problemissolv-ableinpolynomialtimebyaddingarcsfromstov2Aoflargeenoughcapacities,e.g.,capacitiesequalton3+1aresucientlylarge.Furthermore,thepeelingalgorithmwhichavoidsremovingverticesfromQisa1 3-approximationalgo-rithmfortheConstrained-TDS-Problem.Inthefollowingweevaluatethe1 3-approximationalgo-rithmontwodatasets.Thetwoexperimentsindicatetwodierenttypesofperformancesthatshouldbeexpectedinreal-worldapplications.Therstisapositivewhereasthesecondisnegativecase.Bothexperimentshereserveassan-itychecks 4 Politicalvotedata.WeobtainSenatedatafortherstsession(2006)ofthe109thcongresswhichspannedthepe-riodfromJanuary3,2005toJanuary3,2007,duringthe 4Forinstance,bypreprocessingthepoliticalvotedatafromamatrixformtoagraphusingathresholdforedgeaddi-tions,resultsininformationloss. fthandsixthyearsofGeorgeW.Bush'spresidency[ 1 ].InthisCongress,therewere55,45and1Republican,Demo-craticandindependentsenatorsrespectively.ThedatasetcanbedownloadedfromtheUSSenatewebpage http://www.senate.gov .Wepreprocessthedatasetinthefollowingway:weaddanedgebetweentwosenatorsifamongthebillsforwhichtheybothcastedavote,theyvotedatleast80%ofthetimesinthesameway.There-sultinggraphhas100verticesand2034edges.Werunthe1 3-approximationalgorithmonthisgraphusingasoursetQtherstthreerepublicansaccordingtolexicographicor-der:Alexander(R-TN),Allard(R-CO)andAllen(R-VA).Weobtainatouroutputasubgraphconsistingof47ver-tices.Byinspectingtheirparty,wendthat100%ofthemareRepublicans.Thisshowsthatouralgorithminthiscasesucceedsinndingthelargemajorityoftheclusterofrepub-licans.Itisinterestingthatthe8remainingRepublicansdonotenterthetriangle-densestsubgraph.Acarefulinspectionofthedata,c.f.[ 2 ],indicatesthat6republicansagreewiththepartyvoteonatmost79%ofthebills,and8ofthemonatmost85%ofthebills.DBLPgraph.WeinputasaquerysetQasetofscientistswhohaveestablishedthemselvesintheoryandalgorithmde-sign:RichardKarp,ChristosPapadimitriou,MihalisYan-nakakisandSantoshVempala.ThealgorithmreturnsatitsoutputthequerysetandasetSof44verticescorrespondingtoacliqueof(mostly)Italiancomputerscientists.Welistasubsetofthe44verticeshere:M.Bencivenni,M.Cana-paro,F.Capannini,L.Carota,M.Carpene,R.Veraldi,P.Veronesi,M.Vistoli,R.Zappi.TheoutputgraphinducedbyS[Qisdisconnected.Therefore,thiscanbeeasilyex-plainedbecauseofthefollowing(folklore)inequality,giventhatjQjjSjinourexample.Claim1.Leta;b;c;dbenon-negative.Then,max c;b db cdmin c;b d(3)Inourexample,wegett(S);cjSj;bt(Q);djQj.Insuchascenario,wheretheoutputconsistsoftheunionofadensesubgraphandthequerysetQ,analgorithmwhichbuildsitselfupfromQ-assumingQisnotanindependentset-toVbyaddingverticeswhichcreateasmanytrianglesaspossibleandreturningthemaximumdensitysubgraph,ratherthanpeelingverticesfromVdowntoQshouldbepreferredinpractice,seealso[ 49 ].7.CONCLUSIONInthisworkweintroducetheaveragetriangledensityasanovelobjectiveforattackingtheimportantproblemofndingnear-cliques.Weproposeexactandapproximationalgorithms.Furthermore,ourtechniquescansolvethemoregeneralproblemofmaximizingthe-cliquedensity.Exper-imentallyweverifythevalueoftheTDS-Problemasanoveladditiontothegraphminingtoolbox.Ourworkleavesnumerousproblemsopen,includingthefollowing:(a)Canweobtainafasterexactalgorithmbyimprovingthespaceusageofthenetworkconstruction?(b)Canweusesparsicationtoobtainfasterapproximatesolu-tions[ 44 ]?8.REFERENCES[1] http://tinyurl.com/bwgpka . 6 [2] http://tinyurl.com/zgdam . 6 [3]R.K.Ahuja,J.B.Orlin,C.Stein,andR.E.Tarjan.Improvedalgorithmsforbipartitenetwork\row.SIAMJournalonComputing,23(5):906{933,1994. 4.1.1 , 2 , 4.4 [4]N.Alon,R.Yuster,andU.Zwick.Findingandcountinggivenlengthcycles.Algorithmica,17(3):209{223,1997. 2 , 4.1.2 [5]R.AndersenandK.Chellapilla.Findingdensesubgraphswithsizebounds.InWAW,2009. 2 [6]A.Angel,N.Sarkas,N.Koudas,andD.Srivastava.Densesubgraphmaintenanceunderstreamingedgeweightupdatesforreal-timestoryidentication.Proc.VLDBEndow.,5(6):574{585,Feb.2012. 1 [7]Y.Asahiro,R.Hassin,andK.Iwama.Complexityofndingdensesubgraphs.Discr.Ap.Math.,121(1-3),2002. 2 [8]Y.Asahiro,K.Iwama,H.Tamaki,andT.Tokuyama.Greedilyndingadensesubgraph.J.Algorithms,34(2),2000. (document) , 2 , 4.2 [9]G.D.BaderandC.W.Hogue.Anautomatedmethodforndingmolecularcomplexesinlargeproteininteractionnetworks.BMCbioinformatics,4(1):2,2003. 1 [10]B.Bahmani,R.Kumar,andS.Vassilvitskii.Densestsubgraphinstreamingandmapreduce.ProceedingsoftheVLDBEndowment,5(5):454{465,2012. (document) , 2 , 4.3 , 4.3 [11]V.BatageljandM.Zaversnik.AnO(m)algorithmforcoresdecompositionofnetworks.Arxiv,arXiv.cs/0310049,2003. 2 [12]A.Bhaskara,M.Charikar,E.Chlamtac,U.Feige,andA.Vijayaraghavan.Detectinghighlog-densities:anO(n1=4)approximationfordensest-subgraph.InProceedingsofthe42ndACMsymposiumonTheoryofcomputing,pages201{210.ACM,2010. 2 [13]S.Bhattacharya,M.Henziger,D.Nanongkai,andC.E.Tsourakakis.Space-andtime-ecientalgorithmsformaintainingdensesubgraphsonone-passdynamicstreams.InProceedingsofthe47thACMsymposiumonTheoryofcomputing,2015 2 [14]A.Bjorklund,R.Pagh,V.WilliamsVassilevska,andU.Zwick.Listingtriangles.InProceedingsof41stInternationalColloquiumonAutomata,LanguagesandProgramming(ICALP),2014. 2 , 4.1.1 , 4.1.1 [15]B.Bollobas.Randomgraphs,volume73ofCambridgeStudiesinAdvancedMathematics.CambridgeUniversityPress,Cambridge,secondedition,2001. 3 [16]Y.BoykovandV.Kolmogorov.Anexperimentalcomparisonofmin-cut/max-\rowalgorithmsforenergyminimizationinvision.PatternAnalysisandMachineIntelligence,IEEETransactionson,26(9):1124{1137,2004. 5.1 [17]G.BuehrerandK.Chellapilla.Ascalablepatternminingapproachtowebgraphcompressionwithcommunities.InWSDM,pages95{106.ACM,2008. 1 [18]M.Charikar.Greedyapproximationalgorithmsforndingdensecomponentsinagraph.InAPPROX,2000. (document) , 1 , 2 , 4.1 , 4.2 , 4.2 , 5.2 [19]E.Cohen,E.Halperin,H.Kaplan,andU.Zwick. Reachabilityanddistancequeriesvia2-hoplabels.InSODA,2002. 1 [20]J.DeanandS.Ghemawat.Mapreduce:simplieddataprocessingonlargeclusters.Commun.ACM,51(1):107{113,Jan.2008. 4.3 [21]A.EpastoandS.LattanziandM.Sozio.EcientDensestSubgraphComputationinEvolvingGraphs.WWW'15(toappear),2015. 2 [22]U.Feige,G.Kortsarz,andD.Peleg.Thedensek-subgraphproblem.Algorithmica,29(3),2001. 2 [23]I.Finocchi,M.Finocchi,andE.G.Fusco.Countingsmallcliquesinmapreduce.arXivpreprintarXiv:1403.0734,2014. 2 [24]E.Fratkin,B.T.Naughton,D.L.Brutlag,andS.Batzoglou.Motifcut:regulatorymotifsndingwithmaximumdensitysubgraphs.Bioinformatics,22(14):e150{e157,2006. 1 [25]G.Gallo,M.D.Grigoriadis,andR.E.Tarjan.Afastparametricmaximum\rowalgorithmandapplications.SIAMJournalonComputing,18(1):30{55,1989. 1 , 2 , 4.1.1 [26]D.Gibson,R.Kumar,andA.Tomkins.Discoveringlargedensesubgraphsinmassivegraphs.InVLDB,2005. 1 [27]A.Gionis,F.Junqueira,V.Leroy,M.Serani,andI.Weber.Piggybackingonsocialnetworks.Proc.VLDBEndow.,6(6):409{420,Apr.2013. 1 [28]A.V.Goldberg.Findingamaximumdensitysubgraph.Technicalreport,UniversityofCaliforniaatBerkeley,1984. 1 , 2 , 4.1.1 , 5.2 [29]M.T.GoodrichandP.Pszona.External-memorynetworkanalysisalgorithmsfornaturallysparsegraphs.InAlgorithms{ESA2011,pages664{676.Springer,2011. 4.3 [30]M.Grotschel,L.Lovasz,andA.Schrijver.Geometricalgorithmsandcombinatorialoptimization.Springer,Berlin,1988. 4.1.2 [31]D.Guseld.Computingthestrengthofagraph.SIAMJournalonComputing,20(4):639{654,1991. 4.1.1 , 4.4 [32]J.Hastad.Cliqueishardtoapproximatewithinn1 .ActaMathematica,182(1),1999. 1 [33]A.ItaiandM.Rodeh.Findingaminimumcircuitinagraph.SIAMJournalonComputing,7(4):413{423,1978. 2 , 4.1.1 , 4.1.1 [34]S.Iwata,L.Fleischer,andS.Fujishige.Acombinatorialstronglypolynomialalgorithmforminimizingsubmodularfunctions.JournaloftheACM(JACM),48(4):761{777,2001. 4.1.2 [35]M.Jha,C.Seshadhri,andA.Pinar.Aspaceecientstreamingalgorithmfortrianglecountingusingthebirthdayparadox.InProceedingsofthe19thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,pages589{597.ACM,2013. 2 [36]J.Jiang,M.Mitzenmacher,andJ.Thaler.Parallelpeelingalgorithms.arXivpreprintarXiv:1302.7014,2013. 4.2 [37]R.Jin,Y.Xiang,N.Ruan,andD.Fuhry.3-hop:ahigh-compressionindexingschemeforreachabilityquery.InSIGMOD,2009. 1 [38]R.KannanandV.Vinay.Analyzingthestructureoflargegraphs,1999. 2 [39]H.Karlo,S.Suri,andS.Vassilvitskii.Amodelofcomputationformapreduce.InSODA,pages938{948.SocietyforIndustrialandAppliedMathematics,2010. 1 [40]S.KhullerandB.Saha.Onndingdensesubgraphs.InICALP,2009. 1 , 2 , 4.2 [41]M.N.Kolountzakis,G.L.Miller,R.Peng,andT.C.E.Ecienttrianglecountinginlargegraphsviadegree-basedvertexpartitioning.InternetMathematics,8(1-2):161{185,2012. 2 [42]L.Lovasz.Submodularfunctionsandconvexity.InMathematicalProgrammingTheStateoftheArt,pages235{257.Springer,1983. 4.1.2 [43]J.B.Orlin.Afasterstronglypolynomialtimealgorithmforsubmodularfunctionminimization.MathematicalProgramming,118(2):237{251,2009. 2 , 4.1.2 [44]R.PaghandC.E.Tsourakakis.Colorfultrianglecountingandamapreduceimplementation.InformationProcessingLetters,112(7):277{281,2012. 2 , 7 [45]A.D.Sarma,A.Lall,D.Nanongkai,andA.Trehan.Densesubgraphsondynamicnetworks.InDistributedComputing,pages151{165.Springer,2012. 2 [46]A.Schrijver.Acombinatorialalgorithmminimizingsubmodularfunctionsinstronglypolynomialtime.JournalofCombinatorialTheory,SeriesB,80(2):346{355,2000. 4.1.2 [47]M.SozioandA.Gionis.Thecommunity-searchproblemandhowtoplanasuccessfulcocktailparty.InKDD,pages939{948.ACM,2010. 6 [48]S.SuriandS.Vassilvitskii.Countingtrianglesandthecurseofthelastreducer.InWWW,pages607{614.ACM,2011. 1 , 4.3 [49]C.E.Tsourakakis,F.Bonchi,A.Gionis,F.Gullo,andM.A.Tsiarli.Denserthanthedensestsubgraph:extractingoptimalquasi-cliqueswithqualityguarantees.InKDD,pages104{112.ACM,2013. 1 , 2 , 6 [50]C.E.Tsourakakis.Anovelapproachtondingnear-cliques:Thetriangle-densestsubgraphproblem.arXivpreprintarXiv:1405.1477,2014. 4.1 [51]C.E.Tsourakakis,M.N.Kolountzakis,andG.L.Miller.Trianglesparsiers.J.GraphAlgorithmsAppl.,15(6):703{726,2011. 2 [52]N.Wang,J.Zhang,K.-L.Tan,andA.K.Tung.Ontriangulation-baseddenseneighborhoodgraphdiscovery.ProceedingsoftheVLDBEndowment,4(2):58{68,2010. 2 [53]V.V.Williams.Multiplyingmatricesfasterthancoppersmith-winograd.InSTOC,pages887{898.ACM,2012. 2 , 4.1.1 , 4.1.2 [54]Y.ZhangandS.Parthasarathy.Extractinganalyzingandvisualizingtrianglek-coremotifswithinnetworks.InICDE,pages1049{1060.IEEE,2012. 2