/
The K clique Densest Subgraph Problem The K clique Densest Subgraph Problem

The K clique Densest Subgraph Problem - PDF document

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
386 views
Uploaded On 2017-04-06

The K clique Densest Subgraph Problem - PPT Presentation

10 HoweverfrequentlythedensestsubgraphproblemfailsindetectinglargenearcliquesinnetworksInthisworkweintroducethecliquedensestsubgraphproblem2Thisgeneralizesthewellstudieddensestsubgraphprobl ID: 336725

10 ].However frequentlythedensestsubgraphprob-lemfailsindetectinglargenear-cliquesinnetworks.Inthiswork weintroducethe-cliquedensestsubgraphproblem 2.Thisgeneralizesthewellstudieddens-estsubgraphprobl

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "The K clique Densest Subgraph Problem" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

TheK-cliqueDensestSubgraphProblemCharalamposE.TsourakakisHarvardSchoolofEngineeringandAppliedSciencesbabis@seas.harvard.eduABSTRACTNumerousgraphminingapplicationsrelyondetectingsub-graphswhicharelargenear-cliques.Sinceformulationsthataregearedtowards ndinglargenear-cliquesareNP-hardandfrequentlyinapproximableduetoconnectionswiththeMaximumCliqueproblem,thepoly-timesolvabledensestsubgraphproblemwhichmaximizestheaveragedegreeoverallpossiblesubgraphs\liesatthecoreoflargescaledatamining"[ 10 ].However,frequentlythedensestsubgraphprob-lemfailsindetectinglargenear-cliquesinnetworks.Inthiswork,weintroducethe-cliquedensestsubgraphproblem,2.Thisgeneralizesthewellstudieddens-estsubgraphproblemwhichisobtainedasaspecialcasefor=2.For=3weobtainanovelformulationwhichwerefertoasthetriangledensestsubgraphproblem:givenagraphG(V;E), ndasubsetofverticesSsuchthat(S)=maxSVt(S) jSj,wheret(S)isthenumberoftrianglesinducedbythesetS.Onthetheoryside,weprovethatforanyconstant,thereexistanexactpolynomialtimealgorithmforthe-cliquedensestsubgraphproblem.Furthermore,weproposeanecient1 k-approximationalgorithmwhichgeneralizesthegreedypeelingalgorithmofAsahiroandCharikar[ 8 , 18 ]for=2.Finally,weshowhowtoimplementecientlythispeelingframeworkonMapReduceforany3,generaliz-ingtheworkofBahmani,KumarandVassilvitskiiforthecase=2[ 10 ].Ontheempiricalside,ourtwomain nd-ingsarethat(i)thetriangledensestsubgraphisconsistentlyclosertobeingalargenear-cliquecomparedtothedensestsubgraphand(ii)thepeelingapproximationalgorithmsforboth=2and=3achieveonreal-worldnetworksap-proximationratioscloserto1ratherthanthepessimistic1 kguarantee.Aninterestingconsequenceofourworkisthattrianglecounting,awell-studiedcomputationalprobleminthecontextofsocialnetworkanalysiscanbeusedtode-tectlargenear-cliques.Finally,weevaluateourproposedmethodonapopulargraphminingapplication.CopyrightisheldbytheInternationalWorldWideWebConferenceCom-mittee(IW3C2).IW3C2reservestherighttoprovideahyperlinktotheauthor'ssiteiftheMaterialisusedinelectronicmedia.WWW2015,May18–22,2015,Florence,Italy.ACM978-1-4503-3469-3/15/05.http://dx.doi.org/10.1145/2736277.2741128.CategoriesandSubjectDescriptorsG.2.2[GraphTheory]:GraphAlgorithms;H.2.8[DatabaseApplications]:DataminingGeneralTermsTheory,ExperimentationKeywordsDensestsubgraphproblem;Graphalgorithms;GraphMin-ing;Near-cliqueextraction1.INTRODUCTIONAwidevarietyofdataminingapplicationsreliesonex-tractingdensesubgraphsfromlargegraphs.Inbioinformat-icsdensesubgraphsareusedfordetectingproteincomplexesinprotein-proteininteractionnetworks[ 9 ]andfor ndingregulatorymotifsinDNA[ 24 ].Theyarealsousedforde-tectinglinkspaminWebgraphs[ 26 ],graphcompression[ 17 ]andminingmicro-bloggingstreams[ 6 ].Amongthevariousformulationsfor ndingdensesub-graphs,thedensestsubgraphproblem(DS-Problem)standsoutforthefactsthatissolvableinpolynomialtime[ 28 ]and1 2-approximableinlineartime[ 18 , 40 ].TostatetheDS-Problemwe rstintroducethenecessarynotation.Inthisworkwefocusonsimpleunweighted,undirectedgraphs.GivenagraphG=(V;E)andasubsetofverticesSV,letG(S)=(S;E(S))bethesubgraphinducedbyS,andlete(S)=jE(S)jbethesizeofE(S).Also,theedgeden-sityofthesetSisde nedasfe(S)=e(S)=jSj2.TheDS-Problemmaximizesthedegreedensitye(S) jSjoverallsubgraphsSV.Noticethatthisisequivalenttomaxi-mizingtheaveragedegree 1 .TheDS-Problemisapowerfulprimitiveformanygraphapplicationsincludingsocialpig-gybacking[ 27 ],reachabilityanddistancequeryindexing[ 19 , 37 ].However,alargenumberofapplicationsaimsto ndsubgraphswhicharelargenear-cliquesratherthanthesub-graphthatmaximizestheaveragedegree.FrequentlytheDS-Problemfailsin ndinglargenear-cliquesbecauseitfavorslargesubgraphswithlowhighedgedensityfe.For 1Ingraphtheorythetermedgedensityrefersbydefaulttofee(S)=jSj22[0;1].However,sincedirectmaxi-mizationoffeisnotameaningfulproblem(evenasingleedgeachievesthemaximumpossibleedgedensity),theDS-Problemmaximizestheaveragedegree.Inthefollowing,werefertotheaveragedegreeofasetasitsdegreedensity. thisreasonotherformulationshavebeenproposed,seeSec-tion 2 .Unfortunately,theseformulationsareNP-hardandalsoinapproximableduetheconnectionswiththeMaximumCliqueproblem[ 32 ].Themaincontributionofthisworkisafamilyoftractableformulationswhichattacksecientlytheproblemofex-tractinglargenear-cliquesandincontrasttowell-performingheuristicscomeswithstrongtheoreticalguarantees.Inde-tail,ourcontributionsaresummarizedasfollows.Newformulation:Weintroducethek-cliquedensestsub-graphproblem(k-Clique-DS-Problem)whichgeneralizesthewell-studiedDS-Problem[ 18 , 25 , 28 , 40 ].Thegoalistomaximizetheaveragenumberof-cliquesinducedbyasetSVoverallpossiblevertexsubsets.NoticethattheDS-Problemisobtainedasaspecialcasefor=2.Weintroducealsothespecialcaseobtainedfor=3asthetriangledensestsubgraphproblem(TDS-Problem).Exactalgorithms.WepresenttwoexactalgorithmsfortheTDS-Problem.Thealgorithmwhichachievesthebestrunningtimeisbasedonmaximum\rowcomputationsandusesO(nt)space.ItisworthoutliningthatGoldberg'snetworkconstructionfortheDS-ProblemwhichusesO(nm)space[ 25 , 28 ]doesnotgeneralizetotheTDS-Problem.ThesloweroneisbasedonsupermodularmaximizationanduseslinearspaceO(nm).Here,n;m;tarethenumberofvertices,edgesandtrianglesintheinputgraph.Ouralgo-rithmscanbemodi edtoyieldpolynomialtimealgorithmsforthek-Clique-DS-Problemwhen=(1).Approximationalgorithm.Weproposea1 3-approximationalgorithmfortheTDS-Problemwhichrunsasymptoticallyfasterthananyoftheexactalgorithms.Wealsoproposea1 3+3-approximationalgorithmforany�0whichcanbeimplementedecientlyinMapReduce.Thealgorithmre-quiresO(log(n)=)roundsandisMapReduceecient[ 39 ]duetotheexistenceofecientMapReducetrianglecount-ingalgorithms,e.g.,[ 48 ].Ouralgorithmscanbeadaptedtothek-Clique-DS-Problemforany=(1).Experimentalevaluation.Weevaluateourexactandap-proximationalgorithmsonnumerousreal-worldnetworks.Amongour ndings,weobservethattheoptimaltriangledensestsubgraphisconsistentlyclosertobeingalargenear-cliquecomparedtotheoptimaldensestsubgraph.Forin-stance,intheFootballnetwork(seeTable 1 foradescriptionofthedataset)theDS-Problemreturnsthewholegraphasthedensestsubgraph,withfe=0:094whereastheTDS-Problemreturnsasubgraphon18verticeswithfe=0:48.Wealsoobservethatthepeelingapproximationalgorithmsforboth=2and=3achieveonreal-worldnetworksapproximationratioscloserto1ratherthanthepessimistic1 kguarantee.Graphminingapplication.Weproposeamodi edver-sionoftheTDS-Problem,theconstrainedtriangledensestsubgraphproblem(Constrained-TDS-Problem),whichaimstomaximizethetriangledensitysubjecttothecon-straintthattheoutputshouldcontainaspeci edsetofver-ticesQ.WeshowhowtosolveexactlytheTDS-Problem.Thisvariationisusefulinvariousdata-miningandbioinfor-maticstasks,see[ 49 ].2.RELATEDWORKSincedensesubgraphdiscoveryconstitutesamainre-searchtopicingraphanalysis,awidevarietyofrelatedmethodsexists:heuristics[ 49 , 52 , 54 ],algorithmiccontri-butionsonNP-hardformulations[ 5 , 12 , 22 , 40 , 49 ]andpoly-timesolvableformulations[ 18 , 40 , 49 ].Wefocusonthelatter.DensestSubgraph.InthedensestsubgraphproblemwearegivenagraphGandwewishto ndthesetSVwhichmaximizestheaveragedegree[ 28 , 38 ].Thedensestsubgraphcanbeidenti edinpolynomialtimebysolvingamaximum\rowproblem[ 25 , 28 ].Charikar[ 18 ]provedthatthegreedyalgorithmproposedbyAsashiroetal.[ 8 ]producesa1 2-approximationofthedensestsubgraphinlin-eartime.Asashiroetal.studythecomplexityof ndingdensesubgraphsbyintroducingageneralizationoftheDS-Problemandthemaximumcliqueproblem[ 7 ].A-coreisamaximalconnectedsubgraphofGinwhichallverticeshavedegreeatleast.Itisworthremarkingthatthesamealgorithmprovidesa-coredecompositionofthegraphandsolvestheproblemof ndingthedegeneracy[ 11 ].Inthecaseofdirectedgraphs,thedensestsubgraphproblemissolvedinpolynomialtimeaswell[ 18 ].KhullerandSaha[ 40 ]pro-videalineartime1 2-approximationalgorithmforthecaseofdirectedgraphsamongothercontributions.TwointerestingvariationsoftheDkSproblemwereintroducedbyAnder-senandChellapilla[ 5 ].ThetwoproblemsaskforthesetSthatmaximizesthedensitysubjecttojSj(DamkS)andjSj(DalkS).WhenrestrictionsonthesizeofSareimposedtheproblembecomesNP-hard[ 40 ].Finally,thedensestsubgraphproblemhasbeenconsideredinvari-oussettings,includingMapReduce[ 10 ],thestreaming[ 10 ],thedynamicsetting[ 21 , 45 ]andtheircombinationrecently[ 13 ].TriangleCountingandListing.ThestateoftheartalgorithmforexacttrianglecountingisduetoAlon,YusterandZwick[ 4 ]andrunsinO(m2! !+1),wherecurrentlythefastmatrixmultiplicationexponent!is2.3729[ 53 ].Thus,theiralgorithmcurrentlyrunsinO(m1:4081)time.ThebestknownlistingalgorithmuntilrecentlywasduetoItaiandRodeh[ 33 ]whichrunsinO(m (G))time,where (G)isthegrapharboricity.Since (G)=O(p m),therunningtimeisalwaysO(m3=2).Recently,Bjorklund,Pagh,WilliamsandZwickgavere nedalgorithmswhichareoutputsensitivealgorithms[ 14 ].Finallyawealthofapproximatetrianglecountingmethodsexist[ 35 , 41 , 44 , 51 ].3.PROBLEMDEFINITIONWede nethenotionofaveragetriangledensity.Definition1(TriangleDensity).LetG(V;E)beanundi-rectedgraph.ForanySVwede neitstriangledensity(S)as(S)=t(S) s,wheret(S)isthenumberoftrianglesinducedbySandsjSj.Noticethat3(S)istheaveragenumberof(induced)trian-glespervertexinS.Theoptimizationproblemwefocusonfollows.Problem1(TDS-Problem).GivenG(V;E), ndasubsetofverticesSsuchthat(S)=GwhereGmaxSV(S).ItisclearthattheDS-ProblemandtheTDS-Problemingeneralcanresultinradicallydi erentsolutions.Con-siderforinstanceagraphGon2n+3verticeswhichis theunionofatriangleK3andofabipartitecliqueKn;n.TheoptimalsolutionsoftheDS-ProblemandtheTDS-Problemarethebipartitecliqueandthetrianglerespec-tively.Therefore,theinterestingquestioniswhethermaxi-mizingtheaveragedegreeandthetriangledensityresultindi erentresultsinreal-worldnetworks.OurresultsinSec-tion 5 indicatethattheanswerispositivesincethetriangledensestsubgraphcomparedtodensestsubgraphissmallerwhichexhibitsastrongernear-cliquestructure.4.PROPOSEDMETHODSection 4.1 providestwoalgorithmswhichsolvetheTDS-Problemexactly.Sections 4.2 and 4.3 providetwoapprox-imationalgorithmsfortheTDS-Problem.Finally,Sec-tion 4.4 providesageneralizationoftheDS-ProblemandtheTDS-Problemtomaximizingtheaverage-cliqueden-sityandshowshowtheresultsfrompreviousSectionsadapttothisproblem.4.1ExactSolutionsLetn;m;tbethenumberofvertices,edgesandtrianglesingraphGrespectively.ThealgorithmpresentedinSec-tion 4.1.1 achievesthebestrunningtime.WepresentanalgorithmwhichreliesonthesupermodularitypropertyofourobjectiveinSection 4.1.2 .Thelatteralgorithm,evenifslower,requiresO(nm)space,whereastheformerO(nt)space.Inreal-worldnetworks,typicallymt.Finally,itisworthmentioningthatCharikar'slinearprogram,seex2in[ 18 ],canbeextendedtoalinearprogram(LP)whichsolvestheTDS-Problem,see[ 50 ]forthedetails.4.1.1AnOm3=2nt+min(n;t)3-timeexactsolution Algorithm1triangle-densestsubgraph(G) 1:ListthesetoftrianglesT(G),tjT(G)j2:l t n;u (n1)(n2) 63:S ;4:whileul1 n(n1)do5: lu 26:H Construct-Network(G; ;T(G))7:(S;T) minimumst-cutinH8:ifSfsgthen9:u 10:else11:l 12:S Snfsg\V(G)13:endif14:ReturnS15:endwhile Ourmaintheoreticalresultisthefollowingtheorem.Itsproofisconstructive.Theorem1.ThereexistsanalgorithmwhichwhichsolvestheTDS-ProblemandrunsinOm3=2nt+min(n;t)3time.The rsttermO(m3=2)comesfromusingtheItai-Rodehal-gorithm[ 33 ]asourtrianglelistingblackbox.IfweusethenaiveO(n3)trianglelistingalgorithmthentherunningtimeexpressionissimpli edtoO(n3nt).Ontheotherhand,ifweusethealgorithmsofBjorklundetal.[ 14 ]the rstterm Algorithm2(G; ;T(G)) 1:V(H) fsg[V(G)[T(G)[ftg.2:Foreachvertexv2V(G)addanarcofcapacity1toeachtriangletiitparticipatesin.3:Foreachtriangle=(u;v;w)2T(G)addarcstou;v;wofcapacity2.4:Adddirectedarc(s;v)2A(H)ofcapacitytvforeachv2V(G).5:Addweighteddirectedarc(v;t)2A(H)ofcapacity3 foreachv2V(G).6:ReturnnetworkH(V(H);A(H);w);s;t2V(H). becomesfordensegraphs~On!n3(!1)=(5!)t2(3!)=(5!)andforsparsegraphs~Om2!=(!+1)m3!1 !+1t3! !+1,where!isthematrixmultiplicationexponent.Currently!2:3729dueto[ 53 ].Wemaintain[ 33 ]asourblack-boxtokeeptheexpressionssimpler.However,thereadershouldkeepinmindthattheresultpresentedin[ 14 ]improvesthetotalrunningtimeofthe rstterm.WeworkourwaytoprovingTheorem 1 by rstprov-ingthenextkeylemma.Then,weremovethelogarithmicfactor.Lemma1.Algorithm1solvestheTDS-ProblemandrunsinOm3=2+(nt+min(n;t)3)log(n)time.Algorithm1usesmaximum\rowcomputationstosolvetheTDS-Problem.ItisworthoutliningthatGoldberg'smaximum\rowalgorithm[ 28 ]fortheDS-ProblemisbasedonanetworkconstructionthatdoesnotadapttothecaseoftheTDS-Problem.Algorithm1returnsanoptimalsub-graphS,i.e.,(S)=.Thealgorithmperformsabinarysearchonthetriangledensityvalue .Speci cally,eachbi-narysearchquerycorrespondstoqueryingdoesthereexistasetSVsuchthatt(S)=jSj� ?.Foreachbinarysearch,weconstructabipartitenetworkHbyinvokingAlgorithm2.LetT(G)bethesetoftrianglesinG.ThevertexsetofHisV(H)=fsg[A[B[ftg,whereAV(G)andBT(G).Noticethatweoverloadthenotationinordertousethefre-quentlyusednotationforthesinkvertext.Itshouldalwaysbeclearfromthecontexttowhichentity(numberoftrian-glesvs.sinkvertex)wereferto.Forthepurposeof ndingT(G),atrianglelistingalgorithmisrequired[ 14 , 33 ].ThearcsetofgraphHiscreatedasfollows.Foreachvertexr2Bcorrespondingtotriangle(u;v;w)weaddthreein-comingandthreeout-comingarcs.Theincomingarcscomefromtheverticesu;v;w2Awhichformtriangle(u;v;w).Eachofthesearcshascapacityequalto1.Theoutgoingarcsgotothesamesetofverticesu;v;w,butthecapacitiesareequalto2.Inadditiontothearcsofcapacity1fromeachvertexu2Atothetrianglesitparticipatesin,weaddanoutgoingarcofcapacity3 tothesinkvertext.Fromthesourcevertexsweaddanoutgoingarctoeachu2Aofcapacitytv,wheretvisthenumberoftrianglesvertexvparticipatesinG.Aswehavealreadynoticed,HcanbeconstructedinO(m3=2)time[ 33 ].ItisworthoutliningthataftercomputingHforthe rsttime,subsequentnetworksneedtoupdateonlythearcsthatdependontheparameter ,somethingnotshowninthepseudocodeforsimplicity.ToprovethatAlgorithm1solvestheTDS-ProblemandrunsinOm3=2+(nt+min(n;t)3)log(n)timeweproceedinsteps. Forthesakeoftheproof,weintroducethefollowingdef-initionsandnotation.ForagivensetofverticesSletti(S)bethenumberoftrianglesthatinvolveexactlyiverticesfromS,i2f1;2;3g.Noticethatt3(S)isthenumberofinducedtrianglesbyS,forwhichwehavebeenusingthesimplernotationt(S)sofar.Weusethefollowingclaimasourcriteriontosettheinitialvaluesl;uinthebinarysearch.Claim1. t n(S)(n1)(n2) 6foranySV.Thelowerboundisobvioussince(V)=t n.Theupperboundalsofollowstriviallybyobservingthat(S)n3=nforany;6SV.ThissuggeststhattheoptimalvalueisalwaysO(n2).Thenextclaimservesasacriteriontodecidewhentostopthebinarysearch.Claim2 Thesmallestpossibledi erenceamongtwodistinctvalues(S1);(S2)isequalto1 n(n1).Toseewhy,noticethatthedi erencebetweentwopossibledi erenttriangledensityvaluesist(S1)jS2jt(S2)jS1j jS1jjS2j:IfjS1jjS2jthenjj1 n1 n(n1),otherwisejj1 jS1jjS2j1 n(n1).Noticethatcombiningtheabovetwoclaimsshowsthatthebinarysearchterminatesinatmostd4lognequeries.ThefollowinglemmaisastructurallemmafortheoptimalstcutthenetworkH.Lemma2.Consideranyminimumst-cut(S;T)inthenetworkH.LetA1S\A;B1S\BandA2T\A;B2T\B.Thecostofthemin-cutisequaltoXv=2A1tv+2t2(A1)+t1(A1)+3 jA1j:Proof.CaseI:A1;: Inthiscasethepropositiontriv-iallyholds,asthecostisequaltoPv2Atv=3t.ItisworthnoticingthatinthiscaseB1hastobealsoempty,otherwisewecontradicttheoptimalityof(S;T).HenceSfsg;TA[B[ftg.CaseII:A1;: ConsiderthecostofthearcsfromA1[B1toA2[B2.Weconsiderthreedi erentsubcases,onepereachtypeoftrianglewithrespecttosetA1.Type3:Ifthereexistthreeverticesu;v;w2A1thatformatriangle(u;v;w),thenthevertexr2Bcorrespondingtothisspeci ctrianglehastobeinB1.Ifnot,thenr2B2,andwecouldreducethecostofthemin-cutby3,ifwemovethetriangletoB1.Thereforethecostwepayfortrianglesoftypethreeis0.Type2:Considerthreeverticesu;v;wsuchthattheyformatriangle(u;v;w)andu;v2A1;w2A2.Then,thevertexr2BcorrespondingtothistrianglecanbeeitherinB1orB2.Inbothcaseswealwayspay2inthecutforeachtriangleoftypetwo.Type1:Finally,inthecaseu;v;wformatriangle,u2A1;v;w2A2thevertexr2Bcorrespondingtotriangle(u;v;w)willbeinB2.Ifnot,thenitliesinB1andwecoulddecreasethecostofthecutby3ifwemoveitinB2.Hence,wepay1inthecutforeachtriangleoftypeone.ThereforethecostduetothevarioustypesoftriangleswithrespecttoA1isequalto2t2(A1)+t1(A1).Furthermore,thecostofthearcsfromsourcestoTisequaltoPv2A2tvPv=2A1tv.ThecostofthearcsfromA1toTisequalto3 jA1j.Summinguptheindividualcostterms,weobtainthatthetotalcostisequaltoPv=2A1tv2t2(A1)+t1(A1)+3 jA1j. ThenextlemmaprovesthecorrectnessofthebinarysearchinAlgorithm1.Lemma3.(a)IfthereexistsasetWV(G)suchthatt3(W)� jWjthenanyminimumst-cut(S,T)inHsat-is esSnfsg6;.(b)Furthermore,iftheredoesnotexistsasetWsuchthatt3(W)� jWjthenthecut(fsg;A[B[ftg)isaminimumst-cut.Proof.(a)LetWVbesuchthatt3(W)� jWj:(1)Supposeforthesakeofcontradictionthattheminimumst-cutisachievedby(fsg;A[B[ftg).Inthiscasethecostoftheminimumst-cutisPv2Atv=3t.Now,considerthefollowing(S;T)cut.SetSconsistsofthesourcevertexs,A1WandB1bethesetoftrianglesoftype3and2inducedbyA1.LetTbetherestoftheverticesinH.Thecostofthiscutiscap(S;T)=Xv=2A1tv+2t2(A1)+t1(A1)+3 jA1j:Therefore,byourassumptionthattheminimumst-cutisachievedby(fsg;A[B[ftg)weobtain3tXv=2A1tv+2t2(A1)+t1(A1)+3 jA1j:(2)Now,noticethatbydoublecountingXv2A1tv=3t3(A1)+2t2(A1)+t1(A1):Furthermore,weobserveXv2A1tvXv=2A1tv=3t:Bycombiningthesetwofacts,andthefactthat3tisthecapacityoftheminimumcut,weobtainthefollowingcon-tradictionofInequality( 1 ).3tXv=2A1tv+2t2(A1)+t1(A1)+3 jA1j,t3(W) jWj:(b)ByLemma 2 ,foranyminimumst-cut(S;T)thecapac-ityofthecutisequaltoPv=2A1tv+2t2(A1)+t1(A1)+3 jA1j,whereA1A\S;A2A\T.Supposeforthesakeofcon-tradictionthatthecut(fsg;A[B[ftg)isnotaminimumcut.Therefore,cap(fsg;A[B[ftg)=3t�Xv=2A1tv+2t2(A1)+t1(A1)+3 jA1j:Usingthesamealgebraicanalysisasin(a),theabovestatementimpliesthecontradictiont3(W)� jWj,whereWA1. NowwecancompletetheproofofLemma 1 .Proof.TheterminationofAlgorithm1followsdirectlyfromClaims1,2.ThecorrectnessfollowsfromLemmata 2 , 3 .TherunningtimefollowsfromClaims1,2whichshowthatthenumberofbinarysearchqueriesisO(log(n))andeachbinarysearchquerycanbeperformedinOnt+min(n;t)3timeusingthealgorithmduetoAhuja,Orlin,SteinandTarjan[ 3 ] 2 orGus eld'salgorithm[ 31 ]. TheproofofTheorem 1 followsfromLemma 1 andthefactthattheparametricmaximum\rowalgorithmofAhuja,Orlin,SteinandTarjan[ 3 ],seealso[ 25 ],s(h)avestheloga-rithmicfactorfromtherunningtime.4.1.2AnO(n5m1:4081n6))log(n)-timeexactsolu-tionInthisSectionweprovideasecondexactalgorithmfortheTDS-Problem.First,weprovidethenecessarytheoreticalbackground.Definition2(Supermodularfunction).LetVbea niteset.Thesetfunctionf:2V!Rissupermodu-larifandonlyifforallA;BVf(A[B)f(A)+f(B)f(A\B):Afunctionfissupermodularifandonlyiffissubmodu-lar.Sub-andsupermodularfunctionsconstituteanimportantclassoffunctionswithvariousspecialproperties.Inthiswork,weareprimarilyinterestedinthefactthatmaximiz-ingasupermodularfunctionissolvableinstronglypolyno-mialtime[ 30 , 34 , 42 , 46 ].Forourpurposes,westatethefollowingresultwhichweuseasasubroutineinourproposedalgorithm.Theorem2([ 43 ]).Thereexistsanalgorithmformax-imizinganintegervaluedsupermodularfunctionfwhichrunsinOn5EOn6)time,wherenjVjisthesizeofthegroundsetVandEOisthemaximumamountoftimetoevaluatef(S)forasubsetSV.WeshowinthefollowingthatwhenthegroundsetisthesetofverticesVandf:2V!Risde nedbyf(S)=t(S) jSjwhere 2R,wecansolvetheTDS-Probleminpolynomialtime.Theorem3.Functionf:V!Rwheref(S)=t(S) jSjissupermodular.Proof.LetA;BV.Lett:2V!Rbethefunc-tionwhichforeachsetofverticesSreturnsthenumberofinducedtrianglest(S).Bycarefulcountingt(A[B)=t(A)+t(B)t(A\B)+t1(A:BnA)+t2(A:BnA);wheret1(A:BnA);t2(A:BnA)arethenumberoftrianglewithone,twoverticesinAandtwo,oneverticesinBnArespectively.Hence,foranyA;BV 2NoticethatthenetworkHhasO(nt)arcs,thereforetherunningtimeof[ 3 ]isO(min(n;t)(nt)+min(n;t)3)=O(nt+min(n;t)3).t(A[B)+t(A\B)t(A)+t(B)andthefunctiontissupermodular.Furthermore,forany �0thefunction jSjissupermodular.Sincethesumoftwosupermodularfunctionsissupermodular,theresultfollows. Theorem 3 naturallysuggestsAlgorithm3.Thealgorithmwillruninalogarithmicnumberofrounds.IneachroundwemaximizefunctionfusingOrlin-Supermodular-OptwhichtakesasinputargumentsthegraphGandtheparameter �0[ 43 ].Weassumeforsimplicitythatwithinthepro-cedureOrlin-Supermodular-Optfunctionfisevaluatedusinganecientexacttrianglecountingalgorithm[ 4 ].Thealgo-rithmofAlon,YusterandZwick[ 4 ]runsinO(m2!=(!+1))timewhere!2:3729[ 53 ].ThissuggeststheEOO(m1:4081).TheoverallrunningtimeofAlgorithm3isO(n5m1:4081n6)log(n)andthespaceusageO(nm)ratherthanO(nt). Algorithm3triangle-densestsubgraph(G) 1:l 0;u (n1)(n2) 6;S V2:whileul1 n(n1)do3: lu 24:(val;S) Orlin-Supermodular-Opt(G; )5:ifval0then6:u 7:else8:l 9:S S10:endif11:ReturnS12:endwhile 4.2A1 3-approximationalgorithmInthisSectionweprovideanalgorithmfortheTDS-Problemwhichprovidesa1 3-approximation.Ouralgorithmfollowsthepeelingparadigm,see[ 8 , 18 , 40 , 36 ].Speci cally,ineachrounditremovesthevertexwhichparticipatesinthesmallestnumberoftrianglesandreturnsthesubgraphthatachievesthelargesttriangledensity.ThepseudocodeisshowninAlgorithm4. Algorithm4(G) 1:Countthenumberoftrianglestvforeachvertexv2V2:Hn G3:fori nto2do4:LetvbethevertexofGiofminimumnumberoftri-angles5:Hi1 Hinv6:endfor7:ReturnHjthatachievesmaximumtriangledensityamongHis,i=1;:::;n. Theorem4.Algorithm4isa1 3-approximationalgorithmfortheTDS-Problem. Proof.LetSbeanoptimalset.Letv2S;jSjsandtA(v)bethenumberofinducedtrianglesbyAthatvparticipatesin.Then,Gt(S) st(Snfvg) s1,tS(v)G;sincet(Snfvg)=t(S)tS(v).Considertheiterationbeforethealgorithmremovesthe rstvertexvthatbelongsinS.CallthesetofverticesW.Clearly,SWandforeachvertexu2WthefollowinglowerboundholdstW(u)tW(v)tS(v)GduetothegreedinessofAlgorithm3.ThisprovidesalowerboundonthetotalnumberoftrianglesinducedbyWt(W)=1 3Xu2WtW(u)1 3jWjG)t(W) jWj1 3G:Tocompletetheproof,noticethatthealgorithmreturnsasubgraphSsuchthat(S)(W)1 3G. Thekeydi erencecomparedtotheDS-Problempeel-ingalgorithm[ 18 ]isthatwhenweremoveavertex,thecountsofitsneighborsmaydecreasemorethan1.There-fore,whenvertexvisremoved,weupdatethecountsofitsneighborsinOdeg(v)2time,bylookinghowmanytrian-gleseachofitsneighborshasaftervisremoved.NoticethatOPvdeg(v)2O(mn).4.3MapReduceImplementationTheMapReduceframework[ 20 ]hasbecomethedefactostandardforprocessinglarge-scaledatasets.Inthefollow-ing,weshowhowwecanapproximateecientlytheTDS-ProbleminMapReduce.Beforewedescribethealgo-rithm,weshowthatAlgorithm5forany�0terminatesandprovidesa1 3+3-approximation.Theideabehindthisalgorithmistopeelverticesinbatches[ 10 , 29 ]ratherthanonebyone. Algorithm5(G;�0) 1:Sout;S V2:whileS;do3:A(S) fi2S:tS(i)3(1+)(S)g4:S SnA(S)5:if(S)(Sout)then6:Sout S7:endif8:endwhile9:ReturnSout. Lemma4.Forany�0,Algorithm5providesa1 (3+3)-approximationtotheTDS-Problem.Furthermore,itter-minatesinO(log1+(n))passes.Proof.LetSbeanoptimalsolutiontotheTDS-Problem.AsweprovedinTheorem 4 ,foranyv2SitistruethattS(v)G.Furthermore,ineachroundatleastonever-texisremoved.Toseewhy,assumeforthesakeofcon-tradictionthatA(S)=;forsomeSduringtheexecutionofthealgorithm.Then,weobtainthecontradictionthat3jSj(S)=Pv2StS(v)(3+3)jSj(S).Considertheroundwherethealgorithmforthe rsttimeremovesavertexv2S.LetWbethecorrespondingsetofvertices.Sincev2A(W)ispeeledo ,weobtainanupperboundonitsin-duceddegree,namelyv2A(W))tW(v)(3+3)(W).SinceSW,weobtain(3+3)(W)tW(v)tS(v)(S);whichprovesthatAlgorithm5isa1 (3+3)-approximationtotheTDS-Problem.Toseewhythealgorithmterminatesinlogarithmicnumberofrounds,noticethat3t(S)Xv2SnA(S)tS(v)(3+3)jSjjA(S)jt(S) jSj,jA(S)j 1+jSj,jSnA(S)j1 1+jSj:SinceSdecreasesbyafactorof1 1+ineachround,thealgo-rithmterminatesinO(log1+(n))=Olog(n) rounds. MapReduceImplementation:NowweareabletodescribeouralgorithminMapReduce.ItusesanyoftheecientalgorithmsofSuriandVassilvitski[ 48 ]asasubroutinetocounttrianglespervertexineachround.Theremovaloftheverticeswhichparticipateinlesstrianglesthanthethresh-old,isdoneintworounds,asin[ 10 ].Forcompleteness,wedescribetheprocedurehere.ThesetofverticesStobepeeledo ineachroundaremarkedbyaddingakey-valuepairhv;$iforeachv2S.Eachedge(u;v)ismappedtohu;vi.Thereducerreceivesallendpointsoftheedgesinci-dentwithvandthesymbol$incasethevertexismarkedfordeletion.Incasethevertexismarked,thenthereducetaskreturnsnothing,otherwiseitcopiesitsinput.Inthesecondround,weperformthesameprocedurewiththeonlydi er-encebeingthatwemapeachedge(u;v)tohv;ui.Therefore,theedgeswhichremainhavebothendpointsunmarked.ThealgorithmrunsinO(log(n)=),asittakesO(log(n)=)peel-ingo rounds,andineachpeelinground,constantnumberofroundsisneededtocounttrianglespervertex,markver-ticesfordeletionandremovethecorrespondingvertexset.4.4k-cliqueDensestSubgraphWeoutlinethatourproposedmethodscanbeadaptedtothefollowinggeneralizationoftheDS-ProblemandtheTDS-Problem.Definition3(-clique-densestsubgraph).LetG(V;E)beanundirectedgraph.ForanySVwede neits-cliquedensityhk(S),2ashk(S)=ck(S) s,whereck(S)isthenumberof-cliquesinducedbySandsjSj.Problem2(k-Clique-DS-Problem).GivenG(V;E), ndasubsetofverticesSsuchthathk(S)=hkwherehk=maxSVhk(S).Asinthetriangledensestsubgraphproblem,wecreateanetworkHparameterizedbythevalue onwhichweper-formourbinarysearch.TheprocedureisdescribedinAlgo-rithm6.ThesetC(G)isthesetof-cliquesinG.WetheninvokeAlgorithm1,withtheupperboundusettonk.Fol-lowingtheanalysisofTheorem 1 ,weseethatthek-Clique-DS-Problemissolvableinpolynomialtime.Forinstance,usingGus eld'salgorithm[ 31 ]or[ 3 ]ineachbinarysearch querywegetanoverallrunningtimeOnk+(njC(G)jn3)log(n)O(nk+1log(n)).UsingtheimprovedresultduetoAhuja,Orlin,SteinandTarjanforparametricmax\rowsinunbalancedbipartitegraphs[ 3 ],wesavetheloga-rithmicfactorintherunningtime. Algorithm6(G; ;C(G);k) 1:V(H) fsg[V(G)[C(G)[ftg.2:Foreachvertexv2V(G)addanarcofcapacity1toeach-cliqueciitparticipatesin.3:Foreach-clique(ui1;:::;uik)2C(G)addarcstoui1;:::;uikofcapacity1.4:Adddirectedarc(s;v)2A(H)ofcapacitycvforeachv2V(G).5:Addweighteddirectedarc(v;t)2A(H)ofcapacityk foreachv2V(G).6:ReturnnetworkH(V(H);A(H);w);s;t2V(H). Furthermore,Algorithm4canalsobemodi ed,byremovingineachroundthevertexwiththesmallestnumberof-cliques,toobtainCorollary 2 .AstheanalogyofTheorem 4 .Corollary1.Thealgorithmwhichpeelso ineachroundthevertexwiththeminimumnumberof-cliquesandreturnsthesubgraphthatachievesthelargest-cliquedensity,isa1 k-approximationalgorithmforthek-Clique-DS-Problem.Similarly,Algorithm5andtheMapReduceimplementa-tioncanbemodi edtosolvethek-Clique-DS-Problem.Weomitthedetails.Corollary2.Thealgorithmwhichpeelso ineachroundthesetofverticeswithlessthan(1+)h(S),whereh(S)isthe-cliquedensityinthatround,terminatesinO(log1+(n))roundsandprovidesa1 k(1+)-approximationguaranteeforthek-Clique-DS-Problem.Furthermore,using[ 23 ],weobtainanecientMapReduceimplementation.Weillustrateanexamplewherechoosingalargervalueyieldsbene ts.LetGG(n;p)beanErdos-Renyigraph,wherepp(n).AssumethatweplantacliqueKofsizenforsomeconstant\r�0.Wewishtoshowanon-trivialrangeofpp(n)valuessuchthatthefollowingconditionshold:h2(C)=jE(K)j jKj(n\r2) n\rp(n2) nE[h2(V)],andfor3hk(C)=(n\rk) n\rp(k2)(nk) nE[hk(V)].Bysimplealgebraicmanipulationweseethatpsatis esbothconditionsifOn(1)pOn2 k(1) 3 .Clearly,forlargervalues,weallowourselvesalargerrangeofpvaluesforwhichwecan ndthehiddencliqueinexpectation.Ourpreliminaryexperimentalresultsfor=4indicatethatthe4-clique-densestsubgraphgetsclosertoalargenear-cliquecomparedtothetriangledensestsubgraph.However,thegainofmovingfromthedensestsubgraphtothetriangledensestsubgraphwithrespecttoextractinglargenear-cliquesislargerthanthegainofmovingfromthetriangledensestsubgraphtothe4-clique-densestsubgraph. 3Noticethatforthisrangeofp,thegraphisconnectedandthecliquenumberisconstantwithhighprobability[ 15 ]5.EXPERIMENTALEVALUATIONBeforewepresentour ndingsindetail,wesummarizethem:(i)theTDS-Problemandtheproposedalgorithmsconstitutenewvaluablegraphminingprimitivesfor ndinglargenear-cliques,(ii)the1 3-approximationalgorithm(Al-gorithm4)achievessigni cantlybetterapproximationsthanthepessimistic1 3guarantee.Alsoitissigni cantlyfaster.(iii)TryingasmallrangeofvaluesforAlgorithm5isingeneralasaferstrategycomparedtorunningexperimentswith xedchoice.5.1ExperimentalSetupThedatasetsweuseareshowninTable 1 .Allgraphsweremadesimpleandundirectedbyignoringtheedgedi-rection,whenthegraphisdirected,andremovingself-loopsandmultipleedges,ifany.Theexperimentswereperformedonasinglemachine,withIntel(R)Core(TM)i5CPUat2.40GHz,with3.86GBofmainmemory.Wehaveimple-mentedAlgorithm1inMatlabR2011ausingamaximum\rowimplementationduetoKolmogorovandBoykov[ 16 ]asoursubroutinewhichrunsintimeO(t(nt)3).Thisimple-mentationisprohibitivelyexpensiveevenforsmallgraphswhichhavealargenumberoftriangles.WehavecodedthepeelingalgorithminC++usingpriorityqueues.Asourtrianglelistingalgorithm,weusethesimplenodeiter-atoralgorithmwhichchecksforeachvertexthenumberofedgesamongitsneighbors.Thecodeispubliclyavailableat http://people.seas.harvard.edu/~babis/code.html .Wemeasurethequalityofeachextractedsubgraphbytwomea-sures:theedgedensityoftheextractedsubgraphfee(S)=jSj2andtheoutputsizejSj.Noticethatwhenfeiscloseto1,theextractedsubgraphisclosetobeingaclique.5.2MainFindingsTable 2 showstheresultsobtainedonseveralpopularsmall-andmedium-sizedgraphs.Eachcolumncorrespondstoadataset.Therowscorrespondtomeasurementsforeachmethodweusetoextractasubgraph.Speci cally,the rst(DS),second(1 2-DS),third(TDS)andfourth(1 3-TDS)rowscorrespondtothesubgraphextractedbyGold-berg'sexactalgorithm[ 28 ]fortheDS-Problem,Charikar's1 2-approximationalgorithm[ 18 ]fortheDS-Problem,Al-gorithm1andAlgorithm4fortheTDS-Problemrespec-tively.ForeachoptimalextractedsubgraphS,weshowitssizeasafractionofthetotalnumberofvertices,theedgedensityfe(S),theaveragedegree2(S)=2e(S)=jSjandtheaveragenumberoftrianglespervertex3(S)=3t(S)=jSj.Weobservethatthetriangle-densestsubgraphisclosertobeinganear-cliquecomparedtothedensestsubgraph.ApronouncedexampleistheFootballnetworkwheretheopti-maldensestsubgraphisthewholenetworkwithfe=0:0094,whereastheoptimaltriangle-densestsubgraphisasetof18verticeswithedgedensity0.48.Finally,weobservethatthequalityofAlgorithm's4outputisveryclosetotheoptimalsolutionandsometimesevenbetter.ThesameobservationholdsforthecaseofCharikar's1 2-approximationalgorithm[ 18 ].WeusetheC++implementationofAlgorithm4andaC++implementationofCharikar's1 2-approximationalgo-rithmontherestofthedatasetsofTable 1 .TheresultsareshowninTable 3 ,uptotwodecimaldigitsofaccuracy.Theca-HepThdatasetisanexception,astheoptimalsolutionscoincide.Wealsonoticethattheruntimesappearthesame Name Nodes Edges Description Adjnoun 112 425 Generatedbyprocessingtextdata AS-735 6475 12572 AutonomousSystems AS-caida 26475 53381 AutonomousSystems ca-Astro 17903 196972 Co-authorship ca-GrQC 4158 13422 Co-authorship ca-HepTh 11204 117619 Co-authorship Celegans 297 4296 NeuralnetworkofC.Elegans DBLP 53442 255936 Co-authorship Epinions 75877 405739 Socialnetwork Enron 33696 180811 Email EuAll 224832 339925 Email Football 115 613 NCAAfootballgamenetwork Karate 34 78 Socialnetwork Lesmis 77 254 Generatedbyprocessingtextdata Politicalblogs 1490 16715 Generatedbyprocessingsalesdata Politicalbooks 105 441 Blognetwork soc-Slashdot0811 77360 469180 PersontoPerson soc-Slashdot0902 82168 504230 PersontoPerson wb-cs-Stanford 8929 26320 WebGraph web-Google 855802 4291352 WebGraph web-NotreDame 325729 1090108 WebGraph Wiki-vote 7066 100736 Wikipedia\who-votes-whom" Table1:Datasetsusedinourexperiments. Method Measure AdjnounCelegansFootballKarateLesmisPolblogsPolbooks DS jSj jVj(%) 42.8645.810047.129.919.151.4 2 9.5817.1610.665.2510.7855.829.40 fe 0.200.130.0940.350.490.1960.18 3 1445.9321.125.6441.61768.8722.68 1 2-DS jSj jVj(%) 41.142.410052.929.918.757.1 2 9.5717.110.665.210.7855.89.3 fe 0.210.140.0940.310.490.200.16 3 14.1646.521.125.1641.61774.622.68 TDS jSj jVj(%) 36.610.415.717.716.98.119.1 2 9.3713.818.224.6710.6255.729.34 fe 0.230.460.480.930.890.460.50 3 1556.82288.0147.31972.3625.95 1 3-TDS jSj jVj(%) 36.69.115.717.716.98.115.2 2 9.3713.568.224.6710.6255.729.13 fe 0.230.520.480.930.890.460.61 3 1556.55288.0147.31972.3625.5 Table2:ComparisonoftheextractedsubgraphsbytheGoldberg'sexactalgorithmfortheDS-Problem(DS),Charikar's1 2-approximationalgorithm(1 2-DS),ourexactalgorithmfortheTDS-Problem(TDS)andour1 3-approximationalgorithm(1 2-TDS).Here,fe(S)=e(S)=jSj2istheedgedensityoftheextractedsubgraph,2(S)=2e(S)=jSjistheaveragedegreeand3(S)=3t(S)=jSjistheaveragenumberoftriangles.forca-HepThduetousingtwodecimaldigitsofaccuracy.Onotherdatasets,weobservedi erencesbetweenthetwosolu-tions.Forinstance,forthecollaborationnetworkca-Astrothedensestsubgraphisasubgraphwith1184verticesandfe=0:05.Thetriangle-densestsubgraphisacliquewith57vertices.Overall,weverifythefactthatthetriangle-densestsubgraphisclosertobeinganear-clique.Finally,theruntimesareshown.NoticethattheruntimesreportedforforAlgorithm4includeboththetrianglecountingandthepeelingphase.5.3ExploringparameterinAlgorithm5InthisSectionwepresenttheresultsofAlgorithm5ontheDBLPgraph.Thisisparticularlyinterestinginstanceasitindicatesthatinsteadoftryingtoselectagoodvalue,itisworthtryingoutatleastfewvalues,assumingcompu-tationalresourcesareavailable.Werangefrom0.1to1.8withastepof0.1.Figure 1 (a)plotsthenumberofroundsAlgorithm5takestoterminateasafunctionof.Weob-servethatevenforsmallvaluesthenumberofroundsis6.Thereadershouldcomparethistotheupperboundpre-dictedbyLemma 4 whichexceeds100.Figure 1 (b)plotsthe 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 2 4 6 Rounds 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.2 0.4 0.6 0.8 1  Rel.  1/(3+3) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0 0.2 0.4 0.6 0.8 1  Rel. fe Rel. ft (a)(b)(c)Figure1:Exploringthetrade-o betweenthenumberofroundsandaccuracyasafunctionoftheparameterforAlgorithm5.LetS;SbetheextractedsubgraphsbyAlgorithms5and1respectively.(a)Numberofrounds,(b)relativeaveragetriangledensityratio(S) (blue)andtheapproximationguarantee1=(3+3)(red),and(c)relativeratiosfe(S) fe(S);ft(S) ft(S)asfunctionsof.1 2-DS1 3-TDS jSjfeT jSjfeT AS-735 590.280.00 130.80.07 AS-caida 1430.140.02 270.520.63 ca-Astro 11840.050.06 5711.42 ca-GrQC 420.790.00 140.890.02 ca-HepTh 3210.02 3210.02 Epinions 9990.1210.08 4310.2568.75 Enron 5550.140.02 3900.192.01 EuAll 5070.130.08 2000.299.52 soc-Slashdot0811 2070.410.13 2530.496.85 soc-Slashdot0902 2190.400.16 1730.507.72 wb-cs-Stanford 840.640.48 260.800.67 web-Google 2400.232.54 1200.4479.5 web-NotreDame 13670.1150.50 4570.34516.3 Wiki-vote 8460.110.00 4640.800.19 Table3:Comparisonoftheextractedsubgraphsbythe1 2-approximationalgorithmofCharikarandthe1 3-approximationalgorithm,Algorithm4.There-spectiveruntimesareshowninseconds.relativeratioRel:(S) whereSistheoutputofAlgo-rithm5.Forconvenience,thelowerbound1 3+3isplottedwithredcolor.Besidestheratiofe(S) fe(S), gure 1 (c)plotsalsotherelativeratioft(S) ft(S)asafunctionof.Hereft(S)=t(S) (jSj3).Asweobserve,thequalityofAlgorithm5isclosetotheoptimalsolutionexceptfor=0:7and=0:8.Byinspectingwhythishappensweobservethattheoptimaltriangle-densestsubgraphisacliqueof44vertices.Itturnsoutthatfor=0:7;0:8theoptimalsubgraphwhichisfoundinthelastroundoftheexecutionofthealgorithm(thelatterhappensforallvalues)consistsof98and74verticeswhichcontainasasubgraphtheoptimalK44.Forothervaluesof,thesubgraphinthelastroundiseithertheoptimalK44orclosetoit,withfewmoreextravertices.Thisexampleshowsthepotentialdangerofusingasinglevaluefor,suggestingthattryingoutasmallnumberofvaluescanbesigni cantlybene cialintermsoftheapproximationquality.6.APPLICATION:ORGANIZINGCOCKTAILPARTIESAgraphminingproblemthatcomesupinvariousappli-cationsisthefollowing:givenasetofverticesQV, ndadensesubgraphcontainingQ.Werefertothistypeofgraphminingproblemsascocktailproblems,duetothefol-lowingmotivation,c.f.[ 47 ].SupposethatasetofpeopleQwantstoorganizeacocktailparty.Howdotheyinviteotherpeopletothepartysothatthesetofalltheparticipants,includingQ,areassimilaraspossible?AvariationoftheTDS-Problemwhichaddressesthisgraphminingproblemfollows.Problem3(Constrained-TDS-Problem).GivenagraphG(V;E)andQV, ndthesubsetofverticesSthatmaximizesthetriangledensitysuchthatQS,S=argmaxQSV(S):TheConstrained-TDS-Problemcanbesolvedbymod-ifyingourproposedalgorithmsaccordingly.Ausefulcorol-laryfollows.Corollary3.TheConstrained-TDS-Problemissolv-ableinpolynomialtimebyaddingarcsfromstov2Aoflargeenoughcapacities,e.g.,capacitiesequalton3+1aresucientlylarge.Furthermore,thepeelingalgorithmwhichavoidsremovingverticesfromQisa1 3-approximationalgo-rithmfortheConstrained-TDS-Problem.Inthefollowingweevaluatethe1 3-approximationalgo-rithmontwodatasets.Thetwoexperimentsindicatetwodi erenttypesofperformancesthatshouldbeexpectedinreal-worldapplications.The rstisapositivewhereasthesecondisnegativecase.Bothexperimentshereserveassan-itychecks 4 Politicalvotedata.WeobtainSenatedataforthe rstsession(2006)ofthe109thcongresswhichspannedthepe-riodfromJanuary3,2005toJanuary3,2007,duringthe 4Forinstance,bypreprocessingthepoliticalvotedatafromamatrixformtoagraphusingathresholdforedgeaddi-tions,resultsininformationloss. fthandsixthyearsofGeorgeW.Bush'spresidency[ 1 ].InthisCongress,therewere55,45and1Republican,Demo-craticandindependentsenatorsrespectively.ThedatasetcanbedownloadedfromtheUSSenatewebpage http://www.senate.gov .Wepreprocessthedatasetinthefollowingway:weaddanedgebetweentwosenatorsifamongthebillsforwhichtheybothcastedavote,theyvotedatleast80%ofthetimesinthesameway.There-sultinggraphhas100verticesand2034edges.Werunthe1 3-approximationalgorithmonthisgraphusingasoursetQthe rstthreerepublicansaccordingtolexicographicor-der:Alexander(R-TN),Allard(R-CO)andAllen(R-VA).Weobtainatouroutputasubgraphconsistingof47ver-tices.Byinspectingtheirparty,we ndthat100%ofthemareRepublicans.Thisshowsthatouralgorithminthiscasesucceedsin ndingthelargemajorityoftheclusterofrepub-licans.Itisinterestingthatthe8remainingRepublicansdonotenterthetriangle-densestsubgraph.Acarefulinspectionofthedata,c.f.[ 2 ],indicatesthat6republicansagreewiththepartyvoteonatmost79%ofthebills,and8ofthemonatmost85%ofthebills.DBLPgraph.WeinputasaquerysetQasetofscientistswhohaveestablishedthemselvesintheoryandalgorithmde-sign:RichardKarp,ChristosPapadimitriou,MihalisYan-nakakisandSantoshVempala.ThealgorithmreturnsatitsoutputthequerysetandasetSof44verticescorrespondingtoacliqueof(mostly)Italiancomputerscientists.Welistasubsetofthe44verticeshere:M.Bencivenni,M.Cana-paro,F.Capannini,L.Carota,M.Carpene,R.Veraldi,P.Veronesi,M.Vistoli,R.Zappi.TheoutputgraphinducedbyS[Qisdisconnected.Therefore,thiscanbeeasilyex-plainedbecauseofthefollowing(folklore)inequality,giventhatjQjjSjinourexample.Claim1.Leta;b;c;dbenon-negative.Then,max c;b db cdmin c;b d(3)Inourexample,wegett(S);cjSj;bt(Q);djQj.Insuchascenario,wheretheoutputconsistsoftheunionofadensesubgraphandthequerysetQ,analgorithmwhichbuildsitselfupfromQ-assumingQisnotanindependentset-toVbyaddingverticeswhichcreateasmanytrianglesaspossibleandreturningthemaximumdensitysubgraph,ratherthanpeelingverticesfromVdowntoQshouldbepreferredinpractice,seealso[ 49 ].7.CONCLUSIONInthisworkweintroducetheaveragetriangledensityasanovelobjectiveforattackingtheimportantproblemof ndingnear-cliques.Weproposeexactandapproximationalgorithms.Furthermore,ourtechniquescansolvethemoregeneralproblemofmaximizingthe-cliquedensity.Exper-imentallyweverifythevalueoftheTDS-Problemasanoveladditiontothegraphminingtoolbox.Ourworkleavesnumerousproblemsopen,includingthefollowing:(a)Canweobtainafasterexactalgorithmbyimprovingthespaceusageofthenetworkconstruction?(b)Canweusesparsi cationtoobtainfasterapproximatesolu-tions[ 44 ]?8.REFERENCES[1] http://tinyurl.com/bwgpka . 6 [2] http://tinyurl.com/zgdam . 6 [3]R.K.Ahuja,J.B.Orlin,C.Stein,andR.E.Tarjan.Improvedalgorithmsforbipartitenetwork\row.SIAMJournalonComputing,23(5):906{933,1994. 4.1.1 , 2 , 4.4 [4]N.Alon,R.Yuster,andU.Zwick.Findingandcountinggivenlengthcycles.Algorithmica,17(3):209{223,1997. 2 , 4.1.2 [5]R.AndersenandK.Chellapilla.Findingdensesubgraphswithsizebounds.InWAW,2009. 2 [6]A.Angel,N.Sarkas,N.Koudas,andD.Srivastava.Densesubgraphmaintenanceunderstreamingedgeweightupdatesforreal-timestoryidenti cation.Proc.VLDBEndow.,5(6):574{585,Feb.2012. 1 [7]Y.Asahiro,R.Hassin,andK.Iwama.Complexityof ndingdensesubgraphs.Discr.Ap.Math.,121(1-3),2002. 2 [8]Y.Asahiro,K.Iwama,H.Tamaki,andT.Tokuyama.Greedily ndingadensesubgraph.J.Algorithms,34(2),2000. (document) , 2 , 4.2 [9]G.D.BaderandC.W.Hogue.Anautomatedmethodfor ndingmolecularcomplexesinlargeproteininteractionnetworks.BMCbioinformatics,4(1):2,2003. 1 [10]B.Bahmani,R.Kumar,andS.Vassilvitskii.Densestsubgraphinstreamingandmapreduce.ProceedingsoftheVLDBEndowment,5(5):454{465,2012. (document) , 2 , 4.3 , 4.3 [11]V.BatageljandM.Zaversnik.AnO(m)algorithmforcoresdecompositionofnetworks.Arxiv,arXiv.cs/0310049,2003. 2 [12]A.Bhaskara,M.Charikar,E.Chlamtac,U.Feige,andA.Vijayaraghavan.Detectinghighlog-densities:anO(n1=4)approximationfordensest-subgraph.InProceedingsofthe42ndACMsymposiumonTheoryofcomputing,pages201{210.ACM,2010. 2 [13]S.Bhattacharya,M.Henziger,D.Nanongkai,andC.E.Tsourakakis.Space-andtime-ecientalgorithmsformaintainingdensesubgraphsonone-passdynamicstreams.InProceedingsofthe47thACMsymposiumonTheoryofcomputing,2015 2 [14]A.Bjorklund,R.Pagh,V.WilliamsVassilevska,andU.Zwick.Listingtriangles.InProceedingsof41stInternationalColloquiumonAutomata,LanguagesandProgramming(ICALP),2014. 2 , 4.1.1 , 4.1.1 [15]B.Bollobas.Randomgraphs,volume73ofCambridgeStudiesinAdvancedMathematics.CambridgeUniversityPress,Cambridge,secondedition,2001. 3 [16]Y.BoykovandV.Kolmogorov.Anexperimentalcomparisonofmin-cut/max-\rowalgorithmsforenergyminimizationinvision.PatternAnalysisandMachineIntelligence,IEEETransactionson,26(9):1124{1137,2004. 5.1 [17]G.BuehrerandK.Chellapilla.Ascalablepatternminingapproachtowebgraphcompressionwithcommunities.InWSDM,pages95{106.ACM,2008. 1 [18]M.Charikar.Greedyapproximationalgorithmsfor ndingdensecomponentsinagraph.InAPPROX,2000. (document) , 1 , 2 , 4.1 , 4.2 , 4.2 , 5.2 [19]E.Cohen,E.Halperin,H.Kaplan,andU.Zwick. Reachabilityanddistancequeriesvia2-hoplabels.InSODA,2002. 1 [20]J.DeanandS.Ghemawat.Mapreduce:simpli eddataprocessingonlargeclusters.Commun.ACM,51(1):107{113,Jan.2008. 4.3 [21]A.EpastoandS.LattanziandM.Sozio.EcientDensestSubgraphComputationinEvolvingGraphs.WWW'15(toappear),2015. 2 [22]U.Feige,G.Kortsarz,andD.Peleg.Thedensek-subgraphproblem.Algorithmica,29(3),2001. 2 [23]I.Finocchi,M.Finocchi,andE.G.Fusco.Countingsmallcliquesinmapreduce.arXivpreprintarXiv:1403.0734,2014. 2 [24]E.Fratkin,B.T.Naughton,D.L.Brutlag,andS.Batzoglou.Motifcut:regulatorymotifs ndingwithmaximumdensitysubgraphs.Bioinformatics,22(14):e150{e157,2006. 1 [25]G.Gallo,M.D.Grigoriadis,andR.E.Tarjan.Afastparametricmaximum\rowalgorithmandapplications.SIAMJournalonComputing,18(1):30{55,1989. 1 , 2 , 4.1.1 [26]D.Gibson,R.Kumar,andA.Tomkins.Discoveringlargedensesubgraphsinmassivegraphs.InVLDB,2005. 1 [27]A.Gionis,F.Junqueira,V.Leroy,M.Sera ni,andI.Weber.Piggybackingonsocialnetworks.Proc.VLDBEndow.,6(6):409{420,Apr.2013. 1 [28]A.V.Goldberg.Findingamaximumdensitysubgraph.Technicalreport,UniversityofCaliforniaatBerkeley,1984. 1 , 2 , 4.1.1 , 5.2 [29]M.T.GoodrichandP.Pszona.External-memorynetworkanalysisalgorithmsfornaturallysparsegraphs.InAlgorithms{ESA2011,pages664{676.Springer,2011. 4.3 [30]M.Grotschel,L.Lovasz,andA.Schrijver.Geometricalgorithmsandcombinatorialoptimization.Springer,Berlin,1988. 4.1.2 [31]D.Gus eld.Computingthestrengthofagraph.SIAMJournalonComputing,20(4):639{654,1991. 4.1.1 , 4.4 [32]J.Hastad.Cliqueishardtoapproximatewithinn1.ActaMathematica,182(1),1999. 1 [33]A.ItaiandM.Rodeh.Findingaminimumcircuitinagraph.SIAMJournalonComputing,7(4):413{423,1978. 2 , 4.1.1 , 4.1.1 [34]S.Iwata,L.Fleischer,andS.Fujishige.Acombinatorialstronglypolynomialalgorithmforminimizingsubmodularfunctions.JournaloftheACM(JACM),48(4):761{777,2001. 4.1.2 [35]M.Jha,C.Seshadhri,andA.Pinar.Aspaceecientstreamingalgorithmfortrianglecountingusingthebirthdayparadox.InProceedingsofthe19thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,pages589{597.ACM,2013. 2 [36]J.Jiang,M.Mitzenmacher,andJ.Thaler.Parallelpeelingalgorithms.arXivpreprintarXiv:1302.7014,2013. 4.2 [37]R.Jin,Y.Xiang,N.Ruan,andD.Fuhry.3-hop:ahigh-compressionindexingschemeforreachabilityquery.InSIGMOD,2009. 1 [38]R.KannanandV.Vinay.Analyzingthestructureoflargegraphs,1999. 2 [39]H.Karlo ,S.Suri,andS.Vassilvitskii.Amodelofcomputationformapreduce.InSODA,pages938{948.SocietyforIndustrialandAppliedMathematics,2010. 1 [40]S.KhullerandB.Saha.On ndingdensesubgraphs.InICALP,2009. 1 , 2 , 4.2 [41]M.N.Kolountzakis,G.L.Miller,R.Peng,andT.C.E.Ecienttrianglecountinginlargegraphsviadegree-basedvertexpartitioning.InternetMathematics,8(1-2):161{185,2012. 2 [42]L.Lovasz.Submodularfunctionsandconvexity.InMathematicalProgrammingTheStateoftheArt,pages235{257.Springer,1983. 4.1.2 [43]J.B.Orlin.Afasterstronglypolynomialtimealgorithmforsubmodularfunctionminimization.MathematicalProgramming,118(2):237{251,2009. 2 , 4.1.2 [44]R.PaghandC.E.Tsourakakis.Colorfultrianglecountingandamapreduceimplementation.InformationProcessingLetters,112(7):277{281,2012. 2 , 7 [45]A.D.Sarma,A.Lall,D.Nanongkai,andA.Trehan.Densesubgraphsondynamicnetworks.InDistributedComputing,pages151{165.Springer,2012. 2 [46]A.Schrijver.Acombinatorialalgorithmminimizingsubmodularfunctionsinstronglypolynomialtime.JournalofCombinatorialTheory,SeriesB,80(2):346{355,2000. 4.1.2 [47]M.SozioandA.Gionis.Thecommunity-searchproblemandhowtoplanasuccessfulcocktailparty.InKDD,pages939{948.ACM,2010. 6 [48]S.SuriandS.Vassilvitskii.Countingtrianglesandthecurseofthelastreducer.InWWW,pages607{614.ACM,2011. 1 , 4.3 [49]C.E.Tsourakakis,F.Bonchi,A.Gionis,F.Gullo,andM.A.Tsiarli.Denserthanthedensestsubgraph:extractingoptimalquasi-cliqueswithqualityguarantees.InKDD,pages104{112.ACM,2013. 1 , 2 , 6 [50]C.E.Tsourakakis.Anovelapproachto ndingnear-cliques:Thetriangle-densestsubgraphproblem.arXivpreprintarXiv:1405.1477,2014. 4.1 [51]C.E.Tsourakakis,M.N.Kolountzakis,andG.L.Miller.Trianglesparsi ers.J.GraphAlgorithmsAppl.,15(6):703{726,2011. 2 [52]N.Wang,J.Zhang,K.-L.Tan,andA.K.Tung.Ontriangulation-baseddenseneighborhoodgraphdiscovery.ProceedingsoftheVLDBEndowment,4(2):58{68,2010. 2 [53]V.V.Williams.Multiplyingmatricesfasterthancoppersmith-winograd.InSTOC,pages887{898.ACM,2012. 2 , 4.1.1 , 4.1.2 [54]Y.ZhangandS.Parthasarathy.Extractinganalyzingandvisualizingtrianglek-coremotifswithinnetworks.InICDE,pages1049{1060.IEEE,2012. 2