In this paper we present an algorithm for local graph partitioning using personalized PageRank vectors We develop an improved algorithm for computing approximate PageRank vectors and derive a mixing result for PageRank vectors similar to that for ra ID: 4187
Download Pdf The PPT/PDF document "Local Graph Partitioning using PageRank ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Inthispaper,wepresentalocalgraphpartitioningalgorithmthatusespersonalizedPageRankvectorstoproducecuts.BecauseaPageRankvectorisdenedrecursively(aswewilldescribeinsection2),wecanconsiderasinglePageRankvectorinplaceofasequenceofrandomwalkvectors,whichsimpliestheprocessofndingcutsandallowsgreater exibilitywhencomputingapproximations.WeshowdirectlythatasweepoverasingleapproximatePageRankvectorcanproducecutswithsmallconductance.Incontrast,SpielmanandTengshowthatwhenagoodcutcanbefoundfromaseriesofwalkdistributions,asimilarcutcanbefoundfromaseriesofapproximatewalkdistributions.Ourmethodofanalysisallowsustondcutsusingapproximationswithlargeramountsoferror,whichimprovestherunningtime.Theanalysisofouralgorithmisbasedonthefollowingresults:WegiveanimprovedalgorithmforcomputingapproximatePageRankvectors.WeuseatechniqueintroducedbyJeh-Widom[7],andfurtherdevelopedbyBerkhininhisBookmarkColoringAlgorithm[1].ThealgorithmsofJeh-WidomandBerkhincomputemanypersonal-izedPageRankvectorssimultaneously,morequicklythantheycouldbecomputedindividu-ally.OuralgorithmcomputesasingleapproximatePageRankvectormorequicklythanthealgorithmsofJeh-WidomandBerkhinbyafactoroflogn.WeproveamixingresultforPageRankvectorsthatissimilartotheLovasz-Simonovitsmixingresultforrandomwalks.Usingthismixingresult,weshowthatifasweepoveraPageRankvectordoesnotproduceacutwithsmallconductance,thenthatPageRankvectorisclosetothestationarydistribution.WethenshowthatforanysetCwithsmallconductance,andformanystartingverticescontainedinC,theresultingPageRankvectorisnotclosetothestationarydistribution,becauseithassignicantlymoreprobabilitywithinC.CombiningtheseresultsyieldsalocalversionoftheCheegerinequalityforPageRankvectors:ifCisasetwithconductance(C)f(),thenasweepoveraPageRankvectorpr(;v)ndsasetwithconductanceatmost,providedthatissetcorrectlydependingon,andthatvisoneofasignicantnumberofgoodstartingverticeswithinC.Thisholdsforafunctionf()thatsatisesf()= (2=logm).Usingtheresultsdescribedabove,weproducealocalpartitioningalgorithmPageRank-NibblewhichimprovesboththerunningtimeandapproximationratioofNibble.PageRank-Nibbletakesasinputastartingvertexv,atargetconductance,andanintegerb2[1;logm].WhenvisagoodstartingvertexforasetCwithconductance(C)g(),thereisatleastonevalueofbwherePageRank-NibbleproducesasetSwiththefollowingproperties:theconductanceofSisatmost,thevolumeofSisatleast2b1andatmost(2=3)vol(G),andtheintersectionofSandCsatisesvol(S\C)2b2.Thisholdsforafunctiong()thatsatisesg()= (2=log2m).TherunningtimeofPageRank-NibbleisO(2blog3m=2),whichisnearlylinearinthevolumeofS.Incomparison,theNibblealgorithmrequiresthatChaveconductanceO(3=log2m),andrunsintimeO(2blog4m=5).PageRank-NibblecanbeusedinterchangeablywithNibble,leadingimmediatelytofasteralgorithmswithimprovedapproximationratiosinseveralapplications.Inparticular,weobtainanalgorithmPageRank-Partitionthatndscutswithsmallconductanceandapproximatelyoptimalbalance:ifthereexistsasetCsatisfying(C)g()andvol(C)1 2vol(G),thenthealgorithmndsasetSsuchthat(S)and1 2vol(C)vol(S)5 6vol(G),intimeO(mlog4m=3).Thisholdsforafunctiong()thatsatisesg()= (2=log2m).2 Weremarkthatvol(V)=2m,andwewillsometimeswritevol(G)inplaceofvol(V).Theedgeboundaryofasetisdenedtobe@(S)=ffx;yg2Ejx2S;y62Sg;andtheconductanceofasetis(S)=j@(S)j min(vol(S);2mvol(S)):2.3.DistributionsTwodistributionswewillusefrequentlyarethestationarydistribution, S(x)=(d(x) vol(S)ifx2S0otherwise:andtheindicatorfunction,v(x)=1ifx=v0otherwise:TheamountofprobabilityfromadistributionponasetSofverticesiswrittenp(S)=Xx2Sp(x):Wewillsometimesrefertothequantityp(S)asanamountofprobabilityevenifp(V)isnotequalto1.Asanexampleofthisnotation,thePageRankvectorwithteleportationconstantandpreferencevectorviswrittenpr(;v),andtheamountofprobabilityfromthisdistributiononasetSiswritten[pr(;v)](S).ThesupportofadistributionisSupp(p)=fvjp(v)6=0g:2.4.SweepsAsweepisanecienttechniqueforproducingcutsfromanembeddingofagraph,andisoftenusedinspectralpartitioning[11,14].Wewillusethefollowingdegree-normalizedversionofasweep.Givenadistributionp,withsupportsizeNp=jSupp(p)j,letv1;:::;vNpbeanorderingoftheverticessuchthatp(vi) d(vi)p(vi+1) d(vi+1).Thisproducesacollectionofsets,Spj=fv1;:::;vjgforeachj2f0;:::;Npg,whichwecallsweepsets.Welet(p)=minj2[1;Np](Spj)bethesmallestconductanceofanyofthesweepsets.Acutwithconductance(p)canbefoundbysortingpandcomputingtheconductanceofeachsweepset,whichcanbedoneintimeO(vol(Supp(p))logn).2.5.MeasuringthespreadofadistributionWemeasurehowwelladistributionpisspreadinthegraphusingafunctionp[k]denedforallintegersk2[0;2m].Thisfunctionisdeterminedbysettingp[k]=pSpj;forthosevaluesofkwherek=vol(Spj),andtheremainingvaluesaresetbydeningp[k]tobepiecewiselinearbetweenthesepoints.Inotherwords,foranyintegerk2[0;2m],ifjistheuniquevertexsuchthatvol(Spj)kvol(Spj+1),thenp[k]=pSpj+kvol(Spj) d(vj)p(vj+1):Thisimpliesthatp[k]isanincreasingfunctionofk,andaconcavefunctionofk.Itisnothardtoseethatp[k]isanupperboundontheamountofprobabilityfromponanysetwithvolumek;foranysetS,wehavep(S)p[vol(S)]:4 whichensuresthatpisanapproximatePageRankvectorforpr(;v)afteranysequenceofpushoperations.Wenowformallydenepushu,whichperformsthispushoperationonthedistributionspandratachosenvertexu. pushu(p;r):1.Letp0=pandr0=r,exceptforthefollowingchanges:(a)p0(u)=p(u)+r(u).(b)r0(u)=(1)r(u)=2.(c)Foreachvsuchthat(u;v)2E:r0(v)=r(v)+(1)r(u)=(2d(u)).2.Return(p0;r0). Lemma1.Letp0andr0betheresultoftheoperationpushuonpandr.Thenp0+pr(;r0)=p+pr(;r):TheproofofLemma1canbefoundintheAppendix.Duringeachpush,someprobabilityismovedfromrtop,whereitremains,andaftersucientlymanypushesrcanbemadesmall.Wecanboundthenumberofpushesrequiredbythefollowingalgorithm. ApproximatePageRank(v;;):1.Letp=~0,andr=v.2.Whilemaxu2Vr(u) d(u):(a)Chooseanyvertexuwherer(u) d(u).(b)Applypushuatvertexu,updatingpandr.3.Returnp,whichsatisesp=apr(;v;r)withmaxu2Vr(u) d(u). Lemma2.LetTbethetotalnumberofpushoperationsperformedbyApproximatePageRank,andletdibethedegreeofthevertexuusedintheithpush.ThenTXi=1di1 :Proof.Theamountofprobabilityonthevertexpushedattimeiisatleastdi,thereforejrj1decreasesbyatleastdiduringtheithpush.Sincejrj1=1initially,wehavePTi=1di1,andtheresultfollows. ToimplementApproximatePageRank,wedeterminewhichvertextopushateachstepbymain-tainingaqueuecontainingthoseverticesuwithr(u)=d(u).Ateachstep,pushoperationsareperformedontherstvertexinthequeueuntilr(u)=d(u)forthatvertex,whichisthenremovedfromthequeue.Ifapushoperationraisesthevalueofr(x)=d(x)aboveforsomevertexx,thatvertexisaddedtothebackofthequeue.Thiscontinuesuntilthequeueisempty,atwhichpointeveryvertexhasr(u)=d(u).WewillshowthatthisalgorithmhasthepropertiespromisedinTheorem1.TheproofiscontainedintheAppendix.6 Whenalazywalkstepisappliedtothedistributionp,theamountofprobabilitythatmovesfromutovis1 2p(u;v).ForanysetSofvertices,wehavethesetofdirectededgesintoS,andthesetofdirectededgesoutofS,denedbyin(S)=f(u;v)2Eju2Sg;andout(S)=f(u;v)2Ejv2Sg;respectively.Lemma4.Foranydistributionp,andanysetSofvertices,pW(S)1 2(p(in(S)[out(S))+p(in(S)\out(S))):TheproofofLemma4canbefoundintheAppendix.WenowcombinethisresultwiththeinequalityfromLemma3torelateapr(;s;r)toitself.Incontrast,theproofofLovaszandSimonovits[9,10]relatesthewalkdistributionsp(t)andp(t+1),wherep(t+1)=p(t)W,andp(0)=s.Lemma5.Ifp=apr(;s;r)isanapproximatePageRankvector,thenforanysetSofvertices,p(S)s(S)+(1)1 2(p(in(S)[out(S))+p(in(S)\out(S))):Furthermore,foreachj2[1;n1],phvol(Spj)ishvol(Spj)i+(1)1 2phvol(Spj)j@(Spj)ji+phvol(Spj)+j@(Spj)ji:TheproofofLemma5isincludedintheAppendix.ThefollowinglemmausestheresultfromLemma5toplaceanupperboundonapr(;s;r)[k].Moreprecisely,itshowsthatifacertainupperboundonapr(;s;r)[k]k 2mdoesnothold,thenoneofthesweepsetsfromapr(;s;r)hasbothsmallconductanceandasignicantamountofprobabilityfromapr(;s;r).ThislowerboundonprobabilitywillbeusedinSection6tocontrolthevolumeoftheresultingsweepset.Theorem3.Letp=apr(;s;r)beanapproximatePageRankvectorwithjsj11.Letand beanyconstantsin[0;1].Eitherthefollowingboundholdsforanyintegertandanyk2[0;2m]:p[k]k 2m +t+p min(k;2mk)12 8t;orelsethereexistsasweepcutSpjwiththefollowingproperties:1.(Spj);2.pSpjvol(Spj) 2m +t+q min(vol(Spj);2mvol(Spj))12 8t;forsomeintegert,3.j2[1;jSupp(p)j].TheproofcanbefoundintheAppendix.WecanrephrasethesequenceofboundsfromTheorem3toprovethetheorempromisedatthebeginningofthissection.Namely,weshowthatifthereexistsasetofvertices,ofanysize,thatcontainsaconstantamountmoreprobabilityfromapr(;s;r)thanfromthestationarydistribution,thenthesweepoverapr(;s;r)ndsacutwithconductanceroughlyp lnm.WeremarkthatthisappliestoanyapproximatePageRankvector,regardlessofthesizeoftheresidualvector:theresidualvectoronlyneedstobesmalltoensurethatapr(;s;r)islargeenoughthatthetheoremapplies.Theproofisgivenintheappendix.8 6AnalgorithmfornearlylineartimegraphpartitioningInthissection,weextendourlocalpartitioningtechniquestondasetwithsmallconductance,whileprovidingmorecontroloverthevolumeofthesetproduced.TheresultisanalgorithmcalledPageRank-Nibblethattakesascalebaspartofitsinput,runsintimeproportionalto2b,andonlyproducesacutwhenitndsasetwithconductanceandvolumeroughly2b.WeprovethatPageRank-Nibblendsasetwiththesepropertiesforatleastonevalueofb2[1;dlogme],providedthatvisagoodstartingvertexforasetofconductanceatmostg(),whereg()= (2=log2m). PageRank-Nibble(v;;b):Input:avertexv,aconstant2(0;1],andanintegerb2[1;B],whereB=dlog2me.1.Let=2 225ln(100p m).2.ComputeanapproximatePageRankvectorp=apr(;v;r)withresidualvectorrsatisfyingmaxu2Vr(u) d(u)2b1 48B.3.CheckeachsetSpjwithj2[1;jSupp(p)j],toseeifitobeysthefollowingconditions:Conductance:(Spj),Volume:2b1vol(Spj)2 3vol(G),ProbabilityChange:p2bp2b11 48B,4.IfsomesetSpjsatisesalloftheseconditions,returnSpj.Otherwise,returnnothing. Theorem7.PageRank-Nibble(v;;b)canbeimplementedwithrunningtimeO(2blog3m 2).Theorem8.LetCbeasetsatisfying(C)2=(22500log2100m)andvol(C)1 2vol(G),andletvbeavertexinCfor=2=(225ln(100p m)).Then,thereissomeintegerb2[1;dlog2me]forwhichPageRank-Nibble(v;;b)returnsasetS.AnysetSreturnedbyPageRank-Nibble(v;;b)hasthefollowingproperties:1.(S),2.2b1vol(S)2 3vol(G),3.vol(S\C)2b2:TheproofsofTheorems7and8areincludedintheAppendix.PageRank-NibbleimprovesboththerunningtimeandapproximationratiooftheNibblealgo-rithmofSpielmanandTeng,whichrunsintimeO(2blog4m=5),andrequires(C)=O(3=log2m).PageRank-NibblecanbeusedinterchangeablywithNibbleinseveralimportantapplications.Forexample,bothPageRank-NibbleandNibblecanbeappliedrecursivelytoproducecutswithnearlyoptimalbalance.AnalgorithmPageRank-PartitionwiththefollowingpropertiescanbecreatedinessentiallythesamewayasthealgorithmPartitionin[15],soweomitthedetails.Theorem9.ThealgorithmPageRank-Partitiontakesasinputaparameter,andhasex-pectedrunningtimeO(mlog(1=p)log4m=3).IfthereexistsasetCwithvol(C)1 2vol(G)and(C)2=(1845000log2m),thenwithprobabilityatleast1p,PageRank-PartitionproducesasetSsatisfying(S)and1 2vol(C)vol(S)5 6vol(G).10 7AppendixTodemonstratetheequivalenceoflazyandstandardPageRankvectors,letrpr(;s)bethestandardPageRankvector,denedtobetheuniquesolutionpoftheequationp=s+(1)pM,whereMistherandomwalktransitionmatrixM=D1A.Weprovethefollowingproposition.Proposition3.pr(;s)=rpr(2 1+;s).Proof.Wehavethefollowingsequenceofequations.pr(;s)=s+(1)pr(;s)Wpr(;s)=s+(1 2)pr(;s)+(1 2)pr(;s)(D1A)(1+ 2)pr(;s)=s+(1 2)pr(;s)(D1A)pr(;s)=(2 1+)s+(1 1+)pr(;s)mSincepr(;s)satisestheequationforrpr(2 1+;s),andsincethisequationhasauniquesolution,theresultfollows. ProofofProposition1.Theequationp=s+(1)pWisequivalenttos=p[I(1)W].Thematrix(I(1)W)isnonsingular,sinceitisstrictlydiagonallydominant,sothisequationhasauniquesolutionp. ProofofProposition2.Thesuminequation(2)thatdenesRisconvergentfor2(0;1],andthefollowingcomputationshowsthatsRobeysthesteadystateequationforpr(;s).s+(1)sRW=s+(1)s 1Xt=0(1)tWt!W=s+s 1Xt=1(1)tWt!=s 1Xt=0(1)tWt!=sR:SincethesolutiontothesteadystateequationisuniquebyProposition1,itfollowsthatpr(;s)=sR. ProofofLemma1.Afterthepushoperation,wehavep0=p+r(u)u:r0=rr(u)u+(1)r(u)uW:Usingequation(5),p+pr(;r)=p+pr(;rr(u)u)+pr(;r(u)u)=p+pr(;rr(u)u)+[r(u)u+(1)pr(;r(u)uW)]=[p+r(u)u]+pr(;[rr(u)u+(1)r(u)uW])=p0+pr(;r0):12 Thisprovestherstpartofthelemma.Toprovethesecondpart,recallthatphvol(Spj)i=p(Spj)foranyintegerj2[0;n].Also,foranysetofdirectededgesA,wehavetheboundp(A)p[jAj].Therefore,phvol(Spj)i=p(Spj)s(Spj)+(1)1 2pin(Spj)[out(Spj)+pin(Spj)\out(Spj)shvol(Spj)i+(1)1 2phin(Spj)[out(Spj)i+phin(Spj)\out(Spj)i:Allthatremainsistoboundthesizesofthesetsintheinequalityabove.Noticethatin(Spj)[out(Spj)j+jin(Spj)\out(Spj)=2vol(Spj);andin(Spj)[out(Spj)in(Spj)\out(Spj)=2j@(Spj)j:Thisimpliesthatin(Spj)[out(Spj)=vol(Spj)+j@(Spj)j;andin(Spj)\out(Spj)=vol(Spj)j@(Spj)j:Theresultfollows. ProofofTheorem3.Letkj=vol(Spj),let kj=min(kj;2mkj),andletft(k)= +t+p min(k;2mk)12 8t:Assumingthattheredoesnotexistasweepcutwithallofthepropertiesstatedinthetheorem,wewillprovebyinductionthatthefollowingholdsforallt0:p[k]k 2mft(k),foranyk2[0;2m]:(7)Forthebasecase,equation(7)holdsfort=0,withanychoiceof and.Toseethis,noticethatforeachintegerk2[1;2m1],p[k]k 2m1p min(k;2mk)f0(k):Fork=0andk=2mwehavep[k]k 2m0f0(k).Theclaimfollowsbecausef0isconcave,p[k]islessthanf0foreachintegervalueofk,andp[k]islinearbetweentheseintegervalues.Assumeforthesakeofinductionthatequation(7)holdsfort.Toprovethatequation(7)holdsfort+1,whichwillcompletetheproofofthetheorem,itsucestoshowthatthefollowingequationholdsforeachj2[1;jSupp(p)j]:p[kj]kj 2mft+1(kj):(8)14 ProofofTheorem2.Let=(apr(;s;r)).Theorem3impliesapr(;s;r)(S)vol(S) vol(G)t+p min(vol(S);2mvol(S))12 8t;foranyintegert0andanyk2[0;2m].Ifwelett=d8 2ln2p m e,thenwehaveapr(;s;r)(S)vol(S) vol(G)d8 2ln2p m e+ 2;whichimplies 2d8 2ln2p m e9 2lnm:Theresultfollowsbysolvingfor. ProofofLemma6.WerstprovethefollowingmonotonicitypropertyforthePageRankoper-ator:foranystartingdistributions,andanyk2[0;2m],pr(;s)[k]s[k]:(9)ThisisaconsequenceofLemma5;ifweletp=pr(;s),thenforeachj2[1;n1]wehavephvol(Spj)ishvol(Spj)i+(1)1 2phvol(Spj)j@(Spj)ji+phvol(Spj)+j@(Spj)jishvol(Spj)i+(1)phvol(Spj)i;wherethelastlinefollowsfromtheconcavityofp[k].Thisimpliesthatpr(;s)[kj]s[kj],wherekj=vol(Spr(;s)j),foreachj2[1;n1].Theresultfollows,sinces[k]isconcave,andpr(;s)[k]islinearbetweenthepointswherek=kj.TheamountofprobabilitythatmovesfromCtoCinthestepfrompr(; C)topr(; C)Wisboundedby1 2pr(; C)[j@(C)j],sincej@(C)jisthenumberofdirectededgesfromCtoC.Bythemonotonicityproperty,pr(; C)[j@(C)j] C[j@(C)j]=j@(C)j vol(C)=(C):UsingtherecursivepropertyofPageRank,[pr(; C)]C=[ C+(1)pr(; C)W]C(1)[pr(; C)]C+1 2pr(; C)[j@(C)j](1)[pr(; C)]C+1 2(C):Thisimplies[pr(; C)](C)(C) 2: 16 ProofofTheorem7.AnapproximatePageRankvectorp=apr(;v;r),withresidualvectorrsatisfyingmaxu2Vr(u) d(u)2b 48B,canbecomputedintimeO(2blogm )usingApproximatePageRank.ByTheorem1,wehavevol(Supp(p))=O(2blogm ).Itispossibletocheckeachoftheconditionsinstep4,foreverysetSpjwithj2[1;jSupp(p)j],intimeO(vol(Supp(apr(;v;r)))logn)=O(2blog2m ):Therefore,therunningtimeofPageRank-NibbleisO(2blog2m )=O(2blog3m 2): ProofofTheorem8.ConsiderthePageRankvectorpr(;v).SincevisinC,andsince(C) 1 96B,wehavepr(;v)[vol(C)]vol(C) 2m(1(C) )1 21 21 96:Wehavesetsothatt1=25whent=d8 2ln(100p m)e,andwiththischoiceoftwehavet+p min(vol(C);2mvol(C))12 8t1 25+1 100:Since1 21 965 12+1 25+1 100,thefollowingequationholdswith =5 12.pr(;v)[vol(C)]vol(C) 2m +t+p min(vol(C);2mvol(C))12 8t:(10)LetB=dlog2me.Foreachintegerbin[1;B],let b= (9 10+1 10b B).Considerthesmallestvalueofbin[1;B]forwhichthefollowingequationholdsforsomek2b.pr(;v)[k]k 2m b+t+p min(k;2mk)12 8t;forsomeintegert0:(11)Equation(10)showsthatthisequationholdswithb=Bandk=m.Letb0bethesmallestvalueofbforwhichthisequationholds,andletk0besomevaluesuchthatk0mandsuchthatthisequationholdswithb=b0andk=k0.Noticethatsb01k0sb0,becauseifequation(11)holdsforb=b0andk=k0,italsoholdsforb=b01andk0.WhenPageRank-Nibbleisrunwithb=b0,theapproximatePageRankvectorapr(;v;r)com-putedbyPageRank-Nibblehasonlyasmallamountoferroronasetofvolumek0:theerrorissmall18 Toprovethatthereisasignicantdierencebetweenapr(;v;r)2b0andapr(;v;r)2b01,observethatequation(12)doesnotholdwithb=b01andk=2b01.Therefore,foreveryintegert0,apr(;v;r)h2b01ik0 2m b01+t+q min(2b01;2m2b01)12 8t:(13)Wealsoknowthatforsomeintegert,apr(;v;r)[k0]k0 2m( b01+1 48B)+t+p min(k0;2mk0)12 8t:(14)Since2b01k0m,wehavep min(sb01;2msb01)p min(k0;2mk0).Takinganintegertthatmakesequation(14)true,andpluggingthisvalueoftintoequations(14)and(13),yieldsthefollowinginequality.apr(;v;r)h2b0iapr(;v;r)h2b01iapr(;v;r)[k0]apr(;v;r)h2b01i1 48B:WehaveshownthatSjmeetsalltherequirementsofPageRank-Nibble,whichprovesthatthealgorithmoutputssomecutwhenrunwithb=b0.Wenowprovealowerboundonvol(S\C),whichholdsforanycutSoutputbyPageRank-Nibble,withanyvalueofb.Letp0[k]=p[k]p[k1].Sincep0[k]isadecreasingfunctionofk,p0h2b1ip2bp2b1 2b2b11 2(b1)48B:Itisnothardtoseethatcombiningthislowerboundonp02b1withtheupperboundpC(C) givesthefollowingboundonthevolumeoftheintersection.vol(Sj\C)2b1pC p0[2b1]2b12b1(48B(C) ):Sincewehaveassumedthat(C) 1 96B,wehavevol(S\C)2b12b2=2b2: 20