/
EfcientPeertoPeerKeywordSearchingPatrickReynoldsandAminVahdatDepart EfcientPeertoPeerKeywordSearchingPatrickReynoldsandAminVahdatDepart

EfcientPeertoPeerKeywordSearchingPatrickReynoldsandAminVahdatDepart - PDF document

bitsy
bitsy . @bitsy
Follow
343 views
Uploaded On 2021-07-01

EfcientPeertoPeerKeywordSearchingPatrickReynoldsandAminVahdatDepart - PPT Presentation

0 5 10 15 20 25 30 35 0 2 4 6 8 10 of searchesWords per searchFig2NumberofkeywordspersearchoperationintheIRCacheforatendayperiodinJanuary2002Webelievethatenduserlatencyisthemostimportantperfor ID: 851141

virtual hosts false bloom hosts virtual bloom false offcaches caching peer positivescaching 10000 100 jbj scaling rate hit finally

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "EfcientPeertoPeerKeywordSearchingPatrick..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 EfcientPeer-to-PeerKeywordSearchingPatr
EfcientPeer-to-PeerKeywordSearchingPatrickReynoldsandAminVahdatDepartmentofComputerScience,DukeUniversityfreynolds,vahdatg@cs.duke.edu?Abstract.Therecentlestorageapplicationsbuiltontopofpeer-to-peerdis-tributedhashtableslacksearchcapabilities.Webelievethatsearchisanimportantpartofanydocumentpublicationsystem.Tothatend,wehavedesignedandana-lyzedadistributedsearchenginebasedonadistributedhashtable.Oursimulationresultspredictthatoursearchenginecanansweranaveragequeryinunderonesecond,usingunderonekilobyteofbandwidth.Keywords:search,distributedhashtable,peer-to-peer,Bloomlter,caching1IntroductionRecentworkondistributedhashtables(DHTs)suchasChord[19],CAN[16],andPas-try[17]hasaddressedsomeofthescalabilityandreliabilityproblemsthatplaguedearlierpeer-to-peeroverlaynetworkssuchasNapster[14]andGnutella[8].However,theusefulkeywordsearchingpresentinNapsterandGnutellaisabsentintheDHTsthatendeavortoreplacethem.Inthispaper,wepresentasymmetricallydistributedpeer-to-peersearchenginebasedonaDHTandintendedtoserveDHT-basedlestoragesystems.worddocspeersinvertedindexhashrangeFig.1.Distributinganinvertedindexacrossapeer-to-peernetwork.ApplicationsbuiltusingthecurrentgenerationofDHTsrequestdocumentsusinganopaquekey.Themeansforchoosingthekeyisleftfortheappli-cationbuiltontopoftheDHTtodetermine.Forex-ample,theChordFileSystem,CFS[6],useshashesofcontentblocksaskeys.Freenet[5,9],whichsharessomecharacteristicsofDHTs,useshashesoflenamesaskeys.Ineachcase,usersmusthaveasingle,uniquenametoretrievecontent.Nofunc-tionalityisprovidedforkeywordsearches.Thesystemdescribedinthispaperprovideskey-wordsearchfunctionalityforaDHT-basedlesys-temorarchivalstoragesystem,tomapkeywordqueriestotheuniqueroutingkeysdescribedabove.Itdoessobymappingeachkey-wordtoanodeintheDHTthatwillstorealistofdocumentscontainingthatkeyword.Figure1showshowkeywordsintheindexmapintothehashrangeand,inturn,tonodesintheDHT.?ThisresearchissupportedinpartbytheNationalScienceFoundation(EIA-99772879,ITR-0082912),HewlettPackard,IBM,Intel,andMicrosoft.VahdatisalsosupportedbyanNSFCAREERaward(CCR-9984328),andReynoldsisalsosupportedbyanNSFfellowship. 0 5 10 15 20 25 30 35 0 2 4 6 8 10% of searchesWords per searchFig.2.Numberofkeywordspersearchop-erationintheIRCacheforaten-dayperiodinJanuary2002.Webelievethatend-userlatencyisthemostimportantperformancemetricforasearchengine.Mostend-userlatencyinadis-tributedsearchenginecomesfromnetworktransfertimes.Thus,minimizingthenumberofbytessentandthenumberoftimestheyaresentiscrucial.Bothbytesandhopsareeasytominimizeforqueriesthatcanbeans

2 weredbyasinglehost.Mostqueries,however,c
weredbyasinglehost.Mostqueries,however,con-tainseveralkeywordsandmustbeansweredbyseveralcooperatinghosts.Usingatraceof99,405queriessentthroughtheIRCacheproxysystemtoWebsearchenginesduringaten-dayperiodinJanuary2002,wedeterminedthat71.5%ofqueriescontaintwoormorekeywords.TheentiredistributionofkeywordsperqueryisshowninFigure2.Becausemultiple-keywordqueriesdominatethesearchworkload,optimizingthemisimportantforend-userperformance.Thispaperfocusesonminimizingnetworktrafcformultiple-keywordqueries.1.1Non-goalsOneextremelyusefulfeatureofdistributedhashtablesisthattheyprovideasimpleservicemodelthathidesrequestrouting,churncosts,loadbalancing,andunavailability.MostDHTsrouterequeststonodesthatcanservetheminexpectedO(lgn)steps,fornetworksofnhosts.Theykeepchurncosts[11]–thecostsassociatedwithmanagingnodejoinsanddepartures–logarithmicwiththesizeofthenetwork.Usingconsistenthashing[10]theydivideloadroughlyevenlyamongavailablehosts.Finally,theyperformreplicationtoensureavailabilityevenwhenindividualnodesfail.OurdesignusesaDHTasitsbase;thus,itdoesnotdirectlyaddresstheseissues.1.2OverviewThispaperdescribesoursearchmodel,design,andsimulationexperimentsasfollows.InSection2wedescribeseveralaspectsofthepeer-to-peersearchproblemspace,alongwiththepartsoftheproblemspacewechosetoexplore.Section3describesourap-proachtoperformingpeer-to-peersearchesefciently.Section4detailsoursimulationenvironment,andSection5describesthesimulationresults.WepresentrelatedworkinSection6andconcludeinSection7.2SystemModelFundamentally,searchisthetaskofassociatingkeywordswithdocumentidentiersandlaterretrievingdocumentidentiersthatmatchcombinationsofkeywords.Mosttextsearchingsystemsuseinvertedindices,whichmapeachwordfoundinanydocumenttoalistofthedocumentsinwhichthewordappears.Beyondthissimpledescription, ! " # $ %&'*!&'*"&'*"&'*!&'*#&'*%&'*$&'*#&'*+&'*,&'*-&'*.&'*,&'*.&'*-/'&0!/'&0"/'&0# ! " # $ %&'*!&'*"&'*"&'*!&'*#&'*%&'*$&'*#&'*+&'*,&'*-&'*.&'*,&'*.&'*-/'&0!H'riz'ntal:artiti'nin;Fig.3.Ahorizontallypartitionedindexstorespartofeverykeywordmatch-listoneachnode,oftendividedbydocumentidentiers.Herewedividetheindexintodocumentidentiers1-3,4-6,and7-9.Averticallypartitionedindexassignseachkeywordtoasinglenode.manydesigntrade-offsexist.Howwilltheindexbepartitioned,ifatall?Shoulditbedistributed,orwouldacentralizedindexsufce?Inwhatorderwillmatchingdocumentsbelisted?Howaredocumentchangesreectedintheindex?Weaddressthesequestionsbelow.2.1PartitioningAlthoughasufcientlysmallindexneednotbepartitionedatall,ourtargetapplicationisadatasetlargeenough

3 tooverwhelmthestorageandprocessingcapaci
tooverwhelmthestorageandprocessingcapacitiesofanysinglenode.Thus,somepartitioningschemeisrequired.Therearetwostraightforwardpartitioningschemes:horizontalandvertical.Foreachkeywordanindexstores,itmuststoreamatch-listofidentiersforallofthedocumentscontainingthekeyword.Ahorizontallypartitionedindexdividesthislistamongseveralnodes,eithersequentiallyorbypartitioningthedocumentidentierspace.Google[3]operatesinthismanner.Averticallypartitionedindexassignseachkeyword,undivided,toasinglenode.Figure3showsasmallsampleindexpartitionedhorizontallyandvertically,withK1throughK5representingkeywordsanddoc1throughdoc9representingdocumentsthatcontainthosekeywords.Averticallypartitionedindexminimizesthecostofsearchesbyensuringthatnomorethankserversmustparticipateinansweringaquerycontainingkkeywords.Ahor-izontallypartitionedindexrequiresthatallnodesbecontacted,regardlessofthenumberofkeywordsinthequery.However,horizontalindicespartitionedbydocumentidentiercaninsertorupdateadocumentatasinglenode,whileverticallypartitionedindicesre-quirethatuptokserversparticipatetoinsertorupdateadocumentwithkkeywords.Aslongasmoreserversparticipateintheoverlaythantherearekeywordsassociatedwithanaveragedocument,thesecostsfavorverticalpartitioning.Furthermore,inlesys-tems,mostleschangerarely,andthosethatchangeoftenchangeinburstsandmayberemovedshortlyaftercreation,allowingustooptimizeupdatesbypropagatingchangeslazily.Inarchivalstoragesystems,leschangerarelyifatall.Thus,webelievethatquerieswilloutnumberupdatesforourproposeduses,furtherincreasingthecostadvan-tageforverticallypartitionedsystems.Verticallypartitionedindicessendqueriestoaconstantnumberofhosts,whilehor-izontallypartitionedindicesmustbroadcastqueriestoallnodes.Thus,thethroughputofaverticallypartitionedindextheoreticallygrowslinearlyasmorenodesareadded. Querythroughputinahorizontallypartitionedindexdoesnotbenetatallfromaddi-tionalnodes.Thus,wechoseverticalpartitioningforoursearchengine.2.2CentralizedorDistributedOrganizationGooglehashadgreatsuccessprovidingcentralizedsearchservicesfortheWeb.However,webelievethatforpeer-to-peerlesystemsandarchivalstoragenetworks,adistributedsearchserviceisbetterthanacentralizedone.First,centralizedsystemsprovideasinglepointoffailure.Failuresmaybenetworkoutages;denial-of-serviceattacks,asplaguedseveralWebsitesinFebruaryof2000;orcensorshipbydomesticorforeignauthorities.Inallsuchcases,areplicateddistributedsystemmaybemorerobust.Second,manyusesofpeer-to-peerdistributedsystemsdependonusersvoluntarilycontributingcomputingresources.Acentralizedsear

4 chenginewouldconcentratebothloadandtrust
chenginewouldconcentratebothloadandtrustonasmallnumberofhosts,whichisimpracticalifthosehostsarevoluntarilycontributedbyendusers.Bothcentralizedanddistributedsearchsystemsbenetfromreplication.Replica-tionimprovesavailabilityandthroughputinexchangeforadditionalhardwareandup-datecosts.Adistributedsearchenginebenetsmorefromreplication,however,becausereplicasarelesssusceptibletocorrelatedfailuressuchasattacksornetworkoutages.Distributedreplicasmayalsoallownodesclosertoeachotherortotheclienttorespondtoqueries,reducinglatencyandnetworktrafc.2.3RankingofResultsOneimportantfeatureofsearchenginesistheorderinwhichresultsarepresentedtotheuser.Manydocumentsmaymatchagivensetofkeywords,butsomemaybemoreusefultotheenduserthanothers.Google'sPageRankalgorithm[15]hassuccessfullyexploitedthehyperlinkednatureoftheWebtogivehighscorestopageslinkedtobyotherpageswithhighscores.Severalsearchengineshavesuccessfullyusedwords'proximitytoeachotherortothebeginningofthepagetorankresults.Peer-to-peersystemslackthelinkingstructurenecessaryforPageRankbutmaybeabletotakeadvantageofwordpositionorproximityheuristics.WewilldiscussspecicinteractionsbetweenrankingtechniquesandourdesigninSection3.5afterwehavepresentedthedesign.2.4UpdateDiscoveryAsearchenginemustdiscovernew,removed,ormodieddocuments.Websearchen-gineshavetraditionallyreliedonenumeratingtheentireWebusingcrawlers,whichre-sultsineitherlagorinefciencyifthefrequencyofcrawlingdiffersfromthefrequencyofupdatesforagivenpage.Popularle-sharingsystemsusea“push”modelforup-datesinstead:clientsthathavenewormodiedcontentnotifyserversdirectly.Evenwithpushedupdates,theprocessofdeterminingkeywordsandreportingthemtoservershouldoccurautomaticallytoensureuniformity.TheWebcouldsupporteithercrawledorpushedupdates.Crawledupdatesarecur-rentlythenorm.Peer-to-peerservicesmaylackhyperlinksoranyothermechanismforenumeration,leavingthemdependentonpushedupdates.Webelievethatpushedupdatesaresuperiorbecausetheypromotebothefciencyandcurrencyofindexinformation. 2.5PlacementAllstoragesystemsneedtechniquesforplacingandndingcontent.Distributedsearchsystemsadditionallyneedtechniquesforplacingindexpartitions.WeuseaDHTtomapkeywordstonodesfortheindex,andweclaimthattheplacementofcontentisanorthog-onalproblem.Thereislittleornobenettoplacingdocumentsandtheirkeywordsinthesameplace.First,veryfewdocumentsindicatedasresultsforasearcharelaterretrieved;thus,mostlocalitywouldbewasted.Second,thereisnooverlapbetweenanindexentryandthedocumentitindicates;bothstillmustberetrievedandsentoverthenetwork.Asearchengineisalayerof

5 indirection.Itisexpectedthatdocumentsand
indirection.Itisexpectedthatdocumentsandtheirkeywordsmayappearinunrelatedlocations.3EfcientSupportforPeer-to-PeerSearchIntheprevioussection,wediscussedthearchitectureandpotentialbenetsofafullydistributedpeer-to-peersearchinfrastructure.Theprimarycontributionofthisworkistodemonstratethefeasibilityofthisapproachwithrespecttoindividualenduserrequests.Conductingasearchforasinglekeywordconsistsoflookingupthekeyword'smappingintheindextorevealallofthedocumentscontainingthatkeyword.Thisinvolvescon-tactingasingleremoteserver,anoperationwithnetworkcostscomparabletoaccessingatraditionalsearchservice.Aboolean“AND”searchconsistsoflookingupthesetsforeachkeywordandreturningtheintersection.Aswithtraditionalsearchengines,wereturnasmallsubsetofthematchingdocuments.Thisoperationrequirescontactingmul-tiplepeersacrossthewidearea,andtherequisiteintersectionoperationacrossthesetsreturnedbyeachpeercanbecomeprohibitivelyexpensive,bothintermsofconsumednetworkbandwidthandthelatencyincurredfromtransmittingthisdataacrossthewidearea.ConsidertheexampleinFigure4(a),whichshowsasimplenetworkwithserverssAandsB.ServersAcontainsthesetofdocumentsAforagivenkeywordkA,andserversBcontainsthesetofdocumentsBforanotherkeywordkB.jAjandjBjarethenumberofdocumentscontainingkAandkB,respectively.A\BisthesetofalldocumentscontainingbothkAandkB.Theprimarychallengeinperformingefcientkeywordsearchesinadistributedin-vertedindexislimitingtheamountofbandwidthusedformultiple-keywordsearches.Thenaiveapproach,showninFigure4(a),consistsoftherstserver,sA,sendingitsentiresetofmatchingdocumentIDs,A,tothesecondserver,sB,sothatsBcancalcu-lateA\Bandsendtheresultstotheclient.Thisiswastefulbecausetheintersection,A\B,islikelytobefarsmallerthanA,resultinginmostoftheinformationinAget-tingdiscardedatsB.Furthermore,thesizeofA(i.e.,thenumberofoccurrencesofthekeywordkA)scalesroughlywiththenumberofdocumentsinthesystem.Thus,thecostofnaivesearchoperationsgrowslinearlywiththenumberofdocumentsinthesystem.Weproposethreetechniquestolimitwastedbandwidth,toensurescalability,andtore-duceend-clientlatency:Bloomlters,caches,andincrementalresults.WediscusseachoftheseapproachesinturnandpresentanalyticalresultsshowingthepotentialbenetsofeachtechniqueunderavarietyofconditionsbeforeexploringthesetradeoffsinmoredetailthroughsimulationinSection5. serversAserversBclient(1)request(3)A\B(2)AAB34345612341234(a)Asimpleapproachto“AND”queries.Eachserverstoresalistofdoc-umentIDscorrespondingtoonekey-word.1234serversAserversBclient(1)request(4)A\B(2)F(A)(3)B\F(A)12343456BA34346(b)Bloomltershelpr

6 educetheband-widthrequirementof“AND”quer
educetheband-widthrequirementof“AND”queries.ThegrayboxrepresentstheBloomlterF(A)ofthesetA.NotethefalsepositiveinthesetB\F(A)thatserversBsendsbacktoserversA.Fig.4.Networkarchitectureandprotocoloverview3.1BloomltersABloomlter[2,7,13]isahash-baseddatastructurethatsummarizesmembershipinaset.BysendingaBloomlterbasedonAinsteadofsendingAitself,wereducetheamountofcommunicationrequiredforsBtodetermineA\B.Themembershiptestre-turnsfalsepositiveswithatunable,predictableprobabilityandneverreturnsfalsenega-tives.Thus,theintersectioncalculatedbysBwillcontainallofthetrueintersection,aswellasafewhitsthatcontainonlykBandnotkA.ThenumberoffalsepositivesfallsexponentiallyasthesizeoftheBloomlterincreases.Givenoptimalchoiceofhashfunctions,theprobabilityofafalsepositiveispfp=:6185m=n;(1)wheremisthenumberofbitsintheBloomlterandnisthenumberofelementsintheset[7].Thus,tomaintainaxedprobabilityoffalsepositives,thesizeoftheBloomltermustbeproportionaltothenumberofelementsrepresented.OurmethodforusingBloomlterstodetermineremotesetintersectionsisshowninFigure4(b)andproceedsasfollows.AandBarethedocumentsetstointersect,eachcontainingalargenumberofdocumentIDsforthekeywordskAandkB,respectively.TheclientwishestoretrievetheintersectionA\B.ServersAsendsaBloomlterF(A)ofsetAtoserversB.ServersBtestseachmemberofsetBformembershipinF(A).ServersBsendsthematchingelements,B\F(A),backtoserversA,alongwithsometextualcontextforeachmatch.ServersAremovesthefalsepositivesfromsB'sresultsbycalculatingA\(B\F(A)),whichisequivalenttoA\B. FalsepositivesinB\F(A)donotaffectthecorrectnessofthenalintersectionbutdowastebandwidth.Theyareeliminatedinthenalstep,whensAintersectsB\F(A)againstA.ItisalsopossibletosendB\F(A)directlyfromsBtotheclientratherthanrstsendingittosAandremovingthefalsepositives.Doingsoeliminatesthesmallertransferanditsassociatedlatencyattheexpenseofcorrectness.GivenreasonablevaluesforjAj,jBj,thesizeofeachdocumentrecord,andthecachehitrate(seeSection3.2),thefalse-positiveratemaybeashighas0:05oraslowas0:00003.ThismeansthatB\F(A)willhavefrom0:00003jBjto0:05jBjextraelementsthatdonotcontainkA.Forexample,if5%oftheelementsofBactuallycontainkA,thenreturningtheroughintersectionB\F(A)totheclientresultsinbetween0:00003jBj(0:05+0:00003)jBj=0:06%and0:05jBj(0:05+0:05)jBj=50%oftheresultsbeingincorrectandnotactuallycontainingkA,whereeachexpressionrepresentstheratioofthenumberoffalsepositivestothetotalnumberofelementsinB\F(A).Thedecisiontousethisoptimizationismadeatruntime,whentheparametersareknownandpfpcanbepredicted.ServersAmaychooseanmvalueslightlyl

7 argerthanoptimaltoreducepfpandimprovethe
argerthanoptimaltoreducepfpandimprovethelikelihoodthatsBcanreturnB\F(A)directlytotheclient.ThetotalnumberofbitssentduringtheexchangeshowninFigure4(b)ism+pfpjBjj+jA\Bjj,wherejisthenumberofbitsineachdocumentidentier.Forthispaper,weassumethatdocumentidentiersare128-bithashesofdocumentcontents;thus,jis128.Thenalterm,jA\Bjj,isthesizeoftheintersectionitself.Itcanbeig-noredinouroptimization,becauseitrepresentstheresultingintersection,whichmustbesentregardlessofourchoiceofalgorithm.Theresultingtotalnumberofexcessbitssent(i.e.,excludingtheintersectionitself)ism+pfpjBjj:SubstitutingforpfpfromEquation1yieldsthetotalnumberofexcessbitsasm+:6185m=jAjjBjj:(2)TakingtherstderivativewithrespecttomandsolvingforzeroyieldsanoptimalBloomltersizeofm=jAjlog:61852:081jAjjBjj:(3)Figure5(a)showstheminimumnumberofexcessbitssentforthreesetsofvaluesforjAj,jBj,andj.TheoptimalmforanygivenjAj,jBj,andjisuniqueanddirectlydeterminestheminimumnumberofexcessbitssent.Forexample,whenjAjandjBjare10;000andjis128,mis85;734,andtheminimumnumberofexcessbitssentis106;544,representing12:01:1compressionwhencomparedtothecostofsendingall1;280;000bits(10;000documents,eachwitha128-bitID)ofeitherAorB.AsalsoshowninFigure5(a),performanceisnotsymmetricwhenAandBdifferinsize.Withjconstantat128,theminimumnumberofexcessbitsforjAj=2;000andjBj=10;000is28;008,lowerthantheminimumnumberforjAj=10;000andjBj=2;000,whichis73;046.28;008bitsrepresents9:14:1compressionwhencompared 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20Excess traffic sent (KB)Size of filter, m (KB)|A|=10000, |B|=10000, j=128|A|=2000, |B|=10000, j=128|A|=10000, |B|=2000, j=128(a)Expectedexcessbitssentasafunctionofm 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20Excess traffic sent (KB)Size of filter, m (KB)hit rate=0%hit rate=50%hit rate=80%hit rate=90%hit rate=95%(b)ImprovingcachehitratesreducestheamountofdatasentandincreasesthesizeoftheoptimalBloomlter.Fig.5.EffectsofBloomltersizeandcachehitratewiththe256;000bitsneededtosendallofA.Theserverwiththesmallersetshouldalwaysinitiatethetransfer.OurBloomlterintersectiontechniquecanbeexpandedtoarbitrarynumbersofkey-words.ServersAsendsF(A)toserversB,whichsendsF(B\F(A))tosC,andsoon.Thenalserver,sZ,sendsitsintersectionbacktosA.Eachserverthatencodeditstrans-missionusingaBloomltermustprocesstheintersectiononcemoretoremoveanyfalsepositivesintroducedbyitslter.Thus,theintersectionissenttoeachserverex-ceptsZasecondtime.Asabove,theexpectednumberofexcessbitsisminimizedwhenjAjjBjjCj:::jZj.3.2CachesCachingcaneliminatetheneedforsAtosendAorF(A)ifserversBalre

8 adyhasAorF(A)storedlocally.Wederivemoreb
adyhasAorF(A)storedlocally.WederivemorebenetfromcachingBloomltersthanfromcachingen-tiredocumentmatchlistsbecausethesmallersizeoftheBloomrepresentationmeansthatacacheofxedsizecanstoredataformorekeywords.Thebenetofcachingde-pendsonthepresenceoflocalityinthelistofwordssearchedforbyauserpopulationatanygiventime.Toquantifythisintuition,weusethesameten-dayIRCachetracede-scribedinSection1todeterminewordsearchpopularity.Therewereatotalof251,768wordssearchedforacrossthe99,405searches,45,344ofthemunique.Keywordpopu-larityroughlyfollowedaZipfdistribution,withthemostcommonkeywordsearchedfor4,365times.ThedominanceofpopularkeywordssuggeststhatevenasmallcacheofeithertheBloomlterortheactualdocumentlistonAislikelytoproducehighhitrates.WhenserversBalreadyhastheBloomlterF(A)initscache,asearchoperationforthekeywordskAandkBmayskiptherststep,inwhichserversAsendsitsBloomltertosB.Onaverage,aBloomlterwillbeinanotherserver'scachewithprobabilityrequaltothecachehitrate. TheexcessbitsformulainEquation(2)canbeadaptedtoconsidercachehitrate,r,asfollows:(1r)m+:6185m=jAjjBjj(4)Settingthederivativeofthiswithrespecttomtozeroyieldstheoptimalmasm=jAjlog:6185(1r)2:081jAjjBjj:(5)Figure5(b)showstheeffectofcachehitratesontheexcessbitscurves,assumingjAjandjBjareboth10;000andjis128.Eachcurvestillhasauniqueminimum.Forexample,whenthehitrate,r,is0:5,theminimumexcessnumberofbitssentis60;486,representing21:16:1compressionwhencomparedwithsendingAorB.Improvementsinthecachehitratealwaysreducetheminimumexpectednumberofexcessbitsandincreasetheoptimalm.Thereductionintheexpectednumberofexcessbitssentisnearlylinearwithimprovementsinthehitrate.TheoptimalmincreasesbecauseaswebecomelesslikelytosendtheBloomlter,wecanincreaseitssizeslightlytoreducethefalse-positiverate.Evenwiththeseincreasesinm,wecanstorehundredsofcacheentriespermegabyteofavailablelocalstorage.Weexpectsuchcachingtoyieldhighhitratesgivenevenmoderatelocalityintherequeststream.Cacheconsistencyishandledwithasimpletime-to-liveeld.Updatesonlyoccuratakeyword'sprimarylocation,andslightlystalematchlistinformationisacceptable,espe-ciallygiventhecurrentstateofInternetsearchservices,wheresomedegreeofstalenessisunavoidable.Thus,morecomplexconsistencyprotocolsshouldnotbenecessary.3.3IncrementalresultsClientsrarelyneedalloftheresultsofakeywordsearch.Byusingstreamingtransfersandreturningonlythedesirednumberofresults,wecangreatlyreducetheamountofinformationthatneedstobesent.Thisis,infact,criticalforscalability:thenumberofresultsforanygivenqueryisroughlyproportionaltothenumberofdocumentsinthenet

9 work.Thus,thebandwidthcostofreturningall
work.Thus,thebandwidthcostofreturningallresultstotheclientwillgrowlinearlywiththesizeofthenetwork.Bloomltersandcachescanyieldasubstantialconstant-factorimprovement,butneithertechniqueeliminatesthelineargrowthincost.Truncatingtheresultsistheonlywaytoachieveconstantcostindependentofthenumberofdocumentsinthenetwork.Whenaclientsearchesforaxednumberofresults,serverssAandsBcommunicateincrementallyuntilthatnumberisreached.ServersAsendsitsBloomlterinchunksandserversBsendsablockofresults(trueintersectionsandfalsepositives)foreachchunkuntilserversAhasenoughresultstoreturntotheclient.BecauseasingleBloomltercannotbedividedandstillretainanymeaning,wedividethesetAintochunksandsendafullBloomlterofeachchunk.ThechunksizecanbesetadaptivelybasedonhowmanyelementsofAarelikelytobeneededtoproducethedesirednumberofresults.ThisprotocolisshowninFigure6.NotethatsAandsBoverlaptheircommunication:sAsendsF(A2)assBsendsB\F(A1).Thisprotocolcanbeextendedlogicallytomorethantwoparticipants.ChunksarestreamedinparallelfromserversAtosB,fromsBtosC,andsoon.Theprotocolisanincrementalversionofthemulti-serverprotocoldescribedattheendofSection3.1. serversAserversBclientrequestA\BF(A1)F(A2)F(A3)B\F(A1)B\F(A2)B\F(A3)A1A2A3BFig.6.ServerssAandsBsendtheirdataonechunkatatimeuntilthedesiredinter-sectionsizeisreached.Whenthesystemstreamsdatainchunks,cachescanstoreseveralfractionalBlooml-tersforeachkeywordratherthanstoringtheentireBloomlterforeachkeyword.Thisal-lowsserverstoretainordiscardpartialen-triesinthecache.Aservermaygetapartialcachehitforagivenkeywordifitneedsseveralchunksbutalreadyhassomeofthemstoredlocally.Storingonlyafractionofeachkey-word'sBloomlteralsoreducestheamountofspaceinthecachethateachkeywordcon-sumes,whichincreasestheexpectedhitrate.SendingBloomltersincrementallysub-stantiallyincreasestheCPUcostsinvolvedinprocessingasearch.ThecostforserversBtocalculateeachintersectionB\F(Ai)isthesameasthecosttocalculatetheentireinter-sectionB\F(A)atoncebecauseeachelementofBmustbetestedagainsteachchunk.Thisaddedcostcanbeavoidedbysendingcontigu-ousportionsofthehashspaceineachchunkandindicatingtosBwhichfractionofB(describedasaportionofthehashspace)itneedstotestagainstF(A).3.4VirtualhostsOnekeyconcerninapeer-to-peersystemistheinherentheterogeneityofsuchsys-tems.Randomlydistributingfunctionality(e.g.,keywords)acrossthesystemrunstheriskofassigningapopularkeywordtoarelativelyunder-provisionedmachineintermsofmemory,CPU,ornetworkcapacity.Further,nohashfunctionwilluniformlydistributefunctionalityacrossahashrange.Thus,individualmachinesmaybeassigned

10 dispropor-tionatenumbersofkeywords(recal
dispropor-tionatenumbersofkeywords(recallthatkeywordsareassignedtothehostwhoseIDisclosesttoitinthehashrange).Virtualhosts[6]areonetechniquetoaddressthispoten-tiallimitation.Usingthisapproach,anodeparticipatesinapeer-to-peersystemasseverallogicalhosts,proportionaltoitsrequestprocessingcapacity.Anodethatparticipatesasseveralvirtualhostsisassignedproportionallymoreload,addressingheterogeneousnodecapabilities.Thus,anodewithtentimesthecapacityofsomebaselinemeasurewouldbeassignedtenvirtualIDs(whichmeansthatitismappedtotendifferentIDsinthehashrange).Anoptionalsystem-widescalingfactorforeachnode'snumberofvirtualhostsfurtherreducestheprobabilitythatanysinglenodeisassignedadisproportionatelylargeportionofthehashrange.ThiseffectisquantiedinSection5,butconsiderthefollowingexample:with100hostsofequalpower,itislikelythatoneormorehostswillbeassignedsignicantlymorethan1%ofthehashrange.However,withascalingfactorof100,itismuchlesslikelythatanyhostwillbeassignedmuchmorethan1%oftherangebecausean“unlucky”hash(largeportionofthehashregion)foronevirtualhostis likelytobecanceledoutbya“lucky”hash(smallportionofthehashregion)foranothervirtualhostonthesamephysicalnode.3.5DiscussionTwoofthetechniquesdescribedhere,Bloomltersandcaching,yieldconstant-factorimprovementsintermsofthenumberofbytessentandtheend-to-endquerylatency.BloomlterscompressdocumentIDsetsbyaboutoneorderofmagnitude,inexchangeforeitheraddedlatencyoracongurableprobabilityoffalsepositives.Cachingexploitsre-referencingandsharinginthequeryworkloadtoreducetheprobabilitythatdocu-mentIDsetsneedtobesent.However,eventogether,thesetechniquesleavebothbytessentandend-to-endquerytimeroughlyproportionaltothenumberofdocumentsinthesystem.Thethirdtechnique,incrementalresults,reducesthenumberofbytessentandtheend-to-endquerylatencytoaconstantinmostcases.Aslongastheuserwantsonlyaconstantnumberofresults,onlyaconstantamountofworkwillbedone,regardlessofhowmanypossibleresultsexistinthesystem.Incrementalresultsyieldnoimprovementinsomeunusualcases,however.Iftheusersearchesforseveralkeywordsthatareindi-viduallypopularbutmostlyuncorrelatedinthedocumentspace,theremaybeasmallbutnonzeronumberofvalidresults.1Ifthenumberofresultsisnonzerobutsmallerthanthenumberthattheclientrequests,thesystemmustconsidertheentiresearchspace,renderingincrementalresultsuseless.Incasessuchasthis,theentiresearchspacemustbeconsidered,andincrementalresultswillincrease,ratherthandecrease,thenumberofbytessentandtheend-to-endquerylatency.However,cachingmayalleviatetheproblemifthewordsusedarepopularinsearchqueries,and

11 Bloomltersstillyieldapproximatelyaten-t
Bloomltersstillyieldapproximatelyaten-to-onecompressionfactor.Weexpectthatsearchescontainingpopularbutuncorrelatedkeywordswillberare.InourIRCachesearchtrace,mostofthequerieswithsmallnumbersofresultshaduncom-mon(oftenmisspelled)keywords.Uncommonkeywords—i.e.,thosewithfewmatchingdocuments—areeasytohandle,asdiscussedinSection3.1.Thesystemconsiderstheleastcommonkeywordrst,boundingthemaximumsizeofanyintersectionsetsentfortheremainderofthequery.3.6RankingofResultsTwoofouroptimizationtechniques,Bloomltersandincrementalresults,complicateproblemofrankingresults.Bloomltersroughlyconveymembershipinaset,buttheydonotprovidetheabilitytoordersetmembersortoconveyadditionaldatawitheachmember,suchasaword'spositioninadocument.Theuncompressedresponsemes-sagecontainingB\F(A)cancontaindocument-rankingorword-positioninformation,whichwouldgiveserversAenoughinformationtogeneraterankingsbasedonbothkey-words,kAandkB.However,inSection3.1,wesuggestedeliminatingthisuncompressed1Oneexampleofadifcultsearchis“OpenBSDbirthdaypony,”suggestedbyDavidMazieresatNewYorkUniversity.InrecentGooglesearches,thesethreekeywordsmatchtwomillion,eightmillion,andtwomilliondocuments,respectively.Onlyfteendocumentscontainallthree. responsemessage.DoingsoeliminatestheabilitytoconsiderkAinanyrankingtech-niques.IncrementalresultscanalleviatetheproblemswithBloomlters.IfeachchunksentcontainsdocumentIDswithstrictlylowerrankingsthaninpreviouschunks,thentherstresultsreturnedtotheclientwillbethebest,thoughorderwithinachunkwillnotbepreserved.However,inSection3.3wesuggestedsendingcontiguousportionsofthehashspaceineachchunktosaveprocessingtimeonserversB.Thesetwotechniquesaremutuallyexclusive.Webelievethatrankingdocumentsismoreimportantthaneliminatingoneadditionalmessageorsavingprocessingtime.However,thistrade-offcanbedeterminedatruntimeaccordingtouserpreference.3.7LoadbalancingAverticallypartitionedindexdistributeskeywordsrandomly,resultinginabinomial(roughlynormal)distributionofthenumberofkeywordsoneachnode.However,key-wordappearancepopularity(i.e.,thesizeofthekeyword'smatch-list)andsearchpopu-larityarebothroughlyZipf-distributed.Keywordappearancepopularitydeterminesthestoragerequired,andkeywordsearchpopularitydeterminesprocessingloads.Bothcon-tributetonetworkloads.Theresultingstorage,processing,andnetworkloadsarelessevenlydistributedthanwithahorizontallypartitionedindex.Virtualhostsalleviatetheproblembyassigninglargerloadstomorecapablenodes,buttheydonotmakeloadanymorebalanced.Increasingthesizeofthenetworkandthenumberofdocumentsresultsinsomewhatmorebal

12 ancedload.Aslongasthenetworkisover-provi
ancedload.Aslongasthenetworkisover-provisioned,whichmanypeer-to-peernetworksare,webelievethatloadbalancingwillnotbeaproblem.4SimulationInfrastructureThesimpleanalysisdescribedaboveinSection3providessomeinsightintothepotentialbenetsofourthreeapproachestowardefcientlysupportingpeer-to-peersearch.How-ever,theactualbenetsandtradeoffsdependheavilyupontargetsystemcharacteristicsandaccesspatterns.Totestthevalidityofourapproachunderarangeofrealisticcircum-stances,wedevelopedasimulationinfrastructureimplementingourthreetechniques.Inthissection,wediscussthedetailsofthissimulationinfrastructurebeforepresentingtheresultsofourevaluationinSection5.4.1GoalsOurgoalinwritingthesimulatorwastotestthesystemwitharealisticworkloadandtotesttheeffectsofparametersandfeaturesthatdidnotlendthemselvestotractableanalysis.Inparticular,wetestedtheeffectsofthenumberofhostsinthenetwork,theuseofvirtualhosts,theBloomlterthreshold,Bloomltersizes,cachingtechniques,andtheuseofincrementalresults.Wealsotestedthesystem'ssensitivitytovaryingnetworkcharacteristics. TheBloomlterthresholdreferstothedocumentsetsizebelowwhichahosttrans-mitsafulllistratherthanaBloom-compressedset.Forsmalldocuments,thetotalband-widthconsumedfortransmissiontoaremotehost(forsetintersection)maybesosmallthatitmaynotbeworththeCPUtimerequiredtocompresstheset.EliminatingtheBloomstepfurthereliminatestheneedtoreturntothetransmittinghosttoeliminatefalsepositivesfromtheintersection.Typically,wendthattheextraCPUoverheadandnetworkoverheadofreturningtheresultisworththesubstantialsavinginnetworkband-widthrealizedbyusingBloomlters.InSection5,wequantifythiseffectforavarietyofBloomthresholds.Bloomltersizesaffectthenumberoffalsepositivestransmittedduringthesearchprocess.Iftheclientiswillingtoacceptsomeprobabilityoffalsepositives(areturneddocumentcontainingonlyasubsetoftherequestedkeywords),sufcientlylargeBloomlterscanmeettheclient'sacceptedfalse-positiverateandeliminatetheneedtorevisitnodestoremovefalsepositives,asdescribedinSection3.1.Thatis,smallBloomltersresultinsignicantcompressionofakeyword-setsizeatthecostofeithergeneratingmorefalsepositivesintheresultreturnedtotheclientorrequiringthetransmissionoftheintersectionbacktotheoriginatinghostforfalsepositiveelimination.4.2DesignThesimulatorrunsasasingle-threadedJavaapplication.Weimplementtheinvertedindex,word-to-hostmapping,andhostmeasurement(inthiscase,randomgeneration)inseparateclassessothatmuchofthesimulatorcouldbereusedinafullimplementationofourprotocol.Oursimulationsusearealdocumentsetandsearchtrace.Thedocumentset

13 totals1.85GBofHTMLdata,comprising1.17mil
totals1.85GBofHTMLdata,comprising1.17millionuniquewordsin105,593documents,retrievedbycrawlingtoarecursiondepthofvefrom100seedURLs[4].Thesearchesperformedarereadfromalistof95,409searchescontaining45,344uniquekeywords.ThesearchtraceistheIRCachelogledescribedinSection1.Notethattheresultspresentedinthispaperarerestrictedtotheseparticulartraces.However,wedonotexpectthebenetsofourtechniquestodiffersignicantlyforotherworkloads.Hostsinthenetworkaregeneratedatrandombasedoncongurabledistributionsforuploadspeed,downloadspeed,CPUspeed,andlocalstoragecapacity.Weusethreedis-tributionsfornetworkspeeds:onewithallmodems,onewithallbackbonelinks,andonebasedonthemeasurementsoftheGnutellanetworkperformedbySaroiuetal[18].Thislastheterogeneoussetcontainsamixtureofmodems,broadbandconnections(ca-ble/DSL)andhigh-speedLANconnections.OurCPUspeeddistributionisroughlyabellcurve,withameanof750MIPS,andourlocalstoragedistributionisaheavy-tailedpiece-wisefunctionrangingfrom1MBto100MB.Weexperimentedwithabroadrangeofhostcharacteristicsandpresenttheresultsforthisrepresentativesubsetinthispaper.Togeneraterandomlatencies,weplacehostsatrandomina2,500-milesquaregridandassumethatnetworkpacketstravelanaverageof100,000milespersecond.Thetimerequiredtosendanetworkmessageisthepropagationtime,asdeterminedbythedistancebetweenthehostsinvolved,plusthetransmissiontime,asdeterminedbytheminimumofthesender'suploadspeedandtherecipient'sdownloadspeed,andthesizeofthepacket.Thetotalnetworktimeforasearchisthesumofthelatencyandtransmissiontimeforallpacketssentamongservernodesprocessingthequery.We ignorethetimespentbytheclientsendingtheinitialqueryandreceivingtheresultsbecausethesetimesareconstantandindependentofanysearcharchitecture,whethercentralizedordistributed.DocumentIDsareassumedtobe128bits.ThetimerequiredtolookupwordsinalocalindexorperformintersectionsorBloomlteroperationsisbasedontheCPUspeedandthefollowingassumptionsforoperationcosts:1,500simpleoperationsperhittolookupwordsinanindex,500simpleoperationsperelementtointersecttworesultsets,and10,000simpleoperationsperdocumentIDinsertedintoaBloomlterorcheckedagainstaBloomlterreceivedfromanotherhost.Webelievethatingeneral,theseassumptionsplaceanupperboundontheCPUcostoftheseoperations.Evenwiththeseassumptions,wendthatnetworktimetypicallydominatesCPUtimeforourtargetscenarios.WedeterminethenumberofvirtualhoststoassigneachsimulatednodebasedonitsnetworkandCPUspeedswhencomparedtoabaselinehost.Thebaselinehosthasa57.5MIPSCPUand30Kbit/snetworklinks.Thesespeedswerechosenasthoserequiredtocomputeandtransmit5,

14 000Bloomoperationspersecond.Eachnodeisco
000Bloomoperationspersecond.Eachnodeiscomparedtothebaselinehostinthreecategories:uploadspeed,downloadspeed,andCPUspeed.Thenodes'sminimummarginoverthebaselinehostinthesethreecategoriesisroundeddownandtakentobeitsnumberofvirtualhosts.Toperformeachquery,thesimulatorlooksupeachkeywordintheinvertedindex,obtaininguptoMresultsforeach,whereMistheincrementalresultsize.Eachhostintersectsitssetwiththedatafromtheprevioushostandforwardsittothesubsequenthost,asdescribedinSection3.1.EachnodeforwardsitscurrentintersectedsetaseitheraBloomlterorafullset,dependingonwhetherornotthesetislargerthantheBloomthreshold.Aftereachpeerperformsitspartoftheintersection,anynodethatsentaBloomlterintherstpassispotentiallyrevisitedtoremovefalsepositives.Ifthenumberofresultingdocumentsisatleastaslargeasthethedesirednumber,thesearchisover.Otherwise,Misincreasedadaptivelytotwicewhatappearstobeneededtoproducethedesirednumberofresults,andthesearchisrerun.Ateachstep,ahostchecksitscachetoseeifithasdataforthesubsequenthost'sdocumentlistinitslocalcache.Ifso,itperformsthesubsequenthost'sportionoftheintersectionlocallyandskipsthathostinthesendingsequence.4.3ValidationWevalidatedoursimulatorintwoways.First,wecalculatedthebehaviorandperfor-manceofshort,articialtracesbyhandandconrmedthatthesimulatorreturnsthesameresults.Second,wevariedtheBloomltersize,m,inthesimulatorandcomparedtheresultstotheanalyticalresultspresentedinSection3.1.TheanalyticalresultsshowninFigure5(b)closelyresemblethesimulatedresultsshowninFigure9(a).5ExperimentalResultsThegoalofthissectionistounderstandtheperformanceeffectsofourproposedtech-niquesonapeer-to-peersearchinfrastructure.Ideally,wewishtodemonstratethatourproposedpeer-to-peersearchsystemscaleswithsystemsize(totalresourceconsumption 0 2 4 6 8 10 12 14 1 10 100 1000 10000Traffic per query (KB)Number of hostsNo virtual hostsVirtual hosts, scaling=1Virtual hosts, scaling=10Virtual hosts, scaling=100(a)Thenumberofbytessentincreasesverylittlebeyondnetworksof100hosts.En-ablingvirtualhostsreducesthenumberofbytessentbyabout18%.Scalingthenum-berofvirtualhostsreducesthenumberofbytessentbyanadditional18%. 0 100 200 300 400 500 600 1 10 100 1000 10000Total network time per query (ms)Number of hostsNo virtual hostsVirtual hosts, scaling=1Virtual hosts, scaling=10Virtual hosts, scaling=100(b)Virtualhostscuttheamountoftimespenttransmittingbyupto60%.Scalingthenumberofvirtualhostsyieldsasmalladdi-tionalimprovement.Fig.7.Networkscalingandvirtualhostspersearchgrowssub-linearlywiththenumberofparticipatinghosts)andthattechniquessuch

15 asBloomltersandcachingimprovetheperform
asBloomltersandcachingimprovetheperformanceofindividualrequests.Pri-marily,wefocusonthemetricofbytessentperrequest.TechniquessuchascachingandtheuseofBloomlterslargelyservetoreducethismetric.Reducingbytesperrequesthastheaddedbenetofreducingtotaltimespentinthenetworkandhenceend-to-endclientperceivedlatency.WealsostudytheeffectsofthedistributionofnetworkandCPUcharacteristicsonoverallsystemperformance.Onechallengewithpeer-to-peersystemsisaddressingthesubsetofhoststhathavesignicantlylesscomputationpowerandnet-workbandwidththanisrequiredtosupportahigh-performancesearchinfrastructure.Finally,althoughweimplementedincrementalresults,wedonotpresentresultsforthistechniqueherebecauseourtargetdocumentsetisnotlargeenoughtoreturnlargenumbersofhitsformostqueries.Forourworkload,thisoptimizationreducesnetworkutilizationbyatmost30%inthebestcase.However,webelievethistechniquewillbeincreasinglyvaluableasthedocumentspaceincreasesinsize.5.1ScalabilityandVirtualHostsAkeygoalofourworkistodemonstratethatapeer-to-peersearchinfrastructurescaleswiththenumberofparticipatinghosts.Unlessotherwisespecied,theresultspresentedinthissectionallassumetheheterogeneousdistribution[18]ofper-peernetworkcon-nectivityandthedefaultdistributionofCPUpowerdescribedinSection4.CachingandBloomltersarebothinitiallyturnedoff.AsshowninFigure7(a),increasingthenumberofhostsinthesimulationhaslittleeffectonthetotalnumberofbytessent.Withverysmallnetworks,severalkeywordsfromaquerymaybelocatedonasinglehost,resultinginentirelylocalhandlingofpartsofthequery.However,beyond100hosts,thisprobabil- itybecomesinsignicant,andeachn-keywordquerymustcontactnhosts,independentofthesizeofthesystem.Inadditiontodemonstratingthescalabilityofthesystem,Figures7(a)and7(b)alsoquantifythebenetsoftheuseofvirtualhostsinthesystem.Recallthatwhenvirtualhostsareturnedon,eachnodeisassignedanumberofhostsbasedonitscapacityrelativetothepredenedbaselinedescribedinSection4.ThevirtualhostscalingfactorfurthermultipliesthisnumberofhostsbysomeconstantvaluetoensurethateachphysicalhostisassignedauniformportionoftheoverallhashrangeasdiscussedinSection4.Overall,virtualhostshaveasmalleffectonthenumberoftotalbytessentperquery.Thisisbecauseenablingvirtualhostsconcentratesdatamostlyonpowerfulhosts,increasingtheprobabilitythatpartsofaquerycanbehandledentirelylocally.Virtualhostscalingresultsinbetterexpectedloadbalancing,whichveryslightlydecreasestheamountofdatathatmustbesentonaverage.Althoughvirtualhostshavelittleeffectonhowmuchdatamustbesent,theycansignicantlydecreasetheamountoftimespentsendingt

16 hedata,asshowninFigure7(b).Byassigningmo
hedata,asshowninFigure7(b).Byassigningmoreloadtomorecapablehosts,thevirtualhoststechniquecancutnetworktimesbynearly60%.Usingvirtualhostscalingfurtherdecreasesexpectednetworktimesbyreducingtheprobabilitythatabottleneckhostwillbeassignedadisproportionateamountofloadbymistake.Thus,whiletotalbytessentdecreasesonlyslightlyasaresultofbetterloadbalancing,totalnetworktimedecreasessignicantlybecausemorecapablehosts(withfasternetworkconnections)becomeresponsibleforalargerfractionofrequests.5.2BloomFiltersandCaching 0 2 4 6 8 10 12 14 1 10 100 1000 10000 100000Traffic per query (KB)Bloom filter threshholdFig.8.UsingBloomlterslessoftensigni-cantlyreducestheamountofdatasentbyelim-inatingtheneedtorevisitnodestoeliminatefalsepositives.Havingestablishedthescalabilityofourgeneralapproach,wenowturnourat-tentiontotheadditionalbenetsavailablefromtheuseofBloomlterstoreducenet-workutilization.Inparticular,wefocusonhowlargetheBloomltershouldbeandforwhatminimumdatasetsizeitshouldbeinvoked.UsingBloomltersforeverytransferresultsinsubstantialunnecessarydatatransmissions.AnytimeaBlooml-terisused,thehostusingitmustlaterre-visitthesamequerytoeliminateanyfalsepositives.Thus,Bloomltersshouldonlybeusedwhenthetimesavedwilloutweighthetimespentsendingtheclean-upmes-sage.Figure8showsthetotalbytestrans-mittedperqueryasafunctionoftheBloomlterthreshold,assumingthedefaultvalueof6bitsperBloomentry.WendthattheoptimalBloomlterthresholdforourtracewasapproximately300.AnysetbelowthissizeshouldbesentinitsentiretyasthesavingsfromusingBloomltersdonotoutweighthenetwork(nottomentionlatency)overheadofrevisitingthehosttoeliminatefalsepositives. 0 1 2 3 4 5 6 7 8 0 5 10 15 20 25 30 35 40Traffic per query (KB)Bloom filter size (bits/entry)No caching, 10% false positivesNo caching, 1% false positivesNo caching, 0% false positivesCaching, 10% false positivesCaching, 1% false positivesCaching, 0% false positives(a)Bytesperquery 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 5 10 15 20 25 30 35 40Total network time (s)Bloom filter size (bits/entry)No caching, 10% false positivesNo caching, 1% false positivesNo caching, 0% false positivesCaching, 10% false positivesCaching, 1% false positivesCaching, 0% false positives(b)LatencyplustransmissiontimeFig.9.NetworkcostsasafunctionofBloomltersizeNext,weconsidertheeffectsofvaryingthenumberofbitsperentryintheBloomlterandofcachingontotalnetworktrafc.Figure9(a)plotsthetotalnumberofbytestransmittedasafunctionoftheBloomltersize.Thetwosetsofcurvesrepresentthecasewhenweenableanddisablecaching.Wit

17 hineachset,wesetamaximumrateofallowablef
hineachset,wesetamaximumrateofallowablefalsepositivesinthesetofdocumentsreturnedtotheuserforaparticularquery,at0%,1%,and10%.Whentheclientallows1%or10%falsepositives,false-positiveremovalstepsmaysometimesbeeliminated;increasingtheBloomltersizeenhancesthiseffect.Figure9(b)showsthatallowingfalsepositiveshassignicantlymoreeffectonvaryingtotalnetworktimethanitdoesonbytestransferredasiteliminatesanumberofrequiredmessagetransmissions.TheeffectsofcachingshowninFigure9(a)aresimilartothosederivedanalyticallyinFigure5(b).CachingdecreasesthetotalamountofdatasentandincreasestheoptimalBloomltersize:inthiscase,from18bitsperentryto24bitsperentry.ForoptimalBloomltersizesof18and24bitsperentryintheno-cachingandcachingcasesrespec-tively,ourcachingtechniqueintroducesmorethana50%reductioninthetotalnumberofbytestransmittedperquery.5.3PuttingItAllTogetherWenowpresenttheend-to-endaveragequerytimesconsideringallofouroptimizationsunderavarietyofassumednetworkconditions.Webreakdownthisend-to-endtimeintothethreeprincipalcomponentsthatcontributetoend-to-endlatency:CPUprocessingtime,networktransmissiontime(bytestransferreddividedbythespeedoftheslowernetworkconnectionspeedofthetwocommunicatingpeers),andlatency(determinedbythedistancebetweencommunicatingpeers).RecallfromSection4thatwedonotmeasurethetimeassociatedwitheithertheclientrequestorthenalresponseasthesizeofthesemessagesisindependentofouroptimizationtechniques.Figure10showsthreebarchartsthatbreakdowntotalend-to-endsearchtimeunderthethreenetworkconditionsdescribedinSection4:WAN,Heterogeneous,andMo-dem.Foreachnetworksettingtherearefourindividualbars,representingtheeffectsofvirtualhostsonoroffandofcachingonoroff.Eachbarisfurtherbrokendowninto networktransmissiontime,CPUprocessingtime,andnetworklatency.Inthecaseofanall-modemnetwork,end-to-endquerytimeisdominatedbynetworktransmissiontime.Theuseofvirtualhostshasnoeffectonquerytimesbecausethenetworksetishomo-geneous.Cachingdoesreducethenetworktransmissionportionbyroughly30%.Allqueriesstillmanagetocompletein1secondorlessbecause,asshowninFigure9(a)theuseofallouroptimizationsreducesthetotalbytestransferredperquerytolessthan1,000bytesforourtargetworkload;a56Kmodemcantransfer6KB/secinthebestcase.However,ourresultsarelimitedbythefactthatoursimulatordoesnotmodelnetworkcontention.Ingeneral,weexpecttheper-queryaveragetobeworsethanourreportedresultsifanyindividualnode'snetworkconnectionbecomessaturated.Thislimitationissignicantlymitigatedunderdifferentnetworkconditionsasindividualnodesaremorelikelytohaveadditionalbandwidthavailableand

18 theuseofvirtualhostswillspreadtheloadtoa
theuseofvirtualhostswillspreadtheloadtoavoidunderprovisionedhosts.100200300400500600700800900100011001200130014001500160017001800190020002100Bloom filters, caches, and virtual hosts offcaches off, virtual hosts offcaches on, virtual hosts offcaches off, virtual hosts oncaches on, virtual hosts onWANBloom filters, caches, and virtual hosts offcaches off, virtual hosts offcaches on, virtual hosts offcaches off, virtual hosts oncaches on, virtual hosts onHeterogeneousBloom filters, caches, and virtual hosts offcaches off, virtual hosts offcaches on, virtual hosts offcaches off, virtual hosts oncaches on, virtual hosts onModemsTotal time per query (ms)CPU timeLatencyTransmission timeFig.10.Isolatingtheeffectsofcaching,virtualhosts,anddifferentnetworkcharacteristicsforop-timalBloomthreshold(300)andBloomltersizes(18/24forcachingonoroff).InthehomogeneousWANcase,networktimeisnegligibleinallcasesgiventheveryhightransmissionspeeds.Theuseofcachingreducesla-tencyandCPUtimeby48%and30%,respectively,byavoidingtheneedtocalculateandtransmitBloomltersinthecaseofacachehit.Enablingvir-tualhostsreducestheCPUtimebyconcentratingrequestsonthesubsetofWANnodeswithmoreCPUprocessingpower.RecallthatalthoughthenetworkishomogeneousinthiscasewestillhaveheterogeneityinCPUprocessingpowerasdescribedinSection4.Finally,theuseofvirtualhostsandcachingtogetherhasthemostpro-nouncedeffectontheheterogeneousnetwork,togetherreducingaverageper-queryresponsetimesby59%.Inpar-ticular,theuseofvirtualhostsreducesthenetworktransmissionportionofav-eragequeryresponsetimesby48%byconcentratingkeywordsonthesubsetofnodeswithmorenetworkbandwidth.Cachinguniformlyreducesallaspectsoftheaveragequerytime,inparticularreducingthela-tencycomponentsby47%ineachcasebyeliminatingtheneedforasignicantportionofnetworkcommunication.6RelatedWorkWorkrelatedtoourscanbedividedintofourcategories:therstgenerationofpeer-to-peersystems;thesecond-generation,basedondistributedhashtables;Websearchen- gines;anddatabasesemijoinreductions.WedealtwithDHT-basedsystemsinSection1.Theothers,wedescribehere.Therstgenerationofpeer-to-peersystemsconsistsofNapster[14],Gnutella[8],andFreenet[5,9].NapsterandGnutellabothusesearchesastheircorelocationdetermi-nationtechnique.Napsterperformssearchescentrallyonwell-knownserversthatstorethemetadata,location,andkeywordsforeachdocument.Gnutellabroadcastssearchqueriestoallnodesandallowseachnodetoperformthesearchinanimplementation-specicmanner.YangandGarcia-MolinasuggesttechniquestoreducethenumberofnodescontactedinaGnutellasearchwhilepreservingtheimpl

19 ementation-specicsearchsemanticsandasat
ementation-specicsearchsemanticsandasatisfactorynumberofresponses[20].Freenetprovidesnosearchmech-anismanddependsinsteadonwell-knownnamesandwell-knowndirectoriesofnames.WebsearchenginessuchasGoogle[3]operateinacentralizedmanner.AfarmofserversretrievesallreachablecontentontheWebandbuildsaninvertedindex.Anotherfarmofserversperformslookupsinthisinvertedindex.Whentheinvertedindexisallinonelocation,multiple-keywordsearchescanbeperformedwithentirelylocal-areacom-munication,andtheoptimizationspresentedherearenotneeded.Distributingtheindexoverawideareaprovidesgreateravailabilitythanthecentralizedapproach.Becauseoursystemcantakeadvantageoftheexplicitinsertoperationsinpeer-to-peersystems,wealsoprovidemoreup-to-dateresultsthananycrawler-basedapproachcan.ThegeneralproblemofremotelyintersectingtwosetsofdocumentIDsisequivalenttothedatabaseproblemofperformingaremotenaturaljoin.Weareusingtwoideasfromthedatabaseliterature.Sendingonlythedatanecessaryfortheintersection(i.e.,join)comesfromworkonsemijoinreductions[1].UsingaBloomltertosummarizethesetofdocumentIDscomesfromworkonBloomjoins[12,13].7ConclusionsThispaperpresentsthedesignandevaluationofapeer-to-peersearchinfrastructure.Inthiscontextwemakethefollowingcontributions.First,weshowthatourarchitectureisscalable;globalnetworkstateandmessagetrafcgrowssub-linearlywithincreas-ingnetworksize.Next,relativetoacentralizedsearchinfrastructure,ourapproachcanmaintainhighperformanceandavailabilityinthefaceofindividualfailuresandperfor-manceuctuationsthroughreplication.Finally,throughexplicitdocumentpublishing,ourdistributedkeywordindexdeliversimprovedcompletenessandaccuracyrelativetotraditionalspideringtechniques.Oneimportantconsiderationinourarchitectureisreducingtheoverheadofmulti-keywordconjunctivesearches.Wedescribeandevaluateanumberofcooperatingtechniques—Bloomlters,virtualhosts,caching,andincrementalresults—that,takentogether,reducebothconsumednetworkresourcesandend-to-endperceivedclientsearchlatencybyanorderofmagnitudeforourtargetworkload.AcknowledgmentsWearegratefultoDuaneWesselsoftheIRCacheproject(supportedbyNSFgrantsNCR-9616602andNCR-9521745)foraccesstotheirtracedatales.Wewouldalsolike tothankLipyeowLimforaccesstothe1.85GBHTMLdatasetweusedforourdoc-umenttrace.Finally,RebeccaBraynard,JunYang,andTerenceKellyprovidedhelpfulcommentsondraftsofthispaper.References1.PhilipBernsteinandDah-MingChiu.Usingsemi-joinstosolverelationalqueries.JournaloftheAssociationforComputingMachinery,28(1):25–40,January1981.2.BurtonH.Bloom.Space/timetrade-offsinhashcodingwithallowableerror

20 s.Communica-tionsoftheACM,13(7):422–426,
s.Communica-tionsoftheACM,13(7):422–426,1970.3.SergeyBrinandLawrencePage.Theanatomyofalarge-scalehypertextualwebsearchengine.In7thInternationalWorldWideWebConference,1998.4.JunghooChoandHectorGarcia-Molina.Theevolutionofthewebandimplicationsforanincrementalcrawler.InTheVLDBJournal,September2000.5.I.Clarke.Adistributeddecentralisedinformationstorageandretrievalsystem,1999.6.FrankDabek,M.FransKaashoek,DavidKarger,RobertMorris,andIonStoica.Wide-areacooperativestoragewithCFS.InProceedingsofthe18thACMSymposiumonOperatingSystemsPrinciples(SOSP'01),October2001.7.LiFan,PeiCao,JussaraAlmeida,andAndreiBroder.Summarycache:Ascalablewide-areawebcachesharingprotocol.InProceedingsofACMSIGCOMM'98,pages254–265,1998.8.Gnutella.http://gnutella.wego.com/.9.T.Hong.Freenet:Adistributedanonymousinformationstorageandretrievalsystem.InICSIWorkshoponDesignIssuesinAnonymityandUnobservability,2000.10.DavidR.Karger,EricLehman,FrankThomsonLeighton,RinaPanigrahy,MatthewS.Levine,andDanielLewin.Consistenthashingandrandomtrees:DistributedcachingprotocolsforrelievinghotspotsontheWorldWideWeb.InACMSymposiumonTheoryofComputing,pages654–663,1997.11.DavidLiben-Nowell,HariBalakrishnan,andDavidKarger.Analysisoftheevolutionofpeer-to-peersystems.InProceedingsofACMConferenceonPrinciplesofDistributedComputing(PODC),2002.12.LotharMackertandGuyLohman.Roptimizervalidationandperformanceevaluationforlocalqueries.InACM-SIGMODConferenceonManagementofData,1986.13.JamesMullin.Optimalsemijoinsfordistributeddatabasesystems.IEEETransactionsonSoftwareEngineering,16(5):558–560,May1990.14.Napster.http://www.napster.com/.15.LawrencePage,SergeyBrin,RajeevMotwani,andTerryWinograd.ThePageRankcitationranking:Bringingordertotheweb.Technicalreport,StanfordUniversity,1998.16.SylviaRatnasamy,PaulFrancis,MarkHandley,RichardKarp,andScottShenker.Ascalablecontent-addressablenetwork.InProceedingsofACMSIGCOMM'01,2001.17.AntonyRowstronandPeterDruschel.StoragemanagementandcachinginPAST,alarge-scale,persistentpeer-to-peerstorageutility.InProceedingsofthe18thACMSymposiumonOperatingSystemsPrinciples(SOSP'01),2001.18.StefanSaroiu,P.KrishnaGummadi,andStevenD.Gribble.Ameasurementstudyofpeer-to-peerlesharingsystems.InProceedingsofMultimediaComputingandNetworking2002(MMCN'02),January2002.19.IonStoica,RobertMorris,DavidKarger,M.FransKaashoek,andHariBalakrishnan.Chord:Ascalablepeer-to-peerlookupserviceforInternetapplications.InProceedingsofACMSIG-COMM'01,2001.20.BeverlyYangandHectorGarcia-Molina.Efcientsearchinpeer-to-peernetworks.TechnicalReport2001-47,StanfordUniversity,October2

Related Contents


Next Show more