USENIX Association 11th USENIX Symposium on Networked Systems Design a

1 - 20

iris's Recent Documents

Gupta%20Golden%20Age%20Key.pdf
Gupta%20Golden%20Age%20Key.pdf

 

published 0K
HOW TO ADMINISTER THE
HOW TO ADMINISTER THE

1 TESTS OF ADULT BASIC EDUCATION (TABE) 2 TRAINING OBJECTIVES 1. Introduce you to the TABE. 2. Describe the process for administering the TABE Assessment. 3. Examine the TABE scoring process. 3

published 1K
email : Webview : Don't want hormone-based B.C.? How about Caya?https:
email : Webview : Don't want hormone-based B.C.? How about Caya?https:

November 10 is Birth Control Day. Cheers for Family Planning!A Makeover for the "Old" Diaphragm Is contoured so it fits your body better (see pictureat right of Caya in its handy case); Is easier to

published 0K
STATE OF MICHIGAN  COURT OF APPEALS
STATE OF MICHIGAN COURT OF APPEALS

DEPARTMENT OF HUMAN SERVICES, Petitioner-Appellee, UNPUBLISHED May 4, 2010 v No. 293083 MELODY ROSE VANLUVEN, Family Division LC No. 09-004091-NA Respondent-Appellant, DAMOND GENE WAYE, JR., Re

published 0K
Zero by 30: The Global Strategic Plan toPrevent Human Deaths from DogT
Zero by 30: The Global Strategic Plan toPrevent Human Deaths from DogT

Objective 1To efficiently prevent and respondthrough effective use of vaccines, medicinestools and technologies. The fundamentals of rabies controlare well established. Widescale access to this knowl

published 0K
PremiumEfficiency IE3Motor
PremiumEfficiency IE3Motor

BOGE Competence BOGE C series 2 Screw compressors custom made by BOGE have for decades been synonymous with efcient and reliable compressed air supply from trade workshops to industrial companie

published 1K
Sustainability Assessment of Nike Shoes
Sustainability Assessment of Nike Shoes

Andrew Derrig Pearson King Jake Stocker Ethan Tinson Luke Warren Ellen Winston For Sustainability Science ENVS 195, Fall 2010, Dr. Saleem H. Ali Introduction and Justification In Gr

published 0K

Document on Subject : "USENIX Association 11th USENIX Symposium on Networked Systems Design a"— Transcript:

1 USENIX Association 11th USENIX Symposium
USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation87 Libra:DivideandConquertoVerifyForwardingTablesinHugeNetworksHongyiZeng,ShidongZhang,FeiYe,VimalkumarJeyakumar HongyiZengandVimalkumarJeyakumarwereinternsatGooglewhenthisworkwasdone.HongyiZengiscurrentlywithFacebook.byrareboundaryconditionsareparticularlydifculttond.Commonroutingfailuresincluderoutingloopsandblack-holes(wheretrafctoonepartofthenetworkdis- 1 8811th USENIX Symposium on Networked Systems Design and Implementation USENIX Association Fotwatdipi Gtarh Ptefix A’s Ptefix B’s LOOP! (1) Mar (4) Reduee Ptefix A Ptefix B Ptoeess BoupdatyFigure1:Libradividesthenetworkintomultipleforwardinggraphsinmappingphase,andchecksgraphpropertiesinre-ducingphase. 194016802144 194016801144 U41:194016802144→U11194016801144→U14 U44:194016802144→U11194016801144→U14 U11:194016801144→U41, U44194016802144→DIRECT U14:194016802144→U41, U44194016801144→DIRECT Uubpets U41 U44 U11 U14Figure2:SmallnetworkexamplefordescribingthetypesofforwardingerrorfoundbyLibra.LibrausesMapReduceforverication.Itstartswiththefullgraphofswitches,eachwithitsownprextable.AsdepictedinFigure1,Libracompletesvericationintwophases.Inthephase,itbreaksthegraphintoanumberofslices,oneforeachprex.Thesliceconsistsofonlythoseforwardingrulesusedtoroutepacketstothedestination.Inthereducephase,Libraindepedentlyanalyzeseachslice,representedasaforwardinggraph,inparallelforroutingfailures.WeevaluateLibraontheforwardingtablesfromthreedifferentnetworks.First,“DCN”isanemulateddatacenternetworkwith2millionrulesand10,000switches.Second,“DCN-G”ismadefrom100replicasofDCNconnectedtogether;i.e.,1millionswitches.Third,“INET”isanetworkwith300IPv4routerseachcon-tainsthefullBGPtablewithhalfamillionrules.Theresultsareencouraging.Libratakesoneminutetocheckforloopsandblack-holesinDCN,15minutesforDCN-Gand1.5minutesforINET.ForwardingErrorsAsmalltoynetworkcanillustratethreecommontypesoferrorfoundinforwardingtables.Inthetwo-leveltreenetworkinFigure2twotop-of-rack(ToR)switches(S11,S12)areconnectedtotwospineswitches(S21, U11 U41 U44 U14 (a) Notmal U11 U41 U44 U14 (b) Loors U11 U41 U44 U14 (e) Blaekhole U11 U41 U44 U14 (d) Ipeotteet Uparshot Figure3:Forwardinggraphsfor192.168.0/24asinFigure2andpotentialabnormalities.S22).ThedownlinksfromS11andS12connecttoupto254serversonthesame/24subnet.Thegureshowsa“correct”setofforwardingtables.Notethatourex-amplenetworkusesmultipathrouting.PacketsarrivingatS12ontherightanddestinedtosubnet192.168.0/24ontheleftareload-balancedoverswitchesS21andS22.Ourtoynetworkhas8rules,and2subnets.forwardinggraphisadirectedgraphthatdenesthenetworkbehaviorforeachsubnet.Itcontainsalistof(local switch,remote switch)pairs.Forexample,inFigure3(a),anarrowfromS12toS21meansthepack-etsofsubnet192.168.0/24canbeforwardedfromS12toS21.Multipathroutingcanberepresentedbyanodethathasmorethanoneoutgoingedge.Figure3illustratesthreetypesofforwardingerrorinoursimplenetwork,depictedinforwardinggraphs.Figure3(b)showshowanerrorinS11’sfor-wardingtablescausesa.Insteadofforwarding192.168.0/24downtotheservers,S11forwardspacketsup,i.e.,toS21andS22.S11’sforwardingtableisnow:Thenetworkhastwoloops:S21-S11-S21andS22-S11-S22,andpacketsaddressedto192.168.0/24willneverreachtheirdestinatio

2 n.Figure3(c)showswhathap-pensifS22loseso
n.Figure3(c)showswhathap-pensifS22losesoneofitsforwardingentries:.Inthiscase,ifS12spreadspacketsdestinedto192.168.0/24overbothS21andS22,packetsarrivingtoS22willbedropped.IncorrectSnapshotFigure3(d)showsasubtleprob-lemthatcanleadtofalsepositiveswhenverifyingfor-wardingtables.SupposethelinkbetweenS11-S22goesdown.Twoeventstakeplace(shownasdashedarrows 2 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation89 Figure4:Routingrelatedticketsbymonthandtype.inthegure):1:S22deletes2:S12stopsforwardingpacketstoS22.Becauseoftheasynchronousnatureofroutingupdates,thetwoeventscouldtakeplaceineitherorderAsnapshotmaycaptureoneevent,butnottheother,ormightdetectthemhappeninginthereverseorder.ThesequencecreatesatemporaryblackholeasinFigure3(c),whereasthedesiredsequencedoesnot.Toavoidraisinganunnecessaryalarm(byeventhoughitdidnothappen),ormissinganerroraltogether(byincorrectlyassumingthathappened),Libramustdetectthecorrectstateofthenetwork.Real-worldFailureExamplesTounderstandhowoftenforwardingerrorstakeplace,weexaminedalogof“bugtickets”from14monthsofoperationinalargeGoogledatacenter.Figure4gorizes35ticketsformissingforwardingentries,11forloops,and11forblack-holes.Onaverage,fourissuesarereportedpermonth.Today,forwardingerrorsaretrackeddownbyhandwhich-giventhesizeofthenetworkandthenumberofentries-oftentakesmanyhours.Andbecausethediagnosisisdoneaftertheerroroccurred,thesequenceofeventscausingtheerrorhasusuallylong-sincedisap-pearedbeforethediagnosisstarts.Thismakesithardtoreproducetheerror.Case1:DetectingLoops.Onetypeofloopiscausedbyprexaggregation.Prexesareaggregatedtocom-pacttheforwardingtables:aclustercanadvertiseaprextoreachalloftheserversconnected“be-low”ittothecore,whichusuallyincludesthead-dressesofserversthathavenotyetbeendeployed.How-ever,packetsdestinedtothesenon-deployedaddresses(e.g.,duetomachinemaintenance)cangetstuckinloops.Thisisbecausebelievesthesepacketsaredes-tinedto,whilelackstheforwardingrulestodigestthesepacketsduetotheincompletedeployment,instead,’sdefaultrulesleadpacketsbacktoThisfailuredoesnotcauseaservicetofail(becausetheservicewilluseotherserversinstead),butitdoesde-gradethenetworkcausingunnecessarycongestion.Inthepast,theseerrorswereignoredbecauseofthepro-hibitivecostofperformingafullclustercheck.Libracannishcheckinginlessaminute,andidentifyandreportthespecicswitchandprexentrythatareatrisk.Case2:DiscoveringBlack-holes.Inoneincident,trafcwasinterruptedtohundredsofservers.Initialin-vestigationshowedthatsomeprexeshadhighpacketlossrate,butpacketsseemedtobediscardedrandomly.Ittookseveraldaystonallyuncovertherootcause:AsubsetofroutinginformationwaslostduringBGPup-datesbetweendomains,likelyduetoabugintheroutingsoftware,leadingtoblack-holes.Librawilldetectmissingforwardingentriesquickly,reducingtheoutagetime.Libra’sstablesnapshotsalsoallowittodisambiguatetemporarystatesduringupdatesfromlong-termback-holes.Case3:IdentifyingInconsistencies.Networkcon-trolrunsacrossseveralinstances,whichmayfailfromtimetotime.Whenasecondarybecomestheprimary,itresultsinaurryofchangestotheforwardingta-bles.Thechangesmaytemporarilyorpermanentlycon-ictwiththepreviousforwardingstate,particularlyifthechangeoveritselffailsbeforecompleting.Thenetworkcanbeleftinaninconsistentstate,leadingtopacketloss,black-holesandloops.LessonsLearnedSimplethingsgowrong:Routingerrorsoccurev

3 eninnetworksusingrelativelysimpleIPforwa
eninnetworksusingrelativelysimpleIPforwarding.Theyalsooccurduetormwareupgrades,controllerfailureandsoftwarebugs.Itisessentialtochecktheforwarding,outsidethecontrolsoftware.Multiplemovingparts:Thenetworkconsistsofmul-tipleinteractingsubsystems.Forexample,incase1above,Intra-DCroutingishandledlocally,butroutingisaglobalproperty.Thiscancreateloopsthatarehardtodetectlocallywithinasubsystem.Therearealsomul-tiplenetworkcontrollers.Inconsistentstatemakesithardforthecontrolplanetodetectfailuresonitsown.Scalematters:Largedatacenternetworksusemul-tipathrouting,whichmeanstherearemanyforwardingpathstocheck.Asthenumberofswitches,,growsthenumberofpathsandprextablesgrow,andthecomplex-ityofcheckingallroutesgrowswith.Itisessentialforastaticcheckertoscalelinearlywiththenetwork. 3 9011th USENIX Symposium on Networked Systems Design and Implementation USENIX Association StableSnapshotsItisnoteasytotakeanaccuratesnapshotofthefor-wardingstateofalarge,constantlychangingnetwork.ButifLibrarunsitsstaticchecksonasnapshotofthestatethatneveractuallyoccurred,itwillraisefalsealarmsandmissrealerrors.Wethereforeneedtocap-ture-andcheck-asnapshotoftheglobalforwardingstatethatactuallyexistedatoneinstantintime.WecallstablesnapshotsWhenisthestatestable?Alargenetworkisusuallycontrolledbymultipleroutingprocesseseachresponsi-bleforoneormoreswitches.Eachprocesssendstimes-tampedupdates,whichwecallroutingevents,toadd,modifyanddeleteforwardingentriesintheswitchesitisresponsiblefor.Libramonitorsthestreamofroutingeventstolearntheglobalnetworkstate.Findingthestablestateofaswitchiseasy:eachtableisonlywrittenbyoneroutingprocessusingasingleclock,andalleventsareprocessedinorder.Hence,Libracanreconstructastablestatesimplybyreplayingeventsintimestamporder.Bycontrast,itisnotobvioushowtotakeastablesnapshotofthestatewhendifferentroutingpro-cessesupdatetheirswitchesusingdifferent,unsynchro-nizedclocks.Becausetheclocksaredifferent,andeventsmaybedelayedinthenetwork,simplyreplayingtheeventsintimestampordercanresultinastatethatdidnotactuallyoccurinpractice,leadingtofalsepositivesormissederrors(Section2However,evenifwecannotpreciselysynchronizeclocks,wecanthedifferencebetweenanypairofclockswithhighcondenceusingNTP[].Andwecanboundhowout-of-dateaneventpacketis,byprior-itizingeventpacketsinthenetwork.Thus,everytimes-canbetreatedaslyinginanintervalboundstheuncertaintyofwhentheeventtookTheintervalrepresentsthenotionthatnetworkstatechangesatomicallyatsomeunknowntimeinstantwithintheinterval.Figure5showsanexampleofndingastablesnap-shotinstant.Itiseasytoseethatifnoroutingeventsarerecordedduringa2periodwecanbecondentthatnoroutingchangesactuallytookplace.Therefore,thesnapshotofthecurrentstateisstable(i.e.,accurate).Theorderofanytwopasteventsfromdifferentcessesisirrelevanttothecurrentstate,sincetheyare Notethatastablesnapshotisnotthesameasashot[],whichisonlyonepossiblestateofadistributedsystemthatmightnotactuallyhaveoccurredinpractice.Libraonlyconsidersprocessesthatcandirectlymodifytables.Whilemultiplehigh-levelprotocolscanco-exist(e.g.,OSPFandBGP),thereisusuallyonecommonlow-leveltablemanipulationAPI.Thepositiveandnegativeuncertaintiescanbedifferent,buthereweassumetheyarethesameforsimplicity.Aformalproofcanbefoundin[ Routing Process 1 Routing Process 2Flow of time Libraxxxxxxxx x2 Snapshot instantLibra's reconstruction of the timeline Time instant of Time instant when Figure5:Libra’sreconstructionoftheti

4 melineofroutingevents,takingintoaccountb
melineofroutingevents,takingintoaccountboundedtimestampuncertaintyLibrawaitsfortwicetheuncertaintytoensuretherearenoout-standingevents,whichissufcienttodeducethatroutinghas 100ms 1s 10s 100s 1000s RIBupdatesinter-arrivaltime 90 95 99 Percentiles Figure6:CDFofinter-arrivaltimesofroutingeventsfromalargeproductiondatacenter.Routingeventsareverybursty:over95%ofeventshappenwithin400msofanotherevent.appliedtodifferenttableswithoutinterferingwitheachother(recallthateachtableiscontrolledbyonlyonepro-cess).SoLibraonlyneedstoreplayalleventsintimes-tamporder(toensureeventsforthetableareplayedinorder)toaccuratelyreconstructthecurrentstate.Thisobservationsuggestsasimplewaytocreateasta-blesnapshotbysimplywaitingforaquiet2periodwithnoroutingupdateevents.Feasibility:Theschemeonlyworksiftherearefrequentwindowsofsize2inwhichnoroutingeventstakeplace.Luckily,wefoundthatthesequietperiodshappenfre-quently:weanalyzedadayoflogsfromallroutingpro-cessesinalargeGoogledatacenterwithafewthousandFigure6showstheCDFoftheinter-arrivaltimesforthe28,445routingeventsreportedbytherout-ingprocessesduringtheday.Therstthingtonoticeistheburstiness—over95%ofeventsoccurwithin400msofanotherevent,whichmeanstherearelongperiodswhenthestateisstable.Table1showsthefractionoftimethenetworkisstable,fordifferentvaluesof.Asexpected,largerleadstofewerstablestatesandsmallerpercentageofstabletime.Forexample,whenonly2,137outofall28,445statesarestable.However, 4 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation91 /ms #ofstablestates timeinstablestate/% 0 28,445 100.00 1 16,957 99.97 100 2,137 99.90 1,000 456 99.75 10,000 298 99.60 Table1:Astheuncertaintyinroutingeventtimestamps()in-creases,thenumberofstablestatesdecreases.However,sinceroutingeventsarebursty,thestateisstablemostofthetime.becausetheeventstreamissobursty,theunstablestatesareextremelyshort-lived,occupyingintotalonly0.1%1.5min)oftheentireday.Putanotherway,for99.9%ofthetime,snapshotsarestableandthestaticanalysisresultistrustworthy.Takingstablesnapshots:Thestablesnapshotinstantprovidesareferencepointtoreconstructtheglobalstate.Libra’sstablesnapshotprocessworksasfollows:1)Takeaninitialsnapshotasthecombinationofallswitches’forwardingtables.Atthisstage,eachtablecanberecordedataslightlydifferenttime.2)Subscribetotimestampedeventstreamsfromallroutingprocesses,andapplyeachevent,intheorderoftheirtimestamps,toupdatethestatefrom3)Afterapplying,ifnoeventisreceivedfor2time,declarethecurrentsnapshotstable.Inotherwords,andallpasteventsformastablestatethatactuallyexistedatthistimeinstant.DivideandConquerAfterLibrahastakenastablesnapshotoftheforward-ingstate,itsetsouttostaticallycheckitscorrectness.Givenourgoalofcheckingnetworkswithover10,000switchesandmillionsofforwardingrules,wewillneedtobreakdownthetaskintosmaller,parallelcomputa-tions.Therearetwonaturalwaystoconsiderpartitioningtheproblem:Partitionbasedonswitches:Eachservercouldholdtheforwardingstateforaclusterofswitches,partition-ingthenetworkintoanumberofclusters.Wefoundthisapproachdoesnotscalewellbecausecheckingaforwardingrulemeanscheckingtherulesinmany(orall)partitions-thecomputationisquicklyboggeddownbycommunicationbetweenservers.Also,itishardtobalancethecomputationamongserversbecausesomeswitcheshaveverydifferentnumbersofforwardingrules(e.g.spineandleafswitches).Partitionbasedonsubnets:Eachservercouldholdtheforwardingstatetoreachasetofsubnets.Theser

5 vercomputestheforwardinggraphtoreacheach
vercomputestheforwardinggraphtoreacheachsubnet,thenchecksthegraphforabnormalities.Thedifcultywiththisapproachisthateachservermustholdtheen- U11:194016801144→U41, U44194016802144→DIRECT U14:194016802144→U41, U44194016801144→DIRECT U41:194016802144→U11194016801144→U14 U44:194016802144→U11194016801144→U14 Matehipi1Ulieipi 194016802144:U11→DIRECTU14→U41, U44U41→U11U44→U11 194016801144:U41→DIRECTU11→U41, U44U41→U14U44→U14 Gtarh Comrutipi U11 U41 U44 U14 194016802144 U11 U41 U44 U14 194016801144 Figure7:StepstochecktheroutingcorrectnessinFigure2tiresetofforwardingtablesinmemory,andanyupdatetotheforwardingrulesaffectsallservers.Librapartitionsthenetworkbasedonsubnets,forrea-sonsthatwillbecomeclear.Weobservethattheroutechecker’staskcanbedividedintotwosteps.First,Libraforwardingruleswithsubnets,byndingthesetofforwardingrulesrelevanttoasubnet(i.e.,theyareassociatedifthesubnetisincludedintherule’sprex).Second,Librabuildsaforwardinggraphtoreacheachsubnet,byassemblingallforwardingrulesforthesub-net.Bothstepsareembarrassinglyparallel:matchingisdoneper(subnet,forwardingrule)pair;andeachsub-net’sforwardinggraphcanbeanalyzedindependently.Librathereforeproceedsinthreestepsusingservers:Step1-Matching:Eachserverisinitializedwiththeentirelistofsubnets,andeachserverisassigned1ofallforwardingrules.Theserverconsiderseachfor-wardingruleinturntoseeifitbelongstotheforwardinggraphtoasubnet(i.e.theforwardingruleisaprexofthesubnet).Ifthereisamatch,theserveroutputsthe(subnet,rule)pair.Notethatarulemaymatchmorethanonesubnet.Step2-Slicing:The(subnet,rule)pairsaregroupedbysubnet.Wecalleachgroupa,becauseitcontainsalltherulesandswitchesrelatedtothissubnet.Step3-GraphComputing:Theslicesaredistributedservers.Eachserverconstructsaforwardinggraphbasedontherulescontainedintheslice.Standardgraphalgorithmsareusedtodetectnetworkabnormal-ities,suchasloopsandblack-holes.Figure7showsthestepstocheckthenetworkinure2.Aftertheslicingstage,theforwardingrulesareor-ganizedintotwoslices,correspondingtothetwosubnets192.168.0/24and192.168.1/24.Theforwardinggraphforeachsliceiscalculatedandcheckedinparallel. Otherwise,asubnetwillbefragmentedbyamorespecicrule,leadingtoacomplexforwardinggraph.Seethelastparagraphintion9fordetaileddiscussion. 5 9211th USENIX Symposium on Networked Systems Design and Implementation USENIX Association UDN Copttollets Route DFU Uubpets Rules Uhatd 1 Rules Uhatd 4 Rules Uhatd 3 Rerott fot Uubpet 4 Figure8:Libraworkow.IfaroutingerroroccursandthesecondruleinS11,theloopwillshowupintheforwardinggraphfor192.168.0/24.S11willpointbacktoS21andS22,whichwillbecaughtingraphloopdetectionalgorithm.Ourthree-stepprocessiseasilymappedtoMapRe-duce,whichwedescribeinthenextsection.Libraconsistsoftwomaincomponents:arouteandaMapReduce-basedroutecheckerFigure8showsLibra’sworkow.Theroutedumpertakesstablesnapshotsfromswitchesorcontrollers,andstorestheminadistributedlesystem.Next,thesnapshotisprocessedbyaMapReduce-basedchecker.AquickreviewofMapReduce:MapReduce[]di-videscomputationintotwophases:reduc-.Inthemappingphase,theinputispartitionedintosmall“shards”.Eachofthemisprocessedbyamap-perinparallel.Themapperreadsintheshardlinebylineandoutputsalistof\r\fpairs.Afterthemappingphase,theMapReducesystemshufesputsfromdifferentmappersbysortin

6 gbythekey.Aftershufing,eachreducerr
gbythekey.Aftershufing,eachreducerreceivesa\r \fpair,\r \n\t\r\risalistofallvaluescorrespondingtothe.Thereducerprocessesthislistandoutputsthenalresult.TheMapReducesys-temalsohandlescheckpointingandfailurerecovery.InLibra,thesetofforwardingrulesispartitionedintosmallshardsanddeliveredtomappers.Eachmapperalsotakesafullsetofsubnetstocheck,whichbydefaultcon-tainsallsubnetsinthecluster,butalternativelycanbesubsetsselectedbyuser.Mappersgenerateintermedi-atekeysandvalues,whichareshufedbyMapReduce.Thereducerscompilethevaluesthatbelongtothesamesubnetandgeneratenalreports. 1202020218 12040101134 12010204134 12010201134 12010202132 12010202116 Uubpet No Uubpet 1201020211612010202132< 12010201134Figure9:Findallmatchingsubnetsinthetrie.)isthesmallestmatchingtrienodebiggerthanthe).Hence,itschildrenwithsubnetsmatchtherule.Mappersareresponsibleforslicingnetworksbysubnet.Eachmapperreadsoneforwardingruleatatime.Ifasubnetmatchestherule,themapperout-putsthesubnetprexastheintermediatekey,alongwiththevalue      \f.Thefollowingisanexample( , isomitted):\f\f\fSinceeachmapperonlyseesaportionoftheforward-ingrules,theremaybealongerandmorespecic—butunseen—matchingprexforthesubnetinthesamefor-wardingtable.Wendingthelongestmatchingtothereducers,whichseeallmatchingrules.Mappersarerstinitializedwithafulllistofsubnets,whicharestoredinanin-memorybinarytrieforfastpre-xmatching.Afterinitialization,eachmappertakesashardoftheroutingtable,andmatchestherulesagainstthesubnettrie.Thisprocessisdifferentfromthecon-ventionallongestprexmatching:First,inconventionalpacketmatching,rulesareplacedinatrieandpacketsarematchedonebyone.InLibra,webuildthetriewith.Second,thegoalisdifferent.Inconventionalpacketmatching,onelooksforthelongestmatchingrule.Here,mapperssimplyoutputmatchingsubnetsinthetrie.Here,matchinghasthesamemeaning—thesubnet’sprexmustfullyfallwithintherule’sprex.Weuseatrietoefcientlynd“allmatchingprexes,”bysearchingforthesmallestmatchingtrienode)thatisbiggerorequaltotheruleprex(called 6 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation93 ).Here,“small”and“big”refertothelexico-graphicalorder(notaddressspacesize),whereforeachbitinanIPaddress,wildcardmayormaynotcontainasubnet.Ifexists,weenumerateallitsnon-emptydecedents(includingitself).Otherwise,wede-clarethatthereexistnomatchingsubnetsinthetrie.ure9showsanexample.)isthesmall-estmatchingtrienodebiggerthantherule).Hence,itschildrenwithsubnetsmatchtherule.Proof:Webrieyprovewhythisalgorithmiscorrect.Inanuncompressedtrie,eachbitintheIPaddressisrepresentedbyonelevel,andsothealgorithmiscorrectbydenition:ifthereexistmatchingsubnetsinthetri

7 e,mustexistinthetrieanditsdescendantscon
e,mustexistinthetrieanditsdescendantscontainallmatchingprexes,whichmeansInacompressedtrie,nearbynodesmaybecombined.mayormaynotexistinthetrie.Ifitexists,theprob-lemreducestotheuncompressedtriescenario.Ifnotexistinthetrie,(ifitexists)containsallmatchingsubnetsinitsdescendants.Thisisbecause:Anynodesmallerthandoesnotmatch.Be-causethereisnonodebiggerthanandsmallerthanisnotthesmallestmatchingnode),alsomeans.Asaresult,cannotfallwithinrange.Thisisbecausefortofallwithin,all’snon-wildcardbitsshouldappearin,whichimpliesAnynodebiggerthanthebiggestdescendantsofdoesnotmatch.Otherwise,musthaveacommonancestor,wherebecauseboth,andistheancestorof(anodeisalwayssmallerthanitsdescendants).ThiscontradictstheassumptionthatissmallestmatchingnodeofTimecomplexity:Wecanbreakdownthetimecon-sumedbythemappingphaseintotwoparts.Thetimetoconstructthesubnettrieis,whereisthenumberofsubnets,becauseinsertinganIPprexintoatrietakesconstanttime(lengthofIPaddress).Ifweconsiderasinglethread,ittakestimetomatchrulesagainstthetrie.Sothetotaltimecomplexityis.Ifmapperssharethesametrie,wecanreducethetimeto .Here,weassume.If,onemaywanttoconstructatriewithrulesratherthansubnets(asinconventionallongest-prex-matching).Theoutputsfromthemappingphaseareshufedbyintermediatekeys,whicharethesubnets.Whenshufingnishes,areducerwillreceiveasubnet,alongwithanunorderedsetofvalues,eachcontaining  , \r\f \n,  .Thereducerrstselectsthehigh-estpriorityruleper :Forthesame ,therulewithhigherisse-lected;iftworuleshavethesamepriority,theonewithlarger ischosen.Thereducerthenconstructsadirectedforwardinggraphusingtheselectedrules.Oncethegraphisconstructed,thereducerusesgraphlibrarytoverifythepropertiesofthegraph,forexample,tocheckifthegraphisloop-free.Timecomplexity:Inmostnetworkswehaveseen,asubnetmatchesatmost2-4rulesintheroutingtable.Hence,selectingthehighestpriorityruleandconstruct-ingthegraphtakestime,whereisthenumberofphysicallinksinthenetwork.However,thetotalrun-timedependsonthevericationtask,aswewilldiscussSection6IncrementalUpdatesUntilnow,wehaveassumedLibracheckstheforward-ingcorrectnessfromscratcheachtimeitruns.Libraalsosupportsincrementalupdatesofsubnetsandforwardingrules,allowingittobeusedasanindependent“correct-nesscheckingservice”similartoNetPlumber[]andVeriow[].Inthisway,Libracouldbeusedtocheckforwardingrulesquickly,beforetheyareaddedtotheforwardingtablesintheswitches.Here,ain-memory,“streaming”MapReduceruntime(suchas[])isneededtospeeduptheeventprocessing.Subnetupdates.Eachtimeweaddasubnetforveri-cation,weneedtorerunthewholeMapReducepipeline.Themapperstakes timetondtherelevantrules.Andasinglereducertakestimetoconstructthedi-rectedgraphsliceforthenewsubnet.Ifonehasseveralsubnetstoadd,itisfastertoruntheminabatch,whichtakes insteadof tomap.Removingsubnetsistrivial.Allresultsrelatedtothesubnetsaresimplydiscarded.Forwardingruleupdates.Figure10showsthework-owtoaddnewforwardingrules.Tosupportincremen-talupdatesofrules,reducersneedtostoretheforward-inggraphforeachsliceitisresponsiblefor.Thereducercouldkeepthegraphinmemoryordisk—thetrade-offisalargermemoryfootprint.Ifthegraphsareindisk,axednumberofidlereducerprocessesliveinthemem-oryandfetchgraphsuponrequest.Similarly,themap-persneedtokeepthesubnettrie.Toaddarule,amapperisspawnedjustas

8 itseesan-otherlineofinput(Step1).Matchin
itseesan-otherlineofinput(Step1).Matchingsubnetsfromthetrieareshufedtomultiplereducers(Step2).Eachre-ducerreadsthepreviousslicegraph(Step3),andrecal-culatesitwiththenewrule(Step4).Deletingaruleissimilar.Themappertagstheruleas“tobedeleted”andpassittoreducersforupdatingthe Atanytimeinstance,onlyasmallfractionofgraphswillbeup-dated,andsokeepingallstatesin-memorycanbequiteinefcient. 7 9411th USENIX Symposium on Networked Systems Design and Implementation USENIX Association Request(s) Uubpets Figure10:IncrementalruleupdatesinLibra.Mappersdis-patchmatchingsubnet,rulepairtoreducers,indexedbysubnet.Reducersupdatetheforwardinggraphandrecomputegraphproperties.slicegraph.However,inthegraph’sadjacencylist,thereducernotonlyneedstostorethehighestpriorityrule,butalsomatchingrules.Thisisbecauseifahighestpriorityruleisdeleted,thereducermustusethesecondhighestpriorityruletoupdatethegraph.Besidesupdatinggraphs,incertaincases,graphprop-ertiescanalsobecheckedincrementally,sincetheupdateonlyaffectsasmallpartofgraph.Forexample,inloop-detection,addinganedgeonlyrequiresaDepth-First-Search(DFS)startingfromthenewedge’sdestinationnode,whichnormallywillnottraversetheentiregraph.UnlikeNetPlumberandVeriow,Libradoesnotneedtoexplicitlyrememberthedependencybetweenrules.Thisisbecausethedependencyisalreadyencodedinthematchingandshufingphases.RouteDumperTheroutedumperrecordseachruleusingveelds: \r\fistheuniqueIDoftheswitch; istheprex.isalistofportnamesbecauseofmultipath.isanintegereldservingasatie-breakerinlongestprexmatching.Bystoringtheegressportsin,Libraencodesthetopologyinformationintheforwardingtable.Althoughtheforwardingtableformatismostlystraightforward,twocasesneedspecialhandling:Ingressportdependentrules.Someforwardingrulesdependonparticularingressports.Forexample,arulemayonlybeineffectforpacketsenteringtheswitchfrom.Inreducerswewanttoconstructasimpledi-rectedgraphthatcanberepresentedbyanadjacencylist.Passingthisingressportdependencytotheroutecheckerwillcomplicatethereducerdesign,sincethenexthopinthegraphdependsnotonlyonthecurrenthop,butalsoprevioushopWeusethenotionoflogicalswitchestosolvethisproblem.First,ifaswitchhasrulesthatdependontheingressport,wesplittheswitchintomultiplelogical VRFaOVERRIDE VRFaDEFAWLT VRFa1 VRFa4 Iptet-VRF VRFaFALLBACK Figure11:VirtualRoutingandForwarding(VRFs)aremul-tipletableswithinthesamephysicalswitch.Thetableshavedependency(inter-VRFrules)betweenthem.switches.Eachlogicalswitchisgivenanewnameandcontainstherulesdependingononeingressport,sothattheportis“owned”bythenewlogicalswitch.Wecopyrulesfromtheoriginalswitchtothelogicalswitch.Sec-ond,weupdatetherulesinupstreamswitchestoforwardtothelogicalswitch.Multipletables.Modernswitchescanhavemulti-pleforwardingtablesthatarechainedtogetherbyar-bitrarymatchingrules,usuallycalled“VirtualRoutingandForwarding”(VRF).Figure11depictsanexam-pleVRFsetup:incomingpacketsarematchedagainst OVERRIDE.Ifnoruleismatched,theyenter 1toVRF 16accordingtosome“triggering”rules.Ifallmatchingfails,thepacketentersVRF DEFAULT.Theroutedumpermapsmultipletablesinaphysicalswitchintomultiplelogicalswitches,eachcontainingoneforwardingtable.Eachlogicalswitchconnectstootherlogicalswitches

9 directly.TheruleschainingtheseVRFsareadd
directly.TheruleschainingtheseVRFsareaddedaslowestpriorityrulesinthelogicalswitch’stable.Hence,ifnoruleismatched,thepacketwillcontinuetothenextlogicalswitchinthechain.UsecasesInLibra,thedirectedgraphconstructedbythereducerdataplaneinformationforaparticularsub-net.Inthisgraph,eachvertexcorrespondstoaforward-ingtablethesubnetmatched,andeachedgerepresentsapossiblelinkthepacketcantraverse.Thisgraphalsoencodesmultipathinformation.Therefore,routingcor-rectnessdirectlycorrespondstographproperties.:Areachabilitycheckensuresthesubnetcanbereachedfromanyswitchinthenetwork.Thispropertycanbeveriedbydoinga(reverse)DFSfromthesubnetswitch,andcheckingiftheresultingvertexsetcontainsallswitchesinthenetwork.Thevericationtakestimewhereisthenumberofswitchesthenumberoflinks.Loopdetection:Aloopinthegraphisequivalenttoatleastonestronglyconnectedcomponentinthedirectedgraph.Twoverticesbelongtoastronglycon-nectedcomponent,ifthereisapathfromapathfrom.Wendstronglyconnectedcom-ponentsusingTarjan’sAlgorithm[]whosetimecom- 8 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation95 plexityis:Aswitchisablack-holeforasubnetiftheswitchdoesnothaveamatchingrouteentryforthesub-net.Someblack-holesarelegitimate:iftheswitchisthelasthopforthesubnet,orthereisanexplicitdroprule.Implicitdroprulesneedtobecheckedifthatisbyde-sign.Black-holesmaptoverticeswithzeroout-degree,whichcanthereforebeenumeratedinWaypointrouting:Networkoperatorsmayrequiretraf-cdestinedtocertainsubnetstogothrougha“way-point,”suchasarewalloramiddlebox.Suchbehaviorcanbeveriedintheforwardinggraphbycheckingifthewaypointexistsonalltheforwardingpaths.Specically,onecanremovethewaypointandtheassociatedlinks,andverifythatnoedgeswitchesappearanymoreinaDFSoriginatedfromthesubnet’srsthopswitch,withtheruntimecomplexityofWehaveimplementedLibraforcheckingthecorrect-nessofSoftware-DenedNetwork(SDN)clusters.Eachclusterisdividedintoseveraldomainswhereeachdo-mainiscontrolledbyacontroller.ControllersexchangeroutinginformationandbuildtheroutingtablesforeachOurLibraprototypehastwosoftwarecomponents.Theroutedumper,implementedinPython,connectstoeachcontrolleranddownloadsroutingevents,forward-ingtablesandVRFcongurationsinProtocolBuffersfers17]formatinparallel.Italsoconsultsthetopologydatabasetoidentifythepeerofeachswitchlink.Oncetheroutinginformationisdownloaded,wepreprocessthedataasdescribedinSection5.4andstoreitinadis-tributedlesystem.TheroutecheckerisimplementedinC++asaMapRe-duceapplicationinabout500linesofcode.WeuseaTrielibraryforstoringsubnets,anduseBoostGraphLi-brary[]forallgraphcomputation.Thesamebinarycanrunatdifferentlevelsofparallelism—onasinglema-chinewithmultipleprocesses,oronaclusterwithmul-tiplemachines,simplybychangingcommandlineags.AlthoughLibra’sdesignsupportsincrementalup-dates,ourcurrentprototypeonlydoesbatchprocessing.Weusemicro-benchmarkstoevaluatethespeciccostsforincrementalprocessinginSection8.5,onasimpliedprototypewithonemapperandonereducer.EvaluationToevaluateLibra’sperformance,werstmeasurestart-to-nishruntimeonasinglemachinewithmulti-threading,aswellasonmultiplemachinesinacluster.WealsodemonstrateLibra’slinearscalabilityaswellasitsincrementalupdatecapability. Dataset Switches Rules Subnets DCN 11,260 2,657,422 11,136 DCN-G 1,126,001 265,742,626 1,113,600 INET 316 151,649,486 482,966

10 Table2:DatasetsusedforevaluatingLibra.D
Table2:DatasetsusedforevaluatingLibra.DataSetsWeusethreedatasetstoevaluatetheperformanceofLibra.ThedetailedstatisticsareshowninTable2:DCNisanSDNtestbedusedtoevaluatethescal-abilityofthecontrollersoftware.SwitchesareemulatedbyOpenFlowagentsrunningoncommoditymachinesandconnectedtogetherthroughavirtualnetworkfabric.Thenetworkispartitionedamongcontrollers,whichex-changeroutinginformationtocomputetheforwardingstateforswitchesintheirpartition.DCNcontainsabout10thousandswitchesand2.6millionIPv4forwardingrules.VRFisusedthroughoutthenetwork.:TostresstestLibra,wereplicateDCN100timesbyshiftingtheaddressspaceinDCNsuchthateachDCN-parthasauniqueIPaddressspace.Asin-gletop-levelswitchinterconnectsalltheDCNpiecesto-gether.DCN-Ghas1millionswitchesand265millionforwardingrules.:INETisasyntheticwideareabackbonenetwork.First,weusetheSprintnetworktopologydiscoveredbyRocketFuelproject[],whichcontainsroughly300routers.Then,wecreateaninterfaceforeachprexfoundinafullBGPtablefromRouteViews[entriesasofJuly2013),andspreadthemrandomlyanduniformlytoeachrouteras“localprexes.”Finally,wecomputeforwardingtablesusingshortestpathrouting.SingleMachinePerformanceWestartourevaluationofLibrabyrunningloopde-tectionlocallyonadesktopwithIntel6-coreCPUand32GBmemory.Table3summarizestheresults.WehavelearnedseveralaspectsofLibrafromsinglema-chineevaluation:I/Obottlenecks:StandardMapReduceisdisk-based:Inputsarepipedintothesystemfromdisks,whichcancreateI/Obottlenecks.Forexample,onINET,readingfromdisktake15timeslongerthangraphcomputation.OnDCN,theI/Otimeismuchshorterduetothesmallernumberofforwardingrules.Infact,inbothcases,theI/OisthebottleneckandtheCPUisnotfullyutilized.Theruntimeremainsthesamewithorwithoutmapping.Hence,themappingphaseisomittedinTable3Memoryconsumption:InstandardMapReduce,inter-mediateresultsareushedtodiskafterthemappingphasebeforeshufing,whichisveryslowonasinglemachine.Weavoidthisbykeepingallintermediatestates 9 9611th USENIX Symposium on Networked Systems Design and Implementation USENIX Association Threads 1 2 4 6 8 Read/s 13.7 Shufe/s 7.4 Reduce/s 46.3 25.8 15.6 12.1 11.1 Speedup 1.00 1.79 2.96 3.82 4.17 a)DCNwith2000subnets Threads 1 2 4 6 8 Read/s 170 Shufe/s 3.8 Reduce/s 11.3 5.9 3.2 2.7 2.1 Speedup 1.00 1.91 3.53 4.18 5.38 b)INETwith10,000subnetsTable3:RuntimeofloopdetectiononDCNandINETdatasetsonsinglemachine.ThenumberofsubnetsisreducedcomparedTable2sothatallintermediatestatescantinthememory.Readandshufephasesaresingle-threadedduetotheframe-worklimitation. Figure12:Exampleprogresspercentage(inBytes)ofLibraonDCN.Thethreecurvesrepresent(fromlefttoright)Mapping,Shufing,andReducingphases,whicharepartiallyoverlap-ping.Thewholeprocessendsin57seconds.in-memory.However,itlimitsthenumberofsubnetsthatcanbeveriedatatime—intermediateresultsareallmatching(subnet,rule)pairs.Onasinglemachine,wehavetolimitthenumberofsubnetsto2000inDCNand10,000inINETtoavoidrunningoutofmemory.Graphsizedominatesreducingphase:ThereducingphaseonDCNissignicantlyslowerthanonINET,de-spiteINEThaving75timesmoreforwardingrules.Withasinglethread,Libracanonlyprocess43.2subnetspersecondonDCN,comparedwith885.0subnetspersec-ondonINET(20.5timesfaster).NotethatDCNhas35.6timesmorenodes.ThisexplainsthefasterrunningtimeonINET,sincethetimetodetectloopsgrowslinearlywiththenumberofedgesandnodesinthegraph.Multi-thread:Libraismulti-threaded,butthemulti-threadspeedupisnotline

11 ar.Forexample,onDCN,using8threadsonlyres
ar.Forexample,onDCN,using8threadsonlyresultedina4.17speedup.Thiseffectislikelyduetoinefcienciesinthethreadingimplementa-tionintheunderlyingMapReduceframework,althoughtheoretically,allreducerthreadsshouldruninparallelwithoutstatesharing. DCN DCN-G INET Machines 50 20,000 50 MapInput/Byte 844M 52.41G 12.04G ShufeInput/Byte 1.61G 16.95T 5.72G ReduceInput/Byte 15.65G 132T 15.71G MapTime/s 31 258 76.8 ShufeTime/s 32 768 76.2 ReduceTime/s 25 672 16 TotalTime/s 57 906 93 Table4:Runningtimesummaryofthethreedatasets.Shufeinputiscompressed,whilemapandreduceinputsareuncom-pressed.DCN-Gresultsareextrapolatedfromprocessing1%ofsubnetswith200machinesasasinglejob.WeuseLibratocheckforloopsagainstourthreedatasetsonacomputingcluster.Table4summarizesthere-sults.Libraspends57secondsonDCN,15minutesonDCN-G,and93secondsonINET.Toavoidoverloadingthecluster,theDCN-Gresultisextrapolatedfromtheruntimeof1%ofDCN-Gsubnetswith200machines.Weassume100suchjobsrunninginparallel—eachjobprocesses1%ofsubnetsagainstallrules.Allthejobsofeachother.Wemakethefollowingobservationsfromourcluster-basedevaluation.Runtimeindifferentphases:Inalldatasets,thesumoftheruntimeinthephasesislargerthanthestart-to-endruntime.Thisisbecausethephasescanoverlapeachother.Thereisnodependencybetweendifferentmap-pingshards.Ashardthatnishesthemappingphasecanentertheshufingphasewithoutwaitingforothershards.However,thereisaglobalbarrierbetweenmap-pingandreducingphasessinceMapReducerequiresareducertoreceiveallintermediatevaluesbeforestart-ing.Hence,thesumofruntimeofmappingandreducingphasesroughlyequalsthetotalruntime.Table4showstheoverlappingprogress(inbytes)ofallthreephasesinananalysisofDCN.Shared-clusteroverhead:ThesenumbersarealowerboundofwhatLibracanachievefortworeasons:First,theclusterissharedwithotherprocessesandlacksper-formanceisolation.Inallexperiments,Librauses8threads.However,theCPUutilizationisbetween100%and350%on12-coremachines,whereasitcanachieve800%onadedicatedmachine.Second,themachinesstartprocessingatdifferenttimes—eachmachinemayneeddifferenttimesforinitialization.Hence,allthema-chinesarenotrunningatfull-speedfromthestart.ParallelismamortizesI/Ooverhead:Throughdetailedcounters,wefoundthatunlikeinthesinglemachinecase(whereI/Oisthebottleneck),themappingandreducing USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation97 Figure13:Libraruntimeincreaseslinearlywithnetworksize.timedominatesthetotalruntime.Wehaveseen75%–80%oftimespentonmapping/reducing.ThisisbecausetheaggregatedI/Obandwidthofallmachinesinaclusterismuchhigherthanasinglemachine.TheI/Oisfasterthanthecomputationthroughput,whichmeansthreadswillnotstarve.LinearscalabilityFigure13showshowLibrascaleswiththesizeofthenetwork.WechangethenumberofdevicesintheDCNnetwork,effectivelychangingboththesizeoftheforwardinggraphandthenumberofrules.Wedonotchangethenumberofsubnets.Ourexperimentsruntheloopdetectionon50machines,asintheprevioussection.ThegureshowsthattheLibraruntimescaleslinearlywiththenumberofrules.Thereducephasegrowsmoreerraticallythanthemappingtime,becauseitisaffectedbybothnodesandedgesinthenetwork,whilemappingonlydependsonthenumberofrules.Libra’sruntimeisnotnecessarilyinverselypropor-tionaltothenumberofmachinesused.Thelinearscal-abilityonlyapplieswhenmappingandreducingtimedominate.Infact,weobservethatmoremachinescantakelongertonishajob,becausetheoverheadoftheMapReducesyst

12 emcanslowdownLibra.Forexample,ifwehavemo
emcanslowdownLibra.Forexample,ifwehavemoremapping/reducingshards,weneedtospendanadditionaloverheadondiskshufing.WeomitthedetaileddiscussionasitdependsonthespecicsoftheunderlyingMapReduceframework.IncrementalUpdatesLibracanupdateforwardinggraphsincrementallyasweaddanddeleterulesinthenetwork,asshownintion5.3.Tounderstanditsperformance,wecanbreak-downLibra’sincrementalcomputationintotwosteps:(1)timespentinprexmatching(mapphase)tondwhichsubnetsareaffected,and(2)timetodoanincre-mentalDFSstartingfromthenodewhoseroutingentrieshavechanged(reducephase).Wealsoreportthetotal Map( Reduce(ms) Memory(MB) DCN 0.133 0.62 12 DCN-G 0.156 1.76 412 INET 0.158 0.01 7 Table5:Breakdownofruntimeforincrementalloopchecks.Theunitformapphaseismicrosecondandtheunitforreducephaseismillisecond.heapmemoryallocated.Wemeasuredthetimeforeachofthecomponentsasfollows:(1)forprexmatching,werandomlyselectrulesandndoutallmatchingsubnetsusingthealgo-rithmdescribedinSection5.1,and(2)forincrementalDFS,westartedanewDFSfromrandomlychosennodesinthegraph.Bothresultsareaveragedacross1000tests.TheresultsareshowninTable5First,weveriedthatnomatterhowlargethesub-nettrieis,prexmatchingtakesalmostconstanttime:DCN-G’ssubnettrieis100timeslargerthanDCN-G’sbuttakesalmostthesametime.Second,theresultsalsoshowthattheruntimeforincrementalDFSislikelytobedominatedbyI/Oratherthancompute,becausethesizeoftheforwardinggraphdoesnotexceedthesizeofthephysicalnetwork.Eventhelargestdataset,DCN-G,hasonlyaboutamillionnodesand10millionedges,whichtsinto412MBytesofmemory.Thismillisecondrun-timeiscomparabletoresultsreportedin[]and[butnowonmuchbiggernetworks.LimitationsofLibraLibraisdesignedforstaticheaders:Libraisfasterandmorescalablethanexistingtoolsbecauseitsolvesanar-rowerproblem;itassumespacketsareonlyforwardedbasedonIPprexes,andthatheadersarenotmodiedalongtheway.Unlike,sayHSA,Libracannotprocessagraphthatforwardsonanarbitrarymixofheaders,sinceitisnotobvioushowtocarrymatchinginformationfrommapperstoreducers,orhowtopartitiontheproblem.Aswithotherstaticcheckers,Libracannothandlenon-deterministicnetworkbehaviorordynamicforward-ingstate(e.g.,NAT).Itrequiresacomplete,staticsnap-shotofforwardingstatetoverifycorrectness.Moreover,Libracannottellaforwardingstateisincorrectorhowitwillevolveasitdoesnotinterpretcontrollogic.LibraisdesignedtoslicethenetworkbyIPsubnet:Ifheadersaretransformedinadeterministicway(e.g.,staticNATandIPtunnels),Libracanbeextendedbycombiningresultsfrommultipleforwardinggraphsattheend.Forexample,192.168.0/24intheIntra-DCnet-workmaybetranslatedto10.0.0/24intheInter-DCnet-work.Libracanconstructforwardinggraphsforboth192.168.0/24and10.0.0/24.Whenanalyzingthetwo 11 9811th USENIX Symposium on Networked Systems Design and Implementation USENIX Association subgraphswecanaddanedgetoconnectthem.Forwardinggraphtoobigforasingleserver:scaleslinearlywithbothsubnetsandrules.However,asinglereducerstillcomputestheentireforwardinggraph,whichmightstillbetoolargeforasingleserver.Sincethereducespeeddependsonthesizeofthegraph,wecouldusedistributedgraphlibraries[]inthereducephasetoaccelerateLibra.Subnetsmustbecontainedbyaforwardingrule:ordertobreakthenetworkintooneforwardinggraphpersubnet,Libraexaminesalltheforwardingrulestodecidewhichrulesthesubnet.Thisisapracticalas-sumptionbecause,inmostnetworks,theruleisaprexaggregatesmanysubnets.However,iftherulehasalonger,morespe

13 cicprex(e.g.,itisforroutingtoa
cicprex(e.g.,itisforroutingtoaspecicend-hostorrouterconsole)thanthesubnet’s,theforwardinggraphwouldbecomplicatedsincetherule,representedasanedgeinthegraph,doesnotapplytoalladdressesofthesubnet.Inthiscase,onecanuseVeri-ow[]’snotionofequivalenceclassestoacquiresub-netsdirectlyfromtherulesthemselves.Thistechniquemayserveasanalternativewaytondallmatching(sub-net,rule)pairs.Weleavethisforfuturework.RelatedWorkStaticdataplanecheckers:Xieet.alintroducedal-gorithmstoanalyzereachabilityinIPnetworks[Anteater[]makesthempracticalbyconvertingthecheckingproblemintoaBooleansatisabilityprob-lemandsolvingitwithSATsolvers.Headerspaceanalysis[]tacklesgeneralprotocol-independentstaticcheckingusingageometricmodelandfunctionalsimu-lation.Recently,NetPlumber[]andVeriow[]showthat,forsmallnetworks(comparedtotheoneswecon-siderhere)staticcheckingcanbedoneinmillisecondsbytrackingthedependencybetweenrules.Specically,Veriowslicesthenetworkintoequivalenceclassesbuildsaforwardinggraphforeachclass,inasimilarfashiontoLibra.However,withtheexceptionofNetPlumber,allofthesetoolsandalgorithmsassumecentralizedcomput-ing.NetPlumberintroducesa“ruleclustering”tech-niqueforscalabilty,observingthatruledependenciescanbeseparatedintoseveralrelativelyEachclusterisassignedtoaprocesssothatruleupdatescanbehandledindividually.However,thebenetsofparallelismdiminishwhenthenumberofworkersex-ceedsthenumberofnaturalclustersintheruleset.Incontrast,Librascaleslinearlywithbothrulesandsub-nets.Specically,eventworuleshavedependency,Libracanstillplacethemintodifferentmapshards,andallowreducerstoresolvetheconicts.Othernetworktroubleshootingtechniques:networktroubleshootingtoolsfocusonavarietyofnet-workcomponents.Specically,theexplicitlylayeredde-signofSDNfacilitatessystematictroubleshooting[Effortsinformallanguagefoundations[]andmodel-checkingcontrolprograms[]reducetheprobabilityofbuggycontrolplanes.Thisefforthasbeenrecentlyex-tendedtotheembeddedsoftwareonswitches[].How-ever,basedonourexperience,multiplesimultaneouswritersinadynamicenvironmentmakedevelopingabug-freecontrolplaneextremelydifcult.Activetestingtools[]revealtheinconsistencybe-tweentheforwardingtableandtheactualforwardingstatebysendingoutspeciallydesignedprobes.Theycandiscoverruntimepropertiessuchascongestion,packetloss,orfaultyhardware,whichcannotbedetectedbystaticcheckingtools.LibraisorthogonaltothesetoolssincewefocusonforwardingtablecorrectnessResearchershaveproposedsystemstoextractabnor-malitiesfromeventhistories.STS[]extracts“minimalcausalsequences”fromcontrolplaneeventhistorytoex-plainaparticularcrashorotherabnormalities.NDB[compilespackethistoriesandreasonsaboutdataplanecorrectness.Thesemethodsavoidtakingastablesnap-shotfromthenetwork.Today’snetworksrequirewaytoomuchhumanin-terventiontokeepthemworking.Asnetworksgetlargerandlargerthereishugeinterestinautomatingthecontrol,error-reporting,troubleshootinganddebugging.Untilnow,therehasbeennowaytoautomaticallyver-ifyalltheforwardingbehaviorinanetworkwithtensofthousandsofswitches.LibraisfastbecauseitfocusesoncheckingtheIP-onlyfabriccommonlyusedindatacen-ters.Libraisscalablebecauseitcanbeimplementedus-ingMapReduceallowingittoharnesslargenumbersofservers.Inourexperiments,Libracanmeetthebench-markgoalwesetouttoachieve:itcanverifythecor-rectnessofa10,000-nodenetworkin1minuteusing50servers.Infuture,weexpecttoolslike

14 Libratocheckthecorrectnessofevenlargerne
Libratocheckthecorrectnessofevenlargernetworksinreal-time.Modernlargenetworkshavegonefarbeyondwhathumanoperatorscandebugwiththeirwisdomandin-tuition.Ourexperienceshowsthatitalsogoesbeyondwhatsinglemachinecancomfortablyhandle.WehopethatLibraisjustthebeginningofbringingdistributedcomputingintothenetworkvericationworld.ReferencesencesBoostGraphLibrary.Library.M.Canini,D.Venzano,P.Peresini,D.Kostic,andJ.Rex-ford.ANICEWaytoTestOpenFlowApplications. 12 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation99 K.M.ChandyandL.Lamport.Distributedsnapshots:determiningglobalstatesofdistributedsystems.ACMToCS,1985.1985.T.Condie,N.Conway,P.Alvaro,J.M.Hellerstein,K.Elmeleegy,andR.Sears.MapReduceOnline.Online.J.DeanandS.Ghemawat.MapReduce:SimpliedDataProcessingonLargeClusters.,2004.2004.N.Foster,A.Guha,M.Reitblatt,A.Story,M.Freed-man,N.Katta,C.Monsanto,J.Reich,J.Rexford,C.Schlesinger,D.Walker,andR.Harrison.LanguagesforSoftware-DenedNetworks.IEEECommunicationsMagazine,2013.2013.N.Handigol,B.Heller,V.Jeyakumar,D.Mazieres,andN.McKeown.WhereistheDebuggerformySoftware-DenedNetwork?,2012.2012.B.Heller,C.Scott,N.McKeown,S.Shenker,A.Wund-sam,H.Zeng,S.Whitlock,V.Jeyakumar,N.Handigol,J.McCauley,K.Zaris,andP.Kazemian.LeveragingSDNlayeringtosystematicallytroubleshootnetworks.,2013.2013.P.Kazemian,M.Chang,H.Zeng,G.Varghese,N.McKe-own,andS.Whyte.RealTimeNetworkPolicyCheckingusingHeaderSpaceAnalysis.,2013.2013.P.Kazemian,G.Varghese,andN.McKeown.HeaderSpaceAnalysis:StaticCheckingforNetworks.orks.A.Khurshid,X.Zou,W.Zhou,M.Caesar,andP.B.God-frey.VeriFlow:VerifyingNetwork-WideInvariantsinRealTime.,2013.2013.M.Kuzniar,P.Peresini,M.Canini,D.Venzano,andD.Kostic.ASOFTWayforOpenFlowSwitchInteroper-abilityTesting.,2012.2012.H.Mai,A.Khurshid,R.Agarwal,M.Caesar,P.B.God-frey,andS.T.King.Debuggingthedataplanewithanteater.,2011.2011.K.MarzulloandG.Neiger.DetectionofglobalstateSpringer,1992.1992.D.L.Mills.Internettimesynchronization:thenetworktimeprotocol.IEEETransactionsonCommunicationsCommunicationsTheParallelBoostGraphLibrary..ProtocolBuffers.fers.RouteViews.ws.C.Scott,A.Wundsam,S.Whitlock,A.Or,E.Huang,K.Zaris,andS.Shenker.HowDidWeGetIntoThisMess?IsolatingFault-InducingInputstoSDNControlSoftware.TechnicalReportUCB/EECS-2013-8,2013.2013.N.Spring,R.Mahajan,D.Wetherall,andT.Anderson.MeasuringISPtopologieswithrocketfuel.IEEE/ACMTON,2004.2004.R.Tarjan.Depth-rstsearchandlineargraphalgorithms.12thAnnualSymposiumonSwitchingandAutomataThe-,1971.1971.G.Xie,J.Zhan,D.Maltz,H.Zhang,A.Greenberg,G.Hjalmtysson,andJ.Rexford.OnstaticreachabilityanalysisofIPnetworks.,2005.2005.H.Zeng,P.Kazemian,G.Varghese,andN.McKeown.AutomaticTestPacketGeneration.,2012. 13 Libra: Divide and Conquer to Verify Forwarding Tables in Huge NetworksHongyi Zeng, Stanford University; Shidong Zhang and Fei Ye, Google; Vimalkumar Jeyakumar, Stanford University; Mickey Ju and Junda Liu, Google; Nick McKeown, Stanford University; Amin Vahdat, Google and University of California, San Diegohttps://www.usenix.org/conference/nsdi14/technical-sessions/presentation/zeng This paper is included in the Proceedings of the11th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI ’14).April 2–4, 2014 • Seattle, WA, USAISBN 978-1-931971-09-6Open access to the Proceedings of the11th USENIX Symposium onNetworked Systems Design andImplementation (NSDI ’14)is sponsored by