LibraDivideandConquertoVerifyForwardingTablesinHugeNetworksHongyiZengShidongZhangFeiYeVimalkumarJeyakumar HongyiZengandVimalkumarJeyakumarwereinternsatGooglewhenthisworkwasdoneHongyiZengiscurrent ID: 291771
Download Pdf The PPT/PDF document "USENIX Association 11th USENIX Symposium..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation87 Libra:DivideandConquertoVerifyForwardingTablesinHugeNetworksHongyiZeng,ShidongZhang,FeiYe,VimalkumarJeyakumar HongyiZengandVimalkumarJeyakumarwereinternsatGooglewhenthisworkwasdone.HongyiZengiscurrentlywithFacebook.byrareboundaryconditionsareparticularlydifculttond.Commonroutingfailuresincluderoutingloopsandblack-holes(wheretrafctoonepartofthenetworkdis- 1 8811th USENIX Symposium on Networked Systems Design and Implementation USENIX Association Fotwatdipi Gtarh Ptefix A’s Ptefix B’s LOOP! (1) Mar (4) Reduee Ptefix A Ptefix B Ptoeess BoupdatyFigure1:Libradividesthenetworkintomultipleforwardinggraphsinmappingphase,andchecksgraphpropertiesinre-ducingphase. 194016802144 194016801144 U41:194016802144→U11194016801144→U14 U44:194016802144→U11194016801144→U14 U11:194016801144→U41, U44194016802144→DIRECT U14:194016802144→U41, U44194016801144→DIRECT Uubpets U41 U44 U11 U14Figure2:SmallnetworkexamplefordescribingthetypesofforwardingerrorfoundbyLibra.LibrausesMapReduceforverication.Itstartswiththefullgraphofswitches,eachwithitsownprextable.AsdepictedinFigure1,Libracompletesvericationintwophases.Inthephase,itbreaksthegraphintoanumberofslices,oneforeachprex.Thesliceconsistsofonlythoseforwardingrulesusedtoroutepacketstothedestination.Inthereducephase,Libraindepedentlyanalyzeseachslice,representedasaforwardinggraph,inparallelforroutingfailures.WeevaluateLibraontheforwardingtablesfromthreedifferentnetworks.First,DCNisanemulateddatacenternetworkwith2millionrulesand10,000switches.Second,DCN-Gismadefrom100replicasofDCNconnectedtogether;i.e.,1millionswitches.Third,INETisanetworkwith300IPv4routerseachcon-tainsthefullBGPtablewithhalfamillionrules.Theresultsareencouraging.Libratakesoneminutetocheckforloopsandblack-holesinDCN,15minutesforDCN-Gand1.5minutesforINET.ForwardingErrorsAsmalltoynetworkcanillustratethreecommontypesoferrorfoundinforwardingtables.Inthetwo-leveltreenetworkinFigure2twotop-of-rack(ToR)switches(S11,S12)areconnectedtotwospineswitches(S21, U11 U41 U44 U14 (a) Notmal U11 U41 U44 U14 (b) Loors U11 U41 U44 U14 (e) Blaekhole U11 U41 U44 U14 (d) Ipeotteet Uparshot Figure3:Forwardinggraphsfor192.168.0/24asinFigure2andpotentialabnormalities.S22).ThedownlinksfromS11andS12connecttoupto254serversonthesame/24subnet.Thegureshowsacorrectsetofforwardingtables.Notethatourex-amplenetworkusesmultipathrouting.PacketsarrivingatS12ontherightanddestinedtosubnet192.168.0/24ontheleftareload-balancedoverswitchesS21andS22.Ourtoynetworkhas8rules,and2subnets.forwardinggraphisadirectedgraphthatdenesthenetworkbehaviorforeachsubnet.Itcontainsalistof(local switch,remote switch)pairs.Forexample,inFigure3(a),anarrowfromS12toS21meansthepack-etsofsubnet192.168.0/24canbeforwardedfromS12toS21.Multipathroutingcanberepresentedbyanodethathasmorethanoneoutgoingedge.Figure3illustratesthreetypesofforwardingerrorinoursimplenetwork,depictedinforwardinggraphs.Figure3(b)showshowanerrorinS11sfor-wardingtablescausesa.Insteadofforwarding192.168.0/24downtotheservers,S11forwardspacketsup,i.e.,toS21andS22.S11sforwardingtableisnow:Thenetworkhastwoloops:S21-S11-S21andS22-S11-S22,andpacketsaddressedto192.168.0/24willneverreachtheirdestination.Figure3(c)showswhathap-pensifS22losesoneofitsforwardingentries:.Inthiscase,ifS12spreadspacketsdestinedto192.168.0/24overbothS21andS22,packetsarrivingtoS22willbedropped.IncorrectSnapshotFigure3(d)showsasubtleprob-lemthatcanleadtofalsepositiveswhenverifyingfor-wardingtables.SupposethelinkbetweenS11-S22goesdown.Twoeventstakeplace(shownasdashedarrows 2 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation89 Figure4:Routingrelatedticketsbymonthandtype.inthegure):1:S22deletes2:S12stopsforwardingpacketstoS22.Becauseoftheasynchronousnatureofroutingupdates,thetwoeventscouldtakeplaceineitherorderAsnapshotmaycaptureoneevent,butnottheother,ormightdetectthemhappeninginthereverseorder.ThesequencecreatesatemporaryblackholeasinFigure3(c),whereasthedesiredsequencedoesnot.Toavoidraisinganunnecessaryalarm(byeventhoughitdidnothappen),ormissinganerroraltogether(byincorrectlyassumingthathappened),Libramustdetectthecorrectstateofthenetwork.Real-worldFailureExamplesTounderstandhowoftenforwardingerrorstakeplace,weexaminedalogofbugticketsfrom14monthsofoperationinalargeGoogledatacenter.Figure4gorizes35ticketsformissingforwardingentries,11forloops,and11forblack-holes.Onaverage,fourissuesarereportedpermonth.Today,forwardingerrorsaretrackeddownbyhandwhich-giventhesizeofthenetworkandthenumberofentries-oftentakesmanyhours.Andbecausethediagnosisisdoneaftertheerroroccurred,thesequenceofeventscausingtheerrorhasusuallylong-sincedisap-pearedbeforethediagnosisstarts.Thismakesithardtoreproducetheerror.Case1:DetectingLoops.Onetypeofloopiscausedbyprexaggregation.Prexesareaggregatedtocom-pacttheforwardingtables:aclustercanadvertiseaprextoreachalloftheserversconnectedbe-lowittothecore,whichusuallyincludesthead-dressesofserversthathavenotyetbeendeployed.How-ever,packetsdestinedtothesenon-deployedaddresses(e.g.,duetomachinemaintenance)cangetstuckinloops.Thisisbecausebelievesthesepacketsaredes-tinedto,whilelackstheforwardingrulestodigestthesepacketsduetotheincompletedeployment,instead,sdefaultrulesleadpacketsbacktoThisfailuredoesnotcauseaservicetofail(becausetheservicewilluseotherserversinstead),butitdoesde-gradethenetworkcausingunnecessarycongestion.Inthepast,theseerrorswereignoredbecauseofthepro-hibitivecostofperformingafullclustercheck.Libracannishcheckinginlessaminute,andidentifyandreportthespecicswitchandprexentrythatareatrisk.Case2:DiscoveringBlack-holes.Inoneincident,trafcwasinterruptedtohundredsofservers.Initialin-vestigationshowedthatsomeprexeshadhighpacketlossrate,butpacketsseemedtobediscardedrandomly.Ittookseveraldaystonallyuncovertherootcause:AsubsetofroutinginformationwaslostduringBGPup-datesbetweendomains,likelyduetoabugintheroutingsoftware,leadingtoblack-holes.Librawilldetectmissingforwardingentriesquickly,reducingtheoutagetime.Librasstablesnapshotsalsoallowittodisambiguatetemporarystatesduringupdatesfromlong-termback-holes.Case3:IdentifyingInconsistencies.Networkcon-trolrunsacrossseveralinstances,whichmayfailfromtimetotime.Whenasecondarybecomestheprimary,itresultsinaurryofchangestotheforwardingta-bles.Thechangesmaytemporarilyorpermanentlycon-ictwiththepreviousforwardingstate,particularlyifthechangeoveritselffailsbeforecompleting.Thenetworkcanbeleftinaninconsistentstate,leadingtopacketloss,black-holesandloops.LessonsLearnedSimplethingsgowrong:RoutingerrorsoccureveninnetworksusingrelativelysimpleIPforwarding.Theyalsooccurduetormwareupgrades,controllerfailureandsoftwarebugs.Itisessentialtochecktheforwarding,outsidethecontrolsoftware.Multiplemovingparts:Thenetworkconsistsofmul-tipleinteractingsubsystems.Forexample,incase1above,Intra-DCroutingishandledlocally,butroutingisaglobalproperty.Thiscancreateloopsthatarehardtodetectlocallywithinasubsystem.Therearealsomul-tiplenetworkcontrollers.Inconsistentstatemakesithardforthecontrolplanetodetectfailuresonitsown.Scalematters:Largedatacenternetworksusemul-tipathrouting,whichmeanstherearemanyforwardingpathstocheck.Asthenumberofswitches,,growsthenumberofpathsandprextablesgrow,andthecomplex-ityofcheckingallroutesgrowswith.Itisessentialforastaticcheckertoscalelinearlywiththenetwork. 3 9011th USENIX Symposium on Networked Systems Design and Implementation USENIX Association StableSnapshotsItisnoteasytotakeanaccuratesnapshotofthefor-wardingstateofalarge,constantlychangingnetwork.ButifLibrarunsitsstaticchecksonasnapshotofthestatethatneveractuallyoccurred,itwillraisefalsealarmsandmissrealerrors.Wethereforeneedtocap-ture-andcheck-asnapshotoftheglobalforwardingstatethatactuallyexistedatoneinstantintime.WecallstablesnapshotsWhenisthestatestable?Alargenetworkisusuallycontrolledbymultipleroutingprocesseseachresponsi-bleforoneormoreswitches.Eachprocesssendstimes-tampedupdates,whichwecallroutingevents,toadd,modifyanddeleteforwardingentriesintheswitchesitisresponsiblefor.Libramonitorsthestreamofroutingeventstolearntheglobalnetworkstate.Findingthestablestateofaswitchiseasy:eachtableisonlywrittenbyoneroutingprocessusingasingleclock,andalleventsareprocessedinorder.Hence,Libracanreconstructastablestatesimplybyreplayingeventsintimestamporder.Bycontrast,itisnotobvioushowtotakeastablesnapshotofthestatewhendifferentroutingpro-cessesupdatetheirswitchesusingdifferent,unsynchro-nizedclocks.Becausetheclocksaredifferent,andeventsmaybedelayedinthenetwork,simplyreplayingtheeventsintimestampordercanresultinastatethatdidnotactuallyoccurinpractice,leadingtofalsepositivesormissederrors(Section2However,evenifwecannotpreciselysynchronizeclocks,wecanthedifferencebetweenanypairofclockswithhighcondenceusingNTP[].Andwecanboundhowout-of-dateaneventpacketis,byprior-itizingeventpacketsinthenetwork.Thus,everytimes-canbetreatedaslyinginanintervalboundstheuncertaintyofwhentheeventtookTheintervalrepresentsthenotionthatnetworkstatechangesatomicallyatsomeunknowntimeinstantwithintheinterval.Figure5showsanexampleofndingastablesnap-shotinstant.Itiseasytoseethatifnoroutingeventsarerecordedduringa2periodwecanbecondentthatnoroutingchangesactuallytookplace.Therefore,thesnapshotofthecurrentstateisstable(i.e.,accurate).Theorderofanytwopasteventsfromdifferentcessesisirrelevanttothecurrentstate,sincetheyare Notethatastablesnapshotisnotthesameasashot[],whichisonlyonepossiblestateofadistributedsystemthatmightnotactuallyhaveoccurredinpractice.Libraonlyconsidersprocessesthatcandirectlymodifytables.Whilemultiplehigh-levelprotocolscanco-exist(e.g.,OSPFandBGP),thereisusuallyonecommonlow-leveltablemanipulationAPI.Thepositiveandnegativeuncertaintiescanbedifferent,buthereweassumetheyarethesameforsimplicity.Aformalproofcanbefoundin[ Routing Process 1 Routing Process 2Flow of time Libraxxxxxxxx x2 Snapshot instantLibra's reconstruction of the timeline Time instant of Time instant when Figure5:Librasreconstructionofthetimelineofroutingevents,takingintoaccountboundedtimestampuncertaintyLibrawaitsfortwicetheuncertaintytoensuretherearenoout-standingevents,whichissufcienttodeducethatroutinghas 100ms 1s 10s 100s 1000s RIBupdatesinter-arrivaltime 90 95 99 Percentiles Figure6:CDFofinter-arrivaltimesofroutingeventsfromalargeproductiondatacenter.Routingeventsareverybursty:over95%ofeventshappenwithin400msofanotherevent.appliedtodifferenttableswithoutinterferingwitheachother(recallthateachtableiscontrolledbyonlyonepro-cess).SoLibraonlyneedstoreplayalleventsintimes-tamporder(toensureeventsforthetableareplayedinorder)toaccuratelyreconstructthecurrentstate.Thisobservationsuggestsasimplewaytocreateasta-blesnapshotbysimplywaitingforaquiet2periodwithnoroutingupdateevents.Feasibility:Theschemeonlyworksiftherearefrequentwindowsofsize2inwhichnoroutingeventstakeplace.Luckily,wefoundthatthesequietperiodshappenfre-quently:weanalyzedadayoflogsfromallroutingpro-cessesinalargeGoogledatacenterwithafewthousandFigure6showstheCDFoftheinter-arrivaltimesforthe28,445routingeventsreportedbytherout-ingprocessesduringtheday.Therstthingtonoticeistheburstinessover95%ofeventsoccurwithin400msofanotherevent,whichmeanstherearelongperiodswhenthestateisstable.Table1showsthefractionoftimethenetworkisstable,fordifferentvaluesof.Asexpected,largerleadstofewerstablestatesandsmallerpercentageofstabletime.Forexample,whenonly2,137outofall28,445statesarestable.However, 4 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation91 /ms #ofstablestates timeinstablestate/% 0 28,445 100.00 1 16,957 99.97 100 2,137 99.90 1,000 456 99.75 10,000 298 99.60 Table1:Astheuncertaintyinroutingeventtimestamps()in-creases,thenumberofstablestatesdecreases.However,sinceroutingeventsarebursty,thestateisstablemostofthetime.becausetheeventstreamissobursty,theunstablestatesareextremelyshort-lived,occupyingintotalonly0.1%1.5min)oftheentireday.Putanotherway,for99.9%ofthetime,snapshotsarestableandthestaticanalysisresultistrustworthy.Takingstablesnapshots:Thestablesnapshotinstantprovidesareferencepointtoreconstructtheglobalstate.Librasstablesnapshotprocessworksasfollows:1)Takeaninitialsnapshotasthecombinationofallswitchesforwardingtables.Atthisstage,eachtablecanberecordedataslightlydifferenttime.2)Subscribetotimestampedeventstreamsfromallroutingprocesses,andapplyeachevent,intheorderoftheirtimestamps,toupdatethestatefrom3)Afterapplying,ifnoeventisreceivedfor2time,declarethecurrentsnapshotstable.Inotherwords,andallpasteventsformastablestatethatactuallyexistedatthistimeinstant.DivideandConquerAfterLibrahastakenastablesnapshotoftheforward-ingstate,itsetsouttostaticallycheckitscorrectness.Givenourgoalofcheckingnetworkswithover10,000switchesandmillionsofforwardingrules,wewillneedtobreakdownthetaskintosmaller,parallelcomputa-tions.Therearetwonaturalwaystoconsiderpartitioningtheproblem:Partitionbasedonswitches:Eachservercouldholdtheforwardingstateforaclusterofswitches,partition-ingthenetworkintoanumberofclusters.Wefoundthisapproachdoesnotscalewellbecausecheckingaforwardingrulemeanscheckingtherulesinmany(orall)partitions-thecomputationisquicklyboggeddownbycommunicationbetweenservers.Also,itishardtobalancethecomputationamongserversbecausesomeswitcheshaveverydifferentnumbersofforwardingrules(e.g.spineandleafswitches).Partitionbasedonsubnets:Eachservercouldholdtheforwardingstatetoreachasetofsubnets.Theservercomputestheforwardinggraphtoreacheachsubnet,thenchecksthegraphforabnormalities.Thedifcultywiththisapproachisthateachservermustholdtheen- U11:194016801144→U41, U44194016802144→DIRECT U14:194016802144→U41, U44194016801144→DIRECT U41:194016802144→U11194016801144→U14 U44:194016802144→U11194016801144→U14 Matehipi1Ulieipi 194016802144:U11→DIRECTU14→U41, U44U41→U11U44→U11 194016801144:U41→DIRECTU11→U41, U44U41→U14U44→U14 Gtarh Comrutipi U11 U41 U44 U14 194016802144 U11 U41 U44 U14 194016801144 Figure7:StepstochecktheroutingcorrectnessinFigure2tiresetofforwardingtablesinmemory,andanyupdatetotheforwardingrulesaffectsallservers.Librapartitionsthenetworkbasedonsubnets,forrea-sonsthatwillbecomeclear.Weobservethattheroutecheckerstaskcanbedividedintotwosteps.First,Libraforwardingruleswithsubnets,byndingthesetofforwardingrulesrelevanttoasubnet(i.e.,theyareassociatedifthesubnetisincludedintherulesprex).Second,Librabuildsaforwardinggraphtoreacheachsubnet,byassemblingallforwardingrulesforthesub-net.Bothstepsareembarrassinglyparallel:matchingisdoneper(subnet,forwardingrule)pair;andeachsub-netsforwardinggraphcanbeanalyzedindependently.Librathereforeproceedsinthreestepsusingservers:Step1-Matching:Eachserverisinitializedwiththeentirelistofsubnets,andeachserverisassigned1ofallforwardingrules.Theserverconsiderseachfor-wardingruleinturntoseeifitbelongstotheforwardinggraphtoasubnet(i.e.theforwardingruleisaprexofthesubnet).Ifthereisamatch,theserveroutputsthe(subnet,rule)pair.Notethatarulemaymatchmorethanonesubnet.Step2-Slicing:The(subnet,rule)pairsaregroupedbysubnet.Wecalleachgroupa,becauseitcontainsalltherulesandswitchesrelatedtothissubnet.Step3-GraphComputing:Theslicesaredistributedservers.Eachserverconstructsaforwardinggraphbasedontherulescontainedintheslice.Standardgraphalgorithmsareusedtodetectnetworkabnormal-ities,suchasloopsandblack-holes.Figure7showsthestepstocheckthenetworkinure2.Aftertheslicingstage,theforwardingrulesareor-ganizedintotwoslices,correspondingtothetwosubnets192.168.0/24and192.168.1/24.Theforwardinggraphforeachsliceiscalculatedandcheckedinparallel. Otherwise,asubnetwillbefragmentedbyamorespecicrule,leadingtoacomplexforwardinggraph.Seethelastparagraphintion9fordetaileddiscussion. 5 9211th USENIX Symposium on Networked Systems Design and Implementation USENIX Association UDN Copttollets Route DFU Uubpets Rules Uhatd 1 Rules Uhatd 4 Rules Uhatd 3 Rerott fot Uubpet 4 Figure8:Libraworkow.IfaroutingerroroccursandthesecondruleinS11,theloopwillshowupintheforwardinggraphfor192.168.0/24.S11willpointbacktoS21andS22,whichwillbecaughtingraphloopdetectionalgorithm.Ourthree-stepprocessiseasilymappedtoMapRe-duce,whichwedescribeinthenextsection.Libraconsistsoftwomaincomponents:arouteandaMapReduce-basedroutecheckerFigure8showsLibrasworkow.Theroutedumpertakesstablesnapshotsfromswitchesorcontrollers,andstorestheminadistributedlesystem.Next,thesnapshotisprocessedbyaMapReduce-basedchecker.AquickreviewofMapReduce:MapReduce[]di-videscomputationintotwophases:reduc-.Inthemappingphase,theinputispartitionedintosmallshards.Eachofthemisprocessedbyamap-perinparallel.Themapperreadsintheshardlinebylineandoutputsalistof\r\fpairs.Afterthemappingphase,theMapReducesystemshufesputsfromdifferentmappersbysortingbythekey.Aftershufing,eachreducerreceivesa\r\fpair,\r\n\t\r\risalistofallvaluescorrespondingtothe.Thereducerprocessesthislistandoutputsthenalresult.TheMapReducesys-temalsohandlescheckpointingandfailurerecovery.InLibra,thesetofforwardingrulesispartitionedintosmallshardsanddeliveredtomappers.Eachmapperalsotakesafullsetofsubnetstocheck,whichbydefaultcon-tainsallsubnetsinthecluster,butalternativelycanbesubsetsselectedbyuser.Mappersgenerateintermedi-atekeysandvalues,whichareshufedbyMapReduce.Thereducerscompilethevaluesthatbelongtothesamesubnetandgeneratenalreports. 1202020218 12040101134 12010204134 12010201134 12010202132 12010202116 Uubpet No Uubpet 1201020211612010202132< 12010201134Figure9:Findallmatchingsubnetsinthetrie.)isthesmallestmatchingtrienodebiggerthanthe).Hence,itschildrenwithsubnetsmatchtherule.Mappersareresponsibleforslicingnetworksbysubnet.Eachmapperreadsoneforwardingruleatatime.Ifasubnetmatchestherule,themapperout-putsthesubnetprexastheintermediatekey,alongwiththevalue \f.Thefollowingisanexample( , isomitted):\f\f\fSinceeachmapperonlyseesaportionoftheforward-ingrules,theremaybealongerandmorespecicbutunseenmatchingprexforthesubnetinthesamefor-wardingtable.Wendingthelongestmatchingtothereducers,whichseeallmatchingrules.Mappersarerstinitializedwithafulllistofsubnets,whicharestoredinanin-memorybinarytrieforfastpre-xmatching.Afterinitialization,eachmappertakesashardoftheroutingtable,andmatchestherulesagainstthesubnettrie.Thisprocessisdifferentfromthecon-ventionallongestprexmatching:First,inconventionalpacketmatching,rulesareplacedinatrieandpacketsarematchedonebyone.InLibra,webuildthetriewith.Second,thegoalisdifferent.Inconventionalpacketmatching,onelooksforthelongestmatchingrule.Here,mapperssimplyoutputmatchingsubnetsinthetrie.Here,matchinghasthesamemeaningthesubnetsprexmustfullyfallwithintherulesprex.Weuseatrietoefcientlyndallmatchingprexes,bysearchingforthesmallestmatchingtrienode)thatisbiggerorequaltotheruleprex(called 6 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation93 ).Here,smallandbigrefertothelexico-graphicalorder(notaddressspacesize),whereforeachbitinanIPaddress,wildcardmayormaynotcontainasubnet.Ifexists,weenumerateallitsnon-emptydecedents(includingitself).Otherwise,wede-clarethatthereexistnomatchingsubnetsinthetrie.ure9showsanexample.)isthesmall-estmatchingtrienodebiggerthantherule).Hence,itschildrenwithsubnetsmatchtherule.Proof:Webrieyprovewhythisalgorithmiscorrect.Inanuncompressedtrie,eachbitintheIPaddressisrepresentedbyonelevel,andsothealgorithmiscorrectbydenition:ifthereexistmatchingsubnetsinthetrie,mustexistinthetrieanditsdescendantscontainallmatchingprexes,whichmeansInacompressedtrie,nearbynodesmaybecombined.mayormaynotexistinthetrie.Ifitexists,theprob-lemreducestotheuncompressedtriescenario.Ifnotexistinthetrie,(ifitexists)containsallmatchingsubnetsinitsdescendants.Thisisbecause:Anynodesmallerthandoesnotmatch.Be-causethereisnonodebiggerthanandsmallerthanisnotthesmallestmatchingnode),alsomeans.Asaresult,cannotfallwithinrange.Thisisbecausefortofallwithin,allsnon-wildcardbitsshouldappearin,whichimpliesAnynodebiggerthanthebiggestdescendantsofdoesnotmatch.Otherwise,musthaveacommonancestor,wherebecauseboth,andistheancestorof(anodeisalwayssmallerthanitsdescendants).ThiscontradictstheassumptionthatissmallestmatchingnodeofTimecomplexity:Wecanbreakdownthetimecon-sumedbythemappingphaseintotwoparts.Thetimetoconstructthesubnettrieis,whereisthenumberofsubnets,becauseinsertinganIPprexintoatrietakesconstanttime(lengthofIPaddress).Ifweconsiderasinglethread,ittakestimetomatchrulesagainstthetrie.Sothetotaltimecomplexityis.Ifmapperssharethesametrie,wecanreducethetimeto .Here,weassume.If,onemaywanttoconstructatriewithrulesratherthansubnets(asinconventionallongest-prex-matching).Theoutputsfromthemappingphaseareshufedbyintermediatekeys,whicharethesubnets.Whenshufingnishes,areducerwillreceiveasubnet,alongwithanunorderedsetofvalues,eachcontaining , \r\f\n, .Thereducerrstselectsthehigh-estpriorityruleper :Forthesame ,therulewithhigherisse-lected;iftworuleshavethesamepriority,theonewithlarger ischosen.Thereducerthenconstructsadirectedforwardinggraphusingtheselectedrules.Oncethegraphisconstructed,thereducerusesgraphlibrarytoverifythepropertiesofthegraph,forexample,tocheckifthegraphisloop-free.Timecomplexity:Inmostnetworkswehaveseen,asubnetmatchesatmost2-4rulesintheroutingtable.Hence,selectingthehighestpriorityruleandconstruct-ingthegraphtakestime,whereisthenumberofphysicallinksinthenetwork.However,thetotalrun-timedependsonthevericationtask,aswewilldiscussSection6IncrementalUpdatesUntilnow,wehaveassumedLibracheckstheforward-ingcorrectnessfromscratcheachtimeitruns.Libraalsosupportsincrementalupdatesofsubnetsandforwardingrules,allowingittobeusedasanindependentcorrect-nesscheckingservicesimilartoNetPlumber[]andVeriow[].Inthisway,Libracouldbeusedtocheckforwardingrulesquickly,beforetheyareaddedtotheforwardingtablesintheswitches.Here,ain-memory,streamingMapReduceruntime(suchas[])isneededtospeeduptheeventprocessing.Subnetupdates.Eachtimeweaddasubnetforveri-cation,weneedtorerunthewholeMapReducepipeline.Themapperstakes timetondtherelevantrules.Andasinglereducertakestimetoconstructthedi-rectedgraphsliceforthenewsubnet.Ifonehasseveralsubnetstoadd,itisfastertoruntheminabatch,whichtakes insteadof tomap.Removingsubnetsistrivial.Allresultsrelatedtothesubnetsaresimplydiscarded.Forwardingruleupdates.Figure10showsthework-owtoaddnewforwardingrules.Tosupportincremen-talupdatesofrules,reducersneedtostoretheforward-inggraphforeachsliceitisresponsiblefor.Thereducercouldkeepthegraphinmemoryordiskthetrade-offisalargermemoryfootprint.Ifthegraphsareindisk,axednumberofidlereducerprocessesliveinthemem-oryandfetchgraphsuponrequest.Similarly,themap-persneedtokeepthesubnettrie.Toaddarule,amapperisspawnedjustasitseesan-otherlineofinput(Step1).Matchingsubnetsfromthetrieareshufedtomultiplereducers(Step2).Eachre-ducerreadsthepreviousslicegraph(Step3),andrecal-culatesitwiththenewrule(Step4).Deletingaruleissimilar.Themappertagstheruleastobedeletedandpassittoreducersforupdatingthe Atanytimeinstance,onlyasmallfractionofgraphswillbeup-dated,andsokeepingallstatesin-memorycanbequiteinefcient. 7 9411th USENIX Symposium on Networked Systems Design and Implementation USENIX Association Request(s) Uubpets Figure10:IncrementalruleupdatesinLibra.Mappersdis-patchmatchingsubnet,rulepairtoreducers,indexedbysubnet.Reducersupdatetheforwardinggraphandrecomputegraphproperties.slicegraph.However,inthegraphsadjacencylist,thereducernotonlyneedstostorethehighestpriorityrule,butalsomatchingrules.Thisisbecauseifahighestpriorityruleisdeleted,thereducermustusethesecondhighestpriorityruletoupdatethegraph.Besidesupdatinggraphs,incertaincases,graphprop-ertiescanalsobecheckedincrementally,sincetheupdateonlyaffectsasmallpartofgraph.Forexample,inloop-detection,addinganedgeonlyrequiresaDepth-First-Search(DFS)startingfromthenewedgesdestinationnode,whichnormallywillnottraversetheentiregraph.UnlikeNetPlumberandVeriow,Libradoesnotneedtoexplicitlyrememberthedependencybetweenrules.Thisisbecausethedependencyisalreadyencodedinthematchingandshufingphases.RouteDumperTheroutedumperrecordseachruleusingveelds: \r\fistheuniqueIDoftheswitch; istheprex.isalistofportnamesbecauseofmultipath.isanintegereldservingasatie-breakerinlongestprexmatching.Bystoringtheegressportsin,Libraencodesthetopologyinformationintheforwardingtable.Althoughtheforwardingtableformatismostlystraightforward,twocasesneedspecialhandling:Ingressportdependentrules.Someforwardingrulesdependonparticularingressports.Forexample,arulemayonlybeineffectforpacketsenteringtheswitchfrom.Inreducerswewanttoconstructasimpledi-rectedgraphthatcanberepresentedbyanadjacencylist.Passingthisingressportdependencytotheroutecheckerwillcomplicatethereducerdesign,sincethenexthopinthegraphdependsnotonlyonthecurrenthop,butalsoprevioushopWeusethenotionoflogicalswitchestosolvethisproblem.First,ifaswitchhasrulesthatdependontheingressport,wesplittheswitchintomultiplelogical VRFaOVERRIDE VRFaDEFAWLT VRFa1 VRFa4 Iptet-VRF VRFaFALLBACK Figure11:VirtualRoutingandForwarding(VRFs)aremul-tipletableswithinthesamephysicalswitch.Thetableshavedependency(inter-VRFrules)betweenthem.switches.Eachlogicalswitchisgivenanewnameandcontainstherulesdependingononeingressport,sothattheportisownedbythenewlogicalswitch.Wecopyrulesfromtheoriginalswitchtothelogicalswitch.Sec-ond,weupdatetherulesinupstreamswitchestoforwardtothelogicalswitch.Multipletables.Modernswitchescanhavemulti-pleforwardingtablesthatarechainedtogetherbyar-bitrarymatchingrules,usuallycalledVirtualRoutingandForwarding(VRF).Figure11depictsanexam-pleVRFsetup:incomingpacketsarematchedagainst OVERRIDE.Ifnoruleismatched,theyenter 1toVRF 16accordingtosometriggeringrules.Ifallmatchingfails,thepacketentersVRF DEFAULT.Theroutedumpermapsmultipletablesinaphysicalswitchintomultiplelogicalswitches,eachcontainingoneforwardingtable.Eachlogicalswitchconnectstootherlogicalswitchesdirectly.TheruleschainingtheseVRFsareaddedaslowestpriorityrulesinthelogicalswitchstable.Hence,ifnoruleismatched,thepacketwillcontinuetothenextlogicalswitchinthechain.UsecasesInLibra,thedirectedgraphconstructedbythereducerdataplaneinformationforaparticularsub-net.Inthisgraph,eachvertexcorrespondstoaforward-ingtablethesubnetmatched,andeachedgerepresentsapossiblelinkthepacketcantraverse.Thisgraphalsoencodesmultipathinformation.Therefore,routingcor-rectnessdirectlycorrespondstographproperties.:Areachabilitycheckensuresthesubnetcanbereachedfromanyswitchinthenetwork.Thispropertycanbeveriedbydoinga(reverse)DFSfromthesubnetswitch,andcheckingiftheresultingvertexsetcontainsallswitchesinthenetwork.Thevericationtakestimewhereisthenumberofswitchesthenumberoflinks.Loopdetection:Aloopinthegraphisequivalenttoatleastonestronglyconnectedcomponentinthedirectedgraph.Twoverticesbelongtoastronglycon-nectedcomponent,ifthereisapathfromapathfrom.Wendstronglyconnectedcom-ponentsusingTarjansAlgorithm[]whosetimecom- 8 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation95 plexityis:Aswitchisablack-holeforasubnetiftheswitchdoesnothaveamatchingrouteentryforthesub-net.Someblack-holesarelegitimate:iftheswitchisthelasthopforthesubnet,orthereisanexplicitdroprule.Implicitdroprulesneedtobecheckedifthatisbyde-sign.Black-holesmaptoverticeswithzeroout-degree,whichcanthereforebeenumeratedinWaypointrouting:Networkoperatorsmayrequiretraf-cdestinedtocertainsubnetstogothroughaway-point,suchasarewalloramiddlebox.Suchbehaviorcanbeveriedintheforwardinggraphbycheckingifthewaypointexistsonalltheforwardingpaths.Specically,onecanremovethewaypointandtheassociatedlinks,andverifythatnoedgeswitchesappearanymoreinaDFSoriginatedfromthesubnetsrsthopswitch,withtheruntimecomplexityofWehaveimplementedLibraforcheckingthecorrect-nessofSoftware-DenedNetwork(SDN)clusters.Eachclusterisdividedintoseveraldomainswhereeachdo-mainiscontrolledbyacontroller.ControllersexchangeroutinginformationandbuildtheroutingtablesforeachOurLibraprototypehastwosoftwarecomponents.Theroutedumper,implementedinPython,connectstoeachcontrolleranddownloadsroutingevents,forward-ingtablesandVRFcongurationsinProtocolBuffersfers17]formatinparallel.Italsoconsultsthetopologydatabasetoidentifythepeerofeachswitchlink.Oncetheroutinginformationisdownloaded,wepreprocessthedataasdescribedinSection5.4andstoreitinadis-tributedlesystem.TheroutecheckerisimplementedinC++asaMapRe-duceapplicationinabout500linesofcode.WeuseaTrielibraryforstoringsubnets,anduseBoostGraphLi-brary[]forallgraphcomputation.Thesamebinarycanrunatdifferentlevelsofparallelismonasinglema-chinewithmultipleprocesses,oronaclusterwithmul-tiplemachines,simplybychangingcommandlineags.AlthoughLibrasdesignsupportsincrementalup-dates,ourcurrentprototypeonlydoesbatchprocessing.Weusemicro-benchmarkstoevaluatethespeciccostsforincrementalprocessinginSection8.5,onasimpliedprototypewithonemapperandonereducer.EvaluationToevaluateLibrasperformance,werstmeasurestart-to-nishruntimeonasinglemachinewithmulti-threading,aswellasonmultiplemachinesinacluster.WealsodemonstrateLibraslinearscalabilityaswellasitsincrementalupdatecapability. Dataset Switches Rules Subnets DCN 11,260 2,657,422 11,136 DCN-G 1,126,001 265,742,626 1,113,600 INET 316 151,649,486 482,966 Table2:DatasetsusedforevaluatingLibra.DataSetsWeusethreedatasetstoevaluatetheperformanceofLibra.ThedetailedstatisticsareshowninTable2:DCNisanSDNtestbedusedtoevaluatethescal-abilityofthecontrollersoftware.SwitchesareemulatedbyOpenFlowagentsrunningoncommoditymachinesandconnectedtogetherthroughavirtualnetworkfabric.Thenetworkispartitionedamongcontrollers,whichex-changeroutinginformationtocomputetheforwardingstateforswitchesintheirpartition.DCNcontainsabout10thousandswitchesand2.6millionIPv4forwardingrules.VRFisusedthroughoutthenetwork.:TostresstestLibra,wereplicateDCN100timesbyshiftingtheaddressspaceinDCNsuchthateachDCN-parthasauniqueIPaddressspace.Asin-gletop-levelswitchinterconnectsalltheDCNpiecesto-gether.DCN-Ghas1millionswitchesand265millionforwardingrules.:INETisasyntheticwideareabackbonenetwork.First,weusetheSprintnetworktopologydiscoveredbyRocketFuelproject[],whichcontainsroughly300routers.Then,wecreateaninterfaceforeachprexfoundinafullBGPtablefromRouteViews[entriesasofJuly2013),andspreadthemrandomlyanduniformlytoeachrouteraslocalprexes.Finally,wecomputeforwardingtablesusingshortestpathrouting.SingleMachinePerformanceWestartourevaluationofLibrabyrunningloopde-tectionlocallyonadesktopwithIntel6-coreCPUand32GBmemory.Table3summarizestheresults.WehavelearnedseveralaspectsofLibrafromsinglema-chineevaluation:I/Obottlenecks:StandardMapReduceisdisk-based:Inputsarepipedintothesystemfromdisks,whichcancreateI/Obottlenecks.Forexample,onINET,readingfromdisktake15timeslongerthangraphcomputation.OnDCN,theI/Otimeismuchshorterduetothesmallernumberofforwardingrules.Infact,inbothcases,theI/OisthebottleneckandtheCPUisnotfullyutilized.Theruntimeremainsthesamewithorwithoutmapping.Hence,themappingphaseisomittedinTable3Memoryconsumption:InstandardMapReduce,inter-mediateresultsareushedtodiskafterthemappingphasebeforeshufing,whichisveryslowonasinglemachine.Weavoidthisbykeepingallintermediatestates 9 9611th USENIX Symposium on Networked Systems Design and Implementation USENIX Association Threads 1 2 4 6 8 Read/s 13.7 Shufe/s 7.4 Reduce/s 46.3 25.8 15.6 12.1 11.1 Speedup 1.00 1.79 2.96 3.82 4.17 a)DCNwith2000subnets Threads 1 2 4 6 8 Read/s 170 Shufe/s 3.8 Reduce/s 11.3 5.9 3.2 2.7 2.1 Speedup 1.00 1.91 3.53 4.18 5.38 b)INETwith10,000subnetsTable3:RuntimeofloopdetectiononDCNandINETdatasetsonsinglemachine.ThenumberofsubnetsisreducedcomparedTable2sothatallintermediatestatescantinthememory.Readandshufephasesaresingle-threadedduetotheframe-worklimitation. Figure12:Exampleprogresspercentage(inBytes)ofLibraonDCN.Thethreecurvesrepresent(fromlefttoright)Mapping,Shufing,andReducingphases,whicharepartiallyoverlap-ping.Thewholeprocessendsin57seconds.in-memory.However,itlimitsthenumberofsubnetsthatcanbeveriedatatimeintermediateresultsareallmatching(subnet,rule)pairs.Onasinglemachine,wehavetolimitthenumberofsubnetsto2000inDCNand10,000inINETtoavoidrunningoutofmemory.Graphsizedominatesreducingphase:ThereducingphaseonDCNissignicantlyslowerthanonINET,de-spiteINEThaving75timesmoreforwardingrules.Withasinglethread,Libracanonlyprocess43.2subnetspersecondonDCN,comparedwith885.0subnetspersec-ondonINET(20.5timesfaster).NotethatDCNhas35.6timesmorenodes.ThisexplainsthefasterrunningtimeonINET,sincethetimetodetectloopsgrowslinearlywiththenumberofedgesandnodesinthegraph.Multi-thread:Libraismulti-threaded,butthemulti-threadspeedupisnotlinear.Forexample,onDCN,using8threadsonlyresultedina4.17speedup.Thiseffectislikelyduetoinefcienciesinthethreadingimplementa-tionintheunderlyingMapReduceframework,althoughtheoretically,allreducerthreadsshouldruninparallelwithoutstatesharing. DCN DCN-G INET Machines 50 20,000 50 MapInput/Byte 844M 52.41G 12.04G ShufeInput/Byte 1.61G 16.95T 5.72G ReduceInput/Byte 15.65G 132T 15.71G MapTime/s 31 258 76.8 ShufeTime/s 32 768 76.2 ReduceTime/s 25 672 16 TotalTime/s 57 906 93 Table4:Runningtimesummaryofthethreedatasets.Shufeinputiscompressed,whilemapandreduceinputsareuncom-pressed.DCN-Gresultsareextrapolatedfromprocessing1%ofsubnetswith200machinesasasinglejob.WeuseLibratocheckforloopsagainstourthreedatasetsonacomputingcluster.Table4summarizesthere-sults.Libraspends57secondsonDCN,15minutesonDCN-G,and93secondsonINET.Toavoidoverloadingthecluster,theDCN-Gresultisextrapolatedfromtheruntimeof1%ofDCN-Gsubnetswith200machines.Weassume100suchjobsrunninginparalleleachjobprocesses1%ofsubnetsagainstallrules.Allthejobsofeachother.Wemakethefollowingobservationsfromourcluster-basedevaluation.Runtimeindifferentphases:Inalldatasets,thesumoftheruntimeinthephasesislargerthanthestart-to-endruntime.Thisisbecausethephasescanoverlapeachother.Thereisnodependencybetweendifferentmap-pingshards.Ashardthatnishesthemappingphasecanentertheshufingphasewithoutwaitingforothershards.However,thereisaglobalbarrierbetweenmap-pingandreducingphasessinceMapReducerequiresareducertoreceiveallintermediatevaluesbeforestart-ing.Hence,thesumofruntimeofmappingandreducingphasesroughlyequalsthetotalruntime.Table4showstheoverlappingprogress(inbytes)ofallthreephasesinananalysisofDCN.Shared-clusteroverhead:ThesenumbersarealowerboundofwhatLibracanachievefortworeasons:First,theclusterissharedwithotherprocessesandlacksper-formanceisolation.Inallexperiments,Librauses8threads.However,theCPUutilizationisbetween100%and350%on12-coremachines,whereasitcanachieve800%onadedicatedmachine.Second,themachinesstartprocessingatdifferenttimeseachmachinemayneeddifferenttimesforinitialization.Hence,allthema-chinesarenotrunningatfull-speedfromthestart.ParallelismamortizesI/Ooverhead:Throughdetailedcounters,wefoundthatunlikeinthesinglemachinecase(whereI/Oisthebottleneck),themappingandreducing USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation97 Figure13:Libraruntimeincreaseslinearlywithnetworksize.timedominatesthetotalruntime.Wehaveseen75%80%oftimespentonmapping/reducing.ThisisbecausetheaggregatedI/Obandwidthofallmachinesinaclusterismuchhigherthanasinglemachine.TheI/Oisfasterthanthecomputationthroughput,whichmeansthreadswillnotstarve.LinearscalabilityFigure13showshowLibrascaleswiththesizeofthenetwork.WechangethenumberofdevicesintheDCNnetwork,effectivelychangingboththesizeoftheforwardinggraphandthenumberofrules.Wedonotchangethenumberofsubnets.Ourexperimentsruntheloopdetectionon50machines,asintheprevioussection.ThegureshowsthattheLibraruntimescaleslinearlywiththenumberofrules.Thereducephasegrowsmoreerraticallythanthemappingtime,becauseitisaffectedbybothnodesandedgesinthenetwork,whilemappingonlydependsonthenumberofrules.Librasruntimeisnotnecessarilyinverselypropor-tionaltothenumberofmachinesused.Thelinearscal-abilityonlyapplieswhenmappingandreducingtimedominate.Infact,weobservethatmoremachinescantakelongertonishajob,becausetheoverheadoftheMapReducesystemcanslowdownLibra.Forexample,ifwehavemoremapping/reducingshards,weneedtospendanadditionaloverheadondiskshufing.WeomitthedetaileddiscussionasitdependsonthespecicsoftheunderlyingMapReduceframework.IncrementalUpdatesLibracanupdateforwardinggraphsincrementallyasweaddanddeleterulesinthenetwork,asshownintion5.3.Tounderstanditsperformance,wecanbreak-downLibrasincrementalcomputationintotwosteps:(1)timespentinprexmatching(mapphase)tondwhichsubnetsareaffected,and(2)timetodoanincre-mentalDFSstartingfromthenodewhoseroutingentrieshavechanged(reducephase).Wealsoreportthetotal Map( Reduce(ms) Memory(MB) DCN 0.133 0.62 12 DCN-G 0.156 1.76 412 INET 0.158 0.01 7 Table5:Breakdownofruntimeforincrementalloopchecks.Theunitformapphaseismicrosecondandtheunitforreducephaseismillisecond.heapmemoryallocated.Wemeasuredthetimeforeachofthecomponentsasfollows:(1)forprexmatching,werandomlyselectrulesandndoutallmatchingsubnetsusingthealgo-rithmdescribedinSection5.1,and(2)forincrementalDFS,westartedanewDFSfromrandomlychosennodesinthegraph.Bothresultsareaveragedacross1000tests.TheresultsareshowninTable5First,weveriedthatnomatterhowlargethesub-nettrieis,prexmatchingtakesalmostconstanttime:DCN-Gssubnettrieis100timeslargerthanDCN-Gsbuttakesalmostthesametime.Second,theresultsalsoshowthattheruntimeforincrementalDFSislikelytobedominatedbyI/Oratherthancompute,becausethesizeoftheforwardinggraphdoesnotexceedthesizeofthephysicalnetwork.Eventhelargestdataset,DCN-G,hasonlyaboutamillionnodesand10millionedges,whichtsinto412MBytesofmemory.Thismillisecondrun-timeiscomparabletoresultsreportedin[]and[butnowonmuchbiggernetworks.LimitationsofLibraLibraisdesignedforstaticheaders:Libraisfasterandmorescalablethanexistingtoolsbecauseitsolvesanar-rowerproblem;itassumespacketsareonlyforwardedbasedonIPprexes,andthatheadersarenotmodiedalongtheway.Unlike,sayHSA,Libracannotprocessagraphthatforwardsonanarbitrarymixofheaders,sinceitisnotobvioushowtocarrymatchinginformationfrommapperstoreducers,orhowtopartitiontheproblem.Aswithotherstaticcheckers,Libracannothandlenon-deterministicnetworkbehaviorordynamicforward-ingstate(e.g.,NAT).Itrequiresacomplete,staticsnap-shotofforwardingstatetoverifycorrectness.Moreover,Libracannottellaforwardingstateisincorrectorhowitwillevolveasitdoesnotinterpretcontrollogic.LibraisdesignedtoslicethenetworkbyIPsubnet:Ifheadersaretransformedinadeterministicway(e.g.,staticNATandIPtunnels),Libracanbeextendedbycombiningresultsfrommultipleforwardinggraphsattheend.Forexample,192.168.0/24intheIntra-DCnet-workmaybetranslatedto10.0.0/24intheInter-DCnet-work.Libracanconstructforwardinggraphsforboth192.168.0/24and10.0.0/24.Whenanalyzingthetwo 11 9811th USENIX Symposium on Networked Systems Design and Implementation USENIX Association subgraphswecanaddanedgetoconnectthem.Forwardinggraphtoobigforasingleserver:scaleslinearlywithbothsubnetsandrules.However,asinglereducerstillcomputestheentireforwardinggraph,whichmightstillbetoolargeforasingleserver.Sincethereducespeeddependsonthesizeofthegraph,wecouldusedistributedgraphlibraries[]inthereducephasetoaccelerateLibra.Subnetsmustbecontainedbyaforwardingrule:ordertobreakthenetworkintooneforwardinggraphpersubnet,Libraexaminesalltheforwardingrulestodecidewhichrulesthesubnet.Thisisapracticalas-sumptionbecause,inmostnetworks,theruleisaprexaggregatesmanysubnets.However,iftherulehasalonger,morespecicprex(e.g.,itisforroutingtoaspecicend-hostorrouterconsole)thanthesubnets,theforwardinggraphwouldbecomplicatedsincetherule,representedasanedgeinthegraph,doesnotapplytoalladdressesofthesubnet.Inthiscase,onecanuseVeri-ow[]snotionofequivalenceclassestoacquiresub-netsdirectlyfromtherulesthemselves.Thistechniquemayserveasanalternativewaytondallmatching(sub-net,rule)pairs.Weleavethisforfuturework.RelatedWorkStaticdataplanecheckers:Xieet.alintroducedal-gorithmstoanalyzereachabilityinIPnetworks[Anteater[]makesthempracticalbyconvertingthecheckingproblemintoaBooleansatisabilityprob-lemandsolvingitwithSATsolvers.Headerspaceanalysis[]tacklesgeneralprotocol-independentstaticcheckingusingageometricmodelandfunctionalsimu-lation.Recently,NetPlumber[]andVeriow[]showthat,forsmallnetworks(comparedtotheoneswecon-siderhere)staticcheckingcanbedoneinmillisecondsbytrackingthedependencybetweenrules.Specically,Veriowslicesthenetworkintoequivalenceclassesbuildsaforwardinggraphforeachclass,inasimilarfashiontoLibra.However,withtheexceptionofNetPlumber,allofthesetoolsandalgorithmsassumecentralizedcomput-ing.NetPlumberintroducesaruleclusteringtech-niqueforscalabilty,observingthatruledependenciescanbeseparatedintoseveralrelativelyEachclusterisassignedtoaprocesssothatruleupdatescanbehandledindividually.However,thebenetsofparallelismdiminishwhenthenumberofworkersex-ceedsthenumberofnaturalclustersintheruleset.Incontrast,Librascaleslinearlywithbothrulesandsub-nets.Specically,eventworuleshavedependency,Libracanstillplacethemintodifferentmapshards,andallowreducerstoresolvetheconicts.Othernetworktroubleshootingtechniques:networktroubleshootingtoolsfocusonavarietyofnet-workcomponents.Specically,theexplicitlylayeredde-signofSDNfacilitatessystematictroubleshooting[Effortsinformallanguagefoundations[]andmodel-checkingcontrolprograms[]reducetheprobabilityofbuggycontrolplanes.Thisefforthasbeenrecentlyex-tendedtotheembeddedsoftwareonswitches[].How-ever,basedonourexperience,multiplesimultaneouswritersinadynamicenvironmentmakedevelopingabug-freecontrolplaneextremelydifcult.Activetestingtools[]revealtheinconsistencybe-tweentheforwardingtableandtheactualforwardingstatebysendingoutspeciallydesignedprobes.Theycandiscoverruntimepropertiessuchascongestion,packetloss,orfaultyhardware,whichcannotbedetectedbystaticcheckingtools.LibraisorthogonaltothesetoolssincewefocusonforwardingtablecorrectnessResearchershaveproposedsystemstoextractabnor-malitiesfromeventhistories.STS[]extractsminimalcausalsequencesfromcontrolplaneeventhistorytoex-plainaparticularcrashorotherabnormalities.NDB[compilespackethistoriesandreasonsaboutdataplanecorrectness.Thesemethodsavoidtakingastablesnap-shotfromthenetwork.Todaysnetworksrequirewaytoomuchhumanin-terventiontokeepthemworking.Asnetworksgetlargerandlargerthereishugeinterestinautomatingthecontrol,error-reporting,troubleshootinganddebugging.Untilnow,therehasbeennowaytoautomaticallyver-ifyalltheforwardingbehaviorinanetworkwithtensofthousandsofswitches.LibraisfastbecauseitfocusesoncheckingtheIP-onlyfabriccommonlyusedindatacen-ters.Libraisscalablebecauseitcanbeimplementedus-ingMapReduceallowingittoharnesslargenumbersofservers.Inourexperiments,Libracanmeetthebench-markgoalwesetouttoachieve:itcanverifythecor-rectnessofa10,000-nodenetworkin1minuteusing50servers.Infuture,weexpecttoolslikeLibratocheckthecorrectnessofevenlargernetworksinreal-time.Modernlargenetworkshavegonefarbeyondwhathumanoperatorscandebugwiththeirwisdomandin-tuition.Ourexperienceshowsthatitalsogoesbeyondwhatsinglemachinecancomfortablyhandle.WehopethatLibraisjustthebeginningofbringingdistributedcomputingintothenetworkvericationworld.ReferencesencesBoostGraphLibrary.Library.M.Canini,D.Venzano,P.Peresini,D.Kostic,andJ.Rex-ford.ANICEWaytoTestOpenFlowApplications. 12 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation99 K.M.ChandyandL.Lamport.Distributedsnapshots:determiningglobalstatesofdistributedsystems.ACMToCS,1985.1985.T.Condie,N.Conway,P.Alvaro,J.M.Hellerstein,K.Elmeleegy,andR.Sears.MapReduceOnline.Online.J.DeanandS.Ghemawat.MapReduce:SimpliedDataProcessingonLargeClusters.,2004.2004.N.Foster,A.Guha,M.Reitblatt,A.Story,M.Freed-man,N.Katta,C.Monsanto,J.Reich,J.Rexford,C.Schlesinger,D.Walker,andR.Harrison.LanguagesforSoftware-DenedNetworks.IEEECommunicationsMagazine,2013.2013.N.Handigol,B.Heller,V.Jeyakumar,D.Mazieres,andN.McKeown.WhereistheDebuggerformySoftware-DenedNetwork?,2012.2012.B.Heller,C.Scott,N.McKeown,S.Shenker,A.Wund-sam,H.Zeng,S.Whitlock,V.Jeyakumar,N.Handigol,J.McCauley,K.Zaris,andP.Kazemian.LeveragingSDNlayeringtosystematicallytroubleshootnetworks.,2013.2013.P.Kazemian,M.Chang,H.Zeng,G.Varghese,N.McKe-own,andS.Whyte.RealTimeNetworkPolicyCheckingusingHeaderSpaceAnalysis.,2013.2013.P.Kazemian,G.Varghese,andN.McKeown.HeaderSpaceAnalysis:StaticCheckingforNetworks.orks.A.Khurshid,X.Zou,W.Zhou,M.Caesar,andP.B.God-frey.VeriFlow:VerifyingNetwork-WideInvariantsinRealTime.,2013.2013.M.Kuzniar,P.Peresini,M.Canini,D.Venzano,andD.Kostic.ASOFTWayforOpenFlowSwitchInteroper-abilityTesting.,2012.2012.H.Mai,A.Khurshid,R.Agarwal,M.Caesar,P.B.God-frey,andS.T.King.Debuggingthedataplanewithanteater.,2011.2011.K.MarzulloandG.Neiger.DetectionofglobalstateSpringer,1992.1992.D.L.Mills.Internettimesynchronization:thenetworktimeprotocol.IEEETransactionsonCommunicationsCommunicationsTheParallelBoostGraphLibrary..ProtocolBuffers.fers.RouteViews.ws.C.Scott,A.Wundsam,S.Whitlock,A.Or,E.Huang,K.Zaris,andS.Shenker.HowDidWeGetIntoThisMess?IsolatingFault-InducingInputstoSDNControlSoftware.TechnicalReportUCB/EECS-2013-8,2013.2013.N.Spring,R.Mahajan,D.Wetherall,andT.Anderson.MeasuringISPtopologieswithrocketfuel.IEEE/ACMTON,2004.2004.R.Tarjan.Depth-rstsearchandlineargraphalgorithms.12thAnnualSymposiumonSwitchingandAutomataThe-,1971.1971.G.Xie,J.Zhan,D.Maltz,H.Zhang,A.Greenberg,G.Hjalmtysson,andJ.Rexford.OnstaticreachabilityanalysisofIPnetworks.,2005.2005.H.Zeng,P.Kazemian,G.Varghese,andN.McKeown.AutomaticTestPacketGeneration.,2012. 13 Libra: Divide and Conquer to Verify Forwarding Tables in Huge NetworksHongyi Zeng, Stanford University; Shidong Zhang and Fei Ye, Google; Vimalkumar Jeyakumar, Stanford University; Mickey Ju and Junda Liu, Google; Nick McKeown, Stanford University; Amin Vahdat, Google and University of California, San Diegohttps://www.usenix.org/conference/nsdi14/technical-sessions/presentation/zeng This paper is included in the Proceedings of the11th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI 14).April 2–4, 2014 • Seattle, WA, USAISBN 978-1-931971-09-6Open access to the Proceedings of the11th USENIX Symposium onNetworked Systems Design andImplementation (NSDI 14)is sponsored by USENIX