/
USENIX Association 11th USENIX Symposium on Networked Systems Design a USENIX Association 11th USENIX Symposium on Networked Systems Design a

USENIX Association 11th USENIX Symposium on Networked Systems Design a - PDF document

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
424 views
Uploaded On 2016-09-27

USENIX Association 11th USENIX Symposium on Networked Systems Design a - PPT Presentation

LibraDivideandConquertoVerifyForwardingTablesinHugeNetworksHongyiZengShidongZhangFeiYeVimalkumarJeyakumar HongyiZengandVimalkumarJeyakumarwereinternsatGooglewhenthisworkwasdoneHongyiZengiscurrent ID: 470719

Libra:DivideandConquertoVerifyForwardingTablesinHugeNetworksHongyiZeng ShidongZhang FeiYe VimalkumarJeyakumar HongyiZengandVimalkumarJeyakumarwereinternsatGooglewhenthisworkwasdone.HongyiZengiscurrent

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "USENIX Association 11th USENIX Symposium..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation87 Libra:DivideandConquertoVerifyForwardingTablesinHugeNetworksHongyiZeng,ShidongZhang,FeiYe,VimalkumarJeyakumar HongyiZengandVimalkumarJeyakumarwereinternsatGooglewhenthisworkwasdone.HongyiZengiscurrentlywithFacebook.byrareboundaryconditionsareparticularlydifculttond.Commonroutingfailuresincluderoutingloopsandblack-holes(wheretrafctoonepartofthenetworkdis- 1 8811th USENIX Symposium on Networked Systems Design and Implementation USENIX Association Fotwatdipi Gtarh Ptefix A’s Ptefix B’s LOOP! (1) Mar (4) Reduee Ptefix A Ptefix B Ptoeess BoupdatyFigure1:Libradividesthenetworkintomultipleforwardinggraphsinmappingphase,andchecksgraphpropertiesinre-ducingphase. 194016802144 194016801144 U41:194016802144→U11194016801144→U14 U44:194016802144→U11194016801144→U14 U11:194016801144→U41, U44194016802144→DIRECT U14:194016802144→U41, U44194016801144→DIRECT Uubpets U41 U44 U11 U14Figure2:SmallnetworkexamplefordescribingthetypesofforwardingerrorfoundbyLibra.LibrausesMapReduceforverication.Itstartswiththefullgraphofswitches,eachwithitsownprextable.AsdepictedinFigure1,Libracompletesvericationintwophases.Inthephase,itbreaksthegraphintoanumberofslices,oneforeachprex.Thesliceconsistsofonlythoseforwardingrulesusedtoroutepacketstothedestination.Inthereducephase,Libraindepedentlyanalyzeseachslice,representedasaforwardinggraph,inparallelforroutingfailures.WeevaluateLibraontheforwardingtablesfromthreedifferentnetworks.First,“DCN”isanemulateddatacenternetworkwith2millionrulesand10,000switches.Second,“DCN-G”ismadefrom100replicasofDCNconnectedtogether;i.e.,1millionswitches.Third,“INET”isanetworkwith300IPv4routerseachcon-tainsthefullBGPtablewithhalfamillionrules.Theresultsareencouraging.Libratakesoneminutetocheckforloopsandblack-holesinDCN,15minutesforDCN-Gand1.5minutesforINET.ForwardingErrorsAsmalltoynetworkcanillustratethreecommontypesoferrorfoundinforwardingtables.Inthetwo-leveltreenetworkinFigure2twotop-of-rack(ToR)switches(S11,S12)areconnectedtotwospineswitches(S21, U11 U41 U44 U14 (a) Notmal U11 U41 U44 U14 (b) Loors U11 U41 U44 U14 (e) Blaekhole U11 U41 U44 U14 (d) Ipeotteet Uparshot Figure3:Forwardinggraphsfor192.168.0/24asinFigure2andpotentialabnormalities.S22).ThedownlinksfromS11andS12connecttoupto254serversonthesame/24subnet.Thegureshowsa“correct”setofforwardingtables.Notethatourex-amplenetworkusesmultipathrouting.PacketsarrivingatS12ontherightanddestinedtosubnet192.168.0/24ontheleftareload-balancedoverswitchesS21andS22.Ourtoynetworkhas8rules,and2subnets.forwardinggraphisadirectedgraphthatdenesthenetworkbehaviorforeachsubnet.Itcontainsalistof(local switch,remote switch)pairs.Forexample,inFigure3(a),anarrowfromS12toS21meansthepack-etsofsubnet192.168.0/24canbeforwardedfromS12toS21.Multipathroutingcanberepresentedbyanodethathasmorethanoneoutgoingedge.Figure3illustratesthreetypesofforwardingerrorinoursimplenetwork,depictedinforwardinggraphs.Figure3(b)showshowanerrorinS11’sfor-wardingtablescausesa.Insteadofforwarding192.168.0/24downtotheservers,S11forwardspacketsup,i.e.,toS21andS22.S11’sforwardingtableisnow:Thenetworkhastwoloops:S21-S11-S21andS22-S11-S22,andpacketsaddressedto192.168.0/24willneverreachtheirdestination.Figure3(c)showswhathap-pensifS22losesoneofitsforwardingentries:.Inthiscase,ifS12spreadspacketsdestinedto192.168.0/24overbothS21andS22,packetsarrivingtoS22willbedropped.IncorrectSnapshotFigure3(d)showsasubtleprob-lemthatcanleadtofalsepositiveswhenverifyingfor-wardingtables.SupposethelinkbetweenS11-S22goesdown.Twoeventstakeplace(shownasdashedarrows 2 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation89 Figure4:Routingrelatedticketsbymonthandtype.inthegure):1:S22deletes2:S12stopsforwardingpacketstoS22.Becauseoftheasynchronousnatureofroutingupdates,thetwoeventscouldtakeplaceineitherorderAsnapshotmaycaptureoneevent,butnottheother,ormightdetectthemhappeninginthereverseorder.ThesequencecreatesatemporaryblackholeasinFigure3(c),whereasthedesiredsequencedoesnot.Toavoidraisinganunnecessaryalarm(byeventhoughitdidnothappen),ormissinganerroraltogether(byincorrectlyassumingthathappened),Libramustdetectthecorrectstateofthenetwork.Real-worldFailureExamplesTounderstandhowoftenforwardingerrorstakeplace,weexaminedalogof“bugtickets”from14monthsofoperationinalargeGoogledatacenter.Figure4gorizes35ticketsformissingforwardingentries,11forloops,and11forblack-holes.Onaverage,fourissuesarereportedpermonth.Today,forwardingerrorsaretrackeddownbyhandwhich-giventhesizeofthenetworkandthenumberofentries-oftentakesmanyhours.Andbecausethediagnosisisdoneaftertheerroroccurred,thesequenceofeventscausingtheerrorhasusuallylong-sincedisap-pearedbeforethediagnosisstarts.Thismakesithardtoreproducetheerror.Case1:DetectingLoops.Onetypeofloopiscausedbyprexaggregation.Prexesareaggregatedtocom-pacttheforwardingtables:aclustercanadvertiseaprextoreachalloftheserversconnected“be-low”ittothecore,whichusuallyincludesthead-dressesofserversthathavenotyetbeendeployed.How-ever,packetsdestinedtothesenon-deployedaddresses(e.g.,duetomachinemaintenance)cangetstuckinloops.Thisisbecausebelievesthesepacketsaredes-tinedto,whilelackstheforwardingrulestodigestthesepacketsduetotheincompletedeployment,instead,’sdefaultrulesleadpacketsbacktoThisfailuredoesnotcauseaservicetofail(becausetheservicewilluseotherserversinstead),butitdoesde-gradethenetworkcausingunnecessarycongestion.Inthepast,theseerrorswereignoredbecauseofthepro-hibitivecostofperformingafullclustercheck.Libracannishcheckinginlessaminute,andidentifyandreportthespecicswitchandprexentrythatareatrisk.Case2:DiscoveringBlack-holes.Inoneincident,trafcwasinterruptedtohundredsofservers.Initialin-vestigationshowedthatsomeprexeshadhighpacketlossrate,butpacketsseemedtobediscardedrandomly.Ittookseveraldaystonallyuncovertherootcause:AsubsetofroutinginformationwaslostduringBGPup-datesbetweendomains,likelyduetoabugintheroutingsoftware,leadingtoblack-holes.Librawilldetectmissingforwardingentriesquickly,reducingtheoutagetime.Libra’sstablesnapshotsalsoallowittodisambiguatetemporarystatesduringupdatesfromlong-termback-holes.Case3:IdentifyingInconsistencies.Networkcon-trolrunsacrossseveralinstances,whichmayfailfromtimetotime.Whenasecondarybecomestheprimary,itresultsinaurryofchangestotheforwardingta-bles.Thechangesmaytemporarilyorpermanentlycon-ictwiththepreviousforwardingstate,particularlyifthechangeoveritselffailsbeforecompleting.Thenetworkcanbeleftinaninconsistentstate,leadingtopacketloss,black-holesandloops.LessonsLearnedSimplethingsgowrong:RoutingerrorsoccureveninnetworksusingrelativelysimpleIPforwarding.Theyalsooccurduetormwareupgrades,controllerfailureandsoftwarebugs.Itisessentialtochecktheforwarding,outsidethecontrolsoftware.Multiplemovingparts:Thenetworkconsistsofmul-tipleinteractingsubsystems.Forexample,incase1above,Intra-DCroutingishandledlocally,butroutingisaglobalproperty.Thiscancreateloopsthatarehardtodetectlocallywithinasubsystem.Therearealsomul-tiplenetworkcontrollers.Inconsistentstatemakesithardforthecontrolplanetodetectfailuresonitsown.Scalematters:Largedatacenternetworksusemul-tipathrouting,whichmeanstherearemanyforwardingpathstocheck.Asthenumberofswitches,,growsthenumberofpathsandprextablesgrow,andthecomplex-ityofcheckingallroutesgrowswith.Itisessentialforastaticcheckertoscalelinearlywiththenetwork. 3 9011th USENIX Symposium on Networked Systems Design and Implementation USENIX Association StableSnapshotsItisnoteasytotakeanaccuratesnapshotofthefor-wardingstateofalarge,constantlychangingnetwork.ButifLibrarunsitsstaticchecksonasnapshotofthestatethatneveractuallyoccurred,itwillraisefalsealarmsandmissrealerrors.Wethereforeneedtocap-ture-andcheck-asnapshotoftheglobalforwardingstatethatactuallyexistedatoneinstantintime.WecallstablesnapshotsWhenisthestatestable?Alargenetworkisusuallycontrolledbymultipleroutingprocesseseachresponsi-bleforoneormoreswitches.Eachprocesssendstimes-tampedupdates,whichwecallroutingevents,toadd,modifyanddeleteforwardingentriesintheswitchesitisresponsiblefor.Libramonitorsthestreamofroutingeventstolearntheglobalnetworkstate.Findingthestablestateofaswitchiseasy:eachtableisonlywrittenbyoneroutingprocessusingasingleclock,andalleventsareprocessedinorder.Hence,Libracanreconstructastablestatesimplybyreplayingeventsintimestamporder.Bycontrast,itisnotobvioushowtotakeastablesnapshotofthestatewhendifferentroutingpro-cessesupdatetheirswitchesusingdifferent,unsynchro-nizedclocks.Becausetheclocksaredifferent,andeventsmaybedelayedinthenetwork,simplyreplayingtheeventsintimestampordercanresultinastatethatdidnotactuallyoccurinpractice,leadingtofalsepositivesormissederrors(Section2However,evenifwecannotpreciselysynchronizeclocks,wecanthedifferencebetweenanypairofclockswithhighcondenceusingNTP[].Andwecanboundhowout-of-dateaneventpacketis,byprior-itizingeventpacketsinthenetwork.Thus,everytimes-canbetreatedaslyinginanintervalboundstheuncertaintyofwhentheeventtookTheintervalrepresentsthenotionthatnetworkstatechangesatomicallyatsomeunknowntimeinstantwithintheinterval.Figure5showsanexampleofndingastablesnap-shotinstant.Itiseasytoseethatifnoroutingeventsarerecordedduringa2periodwecanbecondentthatnoroutingchangesactuallytookplace.Therefore,thesnapshotofthecurrentstateisstable(i.e.,accurate).Theorderofanytwopasteventsfromdifferentcessesisirrelevanttothecurrentstate,sincetheyare Notethatastablesnapshotisnotthesameasashot[],whichisonlyonepossiblestateofadistributedsystemthatmightnotactuallyhaveoccurredinpractice.Libraonlyconsidersprocessesthatcandirectlymodifytables.Whilemultiplehigh-levelprotocolscanco-exist(e.g.,OSPFandBGP),thereisusuallyonecommonlow-leveltablemanipulationAPI.Thepositiveandnegativeuncertaintiescanbedifferent,buthereweassumetheyarethesameforsimplicity.Aformalproofcanbefoundin[ Routing Process 1 Routing Process 2Flow of time Libraxxxxxxxx x2 Snapshot instantLibra's reconstruction of the timeline Time instant of Time instant when Figure5:Libra’sreconstructionofthetimelineofroutingevents,takingintoaccountboundedtimestampuncertaintyLibrawaitsfortwicetheuncertaintytoensuretherearenoout-standingevents,whichissufcienttodeducethatroutinghas 100ms 1s 10s 100s 1000s RIBupdatesinter-arrivaltime 90 95 99 Percentiles Figure6:CDFofinter-arrivaltimesofroutingeventsfromalargeproductiondatacenter.Routingeventsareverybursty:over95%ofeventshappenwithin400msofanotherevent.appliedtodifferenttableswithoutinterferingwitheachother(recallthateachtableiscontrolledbyonlyonepro-cess).SoLibraonlyneedstoreplayalleventsintimes-tamporder(toensureeventsforthetableareplayedinorder)toaccuratelyreconstructthecurrentstate.Thisobservationsuggestsasimplewaytocreateasta-blesnapshotbysimplywaitingforaquiet2periodwithnoroutingupdateevents.Feasibility:Theschemeonlyworksiftherearefrequentwindowsofsize2inwhichnoroutingeventstakeplace.Luckily,wefoundthatthesequietperiodshappenfre-quently:weanalyzedadayoflogsfromallroutingpro-cessesinalargeGoogledatacenterwithafewthousandFigure6showstheCDFoftheinter-arrivaltimesforthe28,445routingeventsreportedbytherout-ingprocessesduringtheday.Therstthingtonoticeistheburstiness—over95%ofeventsoccurwithin400msofanotherevent,whichmeanstherearelongperiodswhenthestateisstable.Table1showsthefractionoftimethenetworkisstable,fordifferentvaluesof.Asexpected,largerleadstofewerstablestatesandsmallerpercentageofstabletime.Forexample,whenonly2,137outofall28,445statesarestable.However, 4 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation91 /ms #ofstablestates timeinstablestate/% 0 28,445 100.00 1 16,957 99.97 100 2,137 99.90 1,000 456 99.75 10,000 298 99.60 Table1:Astheuncertaintyinroutingeventtimestamps()in-creases,thenumberofstablestatesdecreases.However,sinceroutingeventsarebursty,thestateisstablemostofthetime.becausetheeventstreamissobursty,theunstablestatesareextremelyshort-lived,occupyingintotalonly0.1%1.5min)oftheentireday.Putanotherway,for99.9%ofthetime,snapshotsarestableandthestaticanalysisresultistrustworthy.Takingstablesnapshots:Thestablesnapshotinstantprovidesareferencepointtoreconstructtheglobalstate.Libra’sstablesnapshotprocessworksasfollows:1)Takeaninitialsnapshotasthecombinationofallswitches’forwardingtables.Atthisstage,eachtablecanberecordedataslightlydifferenttime.2)Subscribetotimestampedeventstreamsfromallroutingprocesses,andapplyeachevent,intheorderoftheirtimestamps,toupdatethestatefrom3)Afterapplying,ifnoeventisreceivedfor2time,declarethecurrentsnapshotstable.Inotherwords,andallpasteventsformastablestatethatactuallyexistedatthistimeinstant.DivideandConquerAfterLibrahastakenastablesnapshotoftheforward-ingstate,itsetsouttostaticallycheckitscorrectness.Givenourgoalofcheckingnetworkswithover10,000switchesandmillionsofforwardingrules,wewillneedtobreakdownthetaskintosmaller,parallelcomputa-tions.Therearetwonaturalwaystoconsiderpartitioningtheproblem:Partitionbasedonswitches:Eachservercouldholdtheforwardingstateforaclusterofswitches,partition-ingthenetworkintoanumberofclusters.Wefoundthisapproachdoesnotscalewellbecausecheckingaforwardingrulemeanscheckingtherulesinmany(orall)partitions-thecomputationisquicklyboggeddownbycommunicationbetweenservers.Also,itishardtobalancethecomputationamongserversbecausesomeswitcheshaveverydifferentnumbersofforwardingrules(e.g.spineandleafswitches).Partitionbasedonsubnets:Eachservercouldholdtheforwardingstatetoreachasetofsubnets.Theservercomputestheforwardinggraphtoreacheachsubnet,thenchecksthegraphforabnormalities.Thedifcultywiththisapproachisthateachservermustholdtheen- U11:194016801144→U41, U44194016802144→DIRECT U14:194016802144→U41, U44194016801144→DIRECT U41:194016802144→U11194016801144→U14 U44:194016802144→U11194016801144→U14 Matehipi1Ulieipi 194016802144:U11→DIRECTU14→U41, U44U41→U11U44→U11 194016801144:U41→DIRECTU11→U41, U44U41→U14U44→U14 Gtarh Comrutipi U11 U41 U44 U14 194016802144 U11 U41 U44 U14 194016801144 Figure7:StepstochecktheroutingcorrectnessinFigure2tiresetofforwardingtablesinmemory,andanyupdatetotheforwardingrulesaffectsallservers.Librapartitionsthenetworkbasedonsubnets,forrea-sonsthatwillbecomeclear.Weobservethattheroutechecker’staskcanbedividedintotwosteps.First,Libraforwardingruleswithsubnets,byndingthesetofforwardingrulesrelevanttoasubnet(i.e.,theyareassociatedifthesubnetisincludedintherule’sprex).Second,Librabuildsaforwardinggraphtoreacheachsubnet,byassemblingallforwardingrulesforthesub-net.Bothstepsareembarrassinglyparallel:matchingisdoneper(subnet,forwardingrule)pair;andeachsub-net’sforwardinggraphcanbeanalyzedindependently.Librathereforeproceedsinthreestepsusingservers:Step1-Matching:Eachserverisinitializedwiththeentirelistofsubnets,andeachserverisassigned1ofallforwardingrules.Theserverconsiderseachfor-wardingruleinturntoseeifitbelongstotheforwardinggraphtoasubnet(i.e.theforwardingruleisaprexofthesubnet).Ifthereisamatch,theserveroutputsthe(subnet,rule)pair.Notethatarulemaymatchmorethanonesubnet.Step2-Slicing:The(subnet,rule)pairsaregroupedbysubnet.Wecalleachgroupa,becauseitcontainsalltherulesandswitchesrelatedtothissubnet.Step3-GraphComputing:Theslicesaredistributedservers.Eachserverconstructsaforwardinggraphbasedontherulescontainedintheslice.Standardgraphalgorithmsareusedtodetectnetworkabnormal-ities,suchasloopsandblack-holes.Figure7showsthestepstocheckthenetworkinure2.Aftertheslicingstage,theforwardingrulesareor-ganizedintotwoslices,correspondingtothetwosubnets192.168.0/24and192.168.1/24.Theforwardinggraphforeachsliceiscalculatedandcheckedinparallel. Otherwise,asubnetwillbefragmentedbyamorespecicrule,leadingtoacomplexforwardinggraph.Seethelastparagraphintion9fordetaileddiscussion. 5 9211th USENIX Symposium on Networked Systems Design and Implementation USENIX Association UDN Copttollets Route DFU Uubpets Rules Uhatd 1 Rules Uhatd 4 Rules Uhatd 3 Rerott fot Uubpet 4 Figure8:Libraworkow.IfaroutingerroroccursandthesecondruleinS11,theloopwillshowupintheforwardinggraphfor192.168.0/24.S11willpointbacktoS21andS22,whichwillbecaughtingraphloopdetectionalgorithm.Ourthree-stepprocessiseasilymappedtoMapRe-duce,whichwedescribeinthenextsection.Libraconsistsoftwomaincomponents:arouteandaMapReduce-basedroutecheckerFigure8showsLibra’sworkow.Theroutedumpertakesstablesnapshotsfromswitchesorcontrollers,andstorestheminadistributedlesystem.Next,thesnapshotisprocessedbyaMapReduce-basedchecker.AquickreviewofMapReduce:MapReduce[]di-videscomputationintotwophases:reduc-.Inthemappingphase,theinputispartitionedintosmall“shards”.Eachofthemisprocessedbyamap-perinparallel.Themapperreadsintheshardlinebylineandoutputsalistof\r\fpairs.Afterthemappingphase,theMapReducesystemshufesputsfromdifferentmappersbysortingbythekey.Aftershufing,eachreducerreceivesa\r \fpair,\r \n\t\r\risalistofallvaluescorrespondingtothe.Thereducerprocessesthislistandoutputsthenalresult.TheMapReducesys-temalsohandlescheckpointingandfailurerecovery.InLibra,thesetofforwardingrulesispartitionedintosmallshardsanddeliveredtomappers.Eachmapperalsotakesafullsetofsubnetstocheck,whichbydefaultcon-tainsallsubnetsinthecluster,butalternativelycanbesubsetsselectedbyuser.Mappersgenerateintermedi-atekeysandvalues,whichareshufedbyMapReduce.Thereducerscompilethevaluesthatbelongtothesamesubnetandgeneratenalreports. 1202020218 12040101134 12010204134 12010201134 12010202132 12010202116 Uubpet No Uubpet 1201020211612010202132< 12010201134Figure9:Findallmatchingsubnetsinthetrie.)isthesmallestmatchingtrienodebiggerthanthe).Hence,itschildrenwithsubnetsmatchtherule.Mappersareresponsibleforslicingnetworksbysubnet.Eachmapperreadsoneforwardingruleatatime.Ifasubnetmatchestherule,themapperout-putsthesubnetprexastheintermediatekey,alongwiththevalue      \f.Thefollowingisanexample( , isomitted):\f\f\fSinceeachmapperonlyseesaportionoftheforward-ingrules,theremaybealongerandmorespecic—butunseen—matchingprexforthesubnetinthesamefor-wardingtable.Wendingthelongestmatchingtothereducers,whichseeallmatchingrules.Mappersarerstinitializedwithafulllistofsubnets,whicharestoredinanin-memorybinarytrieforfastpre-xmatching.Afterinitialization,eachmappertakesashardoftheroutingtable,andmatchestherulesagainstthesubnettrie.Thisprocessisdifferentfromthecon-ventionallongestprexmatching:First,inconventionalpacketmatching,rulesareplacedinatrieandpacketsarematchedonebyone.InLibra,webuildthetriewith.Second,thegoalisdifferent.Inconventionalpacketmatching,onelooksforthelongestmatchingrule.Here,mapperssimplyoutputmatchingsubnetsinthetrie.Here,matchinghasthesamemeaning—thesubnet’sprexmustfullyfallwithintherule’sprex.Weuseatrietoefcientlynd“allmatchingprexes,”bysearchingforthesmallestmatchingtrienode)thatisbiggerorequaltotheruleprex(called 6 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation93 ).Here,“small”and“big”refertothelexico-graphicalorder(notaddressspacesize),whereforeachbitinanIPaddress,wildcardmayormaynotcontainasubnet.Ifexists,weenumerateallitsnon-emptydecedents(includingitself).Otherwise,wede-clarethatthereexistnomatchingsubnetsinthetrie.ure9showsanexample.)isthesmall-estmatchingtrienodebiggerthantherule).Hence,itschildrenwithsubnetsmatchtherule.Proof:Webrieyprovewhythisalgorithmiscorrect.Inanuncompressedtrie,eachbitintheIPaddressisrepresentedbyonelevel,andsothealgorithmiscorrectbydenition:ifthereexistmatchingsubnetsinthetrie,mustexistinthetrieanditsdescendantscontainallmatchingprexes,whichmeansInacompressedtrie,nearbynodesmaybecombined.mayormaynotexistinthetrie.Ifitexists,theprob-lemreducestotheuncompressedtriescenario.Ifnotexistinthetrie,(ifitexists)containsallmatchingsubnetsinitsdescendants.Thisisbecause:Anynodesmallerthandoesnotmatch.Be-causethereisnonodebiggerthanandsmallerthanisnotthesmallestmatchingnode),alsomeans.Asaresult,cannotfallwithinrange.Thisisbecausefortofallwithin,all’snon-wildcardbitsshouldappearin,whichimpliesAnynodebiggerthanthebiggestdescendantsofdoesnotmatch.Otherwise,musthaveacommonancestor,wherebecauseboth,andistheancestorof(anodeisalwayssmallerthanitsdescendants).ThiscontradictstheassumptionthatissmallestmatchingnodeofTimecomplexity:Wecanbreakdownthetimecon-sumedbythemappingphaseintotwoparts.Thetimetoconstructthesubnettrieis,whereisthenumberofsubnets,becauseinsertinganIPprexintoatrietakesconstanttime(lengthofIPaddress).Ifweconsiderasinglethread,ittakestimetomatchrulesagainstthetrie.Sothetotaltimecomplexityis.Ifmapperssharethesametrie,wecanreducethetimeto .Here,weassume.If,onemaywanttoconstructatriewithrulesratherthansubnets(asinconventionallongest-prex-matching).Theoutputsfromthemappingphaseareshufedbyintermediatekeys,whicharethesubnets.Whenshufingnishes,areducerwillreceiveasubnet,alongwithanunorderedsetofvalues,eachcontaining  , \r\f \n,  .Thereducerrstselectsthehigh-estpriorityruleper :Forthesame ,therulewithhigherisse-lected;iftworuleshavethesamepriority,theonewithlarger ischosen.Thereducerthenconstructsadirectedforwardinggraphusingtheselectedrules.Oncethegraphisconstructed,thereducerusesgraphlibrarytoverifythepropertiesofthegraph,forexample,tocheckifthegraphisloop-free.Timecomplexity:Inmostnetworkswehaveseen,asubnetmatchesatmost2-4rulesintheroutingtable.Hence,selectingthehighestpriorityruleandconstruct-ingthegraphtakestime,whereisthenumberofphysicallinksinthenetwork.However,thetotalrun-timedependsonthevericationtask,aswewilldiscussSection6IncrementalUpdatesUntilnow,wehaveassumedLibracheckstheforward-ingcorrectnessfromscratcheachtimeitruns.Libraalsosupportsincrementalupdatesofsubnetsandforwardingrules,allowingittobeusedasanindependent“correct-nesscheckingservice”similartoNetPlumber[]andVeriow[].Inthisway,Libracouldbeusedtocheckforwardingrulesquickly,beforetheyareaddedtotheforwardingtablesintheswitches.Here,ain-memory,“streaming”MapReduceruntime(suchas[])isneededtospeeduptheeventprocessing.Subnetupdates.Eachtimeweaddasubnetforveri-cation,weneedtorerunthewholeMapReducepipeline.Themapperstakes timetondtherelevantrules.Andasinglereducertakestimetoconstructthedi-rectedgraphsliceforthenewsubnet.Ifonehasseveralsubnetstoadd,itisfastertoruntheminabatch,whichtakes insteadof tomap.Removingsubnetsistrivial.Allresultsrelatedtothesubnetsaresimplydiscarded.Forwardingruleupdates.Figure10showsthework-owtoaddnewforwardingrules.Tosupportincremen-talupdatesofrules,reducersneedtostoretheforward-inggraphforeachsliceitisresponsiblefor.Thereducercouldkeepthegraphinmemoryordisk—thetrade-offisalargermemoryfootprint.Ifthegraphsareindisk,axednumberofidlereducerprocessesliveinthemem-oryandfetchgraphsuponrequest.Similarly,themap-persneedtokeepthesubnettrie.Toaddarule,amapperisspawnedjustasitseesan-otherlineofinput(Step1).Matchingsubnetsfromthetrieareshufedtomultiplereducers(Step2).Eachre-ducerreadsthepreviousslicegraph(Step3),andrecal-culatesitwiththenewrule(Step4).Deletingaruleissimilar.Themappertagstheruleas“tobedeleted”andpassittoreducersforupdatingthe Atanytimeinstance,onlyasmallfractionofgraphswillbeup-dated,andsokeepingallstatesin-memorycanbequiteinefcient. 7 9411th USENIX Symposium on Networked Systems Design and Implementation USENIX Association Request(s) Uubpets Figure10:IncrementalruleupdatesinLibra.Mappersdis-patchmatchingsubnet,rulepairtoreducers,indexedbysubnet.Reducersupdatetheforwardinggraphandrecomputegraphproperties.slicegraph.However,inthegraph’sadjacencylist,thereducernotonlyneedstostorethehighestpriorityrule,butalsomatchingrules.Thisisbecauseifahighestpriorityruleisdeleted,thereducermustusethesecondhighestpriorityruletoupdatethegraph.Besidesupdatinggraphs,incertaincases,graphprop-ertiescanalsobecheckedincrementally,sincetheupdateonlyaffectsasmallpartofgraph.Forexample,inloop-detection,addinganedgeonlyrequiresaDepth-First-Search(DFS)startingfromthenewedge’sdestinationnode,whichnormallywillnottraversetheentiregraph.UnlikeNetPlumberandVeriow,Libradoesnotneedtoexplicitlyrememberthedependencybetweenrules.Thisisbecausethedependencyisalreadyencodedinthematchingandshufingphases.RouteDumperTheroutedumperrecordseachruleusingveelds: \r\fistheuniqueIDoftheswitch; istheprex.isalistofportnamesbecauseofmultipath.isanintegereldservingasatie-breakerinlongestprexmatching.Bystoringtheegressportsin,Libraencodesthetopologyinformationintheforwardingtable.Althoughtheforwardingtableformatismostlystraightforward,twocasesneedspecialhandling:Ingressportdependentrules.Someforwardingrulesdependonparticularingressports.Forexample,arulemayonlybeineffectforpacketsenteringtheswitchfrom.Inreducerswewanttoconstructasimpledi-rectedgraphthatcanberepresentedbyanadjacencylist.Passingthisingressportdependencytotheroutecheckerwillcomplicatethereducerdesign,sincethenexthopinthegraphdependsnotonlyonthecurrenthop,butalsoprevioushopWeusethenotionoflogicalswitchestosolvethisproblem.First,ifaswitchhasrulesthatdependontheingressport,wesplittheswitchintomultiplelogical VRFaOVERRIDE VRFaDEFAWLT VRFa1 VRFa4 Iptet-VRF VRFaFALLBACK Figure11:VirtualRoutingandForwarding(VRFs)aremul-tipletableswithinthesamephysicalswitch.Thetableshavedependency(inter-VRFrules)betweenthem.switches.Eachlogicalswitchisgivenanewnameandcontainstherulesdependingononeingressport,sothattheportis“owned”bythenewlogicalswitch.Wecopyrulesfromtheoriginalswitchtothelogicalswitch.Sec-ond,weupdatetherulesinupstreamswitchestoforwardtothelogicalswitch.Multipletables.Modernswitchescanhavemulti-pleforwardingtablesthatarechainedtogetherbyar-bitrarymatchingrules,usuallycalled“VirtualRoutingandForwarding”(VRF).Figure11depictsanexam-pleVRFsetup:incomingpacketsarematchedagainst OVERRIDE.Ifnoruleismatched,theyenter 1toVRF 16accordingtosome“triggering”rules.Ifallmatchingfails,thepacketentersVRF DEFAULT.Theroutedumpermapsmultipletablesinaphysicalswitchintomultiplelogicalswitches,eachcontainingoneforwardingtable.Eachlogicalswitchconnectstootherlogicalswitchesdirectly.TheruleschainingtheseVRFsareaddedaslowestpriorityrulesinthelogicalswitch’stable.Hence,ifnoruleismatched,thepacketwillcontinuetothenextlogicalswitchinthechain.UsecasesInLibra,thedirectedgraphconstructedbythereducerdataplaneinformationforaparticularsub-net.Inthisgraph,eachvertexcorrespondstoaforward-ingtablethesubnetmatched,andeachedgerepresentsapossiblelinkthepacketcantraverse.Thisgraphalsoencodesmultipathinformation.Therefore,routingcor-rectnessdirectlycorrespondstographproperties.:Areachabilitycheckensuresthesubnetcanbereachedfromanyswitchinthenetwork.Thispropertycanbeveriedbydoinga(reverse)DFSfromthesubnetswitch,andcheckingiftheresultingvertexsetcontainsallswitchesinthenetwork.Thevericationtakestimewhereisthenumberofswitchesthenumberoflinks.Loopdetection:Aloopinthegraphisequivalenttoatleastonestronglyconnectedcomponentinthedirectedgraph.Twoverticesbelongtoastronglycon-nectedcomponent,ifthereisapathfromapathfrom.Wendstronglyconnectedcom-ponentsusingTarjan’sAlgorithm[]whosetimecom- 8 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation95 plexityis:Aswitchisablack-holeforasubnetiftheswitchdoesnothaveamatchingrouteentryforthesub-net.Someblack-holesarelegitimate:iftheswitchisthelasthopforthesubnet,orthereisanexplicitdroprule.Implicitdroprulesneedtobecheckedifthatisbyde-sign.Black-holesmaptoverticeswithzeroout-degree,whichcanthereforebeenumeratedinWaypointrouting:Networkoperatorsmayrequiretraf-cdestinedtocertainsubnetstogothrougha“way-point,”suchasarewalloramiddlebox.Suchbehaviorcanbeveriedintheforwardinggraphbycheckingifthewaypointexistsonalltheforwardingpaths.Specically,onecanremovethewaypointandtheassociatedlinks,andverifythatnoedgeswitchesappearanymoreinaDFSoriginatedfromthesubnet’srsthopswitch,withtheruntimecomplexityofWehaveimplementedLibraforcheckingthecorrect-nessofSoftware-DenedNetwork(SDN)clusters.Eachclusterisdividedintoseveraldomainswhereeachdo-mainiscontrolledbyacontroller.ControllersexchangeroutinginformationandbuildtheroutingtablesforeachOurLibraprototypehastwosoftwarecomponents.Theroutedumper,implementedinPython,connectstoeachcontrolleranddownloadsroutingevents,forward-ingtablesandVRFcongurationsinProtocolBuffersfers17]formatinparallel.Italsoconsultsthetopologydatabasetoidentifythepeerofeachswitchlink.Oncetheroutinginformationisdownloaded,wepreprocessthedataasdescribedinSection5.4andstoreitinadis-tributedlesystem.TheroutecheckerisimplementedinC++asaMapRe-duceapplicationinabout500linesofcode.WeuseaTrielibraryforstoringsubnets,anduseBoostGraphLi-brary[]forallgraphcomputation.Thesamebinarycanrunatdifferentlevelsofparallelism—onasinglema-chinewithmultipleprocesses,oronaclusterwithmul-tiplemachines,simplybychangingcommandlineags.AlthoughLibra’sdesignsupportsincrementalup-dates,ourcurrentprototypeonlydoesbatchprocessing.Weusemicro-benchmarkstoevaluatethespeciccostsforincrementalprocessinginSection8.5,onasimpliedprototypewithonemapperandonereducer.EvaluationToevaluateLibra’sperformance,werstmeasurestart-to-nishruntimeonasinglemachinewithmulti-threading,aswellasonmultiplemachinesinacluster.WealsodemonstrateLibra’slinearscalabilityaswellasitsincrementalupdatecapability. Dataset Switches Rules Subnets DCN 11,260 2,657,422 11,136 DCN-G 1,126,001 265,742,626 1,113,600 INET 316 151,649,486 482,966 Table2:DatasetsusedforevaluatingLibra.DataSetsWeusethreedatasetstoevaluatetheperformanceofLibra.ThedetailedstatisticsareshowninTable2:DCNisanSDNtestbedusedtoevaluatethescal-abilityofthecontrollersoftware.SwitchesareemulatedbyOpenFlowagentsrunningoncommoditymachinesandconnectedtogetherthroughavirtualnetworkfabric.Thenetworkispartitionedamongcontrollers,whichex-changeroutinginformationtocomputetheforwardingstateforswitchesintheirpartition.DCNcontainsabout10thousandswitchesand2.6millionIPv4forwardingrules.VRFisusedthroughoutthenetwork.:TostresstestLibra,wereplicateDCN100timesbyshiftingtheaddressspaceinDCNsuchthateachDCN-parthasauniqueIPaddressspace.Asin-gletop-levelswitchinterconnectsalltheDCNpiecesto-gether.DCN-Ghas1millionswitchesand265millionforwardingrules.:INETisasyntheticwideareabackbonenetwork.First,weusetheSprintnetworktopologydiscoveredbyRocketFuelproject[],whichcontainsroughly300routers.Then,wecreateaninterfaceforeachprexfoundinafullBGPtablefromRouteViews[entriesasofJuly2013),andspreadthemrandomlyanduniformlytoeachrouteras“localprexes.”Finally,wecomputeforwardingtablesusingshortestpathrouting.SingleMachinePerformanceWestartourevaluationofLibrabyrunningloopde-tectionlocallyonadesktopwithIntel6-coreCPUand32GBmemory.Table3summarizestheresults.WehavelearnedseveralaspectsofLibrafromsinglema-chineevaluation:I/Obottlenecks:StandardMapReduceisdisk-based:Inputsarepipedintothesystemfromdisks,whichcancreateI/Obottlenecks.Forexample,onINET,readingfromdisktake15timeslongerthangraphcomputation.OnDCN,theI/Otimeismuchshorterduetothesmallernumberofforwardingrules.Infact,inbothcases,theI/OisthebottleneckandtheCPUisnotfullyutilized.Theruntimeremainsthesamewithorwithoutmapping.Hence,themappingphaseisomittedinTable3Memoryconsumption:InstandardMapReduce,inter-mediateresultsareushedtodiskafterthemappingphasebeforeshufing,whichisveryslowonasinglemachine.Weavoidthisbykeepingallintermediatestates 9 9611th USENIX Symposium on Networked Systems Design and Implementation USENIX Association Threads 1 2 4 6 8 Read/s 13.7 Shufe/s 7.4 Reduce/s 46.3 25.8 15.6 12.1 11.1 Speedup 1.00 1.79 2.96 3.82 4.17 a)DCNwith2000subnets Threads 1 2 4 6 8 Read/s 170 Shufe/s 3.8 Reduce/s 11.3 5.9 3.2 2.7 2.1 Speedup 1.00 1.91 3.53 4.18 5.38 b)INETwith10,000subnetsTable3:RuntimeofloopdetectiononDCNandINETdatasetsonsinglemachine.ThenumberofsubnetsisreducedcomparedTable2sothatallintermediatestatescantinthememory.Readandshufephasesaresingle-threadedduetotheframe-worklimitation. Figure12:Exampleprogresspercentage(inBytes)ofLibraonDCN.Thethreecurvesrepresent(fromlefttoright)Mapping,Shufing,andReducingphases,whicharepartiallyoverlap-ping.Thewholeprocessendsin57seconds.in-memory.However,itlimitsthenumberofsubnetsthatcanbeveriedatatime—intermediateresultsareallmatching(subnet,rule)pairs.Onasinglemachine,wehavetolimitthenumberofsubnetsto2000inDCNand10,000inINETtoavoidrunningoutofmemory.Graphsizedominatesreducingphase:ThereducingphaseonDCNissignicantlyslowerthanonINET,de-spiteINEThaving75timesmoreforwardingrules.Withasinglethread,Libracanonlyprocess43.2subnetspersecondonDCN,comparedwith885.0subnetspersec-ondonINET(20.5timesfaster).NotethatDCNhas35.6timesmorenodes.ThisexplainsthefasterrunningtimeonINET,sincethetimetodetectloopsgrowslinearlywiththenumberofedgesandnodesinthegraph.Multi-thread:Libraismulti-threaded,butthemulti-threadspeedupisnotlinear.Forexample,onDCN,using8threadsonlyresultedina4.17speedup.Thiseffectislikelyduetoinefcienciesinthethreadingimplementa-tionintheunderlyingMapReduceframework,althoughtheoretically,allreducerthreadsshouldruninparallelwithoutstatesharing. DCN DCN-G INET Machines 50 20,000 50 MapInput/Byte 844M 52.41G 12.04G ShufeInput/Byte 1.61G 16.95T 5.72G ReduceInput/Byte 15.65G 132T 15.71G MapTime/s 31 258 76.8 ShufeTime/s 32 768 76.2 ReduceTime/s 25 672 16 TotalTime/s 57 906 93 Table4:Runningtimesummaryofthethreedatasets.Shufeinputiscompressed,whilemapandreduceinputsareuncom-pressed.DCN-Gresultsareextrapolatedfromprocessing1%ofsubnetswith200machinesasasinglejob.WeuseLibratocheckforloopsagainstourthreedatasetsonacomputingcluster.Table4summarizesthere-sults.Libraspends57secondsonDCN,15minutesonDCN-G,and93secondsonINET.Toavoidoverloadingthecluster,theDCN-Gresultisextrapolatedfromtheruntimeof1%ofDCN-Gsubnetswith200machines.Weassume100suchjobsrunninginparallel—eachjobprocesses1%ofsubnetsagainstallrules.Allthejobsofeachother.Wemakethefollowingobservationsfromourcluster-basedevaluation.Runtimeindifferentphases:Inalldatasets,thesumoftheruntimeinthephasesislargerthanthestart-to-endruntime.Thisisbecausethephasescanoverlapeachother.Thereisnodependencybetweendifferentmap-pingshards.Ashardthatnishesthemappingphasecanentertheshufingphasewithoutwaitingforothershards.However,thereisaglobalbarrierbetweenmap-pingandreducingphasessinceMapReducerequiresareducertoreceiveallintermediatevaluesbeforestart-ing.Hence,thesumofruntimeofmappingandreducingphasesroughlyequalsthetotalruntime.Table4showstheoverlappingprogress(inbytes)ofallthreephasesinananalysisofDCN.Shared-clusteroverhead:ThesenumbersarealowerboundofwhatLibracanachievefortworeasons:First,theclusterissharedwithotherprocessesandlacksper-formanceisolation.Inallexperiments,Librauses8threads.However,theCPUutilizationisbetween100%and350%on12-coremachines,whereasitcanachieve800%onadedicatedmachine.Second,themachinesstartprocessingatdifferenttimes—eachmachinemayneeddifferenttimesforinitialization.Hence,allthema-chinesarenotrunningatfull-speedfromthestart.ParallelismamortizesI/Ooverhead:Throughdetailedcounters,wefoundthatunlikeinthesinglemachinecase(whereI/Oisthebottleneck),themappingandreducing USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation97 Figure13:Libraruntimeincreaseslinearlywithnetworksize.timedominatesthetotalruntime.Wehaveseen75%–80%oftimespentonmapping/reducing.ThisisbecausetheaggregatedI/Obandwidthofallmachinesinaclusterismuchhigherthanasinglemachine.TheI/Oisfasterthanthecomputationthroughput,whichmeansthreadswillnotstarve.LinearscalabilityFigure13showshowLibrascaleswiththesizeofthenetwork.WechangethenumberofdevicesintheDCNnetwork,effectivelychangingboththesizeoftheforwardinggraphandthenumberofrules.Wedonotchangethenumberofsubnets.Ourexperimentsruntheloopdetectionon50machines,asintheprevioussection.ThegureshowsthattheLibraruntimescaleslinearlywiththenumberofrules.Thereducephasegrowsmoreerraticallythanthemappingtime,becauseitisaffectedbybothnodesandedgesinthenetwork,whilemappingonlydependsonthenumberofrules.Libra’sruntimeisnotnecessarilyinverselypropor-tionaltothenumberofmachinesused.Thelinearscal-abilityonlyapplieswhenmappingandreducingtimedominate.Infact,weobservethatmoremachinescantakelongertonishajob,becausetheoverheadoftheMapReducesystemcanslowdownLibra.Forexample,ifwehavemoremapping/reducingshards,weneedtospendanadditionaloverheadondiskshufing.WeomitthedetaileddiscussionasitdependsonthespecicsoftheunderlyingMapReduceframework.IncrementalUpdatesLibracanupdateforwardinggraphsincrementallyasweaddanddeleterulesinthenetwork,asshownintion5.3.Tounderstanditsperformance,wecanbreak-downLibra’sincrementalcomputationintotwosteps:(1)timespentinprexmatching(mapphase)tondwhichsubnetsareaffected,and(2)timetodoanincre-mentalDFSstartingfromthenodewhoseroutingentrieshavechanged(reducephase).Wealsoreportthetotal Map( Reduce(ms) Memory(MB) DCN 0.133 0.62 12 DCN-G 0.156 1.76 412 INET 0.158 0.01 7 Table5:Breakdownofruntimeforincrementalloopchecks.Theunitformapphaseismicrosecondandtheunitforreducephaseismillisecond.heapmemoryallocated.Wemeasuredthetimeforeachofthecomponentsasfollows:(1)forprexmatching,werandomlyselectrulesandndoutallmatchingsubnetsusingthealgo-rithmdescribedinSection5.1,and(2)forincrementalDFS,westartedanewDFSfromrandomlychosennodesinthegraph.Bothresultsareaveragedacross1000tests.TheresultsareshowninTable5First,weveriedthatnomatterhowlargethesub-nettrieis,prexmatchingtakesalmostconstanttime:DCN-G’ssubnettrieis100timeslargerthanDCN-G’sbuttakesalmostthesametime.Second,theresultsalsoshowthattheruntimeforincrementalDFSislikelytobedominatedbyI/Oratherthancompute,becausethesizeoftheforwardinggraphdoesnotexceedthesizeofthephysicalnetwork.Eventhelargestdataset,DCN-G,hasonlyaboutamillionnodesand10millionedges,whichtsinto412MBytesofmemory.Thismillisecondrun-timeiscomparabletoresultsreportedin[]and[butnowonmuchbiggernetworks.LimitationsofLibraLibraisdesignedforstaticheaders:Libraisfasterandmorescalablethanexistingtoolsbecauseitsolvesanar-rowerproblem;itassumespacketsareonlyforwardedbasedonIPprexes,andthatheadersarenotmodiedalongtheway.Unlike,sayHSA,Libracannotprocessagraphthatforwardsonanarbitrarymixofheaders,sinceitisnotobvioushowtocarrymatchinginformationfrommapperstoreducers,orhowtopartitiontheproblem.Aswithotherstaticcheckers,Libracannothandlenon-deterministicnetworkbehaviorordynamicforward-ingstate(e.g.,NAT).Itrequiresacomplete,staticsnap-shotofforwardingstatetoverifycorrectness.Moreover,Libracannottellaforwardingstateisincorrectorhowitwillevolveasitdoesnotinterpretcontrollogic.LibraisdesignedtoslicethenetworkbyIPsubnet:Ifheadersaretransformedinadeterministicway(e.g.,staticNATandIPtunnels),Libracanbeextendedbycombiningresultsfrommultipleforwardinggraphsattheend.Forexample,192.168.0/24intheIntra-DCnet-workmaybetranslatedto10.0.0/24intheInter-DCnet-work.Libracanconstructforwardinggraphsforboth192.168.0/24and10.0.0/24.Whenanalyzingthetwo 11 9811th USENIX Symposium on Networked Systems Design and Implementation USENIX Association subgraphswecanaddanedgetoconnectthem.Forwardinggraphtoobigforasingleserver:scaleslinearlywithbothsubnetsandrules.However,asinglereducerstillcomputestheentireforwardinggraph,whichmightstillbetoolargeforasingleserver.Sincethereducespeeddependsonthesizeofthegraph,wecouldusedistributedgraphlibraries[]inthereducephasetoaccelerateLibra.Subnetsmustbecontainedbyaforwardingrule:ordertobreakthenetworkintooneforwardinggraphpersubnet,Libraexaminesalltheforwardingrulestodecidewhichrulesthesubnet.Thisisapracticalas-sumptionbecause,inmostnetworks,theruleisaprexaggregatesmanysubnets.However,iftherulehasalonger,morespecicprex(e.g.,itisforroutingtoaspecicend-hostorrouterconsole)thanthesubnet’s,theforwardinggraphwouldbecomplicatedsincetherule,representedasanedgeinthegraph,doesnotapplytoalladdressesofthesubnet.Inthiscase,onecanuseVeri-ow[]’snotionofequivalenceclassestoacquiresub-netsdirectlyfromtherulesthemselves.Thistechniquemayserveasanalternativewaytondallmatching(sub-net,rule)pairs.Weleavethisforfuturework.RelatedWorkStaticdataplanecheckers:Xieet.alintroducedal-gorithmstoanalyzereachabilityinIPnetworks[Anteater[]makesthempracticalbyconvertingthecheckingproblemintoaBooleansatisabilityprob-lemandsolvingitwithSATsolvers.Headerspaceanalysis[]tacklesgeneralprotocol-independentstaticcheckingusingageometricmodelandfunctionalsimu-lation.Recently,NetPlumber[]andVeriow[]showthat,forsmallnetworks(comparedtotheoneswecon-siderhere)staticcheckingcanbedoneinmillisecondsbytrackingthedependencybetweenrules.Specically,Veriowslicesthenetworkintoequivalenceclassesbuildsaforwardinggraphforeachclass,inasimilarfashiontoLibra.However,withtheexceptionofNetPlumber,allofthesetoolsandalgorithmsassumecentralizedcomput-ing.NetPlumberintroducesa“ruleclustering”tech-niqueforscalabilty,observingthatruledependenciescanbeseparatedintoseveralrelativelyEachclusterisassignedtoaprocesssothatruleupdatescanbehandledindividually.However,thebenetsofparallelismdiminishwhenthenumberofworkersex-ceedsthenumberofnaturalclustersintheruleset.Incontrast,Librascaleslinearlywithbothrulesandsub-nets.Specically,eventworuleshavedependency,Libracanstillplacethemintodifferentmapshards,andallowreducerstoresolvetheconicts.Othernetworktroubleshootingtechniques:networktroubleshootingtoolsfocusonavarietyofnet-workcomponents.Specically,theexplicitlylayeredde-signofSDNfacilitatessystematictroubleshooting[Effortsinformallanguagefoundations[]andmodel-checkingcontrolprograms[]reducetheprobabilityofbuggycontrolplanes.Thisefforthasbeenrecentlyex-tendedtotheembeddedsoftwareonswitches[].How-ever,basedonourexperience,multiplesimultaneouswritersinadynamicenvironmentmakedevelopingabug-freecontrolplaneextremelydifcult.Activetestingtools[]revealtheinconsistencybe-tweentheforwardingtableandtheactualforwardingstatebysendingoutspeciallydesignedprobes.Theycandiscoverruntimepropertiessuchascongestion,packetloss,orfaultyhardware,whichcannotbedetectedbystaticcheckingtools.LibraisorthogonaltothesetoolssincewefocusonforwardingtablecorrectnessResearchershaveproposedsystemstoextractabnor-malitiesfromeventhistories.STS[]extracts“minimalcausalsequences”fromcontrolplaneeventhistorytoex-plainaparticularcrashorotherabnormalities.NDB[compilespackethistoriesandreasonsaboutdataplanecorrectness.Thesemethodsavoidtakingastablesnap-shotfromthenetwork.Today’snetworksrequirewaytoomuchhumanin-terventiontokeepthemworking.Asnetworksgetlargerandlargerthereishugeinterestinautomatingthecontrol,error-reporting,troubleshootinganddebugging.Untilnow,therehasbeennowaytoautomaticallyver-ifyalltheforwardingbehaviorinanetworkwithtensofthousandsofswitches.LibraisfastbecauseitfocusesoncheckingtheIP-onlyfabriccommonlyusedindatacen-ters.Libraisscalablebecauseitcanbeimplementedus-ingMapReduceallowingittoharnesslargenumbersofservers.Inourexperiments,Libracanmeetthebench-markgoalwesetouttoachieve:itcanverifythecor-rectnessofa10,000-nodenetworkin1minuteusing50servers.Infuture,weexpecttoolslikeLibratocheckthecorrectnessofevenlargernetworksinreal-time.Modernlargenetworkshavegonefarbeyondwhathumanoperatorscandebugwiththeirwisdomandin-tuition.Ourexperienceshowsthatitalsogoesbeyondwhatsinglemachinecancomfortablyhandle.WehopethatLibraisjustthebeginningofbringingdistributedcomputingintothenetworkvericationworld.ReferencesencesBoostGraphLibrary.Library.M.Canini,D.Venzano,P.Peresini,D.Kostic,andJ.Rex-ford.ANICEWaytoTestOpenFlowApplications. 12 USENIX Association 11th USENIX Symposium on Networked Systems Design and Implementation99 K.M.ChandyandL.Lamport.Distributedsnapshots:determiningglobalstatesofdistributedsystems.ACMToCS,1985.1985.T.Condie,N.Conway,P.Alvaro,J.M.Hellerstein,K.Elmeleegy,andR.Sears.MapReduceOnline.Online.J.DeanandS.Ghemawat.MapReduce:SimpliedDataProcessingonLargeClusters.,2004.2004.N.Foster,A.Guha,M.Reitblatt,A.Story,M.Freed-man,N.Katta,C.Monsanto,J.Reich,J.Rexford,C.Schlesinger,D.Walker,andR.Harrison.LanguagesforSoftware-DenedNetworks.IEEECommunicationsMagazine,2013.2013.N.Handigol,B.Heller,V.Jeyakumar,D.Mazieres,andN.McKeown.WhereistheDebuggerformySoftware-DenedNetwork?,2012.2012.B.Heller,C.Scott,N.McKeown,S.Shenker,A.Wund-sam,H.Zeng,S.Whitlock,V.Jeyakumar,N.Handigol,J.McCauley,K.Zaris,andP.Kazemian.LeveragingSDNlayeringtosystematicallytroubleshootnetworks.,2013.2013.P.Kazemian,M.Chang,H.Zeng,G.Varghese,N.McKe-own,andS.Whyte.RealTimeNetworkPolicyCheckingusingHeaderSpaceAnalysis.,2013.2013.P.Kazemian,G.Varghese,andN.McKeown.HeaderSpaceAnalysis:StaticCheckingforNetworks.orks.A.Khurshid,X.Zou,W.Zhou,M.Caesar,andP.B.God-frey.VeriFlow:VerifyingNetwork-WideInvariantsinRealTime.,2013.2013.M.Kuzniar,P.Peresini,M.Canini,D.Venzano,andD.Kostic.ASOFTWayforOpenFlowSwitchInteroper-abilityTesting.,2012.2012.H.Mai,A.Khurshid,R.Agarwal,M.Caesar,P.B.God-frey,andS.T.King.Debuggingthedataplanewithanteater.,2011.2011.K.MarzulloandG.Neiger.DetectionofglobalstateSpringer,1992.1992.D.L.Mills.Internettimesynchronization:thenetworktimeprotocol.IEEETransactionsonCommunicationsCommunicationsTheParallelBoostGraphLibrary..ProtocolBuffers.fers.RouteViews.ws.C.Scott,A.Wundsam,S.Whitlock,A.Or,E.Huang,K.Zaris,andS.Shenker.HowDidWeGetIntoThisMess?IsolatingFault-InducingInputstoSDNControlSoftware.TechnicalReportUCB/EECS-2013-8,2013.2013.N.Spring,R.Mahajan,D.Wetherall,andT.Anderson.MeasuringISPtopologieswithrocketfuel.IEEE/ACMTON,2004.2004.R.Tarjan.Depth-rstsearchandlineargraphalgorithms.12thAnnualSymposiumonSwitchingandAutomataThe-,1971.1971.G.Xie,J.Zhan,D.Maltz,H.Zhang,A.Greenberg,G.Hjalmtysson,andJ.Rexford.OnstaticreachabilityanalysisofIPnetworks.,2005.2005.H.Zeng,P.Kazemian,G.Varghese,andN.McKeown.AutomaticTestPacketGeneration.,2012. 13 Libra: Divide and Conquer to Verify Forwarding Tables in Huge NetworksHongyi Zeng, Stanford University; Shidong Zhang and Fei Ye, Google; Vimalkumar Jeyakumar, Stanford University; Mickey Ju and Junda Liu, Google; Nick McKeown, Stanford University; Amin Vahdat, Google and University of California, San Diegohttps://www.usenix.org/conference/nsdi14/technical-sessions/presentation/zeng This paper is included in the Proceedings of the11th USENIX Symposium on Networked SystemsDesign and Implementation (NSDI ’14).April 2–4, 2014 • Seattle, WA, USAISBN 978-1-931971-09-6Open access to the Proceedings of the11th USENIX Symposium onNetworked Systems Design andImplementation (NSDI ’14)is sponsored by USENIX