butdequeueselementsfromtheheadElementswithalowerrankaredequeuedrstiftwoelementshavethesamevaluetheelementenqueuedearlierisdequeuedrst2ThecomputationofanelementsrankbeforeitisenqueuedintoaPIFOWemodel ID: 864217
Download Pdf The PPT/PDF document "lastfinishfpstartplengthfweightprankpsta..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1 last_finish[f]=p.start+p.length/f.weight
last_finish[f]=p.start+p.length/f.weightp.rank=p.start ,butdequeueselementsfromthehead.Elementswithalowerrankarede-queuedÞrst;iftwoelementshavethesamevalue,theelementenqueuedearlierisdequeuedÞrst.2.ThecomputationofanelementÕsrankbeforeitisen-queuedintoaPIFO.Wemodelthiscomputationasapackettransaction[37],anatomicallyexecutedblockofcodethatisexecutedonceforeachelementbeforeenqueuingitinaPIFO. p.rank=p.start schedulingtransactionbeforeenqueuinganelementintothatnodeÕsPIFO;thiselementiseitherapacketorareferencetoachildPIFOofthenode.Second,aschedulingtransactionthatspeciÞeshowtherankiscomputedforelements(packetorPIFOreferences)thatareenqueuedintothenodeÕsPIFO.Figure2bshowsanexampleforHPFQ.Whenapacketisenqueuedintoaschedulingtree,itex-ecut
2 esonetransactionateachnodewhosepacketpre
esonetransactionateachnodewhosepacketpredicatematchesthearrivingpacket.Thesenodesformapathfrom shapingPIFO severalmoreexamples(¤3.1through¤3.4)andalsodescribethelimitationsofourprogrammingmodel(¤3.5).3.1LeastSlack-TimeFirstLeastSlack-TimeFirst(LSTF)[29,31]schedulespacketsateachswitchinincreasingorderofpacketslacks,i.e.,thetimeremaininguntileachpacketÕsdeadline.Packetslacksareinitializedatanendhostoredgeswitchandaredecre-mentedbythewaittimeateachswitchÕsqueue.WecanprogramLSTFusingasimpleschedulingtransaction:p.rank=p.slack+p.arrival_timeTheadditionofthepacketÕsarrivaltimetotheslackal-readycarriedinthepacketensuresthatpacketsaredequeuedinorderoftheirslackatthetimeofdequeue,notenqueue.Then,afterpacketsaredequeued,wesubtractthetimeatwhi
3 chthepacketisdequeuedfromthepacketÕsslac
chthepacketisdequeuedfromthepacketÕsslack,whichhastheeffectofdecrementingtheslackbythewaittimeattheswitchÕsqueue.Thissubtractioncanbeachievedbypro-grammingtheegresspipelineofaprogrammableswitch[17]todecrementoneheaderÞeldbyanother.3.2Stop-and-GoQueueingif(now=frame_end_time):frame_begin_time=frame_end_timeframe_end_time=frame_begin_time+Tp.rank=frame_end_timeFigure6:ShapingtransactionforStop-and-GoQueueing.Stop-and-GoQueueing[24]isanon-work-conservingal-gorithmthatprovidesboundeddelaystopacketsusingaframingstrategy.Timeisdividedintonon-overlappingframesofequallengthT,whereeverypacketarrivingwithinaframeistransmittedattheendoftheframe,smoothingoutanyburstinessintrafÞcpatternsinducedbyprevioushops.TheshapingtransactioninFigure
4 6speciÞesthescheme.frame_begin_timeandfr
6speciÞesthescheme.frame_begin_timeandframe_end_timearetwostatevari-ablesthattrackthebeginningandendofthecurrentframe Whenapacketisenqueued,weexecuteaFIFOschedul-ingtransactionatitsleafnode,settingitsranktothewall-clocktimeonarrival.Attheroot,aPIFOreference(thepacketÕsßowidentiÞer)ispushedintotherootPIFOusingarankthatreßectswhethertheßowisaboveorbelowitsratelimitafterthearrivalofthecurrentpacket.Todeter-minethis,weruntheschedulingtransactioninFigure7thatusesatokenbucket(thestatevariabletb)thatcanbeÞlledupuntilBURST_SIZEtodecideifthearrivingpacketputstheßowaboveorbelowmin_rate.NotethatasinglePIFOnodewiththeschedulingtransac-tioninFigure7isnotsufÞcient.Itcausespacketreorderingwithinaßow:anarrivingpacketcancauseaßowtomovefromalowertoah
5 igherpriorityand,intheprocess,leavebefor
igherpriorityand,intheprocess,leavebeforelowprioritypacketsfromthesameßowthatarrivedearlier.Thetwo-leveltreesolvesthisproblembyattachingprioritiestotransmissionopportunitiesforaspeciÞcßow,notspeciÞcpackets.Nowifanarrivingpacketcausesaßowtomovefromlowtohighpriority,thenextpacketscheduledfromthisßowistheearliestpacketofthatßowchoseninFIFOorder,notthearrivingpacket.3.4OtherexamplesWenowbrießydescribeseveralmoreschedulingalgo-rithmsthatcanbeprogrammedusingPIFOs.1.Fine-grainedpriorityscheduling.ManyalgorithmsschedulethepacketwiththelowestvalueofaÞeldini-tializedbytheendhost.Thesealgorithmscanbepro-grammedbysettingthepacketÕsranktotheappropri-ateÞeld.ExamplesofsuchalgorithmsandtheÞeldstheyusearestrictpriorityscheduling(IPTOSÞeld),Shortest
6 FlowFirst(ßowsize),ShortestRemainingProc
FlowFirst(ßowsize),ShortestRemainingProcessingTime(remainingßowsize),LeastAttainedService(bytesreceivedforaßow),andEarliestDead- schedulerbasedonPIFOs.Wetargetshared-memoryswitchessuchasBroadcomÕsTridentII[3](Figure8).Intheseswitches,aparserfeedspacketsfromallportsinto schedulingtreeandmostpracticalhierarchicalschedulingalgorithmsweknowofdonotrequiremorethanafewlevels erationsoneachPIFO.Figures11aand12ashowthistreeforFigures2and4respectively.ItthenoverlaysthistreeoveraPIFOmeshbyassigningeverylevelofthetreetoaPIFOblockandconÞguringthelookuptablestoconnectPIFOblocksasrequiredbythetree.Figure11bshowsthePIFOmeshforFigure2,whileFigure12bshowsthePIFOmeshforFigure4.Ifaparticularlevelofthetreehasmorethanoneenqueueordequeuefromanotherlevel,w
7 hicharisesinthepresenceofshapingtransact
hicharisesinthepresenceofshapingtransactions(¤),weallocatenewPIFOblocks .Thiscreatesaconßictbecausetherearetwoenqueueoperationsinthesamecy-cle.Conßictsmayalsooccurondequeues.Forinstance,ifTBF_RightshareditsPIFOblockwithanotherlogicalPIFO,dequeueoperationstothetwologicalPIFOscouldoccuratthesametimebecauseTBF_Rightcanbedequeuedatanyarbitrarywall-clocktime.Inaconßict,onlyoneofthetwooperationscanproceed.WeresolvethisconßictinfavorofschedulingPIFOs.Shap-ingPIFOsareusedforratelimitingtoaratelowerthanthelinerate.Therefore,theycanaffordtobedelayedbyafewclocksuntiltherearenoconßicts.Bycontrast,delayingschedulingdecisionsofaschedulingPIFOwouldmeanthattheswitchwouldidleandnotsatisfyitsline-rateguarantee.Asaresult,shapingPIFOsonlygetbest-effort
8 service.Thereareworkaroundstothis.Oneiso
service.Thereareworkaroundstothis.Oneisoverclockingthepipelineat(say)1.25GHzinsteadof1GHz,providingspareclockcyclesforsuchbest-effortprocessing.AnotheristoprovidemultipleportstoaPIFOblocktosupportmulti-pleoperationseveryclock.Thesetechniquesarecommonlyusedinswitchesforbackgroundtaskssuchasreclaimingbufferspace,andcanbeappliedtothePIFOmeshaswell.5.HARDWAREIMPLEMENTATION isasinglecell.Hence,upto60Kpackets/elementsperPIFOblockcanbespreadoutovermultiplelogicalPIFOs.Basedontheserequirements,ourbaselinedesigntargetsaPIFOblockthatsupports64Kpacketsand1024ßowsthatcanbesharedacross256logicalPIFOs.Further,wetargeta6Packetsinashared-memoryswitchareallocatedinsmallunitscalledcells.Figure13:BlockdiagramofPIFOblockwithaßowsched-ulerandarankstore.
9 LogicalPIFOsandmetadataarenotshownforsim
LogicalPIFOsandmetadataarenotshownforsimplicity.16-bitrankÞeldanda32-bitmetadataÞeld(e.g.,p.lengthinFigure1)forourPIFOblock.Weput5suchblocksto-getherintoa5-blockPIFOmeshthatcansupportupto5levelsofhierarchyinaschedulingalgorithmÑsufÞcientformostpracticalhierarchicalschedulersweknowof.5.2AsinglePIFOblock isonneedsonecomparatorcircuit,andsupporting64Koftheseisinfeasible.Atthesametime,nearlyallpracticalschedulingalgo-rithmsgrouppacketsintoßowsorclasses,7e.g.,basedontrafÞctype,ports,oraddresses.TheythenscheduleaßowÕspacketsinFIFOorderbecausepacketranksincreaseacrossaßowÕsconsecutivepackets.Thismotivatesadesignwithtwoparts(Figure13):1.Aßowschedulerthatpickstheelementtodequeuebasedontherankofthehead(earliest)elementsofeachßow.Theßowschedul
10 eriseffectivelyaPIFOconsistingoftheheade
eriseffectivelyaPIFOconsistingoftheheadelementsofallßows.2.Arankstore,aFIFObankthatstorestheranksofele-mentsbeyondtheheadforeachßowinFIFOorder. mateof200mm2providedbyGibbetal.[23].Inreturnforthis3.7%,wegetasigniÞcantlymoreßexiblepacketsched-ulerthancurrentswitches,whichprovideÞxedtwoorthree-levelhierarchicalscheduling.Our3.7%areaoverheadissimilartotheoverheadforotherprogrammableswitchfunc-tions,e.g.,2%forprogrammableparsing[23]and15%forprogrammableheaderprocessing[ 0.1476(fromsynthesis)OnePIFOblock0.224+0.445+0.148+0.148+0.1476=1.11mm25-blockPIFOmesh5.55300atomsspreadoutoverthe5-blockPIFOmeshforrankcomputa-tions6000µm PriorityFlowControl.PriorityFlowControl(PFC)[7]isastandardthatallowsaswitchtosendapausemessagetoanupstreamswitchrequ
11 estingittoceasetransmissionofpacketsbelo
estingittoceasetransmissionofpacketsbelongingtoparticularßows.PFCcanbeintegratedintoourhardwaredesignbymaskingoutcertainßowsintheßowschedulerduringthedequeueoperationiftheyhavebeenpausedbecauseofaPFCpausemessage,andunmask-ingthemwhenaPFCresumemessageisreceived.Multi-pipelineswitches.Thehighestendswitchestoday,suchastheBroadcomTomahawk[],supportaggregateca-pacitiesexceeding3Tbit/sec.Ataminimumpacketsizeof64bytes,thiscorrespondstoanaggregatepacketrateof~6billionpackets/s.Becauseasingleswitchpipeline(Figure8)typicallyrunsat1GHzandprocessesabillionpackets/s,suchswitchesrequiremultipleingressandegresspipelinesthatshareaccesstotheschedulersubsystemalone.Inmulti-pipelineswitches,eachPIFOblockneedstosup-portmultipleenqueueanddequeueoperatio
12 nsperclockcy- stractions[ aseparateP-hea
nsperclockcy- stractions[ aseparateP-heapinstanceforeachport[].Thisper-portdesignincursprohibitiveareaoverheadonashared-memoryswitch,andpreventssharingofthedatabufferandbinaryheapacrossoutputports.Conversely,itisnÕteasytooverlaymultiplelogicalPIFOsoverasingleP-heap,whichwould (WFQ)orminimumßowcompletiontime(SRPT).PastworkhasdemonstratedsigniÞcantperformancebeneÞtsre-sultingfromswitchsupportforßexibleallocation[12,21,32,39].However,thesebeneÞtshaveremainedunrealized, andmakeitmorepredictable.Thatsaid,ourcurrentdesignisonlyaÞrststepandcanbeimprovedinseveralways. [1]Barefoot:TheWorldÕsFastestandMostProgrammableNetworks.https://barefootnetworks.com/media/white_papers/Barefoot-Worlds-Fastest-Most-Programmable-Networks. com/products/physi
13 cal-ip/embedded-memory-ip/sram.php.[9]Sy
cal-ip/embedded-memory-ip/sram.php.[9]SystemVerilog.https://en.wikipedia.org/wiki/SystemVerilog.[10]TokenBucket.https://en.wikipedia.org/wiki/Token_bucket ,17(6):1030Ð1039,Jun1999.[20]A.Demers,S.Keshav,andS.Shenker.AnalysisandSimulationofaFairQueueingAlgorithm.InSIGCOMM,1989.[21]F.R.Dogar,T.Karagiannis,H.Ballani,andA.Rowstron.DecentralizedTask-awareSchedulingforDataCenterNetworks.InSIGCOMM,2014.[22]S.FloydandV.Jacobson.RandomEarlyDetectionGatewaysforCongestionAvoidance.IEEE/ACMTransactionsonNetworking,1(4):397Ð413,Aug.1993.[23]G.Gibb,G.Varghese,M.Horowitz,andN.McKeown.DesignPrinciplesforPacketParsers.In,2013.[24]S.J.Golestani.AStop-and-GoQueueingFrameworkforCongestionManagement.InSIGCOMM,1990.[25]P.Goyal,H.M.Vin,andH.Chen.Start-time
14 FairQueueing:ASchedulingAlgorithmforInte
FairQueueing:ASchedulingAlgorithmforIntegratedServicesPacketSwitchingNetworks.InSIGCOMM,1996.[26]V.Jeyakumar,M.Alizadeh,D.Mazires,B.Prabhakar,A.Greenberg,andC.Kim.EyeQ:PracticalNetworkPerformanceIsolationattheEdge.InNSDI,2013.[27]C.R.Kalmanek,H.Kanakia,andS.Keshav.RateControlledServersforVeryHigh-SpeedNetworks.InGLOBECOM,1990.[28]S.Keshav.Packet-PairFlowControl.IEEE/ACMTransactionsonNetworking InHotNets,2015.[39]A.Sivaraman,K.Winstein,S.Subramanian,andH.Balakrishnan.NoSilverBullet:ExtendingSDNtotheDataPlane.InHotNets,2013.[40]H.Song.Protocol-ObliviousForwarding:UnleashthePowerofSDNThroughaFuture-proofForwardingPlane.InHotSDN,2013.[41]D.Verma,H.Zhang,andD.Ferrari.GuaranteeingDelayJitterBoundsinPacketSwitchingNetworks.InTRICOMM,1991.