/
SPPIFOApproximatingPushInFirstOutBehaviorsusingStrictPriorityQueue SPPIFOApproximatingPushInFirstOutBehaviorsusingStrictPriorityQueue

SPPIFOApproximatingPushInFirstOutBehaviorsusingStrictPriorityQueue - PDF document

holly
holly . @holly
Follow
342 views
Uploaded On 2021-08-16

SPPIFOApproximatingPushInFirstOutBehaviorsusingStrictPriorityQueue - PPT Presentation

Incomingpacketssequencealreadyenqueued341452PIFOqueuetheoretical1234452123445SPPIFOapproximation445312suboptimaloutputstrategyA11503415053124452344512strategyB11502315052123445optimaloutputFigure1SP ID: 864210

148 147 pifo 150 147 148 150 pifo cost push priorityqueue qqq resp fig x0000 priorityqueues usa ows r2qi

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "SPPIFOApproximatingPushInFirstOutBehavio..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 SP-PIFO:ApproximatingPush-InFirst-OutBeh
SP-PIFO:ApproximatingPush-InFirst-OutBehaviorsusingStrict-PriorityQueuesAlbertGranAlcozETHZürichAlexanderDietmüllerETHZürichLaurentVanbeverETHZürichAbstractPush-InFirst-Out(PIFO)queuesarehardwareprimitiveswhichenableprogrammablepacketschedulingbyprovidingtheabstractionofapriorityqueueatlinerate.However,imple-mentingthematscaleisnoteasy:justhardwaredesigns(notimplementations)exist,whichsupportonlyabout1kows.Inthispaper,weintroduceSP-PIFO,aprogrammablepacketschedulerwhichcloselyapproximatesthebehaviorofPIFOqueuesusingstrict-priorityqueues—atlinerate,atscale,andonexistingdevices.ThekeyinsightbehindSP-PIFOistodynamicallyadaptthemappingbetweenpacketranksandavailablestrict-priorityqueuestominimizetheschedulingerrorswithrespecttoanidealPIFO.Wepresentamathematicalformulationoftheproblemandderiveanadaptationtechniquewhichcloselyapproximatestheoptimalqueuemappingwithoutanytrafcknowledge.WefullyimplementSP-PIFOinP4andevaluateitonrealworkloads.WeshowthatSP-PIFO:(i)closelymatchesPIFO,withaslittleas8priorityqueues;(ii)scalestolargeamountofowsandranks;and(iii)quicklyadaptstotrafcvariations.WealsoshowthatSP-PIFOrunsatlinerateonexistinghard-ware(BarefootTono),withanegligiblememoryfootprint.1IntroductionUntilrecently,packetschedulingwasoneofthelastbastionsstandinginthewayofcompletedata-planeprogrammability.Indeed,unlikeforwardingwhosebehaviorcanbeadaptedthankstolanguagessuchasP4[7]andreprogrammablehard-ware[2],schedulingbehaviorismostlysetinstonewithhardwareimplementationsthatcan,atbest,becongured.Toenableprogrammablepacketscheduling,themainchal-lengewastondanappropriateabstractionwhichisexibleenoughtoexpressawidevarietyofschedulingalgorithmsandyetcanbeimplementedefcientlyinhardware[22].In[23],Sivaramanetal.proposedtousePush-InFirst-Out(PIFO)queuesassuchanabstraction.PIFOqueuesallowenqueuedpacketstobepushedinarbitrarypositions(accordingtothepacketsrank)whilebeingdrainedfromthehead. Incomingpacketssequence alreadyenqueued 3 4 1 4 5 2 PIFOqueue(theoretical) 1 2 3 4 4 5 2 1 2 3 4 4 5 SP-PIFO(approximation) 4 4 5 3 1 2 suboptimaloutput strategyA [1–3] [4–5] 3 1 2 4 4 5 2 3 4 4 5 1 2 strategyB [1–2] [3–5] 2 1 2 3 4 4 5 optimaloutput Figure1:SP-PIFOapproximatesthebehaviorofPIFOqueuesbyadaptinghowpacketranksaremappedtopriorityqueues.WhilePIFOqueuesenableprogrammablescheduling,im-plementingtheminhardwareishardduetotheneedtoar-bitrarilysortpacketsatlinerate.[23]describedapossiblehardwaredesign(notimplementation)supportingPIFOontopofBroadcomTridentII[1].Whilepromising,realizingthisdesigninanASICislikelytotakeyears[6],notinclud-ingdeployment.Evenignoringdeploymentconsiderations,thedesignof[23]islimitedasitonlysupports~1000owsandreliesontheassumptionthatthepacketranksincreasemonotonicallywithineachow,whichisnotalwaysthecase.OurworkInthispaper,weaskwhetheritispossibletoap-proximatePIFOqueuesatscale,inexistingprogrammabledataplanes.WeanswerpositivelyandpresentSP-PIFO,anadaptiveschedulingalgorithmthatcloselyapproximatesPIFObehaviorsontopofwidely-availableStrict-Priority(SP)queues.ThekeyinsightbehindSP-PIFOistodynamicallyadaptthemappingbetweenpacketranksandSPqueuesinordertominimizetheamountofschedulingmistakesrelativetoahypotheticalidealPIFOimplementation. ExampleFirst,weprovideanintuitionhowSP-PIFOap-proximatesPIFObehaviorsusingSPqueuesinFig.1.Theex-ampleillustratestheschedulingbehavioroftwoSP-PIFOsys-temswhichreceivetheinputpacketsequence 3 4 1 4 5 2 .Byconvention,wewritetherstpacketbeingenqueuedonthefar-right( 3 )andthelastoneonthefar-left( 2 ).Similarlyto[23],wealsoconsiderthatlower-rankpacketshavehigherpriority(andusecorrespondingcolorcodes).Thegureil-lustratestheschedulingdecisionofeachsystemforthesixthpacket( 2 ),assumingtherst5havebeenenqueuedalready.APIFOqueuealwaysschedulesincomingpacketsper-fectly,leadingtothesortedoutput 1 2 3 4 4 5 .In

2 contrast,thequalityoftheschedulingofaSP-
contrast,thequalityoftheschedulingofaSP-PIFOschemedependson:(i)thenumberofSPqueuesavailable(here,two);and(ii)themappingofpacketrankstothosequeues.Fig.1illustratestwosuchmappingstrategies.StrategyAmapsranks1–3(resp.4–5)tothehighest(resp.lowest)SPqueue,whileStrategyBmapsranks1–2(resp.3–5)tothehighest(resp.lowest)SPqueue.WeseethatStrategyBiscapableofperfectlysortingtheinputsequence,i.e.itbehaveslikeaperfectPIFOqueue.Incontrast,StrategyAleadstosub-optimalpacketinversions,e.g. 1 isincorrectlyscheduledafter 3 .InsightsThekeychallengeinSP-PIFOistodesignadapta-tionstrategiesthatcan:(i)closelyapproximatePIFObehav-ior;and(ii)beimplementedinprogrammabledataplanes.Thesearehardchallengesasthebestmappingstrategyde-pendsonthetrafcmixandtheactualranksbeingenqueued,bothofwhichcanchangeonaper-packetbasis.SP-PIFOapproximatesthebestmappingstrategybydy-namicallyshiftingtheranksmappedtoeachqueuetoreducetheschedulingmistakesitobservesinrealtime.WeshowthatSP-PIFO'sadaptationstrategyachievesalmostthesameperformanceasprovably-correctadaptationstrategieswhilebeingimplementableinprogrammabledataplanes.PerformanceWeuseSP-PIFOtoimplementawidevarietyofschedulingobjectivesrangingfromminimizingowcom-pletiontimestoachievingmax-minfairness.Forallcases,weshowthatSP-PIFOachievesperformanceon-parwiththestate-of-the-art.WealsodemonstratethatSP-PIFOrunsatlinerateonexistingprogrammablehardware.ContributionsOurmaincontributionsare:AnovelapproachforapproximatingPIFOqueuesusingstrict-priorityqueues(§3).Anadaptationalgorithmwhichdynamicallyadaptsthequeuemappingaccordingtothenetworkconditions,closely-approximatinganoptimalscheme(§4).Animplementation1ofSP-PIFOinJavaandP4(§5).AcomprehensiveevaluationshowingSP-PIFOeffec-tivenessinapproximatingperfectPIFObehaviorwithaslittleas8queuesandonactualhardwareswitches(§6). 1Availableathttps://github.com/nsg-ethz/sp-pifo 2 1 4 4 3 5 3 4 1 4 5 2 1 2 3 4 4 5 Queue1 Queue2 Queuen 0 3 5 queueboundsqqq=f0;3;5g Mapping Adaptation Strategy#1GradientDescent(Section3) Strategy#2SP-PIFO(Section4) PriorityQueuing Incomingpackets Outgoingpackets decisionrqi?(bottom-up) q1=0;q2=3;q3=5 Figure2:OverviewofSP-PIFOdata-planepipeline.2OverviewInthissection,weprovideaninformaloverviewofhowSP-PIFOmanagestocloselyapproximatePIFObehaviors.Atahighlevel,SP-PIFOisapriority-queuingschedulingdisci-pline(seeFig.2)whichmapsincomingpacketstonpriorityqueues.SP-PIFOassumesthatpacketsaretaggedwitharankindicatingtheintendedschedulingorder,withlowerranksbe-ingpreferredoverhigherones.Packetsenqueuedinaqueuearescheduledaccordingtotheirorderofarrival(i.e.,First-InFirst-Out),afterallpacketsenqueuedinanyhigher-priorityqueuehavebeenscheduled.Unlikeclassicalpriority-queuingdisciplines[20],SP-PIFOdynamicallyadaptsthemappingbetweenthepacketranksandthepriorityqueuesaccordingtotheobservednetworkconditions.Inparticular,SP-PIFOadaptsthemappingsoastominimizethescheduling“unpi-foness”,thatis,thenumberoftimesahigher-rankpacketisscheduledbeforeanenqueuedlower-rankpacket.Werefertosuchschedulingmistakesasinversions.MappingSP-PIFOmapseachincomingpackettoqueuesaccordingtothequeuebounds.Thesequeueboundsiden-tify,foreachqueuei,thesmallestpacketrankthatcanbeenqueued.Wheneverapacketisreceived,SP-PIFOscansthequeueboundsbottom-up,startingfromthelowest-priorityqueue,andenqueuesthepacketintherstqueuewithaboundsmallerorequaltothepacketrank.Givenapacketwithrankr2Z0andnpriorityqueues,letqqqbethevectorofqueuebounds(q1;;qn)2Znsuchthat0q1q2qn.Forinstance,consideravectorqqq=f0;3;5gindicatingtheboundsof3priorityqueues,with0(resp.5)indicatingtheboundofthehighest-(resp.lowest-)priorityqueue.Givenqqq,SP-PIFOenqueuespacketswithrank2intherst(highest-priority)queue,packetswithrank3inthesecondqueueandpacketswithrank10inthethird(lowest-priority)queue.Adapt

3 ation“Unpifoness”canbeminimize
ation“Unpifoness”canbeminimizedacrossmulti-plepackets,e.g.bymonitoringtherankdistributionoverperiodictimewindowsandadaptingtheboundsthroughagradientdescent,oronaper-packetbasis(seeFig.2).De- pendingonthecharacteristicsoftherankdistribution,therststrategycanprovablyconvergetotheoptimalmapping.Unfortunately,itsrequirementsexceedthecapabilitiesofex-istingprogrammabledataplanes.SP-PIFOaddressesthesetwolimitations:itworksforanyrankdistribution,onexistinghardware.SP-PIFOdynamicallyadaptsqsuchthatthere-sultingschedulingcloselyapproximatesanidealPIFOqueue,minimizingtheamountofobservedinversionsbydynamicallyshiftingtheranksmappedtoeachqueue.SP-PIFOoperatesonline,withoutpriorknowledgeoftheincomingpacketranks.SP-PIFO'sadaptationmechanismconsistsoftwostages:apush-upstagewherefuturelow-rank(i.e.high-priority)packetsarepushedtohigher-priorityqueues;andapush-downstagewherefuturehigh-rank(i.e.low-priority)packetsarepusheddowntolowerqueues.Stage1:Push-upWheneverSP-PIFOenqueuesapacket,itupdatesthecorrespondingqueueboundtotherankoftheenqueuedpacket.Doingso,SP-PIFOaimsatensuringthatfuturelower-rankedpacketswillnotbeenqueuedinthesamequeue,butinamorepreferredone.Intuitively,SP-PIFO“pushesup”packetswithlowrankstothehighest-priorityqueues,wheretheywilldrainedrst.Ofcourse,asthenumberofqueuesisnite—andoften,muchsmallerthanthenumberofranks—thisisnotalwayspossible,leadingtoinversions.Stage2:Push-downWheneverSP-PIFOdetectsaninver-sioninthehighest-priorityqueue(i.e.,thepacketrankissmallerthanthehighest-priorityqueuebound),itdecreasesthequeueboundofallqueues.Doingso,SP-PIFOen-suresthatfuturehigher-rankpacketswillbeenqueuedinlower-priorityqueues.Intuitively,afteraninversion,SP-PIFO“pushesdown”packetswithhighrankstothelower-priorityqueuesinordertopreventthemfromcausinginversionsinthehighest-priorityqueue.SP-PIFOdecreasesthequeueboundsaccordingtothemagnitudeoftheinversion,i.e.thedifferencebetweenthepacketrankandthecorrespondingqueuebound:thebiggertheinversion,themoreranksarepusheddown.ExampleFig.3illustratestheexecutionofSP-PIFOwithtwopriorityqueueswhenreceiving 3 4 1 4 5 2 1 .Withoutlossofgenerality,weconsiderthatthequeueboundsareinitializedto0.SP-PIFOenqueuestherstpacket( 3 )inthelowest-priorityqueueandupdatesitsqueueboundto3.Likewise,SP-PIFOalsoenqueuesthesecondpacket, 4 ,inthelowest-priorityqueue.Asitsrank(4)ishigherthanthequeuebound(3),itthenupdatesthequeueboundto4.Thesameprocessisappliedtothesubsequentpacketsuntilthesecond 1 isencountered,creatinganinversion(grayedareainFig.3).Indeed,SP-PIFOenqueues 1 inthehighest-priorityqueueafterhavingenqueued 2 .Oncetheinversionisdetected,SP-PIFOadaptsthequeueboundsto1and5�1=4,respectively.Observethatif 1 and 2 keeparriving,theboundofthelowest-priorityqueuewilldecrease,eventuallyreach-ing2.Atthispoint,future 1 willnotexperienceinversionsanymoreastheywillhaveadedicatedqueue. Reactingtoinversions 0 0 3 3 0 3 4 3 4 0 4 1 3 4 1 1 4 4 3 4 4 1 1 4 5 3 4 4 5 1 1 5 2 3 4 4 5 1 2 2 5 1 3 4 4 5 1 2 1 1 5�1=4 3 4 1 4 5 2 1 Incomingpackets Figure3:SP-PIFOmappingandadaptationmechanisms.3SP-PIFOdesignInthissection,wedescribethetheoreticalbasissupportingthedesignofSP-PIFO.Werstphrasetheproblemofndingtheoptimalqueueboundsasanempiricalriskminimizationprob-leminwhichalossfunction—how“unpifo”thecurrentmap-pingis—isminimized(§3.1).Wethendevelopanalgorithmbasedongradientdescentwhichprovablyconvergestotheoptimalboundsforstablerankdistributions(§3.2).Weshowhowtheconvergencerequirementsmakethealgorithmim-practical(§3.3).Inthefollowing,wepresentSP-PIFOwhichrelaxestherequirementsatthebenetofpracticality(§4).3.1ProblemstatementLetU:RnR0!R0bealossfunctionsuchthatU(qqq;r)quantiestheapproximationerrorofschedulingapacketwithrankrbasedonqueueboundsqqqcomparedtoanidealPIFOqueue.Intuitively,asmallerlossequal

4 sabetterapproximation.NotethatUstandsfor
sabetterapproximation.NotethatUstandsforunpifoness.Theadaptationgoalistondtheoptimalqueueboundsqqqthatminimizetheexpectedlossforallpossibleranks.LetQbethespaceofallvalidboundvectorsandRthedistributionofpacketranks,thentheoptimalqueueboundsqqqare:qqq=argminqqq2QErR[U(qqq;r)](1)Findingqqqdirectlyisintractablethough.Indeed,evaluatingtheexpectedlossUisimpossiblesincethedistributionofpacketranksRisunknown.WeaddressthisproblembyconsideringtheempiricallossUempobservedoverasetDofi.i.d.ranksamples.Doingso,wephrasetheproblemofndingqqqasanempiricalriskminimization(ERM)problem:qqq=argminqqq2Q1 jDjår2DUemp(D;qqq;r)(2) EvaluatingempiricallossesForagivenrankr,wemea-suretheempiricallossUempastheexpectednumberofinver-sionsthatrwouldencounter,iftherankdistributionDwasscheduledgiventhequeueboundsqqq,weightedbythecostthateachinversionwouldcausetothesystemperformance.Thiscostcanbejustaconstantvalue,ifallinversionsaretreatedthesame,oritcanmeasurethemagnitudeofthein-version(i.e.,howbigisthedifferencebetweenrankscausingit).Sincerreceivesinversionsonlyfromhigherranksinthedistribution,Uempcanberewrittenas:Uemp(D;qqq;r)=1 jDjår02Dr0�rcostqqq(r0;r)(3)Havingformulatedtheadaptationgoalasanempiricalriskminimization,weaimtosolveitbyanalyzinghowchangesinqqqinuencetheempiricalrisk,andtryingtodesignaniterativealgorithmcapableofconvergingtotheminimalrisk.3.2Gradient-basedadaptationalgorithmWerstintroduceagreedy,gradient-basedalgorithm,whichprovablyconvergestotheoptimalqueueboundsqqqprovidedthattherankdistributionstaysconstant.Thealgorithmbuildsuponthefactthatinversionscannotoccurbetweenranksmappedtodifferentpriorityqueues.Thisallowstoinstantiatetheempiricalriskminimizationineq.2ataqueuelevelbysimplyaddingtheindividuallossesofeachqueue.LettingU(qi)bethelossfunctioncorrespondingtothequeuewithboundqi,thisis:qqq=argminqqq2Qåqi2qqqU(qi)(4)LettingpD(r)andpD(r0)betheempiricalprobabilityofranksrandr0,respectively,bothmappedtothequeuewithboundqi,wecandenetheunpifonessofthequeueas:U(qi)=åqirqi+1rr0qi+1pD(r)pD(r0)cost(r0;r)(5)OverviewConsideringthisprobleminstantiation,thegreedyalgorithmrstcomputestherankdistributionoverasetofkpacketsbeforeminimizingtheexpectedper-queueunpifonessbyincrementing(resp.decrementing)thequeuebounds.Specically,afterprocessingthek-thpacket,thegreedyalgo-rithmselects,foreachqueue,theboundthatmostdecreasestheoverallsystemunpifoness.Althoughcomparingtheperfor-manceofallboundcombinationsisnotpossible,weintroduceanefcientcomputationmechanismthatallowstoprunethesearchspacewhilepreservingconvergence.WeprovetheoptimalityofthealgorithminAppendixA. Incomingpackets adaptationwindow(k=7) 3 4 1 4 5 1 2 ::: ::: 3 1 1 2 4 4 5 1 1 2 3 4 4 5 3 4 1 4 1 2 5 unpifoness=8a improvingallocation8a9a worseningallocation25a�9a currentallocation unpifoness=25a unpifoness=9a 1 4 1 3 1 5 [q1=1;q2=3](updatedbounds) r1 r2 r3 r4 r5 0 1=7 2=7 packetrankdistribution Figure4:Thegradient-basedalgorithmgreedilyminimizestheexpectedunpifoness.ExampleWeillustratetheexecutionofthealgorithminFig.4.Weassumeasystemwithtwopriorityqueuesandassumethatthepacketsequence 3 4 1 4 5 1 2 isreceivedoverandoveragain.Wesettheadaptationwindowkto7packets.Weinitializethequeueboundsto1and4.Thealgorithmstartsbycomputingtheobservedrankdis-tributionafterreceivingthe7-thpacket.Here,itestimatestheprobabilityofreceivingapacketofrank1asp(1)=2=7.Sim-ilarly,p(2)=1=7,p(3)=1=7,p(4)=2=7andp(5)=1=7.Itthencomputestheexpectedunpifonessthatthisdistribu-tionwouldhavegeneratedwiththecurrentqueuebounds(eq.3).Forthehigher-priorityqueue,thisisU1=p(1)p(2)cost(2;1)+p(1)p(3)cost(3;1)+p(2)p(3)cost(3;2)=(2=71=7)(2�1)+(2=71=7)(3�1)+(1=71=7)(3�2).ThisequationcanbesimpliedtoU1=7awherea=(1=71=7).Similarly,U2=p(4)p(5)cost(5;4)=2a,addingup

5 atotalofU=9a.Next,thealgorithmcomparesth
atotalofU=9a.Next,thealgorithmcomparestheexpectedunpifonessthatwouldbeobtainedifthequeueboundwasincremented(gradi-entup)ordecremented(gradientdown)andadaptsthequeueboundinthedirectionresultinginthebiggestdecreaseofunpifoness.GradientupIncrementingq2from4to5meansthatonlyrank{5}wouldbemappedtothelower-priorityqueue.TheresultingunpifonessisU=25a.Thehigherunpifoness(25ainsteadof9a)indicatesthat,byincrementingq2,thesystemgetsfurtherawayfromthePIFObehavior.Notethatthein-creaseinunpifonesscomesfromthehigher-priorityqueueasrank{5}getsanexclusivequeue. GradientdownIncontrast,thesystemunpifonessreducesfrom9ato8awhendecrementingq2from4to3.Indeed,U1=p(1)p(2)cost(2;1)=2a,andU2=p(3)p(4)cost(4;3)+p(3)p(5)cost(5;3)+p(4)p(5)cost(5;4)=6a,addinguptoU=8a.Assuch,theadaptationmechanismupdatesthequeuebound:q2=3.Theaboveprocessrepeatsevery7-thpacket,estimatingtherankdistributionbeforegreedilyadaptingthequeuebounds.3.3LimitationsWhiletheadaptationalgorithmdescribedaboveprovablycon-vergestotheoptimalmapping(seeA.1),twokeylimitationsmakeitimpractical.First,itisnotcurrentlyimplementableinexistingprogrammabledataplanesduetoresourcecon-straints.Second,thealgorithmonlyconvergesforstablerankdistributions,whichisrarelythecase,anditsconvergencetimedirectlydependsonthedistributionsize,whichcanbelarge.Weexplainhowtoovercometheselimitationsin§4.HardwarerestrictionsMonitoringtherankdistributionsoverperiodicadaptationwindowsrequiresahighamountofmemoryandcomputationalresources,bothofwhicharescarceincurrentprogrammabledataplanes.Inparticular,implementingthegreedyalgorithminhardware(seeA.2)requiresto:(i)storethevalueofeachqueuebound;(ii)com-putethecurrentunpifoness;and(iii)estimatetheunpifonessobtainedbyincrementingordecrementingeachqueuebound.AsweexplaininA.3,theamountofresourcesrequiredtorunthealgorithmonapracticalnumberofqueues(8queuesormore)exceedsthecapabilitiesofcurrentswitchdesigns.ConvergenceInA.4,westudytheperformanceofthegradient-basedalgorithmandanalyzetheeffectsonconver-gencewhentheadaptationwindow,thenumberofqueues,andtherankrangeismodied.Weshowthat,forthealgo-rithmtoconverge,therankdistributionneedstobestableintime.However,thisisunrealisticinmostpracticalscenarioswherenotonlytherankdistributionisunknownbutalsovariesthroughtime(e.g.,virtualtimesinfair-queuingschemes).4Ourapproach:SP-PIFOWenowpresentSP-PIFO,anapproximationofthegradient-basedadaptationalgorithm(§3.2)whichisimplementableinexistingdataplanesandrapidlyadaptstovaryingrankdistri-butions.SP-PIFOsubstitutesthegradientcomputationbyasimpleradaptationprocesswhichminimizestheprobabilityofinversionsperpacket,ratherthanperk-packets.Inthefollowing,werstshowhowtoinstantiatetheem-piricalriskminimizationproblem(eq.2)atthepacketlevelanddescribehowSP-PIFOsolvesit(§4.1).Wethensystem-aticallycharacterizehowSP-PIFOhandlesinversions(§4.2).4.1Per-packetadaptationalgorithmTheSP-PIFOadaptationalgorithm(alg.1)isbasedontwocompetingstagesthatactinopposingdirection.Weshowthatthiscombinationmanagestostrikeabalanceinthenumberofinversionsobservedbyallqueues,resultinginagoodPIFOapproximation.Inthefollowing,werstshowhowtophrasetheempiricalriskminimizationproblemattheper-packetlevelbeforedescribingbothmechanisms.ProblemstatementIncontrastto§3.2,weaimatminimiz-ingthecostgeneratedbyschedulingeachindividualpacket.Formally,weaimtondtheoptimalboundvectorqqqthatminimizestheunpifonessforallenqueuedpacketsP:qqq=argminqqq2QU(P;qqq)(6)Letr(p)betherankofagivenpacketp2P,andletrp(p;qqq)betherankperceivedasaresultofthemappingdecision,whichisidentiedasthehighestrankamongstthoseofpack-etssharingthesamequeue.ConsideringthattheobjectivefortheboundvectorqqqistoperfectlyapproximatePIFObehav-iors,wecanestimatetheunpifonessatenqueueas:U(P;qqq)=åp2Pcostqqq(p)(7)wherecostqqq(p)=rp(p;qqq)�r(p)(8)Computingtherankperceivedrequiresdeterminingthehighestranka

6 mongallpacketssharingthequeueatanygivenm
mongallpacketssharingthequeueatanygivenmoment.Thisnotonlyrequirestokeeptrackofallranksineachqueue,butalsoselectingthehighest,whichiscomputa-tionallyexpensive.SinceoneofthepremisesofSP-PIFOistobeimplementableinthedataplane,werelaxthisconditionandkeeptrackofonlyasingleparameterqiperqueue.Theseparameters,theboundsqqq,simplifythecostestimationofapotentialmappingdecisionatenqueue.Wediscusshowweupdatetheseparametersaswellasthetradeoffsofthisrelaxationbelow.Stage1:“Push-up”Therststageincreasesqqqtominimizetheunpifonessofthequeuetowhichtheincomingpacketismapped.Specically,themappingprocessscansthequeuesbottom-upandenqueuesthepacketintherstqueuethatsatisesr(p)qi.Itthenincreasesqitotherankoftheenqueuedpacket.Bydoingso,themechanismminimizes(i)thecostforeachpacketp(atenqueuetime);aswellas(ii)theimpactthatthisdecisionmayhaveonfuturepackets.Thismappingprocessguaranteesazero-costpacketalloca-tionforallpacketswithinaqueue.Thatis,asweeffectivelykeeptrackofthehighestrankperqueue,weensurethatnopacketwithlowerrankismappedtothesamequeue.Thisholdsforallqueuesexceptforthehighest-priorityqueue.There,packetsareenqueuedevenifr(p)q1. Algorithm1SP-PIFOadaptationalgorithm Require:Anincomingpacketwithrankr.1:procedurePUSH-UP2:forqi:q1toqn,qi2qqqdo.Scanbottom-up3:ifrqiori=1then4:qi r.Updatequeuebound5:ENQUEUE(r;i).Selectqueue6:procedurePUSH-DOWN7:ifrq1then.Detectinversion8:cost qi�r.Computecostinversion9:forqj2qqq,j6=ido10:qj qj�cost.Adaptqueuebounds Stage2:“Push-down”Asillustratedin§2,therststagecanleadtoinversionsinthehighest-priorityqueue.Thesec-ondstageaimsatcounteractingthateffectbyreducingthenumberofranksenqueuedinthehighest-priorityqueue.Thisisachievedbydecreasingallqueueboundsbysomegivenamount.Differentdecreasingstrategiesexist.InSP-PIFO,wedecreaseeachqiproportionallytothecostoftheinver-sion.Thatis,wedecreaseallqueueboundsbyq1�r(p).Thischoiceisboth(i)practical,asitcanbeefcientlyim-plementedinhardware;and(ii)functional,asitresultsinareasonablebalancebetweeninversionsinthehighest-priorityqueueandshiftsintheotherqueues.Below,weprovidesomeinsightsonthenatureofthisbalanceandwhyitisimportantforagoodPIFOapproximation.Wesimulatetheperformanceofdifferentdecreasingstrategiesin§4.2.TradeoffsUnlikethegradient-basedalgorithm(§3.2),SP-PIFOmayconvergetoasub-optimalsolutionexhibitinginver-sions.Onecandistinguishthreesourcesofinversions.First,therecanbeinversionsinthehighest-priorityqueue.Theseinversionsareproportionaltotheprobabilityofobservingpacketswithrankr(p)q1.Second,afterthe“push-down”stage,thequeueboundsdonotnecessarilymatchthehighestrankpacketinthequeueanymore.Thismayleadtoinversionsforfuturepacketsandisproportionaltohowoften,andhowmuch,queueboundsaredecreased.Finally,becauseonlythehighestrankinaqueueistracked,itcanhappenthatapacketisenqueuedinahigher-priorityqueuebecauser(p)qi,whiler(p)isgreaterthanthelowestrankinqueuei,caus-inganinversion.Thisisproportionaltothenumberofranksbetweentheminimumrankinthequeueandthequeuebound.Average-caseanalysisTheexactamountofinversionsintro-ducedbyeachofthesethreesourcesishardtoquantifyasqueueboundsareshiftingwith(almost)everypacket.Yet,onaverage,wecanshowthatthedynamicsofSP-PIFOcoun-teractallthreesources.Ontheonehand,itequalizestheprobabilityofr(p)q1withtheprobabilityofpacketsbe-ingmappedtoaspecicqueue,strikingabalancebetweeninversionsbecausetherearenohigher-priorityqueues,andin-versionsbecauseofqueueboundmismatch.Furthermore,forthisequalizing,theprobabilitiesofspecicranksareweightedmoreiftheyarefarawayfromqueuebounds,whichkeepsqueuesmorecompacttoreducethechanceofoverlap.Asaresult,onaverageworkloads,SP-PIFOprovidesagoodapproximation,andcanadapttoarbitraryrankdistribu-tions.Nevertheless,thereareadversarialpacketorderingscir-cumventingthesemechanisms,resultinginlargeunpifoness(§7).Weprovidethetheoreticalfoundatio

7 nsforthesestate-mentsinAppendixBandverif
nsforthesestate-mentsinAppendixBandverifythembysimulationin§4.2.4.2SP-PIFOanalysisWenowdivedeeperintounderstandingSP-PIFOusingswitch-levelsimulations.WecompareitsbehaviortothatofanidealPIFOqueue,alongwithseveralwell-knownschedul-ingschemes(e.g.,FIFO).Werstdescribethehigh-levelbehaviorusingauniformrankdistribution(§4.2.1),beforesystematicallyexploringthedesignspace(§4.2.2).MethodologyWeimplementvariousschedulingschemes(includingSP-PIFO,FIFO,andourgradient-basedalgorithm)inNetbench[3,15],apacket-levelsimulator.Weanalyzetheperformanceofasingleswitchscheduling1500owsof1MB(xed),whichstartaccordingtoaPoissondistribution.Werunthesimulationduringonesecond.Welimitthetransmissionthroughanoutputlinkof10Gbpswhichcorrespondstoanaverageportutilizationof75%.Wemeasurethenumberofinversionsgeneratedbyeachrankatdequeue.Wheneverapacketispolled,wecheckwhetheritsrankishigherthananyoftheranksremainingatanyofthequeues.Whenthisoccurs,wecountaninversiontotherankgeneratingit(i.e.,theoneofthepolledpacket),makingsurethatinversionsarecountedatmostonceperpolled-packet,regardlessofthenumberofpacketsaffectedbyit.Wecomparefourschedulingschemes:(i)SP-PIFO(§4);(ii)thegradient-basedalgorithm(§3,seeimplementationinA.2);(iii)astrict-priorityschemexedtotheoptimalmappingforauniformdistribution(i.e.,boundsdistributeduniformlyacrossranks,qi=12i);and(vi)aFIFOqueue,asbaseline.Allstrict-priorityschemes(SPschemes)use8queuesof10packets,whiletheFIFOqueuehasacapacityof80packets.4.2.1CharacterizinggeneralSP-PIFObehaviorWestartbyshowcasinghowSP-PIFOhandlesinversionsbyanalyzingitsbehaviorunderauniformrankdistribution.Thatis,wetagthepacketswitharankdrawnfromauniformdistribution(between0to100).Fig.5aillustratesthenumberofinversionsgeneratedbyeachrankforthedifferentSPschemesincomparisonwithFIFO.WeseethataFIFOqueuegeneratesauniformnumberofinversionsacrossallranks(sincetheyallsharethesamequeue).Incontrast,SPschemes(alltheothersinFig.5a)gen-erateaprogressively-highernumberofinversionsasrankval- (a)Uniform8queues (b)Uniform32queues (c)Adaptationstrategies (d)UtilizationFigure5:SP-PIFOperformance(uniformrankdistribution).uesincrease.Thisoccursashigherranksaremappedtolower-priorityqueues,whichdrainpacketslessfrequently.Sincethosequeueshaveahigheroccupancyonaverage,thepoten-tialnumberofinversionsincreases.Thisbehavior,however,isnotpreservedforthelowest-priorityqueue(thefar-rightpeakinthegraph)asaresultofstarvation.Despitehavingthelargestaveragequeuesize,thisqueuedrainsfewerpacketsand,assuch,thenumberofinversionsitseesdecreases.Forthexed-queuebounds,weseethatasaw-shapedelin-eatestheinversionsobservedacrossranksindifferentqueues,reachingthexaxisfortherankscorrespondingtothequeuebounds.Indeed,thelowestrankwithineachqueuenevergen-eratesinversionssincetheotherrankssharingthequeuehavehighervalues.Thesecond-lowestrankcanonlygeneratein-versionstothelowest,andtheprogressioncontinuesuntilthehighestrank,whichcangenerateinversionstoallthelowerrankssharingthequeue.Whenconsideringthegradient-basedgreedyalgorithm(whichisoptimal)andSP-PIFO,weseethatthesaw-shapevanishes.Thisisbecausequeueboundsarenotxedany-moreandsuccessivepacketsofagivenrankcanbemappedtomultiplequeues.Inparticular,sincetherankdistributionsampledateachadaptationwindowvaries,thequeue-bounddesigninthegradient-basedalgorithmoscillates.InSP-PIFO,asahighervariabilityisproduced,thenumberofinversionsdelineatestheenvelopeoftheoptimalschemes.4.2.2CharacterizingSP-PIFOdesignspaceWenowsystematicallyexplorethedesignspaceofSP-PIFOalongfourdimensions:thenumberofqueues,theadaptationstrategywhenencounteringaninversion(inthepush-downstage,§4.1),theutilizationlevels,andtherankdistributions. (a)Exponential (b)Inverseexponential (c)Poisson (d)ConvexFigure6:SP-PIFOperformance(alternativedistributions).SP-PIFOmanagestoapproximatetheoptimalalgorithmsinallrankdistributionsandutilizationlevels,withaslittl

8 eas8queues.Thebestperformancesareobtaine
eas8queues.Thebestperformancesareobtainedunderlowutilizationsandwith32queues.Numberofqueues(Fig.5b)Whenusingonly8queues,SP-PIFOisalreadywithin20–29%ofthegradient-descentalgorithmandtheoptimalmapping.With32queues,itgetsevencloser,producingonly22%moreinversionsthantheoptimalandachievingon-parbehaviortothegradient-descentalgorithm.Overall,itimprovesFIFOperformance3.3(resp.10)whenonly8(resp.32)queuesareused.Push-downstrategies(Fig.5c)Weevaluatefouradaptationstrategiesfordecreasingeachqueueboundinthepush-downstage:(i)tothevalueofthenext-higherqueuebound(“QueueBound”);(ii)bythecostoftheinversion(q1�r(p),thestrat-egyinSP-PIFO,“Cost”);(iii)bytherankofthepacketcaus-ingtheinversion(“Rank”);and(iv)by1(“1”).Thebestperformanceisobtainedfor“QueueBound”,whichproduces15%moreinversionsthanthegradient-basedalgorithm.Thisisfollowedby“Cost”and“Rank”,with22%,and“1”with33%.Whilethethreersttechniquesproducesimilarresults,the“pushdown”effectof“1”istoosmalltobalancethe“pushup”stage,resultinginmanyinver-sions.While“QueueBound”ismarginallybetterthan“Cost”,itismorecostlytoimplement,thusSP-PIFOusesthelatter.Utilization(Fig.5d)SP-PIFOperformanceisclosetothegradient-basedalgorithm.Forutilizationsbelow60%,SP-PIFOison-parwiththegradient-basedalgorithm.Thenumberofinversionsslightlyincreasesathigherutilizations:26%and38%for80%and90%. Rankdistributions(Fig.6)WeanalyzetheperformanceofSP-PIFOunderfouralternativerankdistributions:expo-nential,inverseexponential,Poissonandconvex.SP-PIFOperformsbetterthanFIFOandisclosetothegradient-basedalgorithmforeachdistribution.TheperformanceofSP-PIFOisbetterforrankdistributionsinwhichmoreranksappearinhigher-priorityqueues.ThenumberofinversionsforSP-PIFOinconvexandexponentialdistributionsisonly21–24%higherthanthegradient-basedalgorithm.ThecorrespondingnumbersforPoissonandin-verseexponentialamountto49–55%.Inallcases,SP-PIFOperformsbetween2.5–3.5betterthanaFIFO,withonly8priorityqueues.5ImplementationInthissection,wedescribeourimplementationofSP-PIFOinP416[7]andP414.2Ourimplementationfollowsthealgo-rithmdescribedin§4andspans190(P416)and735(P414)linesofcode.Itperformsthreemainoperations:(i)comput-ing/extractingtherankfromapacketheader;(ii)mappingpacketstoqueues(§2);and(iii)updatingthequeuebounds.RankcomputationWeimplementedandtestedmultiplerankcomputationfunctionssuchasLSTF[17],STFQ[23],andFIFO+[9]inP416.WenotethatthereducedmemoryusageinSP-PIFOleavesroomtocomputeranksdirectlyontheswitch.Thatsaid,mostrankingalgorithmscandirectlybecomputedbytheend-hosts[17].MappingWestorethequeue-boundvaluesinindividualreg-istersandaccessthemsequentiallyusinganif-elsecondi-tionaltree.Foreachregisteraccess,weleveragetheALUtoperformthreeoperations:(i)wereadthequeue-boundvalueandcompareittothepacketrank;(ii)wenotifythequeue-selectionresulttothecontrolowusingasingle-bitmetadata;and(iii)weupdatethequeue-boundvaluetothepacketrankifthequeueisselected.IntheALUofthelastqueue,insteadoftransferringthemappingdecisiontothecontrolowusingabinarymetadata,werstcheckwhetheraninversionhasoccurredbeforetransferringthepotentialinversioncostusinglargermetadata.AdaptationWhenthemappingprocessdetectsaninversion,weneedtoupdateallqueuebounds.WhileaccessingmultipleregistersisnotrestrictedbytheP4specication[10],currentarchitecturesdonotsupportit(amongothers,toguaranteelinerate).Weaddressthisproblembyrelyingonthepacket-resubmissionprimitivetoaccessthequeueboundsasecondtimeandupdatethemwiththemeasuredinversioncost.Whileresubmissioncanpossiblybreaktheline-rateguarantees,weonlyrequireitoccasionally,uponinversions. 2TheP414codeisusedforrunningSP-PIFOontheTonoplatform[2].MemoryrequirementsOurimplementationonlyrequiresnre

9 gisterswherenisthenumberofqueues.Welever
gisterswherenisthenumberofqueues.WeleveragenALUstoaccessregistersduringthemappingprocessandn�1additionalALUstoupdateregistersfromtheresubmissionpipelineincaseofinversions.Weusen�1bitsofmetadatatoaccessthemappingresultsofnon-top-priorityqueuesintheirrespectiveALUsfromthecontrolow(i.e.,asingle1-bitmetadataeldforeachqueue)andanextra32-biteldforthetop-priorityqueueto(potentially)transfertheinversioncost.Regardingthenumberofstages,ourimplementationusesmorestagesthanthenumberofqueuesinordertoperformthesequentialaccesstoqueue-boundregistersduringthemap-pingprocess.Notethatalternativedesignswouldbepossiblebutwouldcomeattheexpenseofline-rateguarantees.6EvaluationWenowevaluateSP-PIFOperformanceandpracticality.Werstusepacket-levelsimulationstoevaluatehowSP-PIFOap-proximateswell-knownschedulingobjectivesunderrealistictrafcworkloads(§6.1).WethenevaluateSP-PIFOschedul-ingperformancewhendeployedonhardwareswitches(§6.2).6.1PerformanceanalysisWeconsidertwoschedulingobjectives:(i)minimizingFlowCompletionTimes(FCTs);and(ii)enforcingfairness.Weconsiderthatranksaresetattheendhostsfortheformerobjectiveandcomputedintheswitchforthelatter.Forbothobjectives,weshowthatSP-PIFOschedulingcapabilitiesachievenear-optimalperformance,withaslittleas8queues.MethodologyWeintegratedSP-PIFOinNetbench[3,15],apacket-levelsimulator.Similarto[4],weusealeaf-spinetopologywith144serversconnectedthrough9leafand4spineswitches.Wesettheaccessandleaf-spinelinksto1Gbpsand4Gbps,respectively.Thisresultsinatheoreticalend-to-endRound-Trip-Time(RTT)of32.12µswhencross-ingthespine(4hops)and26µsundertheleaf(2hops).Wegeneratetrafcowsfollowingtwowidely-usedheavy-tailedworkloads:pFabricwebapplicationanddatamining[4].FlowarrivalsarePoisson-distributedandweadapttheirstartingratestoachievedifferentutilizationlevels.WeuseECMPanddrawsource-destinationpairsuniformlyatrandom.6.1.1MinimizingFlowCompletionTimesRankdenition&benchmarksWeminimizeFCTsbyimplementingthepFabricalgorithm[4]whichsetsthepacketranksaccordingtotheirremainingowsizes.Specically,wecomparepFabricperformancewhenrunontopofPIFOandSP-PIFO.WealsoanalyzeTCPNewRenowithtraditionaldrop-tailqueuesandDCTCPwithECN-markingdrop-tail (a)(0,100KB):Average (b)(0,100KB):99thpercentile (c)[1MB,¥):AverageFigure7:pFabric:FCTstatisticsacrossdifferentowsizesindataminingworkload. (a)(0,100KB):Average (b)(0,100KB):99thpercentile (c)[1MB,¥):AverageFigure8:pFabric:FCTstatisticsacrossdifferentowsizesinwebsearchworkload.queues.OurpFabricimplementationdoesnotconsiderstarva-tionprevention.Assuggestedin[4],weapproximatepFabricratecontrolbyusingstandardTCPwitharetransmissiontime-outof3RTTs,balancingthedifferenceinRTOsbe-tweenschemeswiththeproportionalqueuesize.Thatis,weuseanRTOof96µsand8queues10packetsforSP-PIFO(resp.1queue80packetsinPIFO),andanRTOof300µsand146KBdrop-tailqueuesforbothTCPandDCTCP,withECNmarkingat14.6KB,i.e.10packets.SummaryFig.7andFig.8depicttheaverageand99thper-centileFCTsoflarge(1MB)andsmallows(100KB)forbothdataminingandwebsearchworkloads.WeseethatSP-PIFOachievesclose-to-PIFOperformanceinbothdis-tributions.Whencomparingperformanceacrossowsizes,weseethatSP-PIFOachievesbetterperformanceforsmallows.Thisisnotsurprisingsincethoseowsaremappedintohigher-priorityqueues.Asdiscussedin§4.2,strict-priorityschemesprovidehigherunpifonessprotectionforpacketsmappedintohigher-priorityqueues.Whencomparingthetwotrafcdistributions,weseethatSP-PIFOperformsbetterunderthedataminingworkload.Thisisagainexpected.Whilebothdistributionsareheavy-tailed,thedataminingoneismoreskewed[4]andthereforeeasiertohandleforSP-PIFO.Indeed,theprobabilityofhavinglargeowssimultaneouslysharingthesameport(potentiallyblockingsmallerows)islowerforthedataminingworkload.Datamining(Fig.7)TheaverageFCTsachievedbyPIFOandSP-PIFOaresimilarforsmallows,i.e.withi

10 n0.4–11%.Concretely,SP-PIFOout
n0.4–11%.Concretely,SP-PIFOoutperformsDCTCPandTCPbyafactorof2–5and8–30,respectively.Whenconsid-eringthe99thpercentile,thegapbetweenPIFOandSP-PIFOslightlyaccentuatesto9.6–26.6%.Still,SP-PIFOoutper-formsDCTCPandTCPbyafactorof1.5–4.7and12.5–22,respectively.ThelargestperformancegapbetweenPIFOandSP-PIFOoccursatlowutilization.Inthisregime,thenum-berofpacketsscheduledislowandthetransientadaptationofSP-PIFOismorevisible.Whenevertheutilizationis&#x]TJ/;&#x 9.9;ئ ;&#xTf 1;�.00; 0 ;&#xTd [;40%,thedifferenceisconsistentlybelow20%.Finally,SP-PIFOandPIFOstillperformsimilarlyamonglargeows:within1.9–9%,representingimprovementswithrespecttoTCPandDCTCPof1.4–2.7and1.5–2.8,respectively.Websearch(Fig.8)Theresultsaresimilartothedataminingone,withslightlyworseperformanceforSP-PIFO,especiallyamongstbigows.Indeed,sincethedistributionislessskewed,biggerowshavehigherchancestoreachhigher-priorityqueues,blockingtransmissionsofsmallerows.Still,weseethattheperformanceofSP-PIFOiswithin16.54–32.5%ofPIFOforsmallows,andbetween1.3–4.4and4.7–16.7betterthanDCTCPandTCP.Evenatthe99thpercentile,thedifferencebetweenSP-PIFOandPIFOstayswithin20.7–32%.Notethat,whilethepercentagesmightseemhigh,thevalueswearelookingatareverysmall. (a)(0,100KB):Averageon8queues (b)(0,100KB):Averageon32queues (c)FCTbreakdown70%:Averageon32queuesFigure9:Fairness:FCTstatisticsforallowsatdifferentloads,overthewebsearchworkload.6.1.2EnforcingfairnessacrossowsRankdenition&benchmarksWeenforcefairnessacrossowsbyimplementingtheStart-TimeFairQueueing(STFQ)rankdesign[13]ontopofPIFOandSP-PIFO.WebenchmarkoursolutionwithAFQ[21](§8).Weanalyzetheperformancefordifferentowsizesandnumberofqueues.Specically,weuse8queues10packetsinSP-schemes(resp.1queue80packetsforsingle-queueschemes)and32queues10packetsinSP-schemes(resp.1queue320packetsforsingle-queueschemes).ForAFQ,weselectthebytes-per-roundparameterwhichgivesthebestperformance.Inourtestbed,thisis320and80BpRforthe8-queueand32-queuescenario,respectively.Asin[21],weuseDCTCPastransportlayerforAFQ,PIFOandSP-PIFO(withanRTOof300µs).WesetECNmarkingto48KB,i.e.32packets.WegeneratetrafcfollowingthepFabricwebsearchdistribution.SummaryFig.9aandFig.9bdepicttheaverageFCTsofsmallowsacrossdifferentlevelsofutilization,when8queuesand32queuesareused.Fig.9cdepictstheFCTsacrossowsizesat70%utilizationandfor32queues.InallcasesSP-PIFOachievesnear-PIFObehaviorandison-parperformancewithAFQ(currentstate-of-the-art).Impactoftheutilization(Fig.9a&Fig.9b)SP-PIFOstayswithin23–28%(resp.21–28%)ofidealPIFOacrossalllevelsofutilizationwhen8queues(resp.32)areused.Eveninthehighestutilizations,itisconsistentlybelow26%(resp.25%).SP-PIFOperformanceisatthelevelofAFQ,within3–10%(resp.0.5–11%),generatingimprovementsof1.4–2.3and2.7–4.2(resp.1.4–2.3and3.7–7.4)overDCTCPandTCP.ThefactthatSP-PIFOperfor-manceisequivalentwith8and32queuesisnotsurprising:asthebandwidth-delayproductislow,onlyareducedqueuesizeisrequiredforefcientlinkutilization.Impactofowsizes(Fig.9c)At70%utilization,weseethatSP-PIFOlieswithin10–30%ofPIFOperformanceforallowsizesandison-parwithAFQ.Theonlyexcep-tionisforverysmallows(10K)inwhichAFQperforms20%better.SP-PIFOimprovesDCTCPandTCPbehaviorsforsmallows,within1.5–3Xand2–13X,respectively.Consideringthe99thpercentile,weseethatSP-PIFOstayswithin8–35%ofPIFOandimprovesbetween12–78%and1.5–10.76withrespecttoDCTCPandTCP.Impactofthenumberofqueues(Fig.10)WeanalyzetheimpactofthenumberofqueuesonaverageFCTsforbothAFQandSP-PIFO.WesettheBpR

11 atMSSforallqueuecongurations,asin[21
atMSSforallqueuecongurations,asin[21],avoidingAFQdroppingpacketstoooftenforcasesoffewerqueues.WeseethatwhileAFQhasahighersensitivitywithrespecttothenumberofqueues,SP-PIFOpreservesasimilarlevelofperformance,withoutanycongurationorpriortrafcknowledge.6.2HardwaretestbedWenallyevaluateourhardware-basedimplementationofSP-PIFOontheBarefootTonoWedge100BF-32Xplatform[2].Weperformtwoexperiments.First,weanalyzethebandwidthallocatedbySP-PIFOtoowswithdifferentrankswhenscheduledoverabottlenecklink.Second,wemeasuretheimpactontheFCTwhenSP-PIFOrunspFabric.WeshowthatSP-PIFOefcientlyschedulestrafcatpotentiallyTbps.BandwidthsharesWetransmit8UDPowsof20Gbpsbe-tweentwoservers.Wegeneratetheowsprogressively,inincreasingorderofpriority(decreasingrank).Weuse4pri-orityqueuesandscheduletheowsovera10Gbpsinterface.WegeneratetheowsusingMoongen[12]anduseaninter-mediateswitchtoamplifythemtotherequiredthroughput.Fig.11depictstheows'bandwidthandhowSP-PIFOmanagestovirtuallyextendthenumberofqueues.Asex-pected,therst4owsreceivethecompletebandwidth,sincetheyaremappedtodedicatedqueues.Asthenumberofowsexceedsthenumberofqueues,owsstarttosharequeuespaceandseeareducedbandwidth.FlowcompletiontimesWesimultaneouslygenerate1000TCPowsofdifferentsizes,goingfrom1GBto100GBinstepsof100MB,andschedulethemoverabottlenecklinkof7Gbps.Wesettherankofeachowtotheabsoluteowsize, (a)All:Average (b)All:AverageFigure10:Fairness:FCTstatisticsforallowsatdifferentloads,whenthenumberofqueuesismodied. Figure11:Tono:Bandwidthallocationunderprogressiveowgenerationwithincreasingpriorities.following[4].WecomparetheFCTsachievedbySP-PIFOschedulingandtheonesachievedbyaFIFOqueue.Fig.12showstheresultingFCTs.Asexpected,theFIFOqueueleadstoincreasedFCTsbynotconsideringowsize.Incontrast,SP-PIFOprioritizesshortowsoverlongones,minimizingtheirFCTsandtheoveralltransmissiontime.7DiscussionInthissection,wediscussthelimitationsofSP-PIFOandhowwecanmitigatethem.Werstdiscussintrinsiclimita-tionsthatcomefromusingPIFOasaschedulingscheme.WethendiscussspeciclimitationsofSP-PIFOtogetherwiththeproblemofadversarialworkloads.Finally,wesuggestpotentialhardwareprimitivesthatcouldfacilitatePIFOim-plementationsinthefuture.PIFO-inheritedlimitationsIndividualPIFOqueuessufferfromtwomainlimitations.First,theycannotrate-limittheiregressthroughputpreventingthemfromimplementingnon-work-conservingschedulingalgorithms.SP-PIFOalsosharesthesamelimitation.Second,PIFOqueuescannotdirectlyim-plementhierarchicalschedulingalgorithms.Yet,asproposedby[23],multipleSP-PIFOschemes(i.e.,usingdifferentsetofpriorityqueues)canbegroupedasatreetoapproximatehi-erarchicalschedulingalgorithms.Thekeychallengeconsists Figure12:Tono:FCTstatisticsacrossdifferentowsizeswithpFabricranks.inguringouthowtoallowaccessofmultiplequeueswithexistingtrafcmanagercapabilities.Whilethisisorthogonaltothispaper,oneoptionwouldbetorecirculatepackets,en-ablingaccesstothetrafcmanager(andthereforethequeues)multipletimesinthesamepipeline.Doingso,whilelimitingtheimpactonperformance,isaninterestingopenquestion.SP-PIFO-speciclimitationsThemainlimitationofSP-PIFOisthat,asanapproximationscheme,itcannotguaranteetoperfectlyemulatethebehaviorofatheoreticalPIFOqueueforallranks.Wenotetwothings.First,ourevaluation(§6)showsthat,forrealisticworkloads,SP-PIFOperformanceisoftenon-parwithPIFOperformances.Second,wenotethatSP-PIFOcanprovidestrongPIFO-likeguaranteesforsomeranksbydedicatingsomequeuestothematthepriceofre-ducedperformancefortheotherranks.Webelievethisisaninterestingtradeoffascurrentswitchescansupportupto32queuesperport[21].AdversarialworkloadsWehavearguedthat,onaverage,SP-PIFOcanadapttoanykindofrankdistribution.Thishascertainlimitations.First,weassumethatallqueuesaredrainedatsomepoint.Nonetheless,amalicioushostcouldsendalargenum

12 berofhigh-prioritypacketsand,asaresult,p
berofhigh-prioritypacketsand,asaresult,packetsinlower-priorityqueueswouldneverbedrained.Such“starvation”attacksarecommontoanytypeofpriorityscheme.Forinstance,amalicioushostcouldtrytogrababiggersliceofthenetworkresourcesbysettingranksto0inslack-basedalgorithms[4,9,17]orresettingowidentiersinfair-queuingschemes[23].Theproblemofstarvationinstrict-priorityschedulingisalsowell-knowninthecontextofQoSandistypicallyaddressedbypolicinghigh-prioritytrafcattheedgeofthenetwork[18].Asidefromstarvationattackswealsoassumethat,foragivenrankdistribution,theparticularorderofranksisrandom.Inpractice,thisisreasonable.Whiletheranksforindivid-ualowsmighthavesomestructure(e.g.,monotonically-increasingranks),whenvariousowsarescheduledtogethertheorderingoftheirpacketranksisrandomized.Yet,attack-erscouldtrytocoordinatelargenumbersofowstocreateadversarialorderings,which“outplay”theadaptationmecha- nisms(§B.3).Nevertheless,anynon-maliciousowwhichisactiveatthesametimecanthwartsuchstrategiesbyrandomlybreakingtheadversarialorder.Asidefromthat,thenetworkcouldbemonitoredtodetectsuchadversarialattacks.FacilitatingPIFOinthefutureOnaforward-lookingper-spective,wenotesomeimprovementsinhardwareprimi-tivesthatwouldfacilitatePIFOimplementationsinthefuture.Aswealreadydiscussedin§5,ahighernumberofstageswouldfacilitateper-queuestatestorageandahighernumberofqueueswoulddirectlyincreasePIFOperformance.Fur-therthanthat,multipleanddynamicmemoryaccessbetweentheingressandegresspipelineswouldallowstateupdatesafterinversionsinthehighest-priorityqueuewithouthavingtorelyonresubmissiontechniques.Inthesamedirection,accesstoqueueinformationfromtheingresspipelineoranenhancedexibilityinthemanagementofstrict-priorityqueuesdirectlyfromthedataplanewouldenablemoreaccu-rateunpifonesspredictionatenqueue,openingthedoorstohigher-performanceSP-PIFOalgorithms.8RelatedworkProgrammablepacketschedulingWhileschedulinghasbeenextensivelystudiedovertheyears,theideaofmakingitprogrammableisrelativelyrecent[17,22].In[24],Sivaramanet.al.suggestedprogrammableschedulingbyprovingthatthebestschedulingalgorithmtousedependsonthedesiredperformanceobjective.In[17],Mittalet.al.madetheobser-vationthat,despitecertainalgorithmsacceptcongurationstoapproximateawiderangeofobjectives,auniversalpacketschedulingoutperforminginallscenariosdoesnotexist.Severalabstractionsforprogrammableschedulinghavebeenproposedafterwards.InadditiontoPIFO[24],Eif-fel[19]presentsanalternativequeuestructurewhichap-proximatesneprioritiesbyexploitingthecharacteristicsthatdenepacketranksinmostscenariostodiminishtherequiredcomputationalcomplexity.Incontrastto[19,24],whichrelyonnewhardwaredesigns,SP-PIFOshowsthatefcientprogrammablepacketschedulingcanbeachievedtoday,atscale,andonexistingdevices.ExploitingpriorityqueuesOther(recent)schemeslever-agemultiplepriorityqueuesforspecicperformanceobjec-tives.Theyhighlighttheneedofprogrammableschedulinginexistingdevices[16],andillustratehowrankdesignspro-ducingclose-to-optimalresultscanalreadybeimplementedinexistingdataplanes.Forenforcingfairness,FDPA[8]sim-pliesthecomputationalcostofper-owvirtualcountersorindividualuserqueuesintraditional-fair-queuingschemesbyusingarrival-rateinformationatauserlevel.AFQ[21],instead,emulatesidealfairqueuingbyimplementingper-owcountersonacount-minsketchanddynamicallyrotatingpri-oritiesinastrict-priorityschemetoimitatetheround-robinbehavior.SP-PIFOdiffersbyxingqueueprioritiesanddy-namicallyadaptingthemappingofpacketstothosequeues.ThisactuallymakesSP-PIFOimplementableatlinerateinexistingdataplanes.pFabric[4]andPIAS[5]showtheuseofpriorityqueuesinowcompletiontimeminimization.WhilepFabricreliesingeneralonaPIFO-queuedesign,[4]includesexperimentsinwhichowsaremappedtopriorityqueuesbasedontheirsize.WhilepFabricexperimentsusethresholdsxedfromtheknowledgeof&#

13 3;owdistributions,SP-PIFOadaptsthemappin
3;owdistributions,SP-PIFOadaptsthemappingdesignautomaticallyper-packet,withoutanytrafcknowl-edgerequiredinadvance.PIAS[5]approachesthecaseofunknownowsizesandusesMulti-levelFeedbackQueues(MLFQ)[11]toachievethedesiredShortestJobFirst(SJF)behavior,bygraduallyswitchingowsfromhighertolower-priorityqueuesastheirnumberoftransmittedbytesincrease.Incontrasttotheseproposals,SP-PIFOsupportsamuchwiderrangeofperformanceobjectives.SP-PIFO(likePIFO[24])canbeusedtoimplementanyschedulingalgo-rithminwhichtherelativeschedulingorderdoesnotchangewithfuturepacketarrivals.Asillustratedintheevaluationsection(§6),thealgorithmspresentedinAFQ[21],FDPA[8],pFabric[4]andPIAS[5]canbeusedasrankingdesigns(i.e.,settingpacketrankstoschedulingvirtualrounds,estimatedarrivalrates,shortestremainingprocessingtimeofows,ornumberofpacketstransmittedundertheMLFQagingdesign)toberunontopofSP-PIFO.9ConclusionsWepresentedSP-PIFO,aprogrammablepacketschedulerwhichcloselyapproximatesthetheoreticalbehaviorofPIFOqueues,today,onprogrammabledataplanes.ThekeyinsightbehindSP-PIFOistodynamicallyadaptthemappingbetweenthepacketranksanda(xed)setofstrict-priorityqueues.OurevaluationonrealisticworkloadsshowsthatSP-PIFOispractical:itcloselyapproximatesPIFObehaviorsand,inmanycases,perfectlymatchesthem.WealsoconrmthatSP-PIFOrunsonactualprogrammablehardware.Overall,webelievethatourworkshowsthatthebenetsofprogrammablepacketscheduling—experimentingwithnewschedulingalgorithms—canbefullledtoday,inexistingnetworks.AcknowledgmentsWearegratefultotheNSDIreviewersandourshepherd,AnirudhSivaraman,fortheirinsightfulcomments.WealsothankthemembersoftheNetworkedSystemsGroupatETHZürich(especiallyEdgarCostaMolero)togetherwithChanghoonKimfromBarefootfortheirvaluablefeedback. References[1]BroadcomTridentII.https://www.broadcom.com/products/Switching/DataCenter/BCM56850-Series,2016.[2]BarefootTono.http://barefootnetworks.com/products/brief-tofino/,2017.[3]Netbench.http://github.com/ndal-eth/netbench,2018.[4]MohammadAlizadeh,ShuangYang,MiladSharif,SachinKatti,NickMcKeown,BalajiPrabhakar,andScottShenker.pFabric:MinimalNear-optimalDatacen-terTransport.InACMSIGCOMM,HongKong,China,2013.[5]WeiBai,LiChen,KaiChen,DongsuHan,ChenTian,andHaoWang.Information-AgnosticFlowSchedul-ingforCommodityDataCenters.InUSENIXNSDI,Oakland,CA,USA,2015.[6]AlexanderBarkalov,LarysaTitarenko,andMalgorzataMazurkiewicz.FoundationsofEmbeddedSystems.SpringerInternationalPublishing,2019.[7]PatBosshart,DanDaly,GlenGibb,MartinIzzard,NickMcKeown,JenniferRexford,ColeSchlesinger,DanTalayco,AminVahdat,GeorgeVarghese,andDavidWalker.P4:ProgrammingProtocol-independentPacketProcessors.2014.[8]CarmeloCascone,NicolaBonelli,LucaBianchi,Anto-nioCapone,andBrunildeSansò.TowardsApproximateFairBandwidthSharingviaDynamicPriorityQueuing.InIEEELANMAN,Osaka,Japan,2017.[9]DavidD.Clark,ScottShenker,andLixiaZhang.Sup-portingReal-timeApplicationsinanIntegratedServicesPacketNetwork:ArchitectureandMechanism.InACMSIGCOMM,Baltimore,MD,USA,1992.[10]TheP4LanguageConsortium.P4-16LanguageSpeci-cation,version1.1.0-rc.https://p4.org/p4-spec/docs/P4-16-v1.1.0-draft.pdf,2018.[11]FernandoJ.Corbató,MarjorieMerwin-Daggett,andRobertC.Daley.AnExperimentalTime-sharingSys-tem.InACMAIEE-IRE,NewYork,NY,USA,1962.[12]SebastianGallenmüller,PaulEmmerich,DanielRaumer,andGeorgCarle.MoonGen:SoftwarePacketGenera-tionfor10GbitandBeyond.InUSENIXNSDI,Oakland,CA,USA,2015.[13]PawanGoyal,HarrickM.Vin,andHaichenChen.Start-timeFairQueueing:ASchedulingAlgorithmforInte-gratedServicesPacketSwitchingNetworks.InACMSIGCOMM,PaloAlto,CA,USA,1996.[14]XinJin,XiaozhouLi,HaoyuZhang,RobertSoulé,JeongkeunLee,NateFoster,ChanghoonKim,andIonStoica.NetCache:BalancingKey-ValueStoreswithFastIn-NetworkCaching.InACMSOSP,Shanghai,China,2017.[15]SimonKassing,AsafValadarsky,GalShahaf,MichaelSchapira,andAnkitSingla.BeyondFat-treesWith-outAntennae,Mirr

14 ors,andDisco-balls.InACMSIG-COMM,LosAnge
ors,andDisco-balls.InACMSIG-COMM,LosAngeles,CA,USA,2017.[16]JamesMcCauley,AurojitPanda,ArvindKrishnamurthy,andScottShenker.ThoughtsonLoadDistributionandtheRoleofProgrammableSwitches.InACMSIG-COMM,NewYork,NY,USA,2019.[17]RadhikaMittal,RachitAgarwal,SylviaRatnasamy,andScottShenker.UniversalPacketScheduling.InUSENIXNSDI,SantaClara,CA,USA,2016.[18]JuniperNetworks.ClassofServiceFeatureGuideforSecurityDevices.page115,2018.[19]AhmedSaeed,YimengZhao,NanditaDukkipati,EllenZegura,MostafaAmmar,KhaledHarras,andAminVah-dat.Eiffel:EfcientandFlexibleSoftwarePacketScheduling.InUSENIXNSDI,Boston,MA,USA,2019.[20]ChuckSemeria.SupportingDifferentiatedServiceClasses:QueueSchedulingDisciplines.InJuniperNet-worksWhitePaper,Sunnyvale,CA,USA,2001.[21]NaveenKr.Sharma,MingLiu,KishoreAtreya,andArvindKrishnamurthy.ApproximatingFairQueueingonRecongurableSwitches.InUSENIXNSDI,Renton,WA,USA,2018.[22]AnirudhSivaraman,SuvinaySubramanian,AnuragAgrawal,SharadChole,Shang-TseChuang,TomEdsall,MohammadAlizadeh,SachinKatti,NickMcKeown,andHariBalakrishnan.TowardsProgrammablePacketScheduling.InACMHotNets,Philadelphia,PA,USA,2015.[23]AnirudhSivaraman,SuvinaySubramanian,MohammadAlizadeh,SharadChole,Shang-TseChuang,AnuragAgrawal,HariBalakrishnan,TomEdsall,SachinKatti,andNickMcKeown.ProgrammablePacketSchedulingatLineRate.InACMSIGCOMM,Florianopolis,Brazil,2016.[24]AnirudhSivaraman,KeithWinstein,SuvinaySubrama-nian,andHariBalakrishnan.NoSilverBullet:Extend-ingSDNtotheDataPlane.InACMHotNets,CollegePark,MD,USA,2013. AGradient-basedalgorithmInthisappendixwedetailthegreedyiterativealgorithmpre-sentedin§3.2.Werstmotivateandproofhowthealgorithmconvergestotheoptimalsolution(A.1).Second,weshowhowtoeffectivelyprunethesearchspacemakingcomputationef-cientwhilekeepingconvergence(A.2).Finally,weanalyzeitsimplementation(A.3)andconvergencerequirements(A.4).A.1GreedyoptimizationThealgorithm(alg.2)iterativelyminimizestheriskbyad-justingqueuebounds,onequeueandonestepatatime,untilreachingconvergence.Ateachiteration,thealgorithmpre-dicts,foreveryqi,whethermovingtheboundbyone(ineitherdirection)decreasestheexpectedrisk,andmovestheboundinthedirectionofmaximumdecrease.Inthefollowing,wedis-cussrst,howthealgorithmcanpredicttheexpectedchangeinrisk,andsecond,whycheckingasinglestepissufcienttoconverge. Algorithm2Greedyoptimization Require:k:Stepsize,qqqinit:Initialbounds1:procedureADAPTATION2:D /03:qqq qqqinit.Initializebounds4:forallp:incomingpacketdo5:D D[frank(p)g.Collectsamples6:ifjDj=kthen.Adaptbounds7:P COMPUTERANKPROBABILITES(D)8:repeat9:qqq UPDATEMAPPING(qqq;P)10:untilqqqconverges11:D /0.Resetsamples12:functionUPDATEMAPPING(qqq;P)13:forqi2qqqdo14:D+ RISKFROMINCREMENT(qi;P)15:D� RISKFROMDECREMENT(qi;P)16:if(D+0)and(D+D�)then17:qi qi+118:elseif(D�0)and(D�D+)then19:qi qi�1returnqqq RiskdifferenceIn§3.2,wedemonstratedthattheriskcanbeanalyzedonaper-queuebasisfromthecostofmappingpacketswithdifferentrankstothesamequeue.Consequently,changesintheriskresultingfromchangingtheboundvectorqqqcanbeanalyzedbycomparingtheriskdifferenceinaffectedqueues.Tobeprecise,everychangeofasingleelementqiinqqqaffectstwoqueues,queueiandi�1,asranksareeithermovedfromitoi�1(increaseinqi)ormovedfromi�1toi(decreaseinqi).Theorem1Letr=qi,letQibethesetofranksmappedtoqueuei(beforeanychanges).Increasingqiby1changestheriskby:D+i=p(r)(år2Qi�1p(r)cost(r;r)�år2Qip(r)cost(r;r))(9)Letr=qi�1.Decreasingqiby1changestheriskby:D�i=p(r)(år2Qip(r)cost(r;r)�år2Qi�1p(r)cost(r;r))(10)ProofIncreasingqieffectivelyremovesthelowestrankfromqueuei,whichnowbecomesthehighestrankinqueuei�1.Asthenewhighestrankinqueuei�1,itcausespossibleinversionsandthereforeriskforallotherranksinqueuei�1,resultingintherst,positivetermineq.9.Conversely,asthelowestrankinqueue,itwaspronetoreceiveinversionsfromanyothereleme

15 ntinthequeue,supposingariskinqueueithati
ntinthequeue,supposingariskinqueueithatisremovedwiththechange.Thisriskreductionresultsinthesecond,negative,term.Theprooffordecreasingqiissymmetrical,withthemaindifferencethatnow,rankqi�1istheonechangingfromqueuei�1toqueuei.GreedystepBasedonthetheorypresented,thealgorithmcomputestheriskandeither(foreveryqi):(a)Doesnotmoveqi,ifneitherincrementingordecrement-ingreducestheexpectedrisk.(b)Incrementsqi,ifincrementingdecreasestheriskmorethandecrementing.(c)Decrementsqi,ifdecrementingdecreasestheriskmorethanincrementing.Thiseffectivelyprunesthesearchspace.Ateveryiteration,thealgorithmonlyrequiresaconstantamountofcompar-isons,anditdoesnotexploredirectionsfurtherincasetheyincreasetherisk.Inthefollowing,weshowwhydecidingnottoexploreadirectionfurtherafterasinglestepisreasonable.Theorem2LetD+iandD�idenotetheprospectivein-anddecreasesfromincrementing/decrementingqiby1.LetD++iandD��idenotethein-anddecreasesfromincrement-ing/decrementingqibymorethan1.Letthecostfunctionusedtocomputethedifferencesbenon-decreasinginjr�rjand0ifandonlyifr=r.Then:1.IfD+i�0,thenD++i�0.2.IfD�i�0,thenD��i�0. Proof1:IfD+i�0,år2Qi�1p(r)cost(r;r)�år2Qip(r)cost(r;r)(11)Letr=qi+1,i.e.thesecond-lowestrankinqueuei,whichwouldbemovedifwemovethequeueboundbymorethan1.Movingbothrandrwouldcausethefollowingchangeinrisk:D++i=(12)p(r)(år2Qi�1p(r)cost(r;r)�år2Qip(r)cost(r;r))+(13)p(r)(år2Qi�1p(r)cost(r;r)�år2Qip(r)cost(r;r))(14)Notethatwecanomitthecostbetweenrandrineq.14:asthecostfunctionisbydenitionsymmet-ric,theadditionalincreaseintheleft-handtermisex-actlyequalinmagnitudetotheadditionaldecreaseintheright-handterm,andthustheycanceleachother.Thusweomitthetermtonotclutterthenotation.Next,againbydenitionofthecostfunction,ifr�r�r,thencost(r;r)cost(r;r),andifr�r�r,thencost(r;r)cost(r;r).Additionally,wenotethattheorderofargumentsinthecostfunctiondoesnotmatter,asitissymmetrical.Appliedtotheriskofthelower-andhigher-priorityqueuerespectively(eq.14),thisgives:år2Qi�1p(r)cost(r;r)år2Qi�1p(r)cost(r;r)år2Qip(r)cost(r;r)år2Qip(r)cost(r;r)(15)Andinconclusion,thelefthandtermineq.14islargerthanthelefthandtermineq.13,andtherighthandtermineq.14issmallerthenthelefthandtermineq.13.Consequently,ifeq.13ispositive,eq.14mustalsobepositive(asprobabilitiesarealwayspositive),provingthatifonestepdoesincreasetherisks,twostepswillalsoincreasetherisk.Theexactsameprocedurecanberepeatedforlargerstepsizes,whichweomithere.2:Thisproofisconceptuallyidenticaltotheotherdirection,andwewillthusomitit.Theguidingprincipleisthesame:movingmorethanonerankcanonlycausehigherincreaseinriskinthequeuetheranksaremovedto,andlowerdecreaseinriskinthequeuetheranksaretakenfrom,comparedtothepreviousranks.Thus,ifalreadymovingonerankcausesahigherincreaseinriskinonequeuethandecreaseintheother,movingadditionalranksdoesnotchangethis. Figure13:Greedyconvergenceforuniformrankdistribution.ConclusionWehaveexplainedhowthegreedyalgorithmonlyrequiresexploringthedirectionwhichoffersapotentialdecreaseinrisk,andwehaveprovedhowtheriskdoesnotdecreasewiththedistancebetweenranks(itcannotbebettertohaveabiggerinversion,onlyequalorworse).Thisallowsthegreedyalgorithmtoquicklydecideifadirectionisnotworthinvestigating,effectivelypruningthesearchspace.A.2EfcientcomputationAstrackingthecompleterankdistributionateachiterationmightbetooexpensiveintermsofmemory,andrepeatingtheadaptationuntilconvergencetoocostlyintermsofcom-plexity,weshowinthefollowinglineshowthemathematicalformulationoftheproblemallowsasimpliedimplementa-tionwhichonlyrequires4countersperqueue.Fromtheempiricalprobabilitydenition,pD(r)=jrDj=jDj,wecanrewriteeq

16 .9andeq.10as:D+i=jqij jDj2(år2Qi�
.9andeq.10as:D+i=jqij jDj2(år2Qi�1jrjcost(qi;r)�år2Qijrjcost(r;qi))D�i=jqi�1j jDj2(år2Qijrjcost(qi�1;r)�år2Qi�1jrjcost(r;qi�1))(16)Sincethequeueboundqistaysconstantthroughouttheadaptationwindow,eachofthesummationsineq.16canbeimplementedthroughacounterwhichgetsupdatedeverytimeanewpacketarrives,withitscarriedrank.Notethatthenumberofcountersrequiredincreaseslinearlywiththenum-berofqueues.Also,observethatthecountersineq.16,onlyallowthecomputationofonestepinthegradient.However,thisisenoughsince,ascanbeseeninFig.13,theone-stepversionmanagestoconvergeinpractice. (a)Convergencevs.adaptationwindow (b)Convergencevs.numberofqueues (c)Convergencevs.rankrangeFigure14:Greedyalgorithmadaptationmicrobenchmark.A.3ImplementationrequirementsWiththecomputationpresentedinA.2,implementingthegradient-basedalgorithmontopofnpriorityqueues,requiresnregistersforqueue-boundstorageand(4n)registersforthegradientcomputation.Themappingprocess§2requirespacketstopotentiallyreadallthequeue-boundvalues(i.e.,forpacketsscheduledinthehighest-priorityqueue).Inthesamedirection,whilemostpacketsonlyneedtoupdatethetwocounterscorrespondingtotheirqueue,thekthpacketineachsequenceneedstoaccessallcounterstoperformtheadaptationdecision.Thissupposesbeingabletoreadn+(4n)differentregistersforasinglepacket(withoutevenconsideringtheupdates).Sinceexistingdevicesonlysupportupto12-16stages,withasingleregisteraccessperstage[14],theimplementationofthegreedyalgorithmisnotfeasibleforapracticalnumberofqueues(i.e.,n8).A.4ConvergenceanalysisWenowshowhowthegreedy-algorithmperformancevarieswhenmodifyingthethreemaindegreesoffreedom:(i)theadaptationwindow(i.e.,thenumberofpacketsthataremoni-toredbeforetheadaptationmechanismisexecuted);(ii)thenumberofqueuesavailableinthestrict-priorityscheme;and(iii)thenumberofranksinthedistribution.Forthat,weana-lyzetheunpifonessevolutionofasingleswitchrunningthegreedyalgorithmforauniformrankdistributionfrom0to100untilconvergence.Wecomputeunpifonessasspeciedin§3.1,basedonthepacketsscheduledandthequeueboundsusedduringtheadaptationwindow.EffectsofvaryingtheadaptationwindowFig.14ashowstheunpifonessevolutionwhenwerunthegreedyalgorithmontopofastrict-priorityschemeof8queues,andwevarytheadaptationwindowfrom50to7000packets.Weobservethat,forthealgorithmtoconverge,theadaptationwindowneedstobebroadenoughtocoveracompletesampleoftherankdistribution(i.e.,onethatcharacterizesallitsrepresentativebehaviors).Inourcase,anyadaptationwindowbelow100packetscannotcharacterizecompletelytherankdistribution.Indeed,Fig.14adepictshowthegreedyalgorithmcorrectlyconvergesassoonasmorethan200packetsaremonitoredperiteration.Ingeneral,thebroadertheadaptationwindow,themoreprecisetherankdistributionestimate,andthebettertheadaptationdecision.However,whileatoonarrowadaptationwindowcansupposemissingimportantinformationoftherankdistributionandbreakingconvergenceguarantees,atoobroadadaptationwindowcanmakethealgorithmtooslowtoconverge,negativelyimpactingtheperformance.Finally,thegreedyalgorithmonlyconvergesiftherankdistributionhasasmallervariabilitythantheadaptationrate(i.e.,therankdistributionisstableduringthetimeittakesforthealgorithmtoconverge).Relatingittothepreviouspoint,simplerrankdistributions,whichrequirenarroweradaptationwindows,canaffordhigherlevelsofvariability.Incontrast,complexdistributionswhichtakelongertoadaptandarerequiredtokeepstablelongerforthealgorithmtoconverge.EffectsofvaryingthenumberofqueuesFig.14bdepictsthecaseinwhichwexanadaptationwindowof1000pack-ets,andmodifythenumberofqueuesfrom8to32.Allqueueshaveaconstantsizeof10packets.Weseehowthehighernumberofqueuesthelowertheunpifoness,andthebetterthePIFOapproximation.Thisisexpectedsinceeachqueuecanbeperceivedasanopportunitytosortpacketswithdifferentranks,andthereforetoreducethenumberofinversions.Also,wecanseehowthenumberofiterationsrequiredbythealgo-rit

17 hmtoconvergedoesnotdirectlydependonthenu
hmtoconvergedoesnotdirectlydependonthenumberofqueues.Thisresultsfromthefactthateachadaptationdeci-sionanalyzes(and,ifrequired,updates)potentialredesignsforallthedifferentqueuebounds.EffectsofvaryingthenumberofranksFig.14cpresentstheeffectsofmodifyingtherangeoftheuniformrankdistri-butionfrom100to1000ranks,whenwexthenumberofqueuesto8andtheadaptationwindowto1000packets.Asexpected,underthesamenumberofqueues,ahighernumberofranksimpliesanincreaseinunpifoness.Also,astherankrangesgetclosertotheadaptationwindow,thedistributionestimatesgetworse,andtheadaptationgetstougher. BTheoreticalanalysisofSP-PIFOSP-PIFOisahighly-dynamicprobabilisticsystem.Inpartic-ular,itsqueueboundsqqqchangewithnearlyeveryincomingpacket.Nevertheless,inthissectionweshowthatthesystemhasanattractiveequilibriumqqq(B.1),howthisequilibriumbalancesthedifferentcausesofinversions(B.2),andwedis-cussthelimitationsandopenquestionofouranalysis(B.3).B.1StableequilibriumQueue-bounddynamicsConsiderSP-PIFOasadiscrete-timesystem,whereeachtimestepcorrespondstoanarrivingpacket.Letqqqtbethequeueboundsatstept,whenthet-thpacketarrives.Then,thequeueboundsatstept+1are:qqqt+1=qqqt+D(rt)(17)wherertistherankofthet-thpacket,andD(rt)isthechangethispacketcausesonthequeuebounds.Thequeue-boundchangeisgivenbythe“push-down”and“push-up”stagesofSP-PIFO,respectively.Ifthepacketcausesaninversioninthehighest-priorityqueue,allqueueboundsaredecreasedbyqt1�rt.Otherwise,thereisexactlyonequeueisuchthatqtirtqti+1,andonlyqiissettort,orequivalently,isincreasedbyrt�qti.Finally,letp(rt)betheprobabilityofrankrforthet-thpacket.Then,theexpectedvalueofthequeueboundsatstept+1,andtheexpecteddifferencetothequeueboundsatsteptare,respectively:3Eqt+1i=Eqti(18)+åqtirtqti+1p(rt)(rt�qti)| {z }D+i(qqqt;rt)(19)�årtqt1p(rt)(qt1�rt)| {z }D�(qqqt;rt)(20),Eqt+1i�qti=D+i(qqqt;rt)�D�(qqqt;rt)(21)EquilibriumAsexpected,wecanseefromeq.21thatthechangeofqueueboundsisdeterminedbythe“push-up”(D+i)and“push-down”(D�)stagesworkingagainsteachother.Indeed,ifD+iislargerthanD�,thequeueboundincreases,andviceversa.Thesystemhasanequilibriumqqq,whereD+i=D�andtheexpectedchangeis0.Notethatthisequilibriumdependsontherankprobability.AttractionTheequilibriumqqqisattractive,i.e.ifqtiqi,E[qt+1i�qti]&#x]TJ/;&#x 9.9;ئ ;&#xTf 1;�.44; 0 ;&#xTd [;0,andviceversa.Forsmallperturbations,thisisstraightforward.Assumethatallqueueboundsareinequi-librium,exceptqi.Ifqtiqi,thenD+i(qqqt;rt)&#x]TJ/;&#x 9.9;ئ ;&#xTf 1;�.51; 0 ;&#xTd [;D+i(qqq;rt), 3Forqueuei=n,thereisnoqti+1andthereisnoupperboundonrt.becausethesumineq.19has(i)more(non-negative)terms;and(ii)eachtermisweightedstronger,asthedifferencert�qtiislarger.Ontheotherhand,D�(qqqt;rt)iseitherequaltoD�(qqq;rt)(fori�1)orevensmaller(fori=1,asthereareless,andlesserweighted,termsinthesum20).Thus,theincreaseislargerthanthedecrease,andtheexpectedchangetoqiispositive.Theargumentforqti�qiissymmetrical.Forlargerdisturbances,theequilibriumisalsoattractive,butitmighttakemorethanasingletimestep,asthe“push-up”stageforqialsodependsonqi+1:ifbothqiqiandqi+1qi+1,the“push-up”mightbetooweaktopullqitowardstheequilibrium.However,thisisnotthecaseforthelowest-priorityqueueqn,forwhichthe“push-up”doesnotdependonanotherqueue.Thus,lower-priorityqueues(atleastqn)mightbepulledtowardstheequilibriumatrst,whileotherqiarenot.Noticethatanexpectedincreaseofqti+1increasesthe“push-up”mechanismforqt+1ianddecreasesitforqt+1i+1(eq.19).Eventually,asthelower-priorityqueueboundisgettingclosertotheequilibrium,thehigher-priorityqueueboundisalsopulledtowardstheequilibrium.Thiscontinuesuntilthehighest-priorityqueue,whereanexpectedincreaseofqt1alsoincreasesthe“push-down”mechanismf

18 orallboundsatstept+1(eq.20).Asaresult,ov
orallboundsatstept+1(eq.20).Asaresult,overmultipletimesteps,theexpectedeffectsofthe“push-up”and“push-down”stagesequalize,eventuallypullingallqitowardsqi.B.2BalanceAsexplainedin§4,therearethreemainreasonsforunpi-foness:(i)inversionsinthehighest-priorityqueue,afterwhichallqueueboundsaredecreased;(ii)inversionsinalower-priorityqueueafteritsqueueboundhasbeendecreased;(iii)inversionsinalower-priorityqueue,ifitshighestrank“over-takes”thelowestrankofahigher-priorityqueue.Aswecanseeineq.19,eq.20,andeq.21,allthesefactorsplayaroleinthedynamicsofSP-PIFO.Attheequilibrium,theprobabilityof“push-down”,whichisexactlytheprobabil-ityofaninversioninthehighest-priorityqueue(weightedbyitsseverity),isequalizedwiththeprobabilityofapacketbe-ingmappedtoanyotherqueue(againweighted,moreonthisbelow).Whilethisdoesnotdirectlycorrespondtoinversions,themorepacketsaremappedtolower-priorityqueues,thehigheristheprobabilityofaninversioninthosequeuesaftera“push-down”.SP-PIFOthuskeepsabalancebetweeninver-sions(i)and(ii),asdecreasing(i)wouldrequireastronger“push-down”,whichwouldthenincrease(ii),andviceversa.Finally,asmentionedabove,theranksinaqueueareweightedbyhowfartheyareawayfromthequeuebound(rt�qti).Thispenalizeslong(intermsofdistinctranks)queues,whichhelpstoreduce(iii),astheprobabilityforonequeue“overtaking”anotherincreasesthefurthertheactualqueueboundisfromthehighest-rankpacketinthequeue,whichincreaseswiththelengthofthequeue. B.3AssumptionsandlimitationsTheanalysispresentedaboveisbasedonafewassumptions,whichwearguearejustied,yetposesomeopenquestions.First,weassumethatthereexistsanitedistributionofranks.Thisisgiveninpractice.Sinceranksneedtobepro-cessedandstoredinhardware,whichoffersrestrictedre-sources,rankrangesmusthavealimitedsize.Second,althoughSP-PIFOcanrapidlyadapttovaryingrankdistributions(inparticularfasterthanthegreedyalgo-rithm),weassumethattherankdistributionisstableenoughsuchthatanequilibriumcanexistatall.However,itremainsanopenquestionwhetherthereisapointinwhichtherank-distributionvariationmightbetoofastforthesystemtoactu-allyconvergetoanequilibrium.Inthat(hypothetical)case,theanalysispresentedhereinwouldnotbeusefultoprovideanyadditionalinsightsontheperformanceofSP-PIFO.Finally,weassumethattheranksappearinrandomorder,independentlyfromeachother.Attherstglance,thismayseemirrational,asmanyschedulingalgorithmshavesomestructureinthewayhowranksareassignedtopacketsforagivenow.Nevertheless,inpracticalscenarios,manyowsarescheduledtogether,andeventhoughtheranksforindivid-ualowsmightbestructured,thecombinedranksofpacketsacrossowsbecomerandomized.AdversarialworkloadsBasedonthepreviousassump-tions,wehaveshownthatSP-PIFOisattractedtowardsanexpectedequilibrium,inwhichthedifferentsourcesofunpi-fonessarebalanced.However,therearealsosomelimitations.Ontheonehand,thisequilibriumexistsonlyinexpecta-tion,andthequeueboundsarealsoonlyattractedtoitinexpectation.Theactualqueueboundsdependontheorderinwhichpacketsarrive,asdoinversions.So,eventhoughonaverage,assumingarandomrankordering,thesystemmightbebalanced,thereexistparticularadversarialrankor-derings,which“outplay”thetwostagestocreateeventsoflargeunpifoness.Anadversarymightattempttoabusethisbycoordinatingalargenumberofowstoforceanadversarialorderingofpacketranks.Asanexample,shemighttrytoincreaseallqueueboundsasmuchaspossiblebeforetrigger-inga“push-down”reaction(e.g.,bygeneratingsequencesofmonotonically-increasingpacketranks).Withthesuddende-creaseinqueue-boundvalues,thehigh-rankpacketsmappedinthequeueswouldgenerateinversionstothenewpackets.Nevertheless,anynon-maliciouscoexistingowcaneasilythwartsuchstrategies,byjustrandomlybreakingtheadversar-ialorder.Still,itmightbeinterestingtoclassifyalladversarialorderings,andsubsequentlymonitorthenetworktoactivelydetectsuch

Related Contents


Next Show more