/
Congestion Avoidance and Control Van Jacobson Lawrence Berkeley Laboratory Michael J Congestion Avoidance and Control Van Jacobson Lawrence Berkeley Laboratory Michael J

Congestion Avoidance and Control Van Jacobson Lawrence Berkeley Laboratory Michael J - PDF document

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
462 views
Uploaded On 2014-12-18

Congestion Avoidance and Control Van Jacobson Lawrence Berkeley Laboratory Michael J - PPT Presentation

Karels University of California at Berkeley November 1988 Introduction Computer networks have experienced an explosive growth over the past few years and with that growth have come severe congestion problems For example it is now common to see inter ID: 26095

Karels University California

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Congestion Avoidance and Control Van Jac..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

CongestionAvoidanceandControlVanJacobsonLawrenceBerkeleyLaboratoryMichaelJ.KarelsUniversityofCaliforniaatBerkeleyNovember,1988IntroductionComputernetworkshaveexperiencedanexplosivegrowthoverthepastfewyearsandwiththatgrowthhavecomeseverecongestionproblems.Forexample,itisnowcommontoseeinternetgatewaysdrop10%oftheincomingpacketsbecauseoflocalbufferoverows.Ourinvestigationofsomeoftheseproblemshasshownthatmuchofthecauseliesintransportprotocolimplementations(intheprotocolsthemselves):The`obvious'waystoimplementawindow-basedtransportprotocolcanresultinexactlythewrongbehaviorinresponsetonetworkcongestion.Wegiveexamplesof`wrong'behavioranddescribesomesimplealgorithmsthatcanbeusedtomakerightthingshappen.Thealgorithmsarerootedintheideaofachievingnetworkstabilitybyforcingthetransportconnectiontoobeya`packetconservation'principle.Weshowhowthealgorithmsderivefromthisprincipleandwhateffecttheyhaveontrafcovercongestednetworks.InOctoberof'86,theInternethadtherstofwhatbecameaseriesof`congestioncol-lapses'.Duringthisperiod,thedatathroughputfromLBLtoUCBerkeley(sitesseparatedby400yardsandtwoIMPhops)droppedfrom32Kbpsto40bps.Wewerefascinatedbythissuddenfactor-of-thousanddropinbandwidthandembarkedonaninvestigationofwhythingshadgottensobad.Inparticular,wewonderedifthe4.3BSD(BerkeleyUNIXTCPwasmis-behavingorifitcouldbetunedtoworkbetterunderabysmalnetworkconditions.Theanswertobothofthesequestionswas“yes”. Note:ThisisaveryslightlyrevisedversionofapaperoriginallypresentedatSIGCOMM'88[12].Ifyouwishtoreferencethiswork,pleasecite[12].ThisworkwassupportedinpartbytheU.S.DepartmentofEnergyunderContractNumberDE-AC03-76SF00098.ThisworkwassupportedbytheU.S.DepartmentofCommerce,NationalBureauofStandards,underGrantNumber60NANB8D0830. 1GETTINGTOEQUILIBRIUM:SLOW-STARTSincethattime,wehaveputsevennewalgorithmsintothe4BSDTCPround-trip-timevarianceestimationexponentialretransmittimerbackoffslow-startmoreaggressivereceiverackpolicydynamicwindowsizingoncongestionKarn'sclampedretransmitbackofffastretransmitOurmeasurementsandthereportsofbetatesterssuggestthatthenalproductisfairlygoodatdealingwithcongestedconditionsontheInternet.Thispaperisabriefdescriptionof(andtherationalebehindthem.(isanalgo-rithmrecentlydevelopedbyPhilKarnofBellCommunicationsResearch,describedin[16].isdescribedinasoon-to-be-publishedRFC(ARPANET“RequestforComments”).Algorithms(springfromoneobservation:TheowonaTCPconnection(orISOTP-4orXeroxNSSPPconnection)shouldobeya`conservationofpackets'principle.And,ifthisprinciplewereobeyed,congestioncollapsewouldbecometheexceptionratherthantherule.Thuscongestioncontrolinvolvesndingplacesthatviolateconservationandxingthem.By`conservationofpackets'wemeanthatforaconnection`inequilibrium',i.e.,run-ningstablywithafullwindowofdataintransit,thepacketowiswhataphysicistwouldcall`conservative':Anewpacketisn'tputintothenetworkuntilanoldpacketleaves.ThephysicsofowpredictsthatsystemswiththispropertyshouldberobustinthefaceofObservationoftheInternetsuggeststhatitwasnotparticularlyrobust.Whythediscrepancy?Thereareonlythreewaysforpacketconservationtofail:1.Theconnectiondoesn'tgettoequilibrium,or2.Asenderinjectsanewpacketbeforeanoldpackethasexited,or3.Theequilibriumcan'tbereachedbecauseofresourcelimitsalongthepath.Inthefollowingsections,wetreateachoftheseinturn.1GettingtoEquilibrium:Slow-startFailure(1)hastobefromaconnectionthatiseitherstartingorrestartingafterapacketloss.Anotherwaytolookattheconservationpropertyistosaythatthesenderusesacksasa`clock'tostrobenewpacketsintothenetwork.Sincethereceivercangenerateacksno Aconservativeowmeansthatforanygiventime,theintegralofthepacketdensityaroundthesender–receiver–senderloopisaconstant.Sincepacketshaveto`diffuse'aroundthisloop,theintegralissufcientlycontinuoustobeaLyapunovfunctionforthesystem.AconstantfunctiontriviallymeetstheconditionsforLyapunovstabilitysothesystemisstableandanysuperpositionofsuchsystemsisstable.(See[3],chap.11–12or[21],chap.9forexcellentintroductionstosystemstabilitytheory.) 1GETTINGTOEQUILIBRIUM:SLOW-STARTFigure1:WindowFlowControl`Self-clocking' r r s b Sender Thisisaschematicrepresentationofasenderandreceiveronhighbandwidthnetworksconnectedbyaslower,long-haulnet.Thesenderisjuststartingandhasshippedawin-dow'sworthofpackets,back-to-back.Theackfortherstofthosepacketsisabouttoarrivebackatthesender(theverticallineatthemouthofthelowerleftfunnel).Theverticaldimensionisbandwidth,thehorizontaldimensionistime.Eachoftheshadedboxesisapacket.BandwidthTimeBitssotheareaofeachboxisthepacketsize.Thenumberofbitsdoesn'tchangeasapacketgoesthroughthenetworksoapacketsqueezedintothesmallerlong-haulbandwidthmustspreadoutintime.Thetimerepresentstheminimumpacketspacingontheslowestlinkinthepath(thebottleneck).Asthepacketsleavethebottleneckforthedestinationnet,nothingchangestheinter-packetintervalsoonthereceiver'snetpacketspacing.Ifthereceiverprocessingtimeisthesameforallpackets,thespacingbetweenacksonthereceiver'snet.Ifthetimeslotwasbigenoughforapacket,it'sbigenoughforanacksotheackspacingispreservedalongthereturnpath.Thustheackspacingonthesender'snetSo,ifpacketsaftertherstburstaresentonlyinresponsetoanack,thesender'spacketspacingwillexactlymatchthepackettimeontheslowestlinkinthepath.fasterthandatapacketscangetthroughthenetwork,theprotocolis`selfclocking'(g.1).Selfclockingsystemsautomaticallyadjusttobandwidthanddelayvariationsandhaveawidedynamicrange(importantconsideringthatTCPspansarangefrom800MbpsCraychannelsto1200bpspacketradiolinks).Butthesamethingthatmakesaself-clockedsystemstablewhenit'srunningmakesithardtostart—togetdataowingtheremustbeackstoclockoutpacketsbuttogetackstheremustbedataowing.Tostartthe`clock',wedevelopedaslow-startalgorithmtograduallyincreasetheamountofdatain-transit.Althoughweatterourselvesthatthedesignofthisalgorithmisrathersubtle,theimplementationistrivial—onenewstatevariableandthreelinesofcodeinthesender: Slow-startisquitesimilartotheCUTEalgorithmdescribedin[14].Wedidn'tknowaboutCUTEatthetimeweweredevelopingslow-startbutweshouldhave—CUTEprecededourworkbyseveralmonths.WhendescribingouralgorithmattheFeb.,1987,InternetEngineeringTaskForce(IETF)meeting,wecalled,areferencetoanelectronicsengineer'stechniquetolimitin-rushcurrent.ThenamewascoinedbyJohnNagleinamessagetotheIETFmailinglistinMarch,'87.Thisnamewasclearlysuperiortooursandwepromptlyadoptedit. 1GETTINGTOEQUILIBRIUM:SLOW-STARTFigure2:TheChronologyofaSlow-start 31 One Round Trip Time0R1R 5 One Packet Time Thehorizontaldirectionistime.Thecontinuoustimelinehasbeenchoppedintoone-round-trip-timepiecesstackedverticallywithincreasingtimegoingdownthepage.Thegrey,numberedboxesarepackets.Thewhitenumberedboxesarethecorrespondingacks.Aseachackarrives,twopacketsaregenerated:onefortheack(theacksaysapackethasleftthesystemsoanewpacketisaddedtotakeitsplace)andonebecauseanackopensthecongestionwindowbyonepacket.Itmaybeclearfromthegurewhyanadd-one-packet-to-windowpolicyopensthewindowexponentiallyintime.Ifthelocalnetismuchfasterthanthelonghaulnet,theack'stwopacketsarriveatthebottleneckatessentiallythesametime.Thesetwopacketsareshownstackedontopofoneanother(indicatingthatoneofthemwouldhavetooccupyspaceinthegateway'soutboundqueue).Thustheshort-termqueuedemandonthegatewayisincreasingexponentiallyandopeningawindowofsizepacketswillrequire2packetsofbuffercapacityatthebottleneck.Addacongestionwindow,cwnd,totheper-connectionstate.Whenstartingorrestartingafteraloss,setcwndtoonepacket.Oneachackfornewdata,increasecwndbyonepacket.Whensending,sendtheminimumofthereceiver'sadvertisedwindowandcwnd.Actually,theslow-startwindowincreaseisn'tthatslow:ittakestimewhereistheround-trip-timeandisthewindowsizeinpackets(g.2).Thismeansthewindowopensquicklyenoughtohaveanegligibleeffectonperformance,evenonlinkswithalargebandwidth–delayproduct.Andthealgorithmguaranteesthataconnectionwillsourcedataatarateatmosttwicethemaximumpossibleonthepath.Withoutslow-start,bycontrast,when10MbpsEthernethoststalkoverthe56KbpsArpanetviaIPgateways,the 2CONSERVATIONATEQUILIBRIUM:ROUND-TRIPTIMINGFigure3:StartupbehaviorofTCPwithoutSlow-start Send Time (sec)Packet Sequence Number (KB) 0246810 010203040506070 TracedataofthestartofaTCPconversationbetweentwoSun3/50srunningSun(the4.3BSDTCP).ThetwoSunswereondifferentEthernetsconnectedbyIPgatewaysdrivinga230.4Kbpspoint-to-pointlink(essentiallythesetupshowning.7).Thewin-dowsizefortheconnectionwas16KB(32512-bytepackets)andtherewere30packetsofbufferavailableatthebottleneckgateway.Theactualpathcontainssixstore-and-forwardhopssothepipeplusgatewayqueuehasenoughcapacityforafullwindowbutthegatewayqueuealonedoesnot.Eachdotisa512data-bytepacket.Thex-axisisthetimethepacketwassent.They-axisisthesequencenumberinthepacketheader.Thusaverticalarrayofdotsindicateback-to-backpacketsandtwodotswiththesameybutdifferentxindicatearetransmit.`Desirable'behavioronthisgraphwouldbearelativelysmoothlineofdotsextendingdiagonallyfromthelowerlefttotheupperright.Theslopeofthislinewouldequaltheavailablebandwidth.Nothinginthistraceresemblesdesirablebehavior.Thedashedlineshowsthe20KBpsbandwidthavailableforthisconnection.Only35%ofthisbandwidthwasused;therestwaswastedonretransmits.Almosteverythingisretransmittedatleastonceanddatafrom54to58KBissentvetimes.rst-hopgatewayseesaburstofeightpacketsdeliveredat200timesthepathbandwidth.Thisburstofpacketsoftenputstheconnectionintoapersistentfailuremodeofcontinuousretransmissions(gures3and4).2Conservationatequilibrium:round-triptimingOncedataisowingreliably,problems(2)and(3)shouldbeaddressed.Assumingthattheprotocolimplementationiscorrect,(2)mustrepresentafailureofsender'sretransmittimer.Agoodroundtriptimeestimator,thecoreoftheretransmittimer,isthesinglemost 2CONSERVATIONATEQUILIBRIUM:ROUND-TRIPTIMINGFigure4:StartupbehaviorofTCPwithSlow-start Send Time (sec)Packet Sequence Number (KB) 0246810 020406080100120140160 Sameconditionsasthepreviousgure(sametimeofday,sameSuns,samenetworkpath,samebufferandwindowsizes),exceptthemachineswererunningthe4TCPwithslow-start.Nobandwidthiswastedonretransmitsbuttwosecondsisspentontheslow-startsotheeffectivebandwidthofthispartofthetraceis16KBps—twotimesbetterthangure3.(Thisisslightlymisleading:Unlikethepreviousgure,theslopeofthetraceis20KBpsandtheeffectofthe2secondoffsetdecreasesasthetracelengthens.E.g.,ifthistracehadrunaminute,theeffectivebandwidthwouldhavebeen19KBps.Theeffectivebandwidthwithoutslow-startstaysat7KBpsnomatterhowlongthetrace.)importantfeatureofanyprotocolimplementationthatexpectstosurviveheavyload.Anditisfrequentlybotched([26]and[13]describetypicalproblems).Onemistakeisnotestimatingthevariation,,oftheroundtriptime,.Fromqueuingtheoryweknowthatandthevariationinincreasequicklywithload.Iftheloadis(theratioofaveragearrivalratetoaveragedeparturerate),scalelikeTomakethisconcrete,ifthenetworkisrunningat75%ofcapacity,astheArpanetwasinlastApril'scollapse,oneshouldexpectround-trip-timetovarybyafactorofsixteen(TheTCPprotocolspecication[2]suggestsestimatingmeanroundtriptimeviathelow-passlterwhereistheaverageRTTestimate,isaroundtriptimemeasurementfromthemostrecentlyackeddatapacket,andisaltergainconstantwithasuggestedvalueof0.9.Oncetheestimateisupdated,theretransmittimeoutinterval,rto,forthenextpacketsentissetto 2CONSERVATIONATEQUILIBRIUM:ROUND-TRIPTIMINGFigure5:PerformanceofanRFC793retransmittimer RTT (sec.) 0102030405060708090100110 024681012 Tracedatashowingper-packetroundtriptimeonawell-behavedArpanetconnection.Thex-axisisthepacketnumber(packetswerenumberedsequentially,startingwithone)andthey-axisistheelapsedtimefromthesendofthepackettothesender'sreceiptofitsack.Duringthisportionofthetrace,nopacketsweredroppedorretransmitted.Thepacketsareindicatedbyadot.Adashedlineconnectsthemtomakethesequenceeas-iertofollow.ThesolidlineshowsthebehaviorofaretransmittimercomputedaccordingtotherulesofRFC793.TheparameteraccountsforRTTvariation(see[5],section5).Thesuggestedcanadapttoloadsofatmost30%.Abovethispoint,aconnectionwillrespondtoloadincreasesbyretransmittingpacketsthathaveonlybeendelayedintransit.Thisforcesthenetworktodouselesswork,wastingbandwidthonduplicatesofpacketsthatwilleventuallybedelivered,atatimewhenit'sknowntobehavingtroublewithusefulwork.I.e.,thisisthenetworkequivalentofpouringgasolineonare.Wedevelopedacheapmethodforestimatingvariation(seeappendixA)andthere-sultingretransmittimeressentiallyeliminatesspuriousretransmissions.Apleasantsideeffectofestimatingratherthanusingaxedvalueisthatlowloadaswellashighloadperformanceimproves,particularlyoverhighdelaypathssuchassatellitelinks(gures5and6).Anothertimermistakeisinthebackoffafteraretransmit:Ifapackethastoberetrans-mittedmorethanonce,howshouldtheretransmitsbespaced?Foratransportendpointembeddedinanetworkofunknowntopologyandwithanunknown,unknowableandcon-stantlychangingpopulationofcompetingconversations,onlyoneschemehasanyhopeofworking—exponentialbackoff—butaproofofthisisbeyondthescopeofthispaper. Wearefarfromthersttorecognizethattransportneedstoestimatebothmeanandvariation.See,forexample,[6].Butwedothinkourestimatorissimplerthanmost.See[8].Severalauthorshaveshownthatbackoffs`slower'thanexponentialarestablegivennitepopula-tionsandknowledgeoftheglobaltrafc.However,[17]showsthatnothingslowerthanexponentialbehaviorwillworkinthegeneralcase.Tofeedyourintuition,considerthatanIPgatewayhasessentiallythesamebehaviorasthe`ether'inanALOHAnetorEthernet.Justifyingexponentialretransmitbackoffisthesameas 3ADAPTINGTOTHEPATH:CONGESTIONAVOIDANCEFigure6:PerformanceofaMean+Varianceretransmittimer RTT (sec.) 0102030405060708090100110 024681012 SamedataasabovebutthesolidlineshowsaretransmittimercomputedaccordingtothealgorithminappendixA.Tonesseaproof,notethatanetworkis,toaverygoodapproximation,alinearsystem.Thatis,itiscomposedofelementsthatbehavelikelinearoperators—integrators,delays,gainstages,etc.Linearsystemtheorysaysthatifasystemisstable,thestabilityisexpo-nential.Thissuggeststhatanunstablesystem(anetworksubjecttorandomloadshocksandpronetocongestivecollapse)canbestabilizedbyaddingsomeexponentialdamping(exponentialtimerbackoff)toitsprimaryexcitation(senders,trafcsources).3Adaptingtothepath:congestionavoidanceIfthetimersareingoodshape,itispossibletostatewithsomecondencethatatimeoutin-dicatesalostpacketandnotabrokentimer.Atthispoint,somethingcanbedoneabout(3).Packetsgetlostfortworeasons:theyaredamagedintransit,orthenetworkiscongestedandsomewhereonthepaththerewasinsufcientbuffercapacity.Onmostnetworkpaths,lossduetodamageisrare(1%)soitisprobablethatapacketlossisduetocongestioninthenetwork. showingthatnocollisionbackoffslowerthananexponentialwillguaranteestabilityonanEthernet.Unfortu-nately,withaninniteuserpopulationevenexponentialbackoffwon'tguaranteestability(althoughit`almost'does—see[1]).Fortunately,wedon't(yet)havetodealwithaninniteuserpopulation.Thephrasecongestioncollapse(describingapositivefeedbackinstabilityduetopoorretransmittimers)isagainthecoinageofJohnNagle,thistimefrom[23].Becauseapacketlossemptiesthewindow,thethroughputofanywindowowcontrolprotocolisquitesensitivetodamageloss.ForanRFC793standardTCPrunningwithwindow(whereisatmostthebandwidth-delayproduct),alossprobabilityofdegradesthroughputbyafactorof.E.g.,a1%damagelossrateonanArpanetpath(8packetwindow)degradesTCPthroughputby14%.Thecongestioncontrolschemeweproposeisinsensitivetodamagelossuntilthelossrateisontheorderofthewindowequilibrationlength(thenumberofpacketsittakesthewindowtoregainitsoriginalsizeafteraloss).Ifthepre-losssizeis,equilibrationtakesroughly3packetsso,fortheArpanet,thelosssensitivity 3ADAPTINGTOTHEPATH:CONGESTIONAVOIDANCEA`congestionavoidance'strategy,suchastheoneproposedin[15],willhavetwocomponents:Thenetworkmustbeabletosignalthetransportendpointsthatcongestionisoccurring(orabouttooccur).Andtheendpointsmusthaveapolicythatdecreasesutilizationifthissignalisreceivedandincreasesutilizationifthesignalisn'treceived.Ifpacketlossis(almost)alwaysduetocongestionandifatimeoutis(almost)alwaysduetoalostpacket,wehaveagoodcandidateforthe`networkiscongested'signal.Partic-ularlysincethissignalisdeliveredautomaticallybyallexistingnetworks,withoutspecialmodication(asopposedto[15]whichrequiresanewbitinthepacketheadersandamod-icationtoexistinggatewaystosetthisbit).Theotherpartofacongestionavoidancestrategy,theendnodeaction,isalmostidenticalintheDECschemeandourTCPandfollowsdirectlyfromarst-ordertime-seriesmodelofthenetwork:Saynetworkloadismeasuredbyaveragequeuelengthoverxedintervalsofsomeappropriatelength(somethingneartheroundtriptime).Ifistheloadatinterval,anuncongestednetworkcanbemodeledbysayingchangesslowlycomparedtothesamplingtime.I.e.,constant).Ifthenetworkissubjecttocongestion,thiszerothordermodelbreaksdown.Theaveragequeuelengthbecomesthesumoftwoterms,theabovethataccountsfortheaveragearrivalrateofnewtrafcandintrinsicdelay,andanewtermthataccountsforthefractionoftrafcleftoverfromthelasttimeintervalandtheeffectofthisleft-overtrafc(e.g.,inducedretransmits):(ThesearethersttwotermsinaTaylorseriesexpansionof.Thereisreasontobelieveonemighteventuallyneedathreeterm,secondordermodel,butnotuntiltheInternethasgrownsubstantially.)Whenthenetworkiscongested,mustbelargeandthequeuelengthswillstartin-creasingexponentially.Thesystemwillstabilizeonlyifthetrafcsourcesthrottlebackatleastasquicklyasthequeuesaregrowing.Sinceasourcecontrolsloadinawindow-basedprotocolbyadjustingthesizeofthewindow,,weendupwiththesenderpolicyOncongestionI.e.,amultiplicativedecreaseofthewindowsize(whichbecomesanexponentialdecreaseovertimeifthecongestionpersists). thresholdisabout5%.Atthishighlossrate,theemptywindoweffectdescribedabovehasalreadydegradedthroughputby44%andtheadditionaldegradationfromthecongestionavoidancewindowshrinkingistheleastofone'sproblems.Weareconcernedthatthecongestioncontrolnoisesensitivityisquadraticinbutitwilltakeatleastanothergenerationofnetworkevolutiontoreachwindowsizeswherethiswillbesignicant.Ifexperienceshowsthissensitivitytobealiability,atrivialmodicationtothealgorithmmakesitlinearin.Anin-progresspaperexploresthissubjectindetail.Thisisnotanaccident:WecopiedJain'sschemeafterhearinghispresentationat[10]andrealizingthattheschemewas,inasense,universal.Seeanygoodcontroltheorytextfortherelationshipbetweenasystemmodelandadmissiblecontrolsforthatsystem.Aniceintroductionappearsin[21],chap.8.I.e.,thesystembehaveslike,adifferenceequationwiththesolutionwhichgoesexponentiallytoinnityforany 3ADAPTINGTOTHEPATH:CONGESTIONAVOIDANCEIfthere'snocongestion,mustbenearzeroandtheloadapproximatelyconstant.Thenetworkannounces,viaadroppedpacket,whendemandisexcessivebutsaysnothingifaconnectionisusinglessthanitsfairshare(sincethenetworkisstateless,itcannotknowthis).Thusaconnectionhastoincreaseitsbandwidthutilizationtondoutthecurrentlimit.E.g.,youcouldhavebeensharingthepathwithsomeoneelseandconvergedtoawindowthatgivesyoueachhalftheavailablebandwidth.Ifsheshutsdown,50%ofthebandwidthwillbewastedunlessyourwindowsizeisincreased.Whatshouldtheincreasepolicybe?Therstthoughtistouseasymmetric,multiplicativeincrease,possiblywithalongertimeconstant,.Thisisamistake.Theresultwilloscillatewildlyand,ontheaverage,deliverpoorthroughput.Theanalyticreasonforthishastodowiththatfactthatitiseasytodrivethenetintosaturationbuthardforthenettorecover(what[18],chap.2.1,callstherush-houreffectThusoverestimatingtheavailablebandwidthiscostly.Butanexponential,almostregardlessofitstimeconstant,increasessoquicklythatoverestimatesareinevitable.Withoutjustication,we'llstatethatthebestincreasepolicyistomakesmall,constantchangestothewindowsize:Onnocongestionwhereisthe(thedelay-bandwidthproductofthepathminusprotocolover-head—i.e.,thelargestsensiblewindowfortheunloadedpath).Thisistheadditiveincrease/multiplicativedecreasepolicysuggestedin[15]andthepolicywe'veimplementedinTCPTheonlydifferencebetweenthetwoimplementationsisthechoiceofconstantsforWeused0.5and1forreasonspartiallyexplainedinappendixD.Amorecompleteanalysisisinyetanotherin-progresspaper.Theprecedinghasprobablymadethecongestioncontrolalgorithmsoundhairybutit'snot.Likeslow-start,it'sthreelinesofcode:Onanytimeout,setcwndtohalfthecurrentwindowsize(thisisthemultiplicativeOneachackfornewdata,increasecwndby1/cwnd(thisistheadditiveincrease). Ing.1,notethatthe`pipesize'is16packets,8ineachpath,butthesenderisusingawindowof22packets.Thesixexcesspacketswillformaqueueattheentrytothebottleneckandthatqueuecannotshrinkeventhoughthesendercarefullyclocksoutpacketsatthebottlenecklinkrate.Thisstablequeueisanother,unfortunate,aspectofconservation:Thequeuewouldshrinkonlyifthegatewaycouldmovepacketsintotheskinnypipefasterthanthesenderdumpedpacketsintothefatpipe.Butthesystemtunesitselfsoeachtimethegatewaypullsapacketoffthefrontofitsqueue,thesenderlaysanewpacketontheend.Agatewayneedsexcessoutputcapacity(i.e.,1)todissipateaqueueandtheclearingtimewillscalelike([18],chap.2isanexcellentdiscussionofthis).Sinceatequilibriumourtransportconnection`wants'torunthebottlenecklinkat100%(1),wehavetobesurethatduringthenon-equilibriumwindowadjustment,ourcontrolpolicyallowsthegatewayenoughfreebandwidthtodissipatequeuesthatinevitablyformduetopathtestingandtrafcuctuations.Byanargumentsimilartotheoneusedtoshowexponentialtimerbackoffisnecessary,it'spossibletoshowthatanexponential(multiplicative)windowincreasepolicywillbe`faster'thanthedissipationtimeforsometrafcmixand,thus,leadstoanunboundedgrowthofthebottleneckqueue.See[4]foracompleteanalysisoftheseincreaseanddecreasepolicies.Alsosee[8]and[9]foracontrol-theoreticanalysisofasimilarclassofcontrolpolicies.Thisincrementrulemaybelessthanobvious.Wewanttoincreasethewindowbyatmostonepacketoveratimeintervaloflength(theroundtriptime).Tomakethealgorithm`self-clocked',it'sbettertoincrementbyasmallamountoneachackratherthanbyalargeamountattheendoftheinterval.(Assuming,ofcourse, 4THEGATEWAYSIDEOFCONGESTIONCONTROLWhensending,sendtheminimumofthereceiver'sadvertisedwindowandcwndNotethatthisalgorithmisonlycongestionavoidance,itdoesn'tincludethepreviouslydescribedslow-start.Sincethepacketlossthatsignalscongestionwillresultinare-start,itwillalmostcertainlybenecessarytoslow-startinadditiontotheabove.But,becausebothcongestionavoidanceandslow-startaretriggeredbyatimeoutandbothmanipulatethecongestionwindow,theyarefrequentlyconfused.Theyareactuallyindependentalgorithmswithcompletelydifferentobjectives.Toemphasizethedifference,thetwoalgorithmshavebeenpresentedseparatelyeventhoughinpractisetheyshouldbeimplementedtogether.AppendixBdescribesacombinedslow-start/congestionavoidancealgorithm.Figures7through12showthebehaviorofTCPconnectionswithandwithoutcongestionavoidance.Althoughthetestconditions(e.g.,16KBwindows)weredeliberatelychosentostimulatecongestion,thetestscenarioisn'tfarfromcommonpractice:TheArpanetIMPend-to-endprotocolallowsatmosteightpacketsintransitbetweenanypairofgateways.Thedefault4.3BSDwindowsizeiseightpackets(4KB).Thussimultaneousconversationsbetween,say,anytwohostsatBerkeleyandanytwohostsatMITwouldexceedthenetworkcapacityoftheMITIMPpathandwouldleadtothetypeofbehaviorshown.4Futurework:thegatewaysideofcongestioncontrolWhilealgorithmsatthetransportendpointscaninsurethenetworkcapacityisn'texceeded,theycannotinsurefairsharingofthatcapacity.Onlyingateways,attheconvergenceofows,isthereenoughinformationtocontrolsharingandfairallocation.Thus,weviewthegateway`congestiondetection'algorithmasthenextbigstep. thesenderhaseffectivesillywindowavoidance(see[5],section3)anddoesn'tattempttosendpacketfragmentsbecauseofthefractionallysizedwindow.)Awindowofsizepacketswillgenerateatmostacksin.Thusanincrementof1/perackwillincreasethewindowbyatmostonepacketinone.InTCPwindowsandpacketsizesareinbytessotheincrementtranslatestomaxseg*maxseg/cwndwheremaxsegisthemaximumsegmentsizeandisexpressedinbytes,notpackets.Wehavealsodevelopedarate-basedvariantofthecongestionavoidancealgorithmtoapplytoconnection-lesstrafc(e.g.,domainserverqueries,RPCrequests).Rememberingthatthegoaloftheincreaseanddecreasepoliciesisbandwidthadjustment,andthat`time'(thecontrolledparameterinarate-basedscheme)appearsinthedenominatorofbandwidth,thealgorithmfollowsimmediately:Themultiplicativedecreaseremainsamul-tiplicativedecrease(e.g.,doubletheintervalbetweenpackets).Butsubtractingaconstantamountfromintervalresultinanadditiveincreaseinbandwidth.Thisapproachhasbeentried,e.g.,[19]and[24],andappearstooscillatebadly.Toseewhy,notethatforaninter-packetintervalanddecrement,thebandwidthchangeofadecrease-interval-by-constantpolicyis I!1 anon-linear,anddestablizing,increase.Anupdatepolicythatdoesresultinalinearincreaseofbandwidthovertimeis whereistheintervalbetweensendswhenthethpacketissentandisthedesiredrateofincreaseinpacketsperpacket/sec.Wehavesimulatedtheabovealgorithmanditappearstoperformwell.Totestthepredictionsofthatsimula-tionagainstreality,wehaveacooperativeprojectwithSunMicrosystemstoprototypeRPCdynamiccongestioncontrolalgorithmsusingasatest-bed(sinceisknowntohavecongestionproblemsyetitwouldbedesirabletohaveitworkoverthesamerangeofnetworksasTCPdidlead. 4THEGATEWAYSIDEOFCONGESTIONCONTROLFigure7:Multipleconversationtestsetup (sun 3/50) (sun 3/50) (sun 3/50) (vax 750) (vax 8600) (vax 750) (CCI) (sun 3/50) 230.4 KbsMicrowave 10 Mbs Ethernets Testsetuptoexaminetheinteractionofmultiple,simultaneousTCPconversationssharingabottlenecklink.1MBytetransfers(2048512-data-bytepackets)wereinitiated3secondsapartfromfourmachinesatLBLtofourmachinesatUCB,oneconversationpermachinepair(thedottedlinesaboveshowthepairing).Alltrafcwentviaa230.4KbpslinkconnectingIProutercsamatLBLtoIProuteratUCB.Themicrowavelinkqueuecanholdupto50packets.Eachconnectionwasgivenawindowof16KB(32512-bytepackets).Thusanytwoconnectionscouldoverowtheavailablebufferingandthefourconnectionsexceededthequeuecapacityby160%.Thegoalofthisalgorithmtosendasignaltotheendnodesasearlyaspossible,butnotsoearlythatthegatewaybecomesstarvedfortrafc.Sinceweplantocontinueusingpacketdropsasacongestionsignal,gateway`selfprotection'fromamis-behavinghostshouldfall-outforfree:Thathostwillsimplyhavemostofitspacketsdroppedasthegate-waytrystotellitthatit'susingmorethanitsfairshare.Thus,liketheendnodealgorithm,thegatewayalgorithmshouldreducecongestionevenifnoendnodeismodiedtodocon-gestionavoidance.Andnodesthatdoimplementcongestionavoidancewillgettheirfairshareofbandwidthandaminimumnumberofpacketdrops.Sincecongestiongrowsexponentially,detectingitearlyisimportant.Ifdetectedearly,smalladjustmentstothesenders'windowswillcureit.Otherwisemassiveadjustmentsarenecessarytogivethenetenoughsparecapacitytopumpoutthebacklog.But,giventheburstynatureoftrafc,reliabledetectionisanon-trivialproblem.Jain[15]proposesaschemebasedonaveragingbetweenqueueregenerationpoints.Thisshouldyieldgoodburstlteringbutwethinkitmighthaveconvergenceproblemsunderhighloadorsig-nicantsecond-orderdynamicsinthetrafc.Weplantousesomeofourearlierworkmodelsforround-trip-time/queuelengthpredictionasthebasisofdetection. Theseproblemsstemfromthefactthattheaveragetimebetweenregenerationpointsscaleslikeandthevariancelike(seeFeller[7],chap.VI.9).Thusthecongestiondetectorbecomessluggishascongestionincreasesanditssignal-to-noiseratiodecreasesdramatically. 4THEGATEWAYSIDEOFCONGESTIONCONTROLFigure8:Multiple,simultaneousTCPswithnocongestionavoidance Time (sec)Sequence Number (KB) 050100150200 020040060080010001200 TracedatafromfoursimultaneousTCPconversationswithoutcongestionavoidanceoverthepathsshowningure7.4,000of11,000packetssentwereretransmissions(i.e.,halfthedatapacketswereretransmitted).Sincethelinkdatabandwidthis25KBps,eachofthefourconversationsshouldhavereceived6KBps.Instead,oneconversationgot8KBps,twogot5KBps,onegot0.5KBpsand6KBpshasvanished.Preliminaryresultssuggestthatthisapproachworkswellathighload,isimmunetosecond-ordereffectsinthetrafcandiscomputationallycheapenoughtonotslowdownkilopacket-per-secondgateways.AcknowledgementsWearegratefultothemembersoftheInternetActivityBoard'sEnd-to-EndandInternet-Engineeringtaskforcesforthispastyear'sinterest,encouragement,cogentquestionsandnetworkinsights.BobBradenofISIandCraigPartridgeofBBNwereparticularlyhelpfulinthepreparationofthispaper:theircarefulreadingofearlydraftsimproveditimmensely.TherstauthorisalsodeeplyindebttoJeffMogulofDECWesternResearchLab.WithoutJeff'sinterestandpatientprodding,thispaperwouldneverhaveexisted. 4THEGATEWAYSIDEOFCONGESTIONCONTROLFigure9:Multiple,simultaneousTCPswithcongestionavoidance Time (sec)Sequence Number (KB) 050100150200 020040060080010001200 TracedatafromfoursimultaneousTCPconversationsusingcongestionavoidanceoverthepathsshowningure7.89of8281packetssentwereretransmissions(i.e.,1%ofthedatapacketshadtoberetransmitted).Twooftheconversationsgot8KBpsandtwogot4.5KBps(i.e.,allthelinkbandwidthisaccountedfor—seeg.11).Thedifferencebetweenthehighandlowbandwidthsenderswasduetothereceivers.The4.5KBpssendersweretalkingto4.3receiverswhichwoulddelayanackuntil35%ofthewindowwaslledor200mshadpassed(i.e.,anackwasdelayedfor5–7packetsontheaverage).Thismeantthesenderwoulddeliverburstsof5–7packetsoneachack.The8KBpssendersweretalkingto4.3receiverswhichwoulddelayanackforatmostonepacket(becauseofanack's`clock'rˆole,theauthorsbelievethattheminimumackfrequencyshouldbeeveryotherpacket).I.e.,thesenderwoulddeliverburstsofatmosttwopackets.Theprobabilityoflossincreasesrapidlywithburstsizesosenderstalkingtoold-stylereceiverssawthreetimesthelossrate(1.8%vs.0.5%).Thehigherlossratemeantmoretimespentinretransmitwaitand,becauseofthecongestionavoidance,smalleraveragewindowsizes. 4THEGATEWAYSIDEOFCONGESTIONCONTROLFigure10:TotalbandwidthusedbyoldandnewTCPs Time (sec)Relative Bandwidth 020406080100120 0.81.01.21.41.6 Thethinlineshowsthetotalbandwidthusedbythefoursenderswithoutcongestionavoidance(g.8),averagedover5secondintervalsandnormalizedtothe25KBpslinkbandwidth.Notethatthesenderssend,ontheaverage,25%morethanwilltinthewire.Thethicklineisthesamedataforthesenderswithcongestionavoidance(g.9).Therst5secondintervalislow(becauseoftheslow-start),thenthereisabout20secondsofdampedoscillationasthecongestioncontrol`regulator'foreachTCPndsthecorrectwindowsize.Theremainingtimethesendersrunatthewirebandwidth.(Theactivityaround110secondsisabandwidth`re-negotiation'duetoconnectiononeshuttingdown.Theactivityaround80secondsisareectionofthe`atspot'ing.9wheremostofconversationtwo'sbandwidthissuddenlyshiftedtoconversationsthreeandfour—com-petingconversationsfrequentlyexhibitthistypeof`punctuatedequilibrium'behaviorandwehopetoinvestigateitsdynamicsinafuturepaper.) 4THEGATEWAYSIDEOFCONGESTIONCONTROLFigure11:EffectivebandwidthofoldandnewTCPs Time (sec)Relative Bandwidth 020406080100120 0.50.60.70.80.91.01.11.2 Figure10showedtheoldTCPswereusing25%morethanthebottlenecklinkbandwidth.Thus,oncethebottleneckqueuelled,25%ofthethesenders'packetswerebeingdis-carded.Ifthediscards,andonlythediscards,wereretransmitted,thesenderswouldhavereceivedthefull25KBpslinkbandwidth(i.e.,theirbehaviorwouldhavebeenanti-socialbutnotself-destructive).Butg.8notedthataround25%ofthelinkbandwidthwasun-accountedfor.Hereweaveragethetotalamountofdataackedpervesecondinterval.(Thisgivestheeffectivedeliveredbandwidthofthelink.)ThethinlineisonceagaintheoldTCPs.Notethatonly75%ofthelinkbandwidthisbeingusedfordata(theremaindermusthavebeenusedbyretransmissionsofpacketsthatdidn'tneedtoberetransmitted).ThethicklineshowsdeliveredbandwidthforthenewTCPs.Thereisthesameslow-startandturn-ontransientfollowedbyalongperiodofoperationrightatthelinkbandwidth. 4THEGATEWAYSIDEOFCONGESTIONCONTROLFigure12:Windowadjustmentdetail Time (sec)Relative Bandwidth 020406080 0.40.60.81.01.21.4 Becauseofthevesecondaveragingtime(neededtosmoothoutthespikesintheoldTCPdata),thecongestionavoidancewindowpolicyisdifculttomakeoutingures10and11.Hereweshoweffectivethroughput(dataacked)forTCPswithcongestioncontrol,averagedoverathreesecondinterval.Whenapacketisdropped,thesendersendsuntilitllsthewindow,thenstopsuntiltheretransmissiontimeout.Sincethereceivercannotackdatabeyondthedroppedpacket,onthisplotwe'dexpecttoseeanegative-goingspikewhoseamplitudeequalsthesender'swindowsize(minusonepacket).Iftheretransmithappensinthenextinterval(thein-tervalswerechosentomatchtheretransmittimeout),we'dexpecttoseeapositive-goingspikeofthesameamplitudewhenreceiverackstheout-of-orderdataitcached.Thustheheightofthesespikesisadirectmeasureofthesender'swindowsize.Thedataclearlyshowsthreeoftheseevents(at15,33and57seconds)andthewindowsizeappearstobedecreasingexponentially.Thedottedlineisaleastsquaresttothesixwindowsizemeasurementsobtainedfromtheseevents.Thettimeconstantwas28seconds.(Thelongtimeconstantisduetolackofacongestionavoidancealgorithminthegateway.Witha`drop'algorithmrunninginthegateway,thetimeconstantshouldbearound4seconds) AAFASTALGORITHMFORRTTMEANANDVARIATIONAAfastalgorithmforrttmeanandvariationA.1TheoryTheRFC793algorithmforestimatingthemeanroundtriptimeisoneofthesimplestexamplesofaclassofestimatorscalledrecursivepredictionerrorstochasticgradientalgorithms.Inthepast20yearsthesealgorithmshaverevolutionizedestimationandcon-troltheory[20]andit'sprobablyworthlookingattheRFC793estimatorinsomedetail.GivenanewmeasurementoftheRTT(roundtriptime),TCPupdatesanestimateoftheaverageRTTwhereisa`gain'(01)thatshouldberelatedtothesignal-to-noiseratio(or,equiv-alently,variance)of.Thismakesamoresense,andcomputesfaster,ifwerearrangeandcollecttermsmultipliedbytogetThinkofasapredictionofthenextmeasurement.istheerrorinthatpredictionandtheexpressionabovesayswemakeanewpredictionbasedontheoldpredictionplussomefractionofthepredictionerror.Thepredictionerroristhesumoftwocomponents:(1)errordueto`noise'inthemeasurement(random,unpredictableeffectslikeuctuationsincompetingtrafc)and(2)errorduetoabadchoiceof.CallingtherandomerrortheestimationerrorThetermgivesakickintherightdirectionwhilethetermgivesitakickinarandomdirection.Overanumberofsamples,therandomkickscanceleachotheroutsothisalgorithmtendstoconvergetothecorrectaverage.Butrepresentsacompromise:WewantalargetogetmileageoutofbutasmalltominimizethedamagefromSincethetermsmovetowardtherealaveragenomatterwhatvalueweusefor,it'salmostalwaysbettertouseagainthat'stoosmallratherthanonethat'stoolarge.Typicalgainchoicesare0.1–0.2(thoughit'sagoodideatotakelonglookatyourrawdatabeforepickingagain).It'sprobablyobviousthatwilloscillaterandomlyaroundthetrueaverageandthestandarddeviationofwillbesdev.Alsothatconvergestothetrueaverageexpo-nentiallywithtimeconstant1.Soasmallergivesastablerattheexpenseoftakingamuchlongertimetogettothetrueaverage.Ifwewantsomemeasureofthevariationin,saytocomputeagoodvaluefortheTCPretransmittimer,thereareseveralalternatives.Variance,,istheconventionalchoicebecauseithassomenicemathematicalproperties.Butcomputingvariancerequiressquar-soanestimatorforitwillcontainamultiplywithadangerofintegeroverow.Also,mostapplicationswillwantvariationinthesameunitsas,sowe'llbeforcedtotakethesquarerootofthevariancetouseit(i.e.,atleastadivide,multiplyandtwoadds).Avariationmeasurethat'seasytocomputeisthemeanpredictionerrorormeandevia-tion,theaverageof.Also,sincemdev AAFASTALGORITHMFORRTTMEANANDVARIATIONmeandeviationisamoreconservative(i.e.,larger)estimateofvariationthanstandarddeviation.There'softenasimplerelationbetweenmdevandsdev.E.g.,ifthepredictionerrorsarenormallydistributed,mdev sdev.Formostcommondistributionsthefactortogosdevmdevisnearone( 25).I.e.,mdevisagoodapproximationofsdevandismucheasiertocompute.A.2PracticeFastestimatorsforaverageandmeandeviationgivenmeasurementfollowdirectlyfromtheabove.BothestimatorscomputemeanssotherearetwoinstancesoftheRFC793algorithm:ErrgErrErrTobecomputedquickly,theaboveshouldbedoneinintegerarithmetic.Buttheex-pressionscontainfractions(1)sosomescalingisneededtokeepeverythinginteger.Areciprocalpowerof2(i.e.,forsome)isaparticularlygoodchoiceforthescalingcanbeimplementedwithshifts.Multiplyingthroughby1givesErrErrTominimizeround-offerror,thescaledversionsof,shouldbekeptratherthantheunscaledversions.Picking (closetothe.1suggestedinRFC793)andexpressingtheaboveinC:updateAverageestimator=(sasa+=m;updateDeviationestimator=(svsv+=m;It'snotnecessarytousethesamegainfor.ToforcethetimertogoupquicklyinresponsetochangesintheRTT,it'sagoodideatogivealargergain.Inparticular,becauseofwindow–delaymismatchthereareoftenRTTartifactsatintegermultiplesofthewindowsize.Tolterthese,onewouldlike1intheestimatortobeatleastaslargeasthewindowsize(inpackets)and1intheestimatortobelessthanthewindowsize. Puristsmaynotethatweelidedafactorof1,thenumberofsamples,fromthepreviousinequality.Itmakesnodifferencetotheresult.E.g.,seepackets10–50ofgure5.NotethatthesewindoweffectsareduetocharacteristicsoftheArpa/Milnetsubnet.Ingeneral,windoweffectsonthetimerareatmostasecond-orderconsiderationanddependagreatdealontheunderlyingnetwork.E.g.,ifonewereusingtheWidebandwitha256packetwin-dow,1/256wouldnotbeagoodgainfor(1/16mightbe).Althoughitmaynotbeobvious,theabsolutevalueinthecalculationofintroducesanasymmetryinthetimer:Becausehasthesamesignasanincreaseandtheoppositesignofadecrease,moregaininmakesthe BSLOW-START+CONGESTIONAVOIDANCEALGORITHMUsingagainof.25onthedeviationandcomputingtheretransmittimer,rto,asthenaltimercodelookslike:=(sasa+=m;=(svsv+=m;rto=(sa3)+sv;Ingeneralthiscomputationwillcorrectlyroundrto:Becauseofthetruncationwhencomputingwillconvergetothetruemeanroundeduptothenexttick.Likewisewith.Thus,ontheaverage,thereishalfatickofbiasineach.Thertocomputationshouldberoundedbyhalfatickandonetickneedstobeaddedtoaccountforsendsbeingphasedrandomlywithrespecttotheclock.So,the1.75tickbiascontributionfrom4approximatelyequalsthedesiredhalftickroundingplusonetickphasecorrection.BThecombinedslow-startwithcongestionavoidancealgorithmThesenderkeepstwostatevariablesforcongestioncontrol:aslow-start/congestionwin-dow,cwnd,andathresholdsize,ssthresh,toswitchbetweenthetwoalgorithms.Thesender'soutputroutinealwayssendstheminimumofcwndandthewindowadvertisedbythereceiver.Onatimeout,halfthecurrentwindowsizeisrecordedinssthresh(thisisthemultiplicativedecreasepartofthecongestionavoidancealgorithm),thencwndissetto1packet(thisinitiatesslow-start).Whennewdataisacked,thesenderdoes(cwndifwe'restilldoingslowopenwindowexponentiallycwnd+=1;otherwisedoCongestionAvoidanceincrementcwnd+=1cwnd;Thusslow-startopensthewindowquicklytowhatcongestionavoidancethinksisasafeoperatingpoint(halfthewindowthatgotusintotrouble),thencongestionavoidancetakesoverandslowlyincreasesthewindowsizetoprobeformorebandwidthbecomingavailableonthepath.Notethattheclauseoftheabovecodewillmalfunctionifcwndisanintegerinunscaled,one-packetunits.I.e.,ifthemaximumwindowforthepathispackets,cwnd timergoupquicklyandcomedownslowly,`automatically'givingthebehaviorsuggestedin[22].E.g.,seetheregionbetweenpackets50and80ingure6. CINTERACTIONOFWINDOWADJUSTMENTWITHROUND-TRIPTIMINGmustcovertherange0withresolutionofatleast1Sincesendingpacketssmallerthanthemaximumtransmissionunitforthepathlowersefciency,theimplementormusttakecarethatthefractionallysizedcwndresultinsmallpacketsbeingsent.InTCPimplementations,existingsilly-windowavoidancecodeshouldpreventruntpacketsbutthispointshouldbecarefullychecked.CInteractionofwindowadjustmentwithround-triptimingSomeTCPconnections,particularlythoseoveraverylowspeedlinksuchasadial-upSLIPline[25],mayexperienceanunfortunateinteractionbetweencongestionwindowadjustmentandretransmittiming:Networkpathstendtodivideintotwoclasses:delay-dominatedwherethestore-and-forwardand/ortransitdelaysdeterminetheRTT,andbandwidth-dominatedwhere(bottleneck)linkbandwidthandaveragepacketsizedeterminetheRTTOnabandwidth-dominatedpathofbandwidth,acongestion-avoidancewindowincrementofwillincreasetheRTTofpost-incrementpacketsby IfthepathRTTvariationissmall,mayexceedthe4cushioninrto,aretransmittimeoutwilloccurand,afterafewcyclesofthis,ssthresh(and,thus,cwnd)endupclampedatsmallvalues.ThertocalculationinappendixAwasdesignedtopreventthistypeofspuriousretrans-missiontimeoutduringslow-start.Inparticular,theRTTvariationismultipliedbyfourinthertocalculationbecauseofthefollowing:Aspuriousretransmitoccursiftheretrans-mittimeoutcomputedattheendofslow-startroundrto,iseverlessthanorequaltotheactualRTTofthenextround.Intheworstcaseofallthedelaybeingduethewindow,doubleseachround(sincethewindowsizedoubles).Thus(whereisthemeasuredRTTatslow-startround).Butrtosospuriousretransmittimeoutscannotoccur. ForTCPthishappensautomaticallysincewindowsareexpressedinbytes,notpackets.ForprotocolssuchasISOTP4,theimplementorshouldscalesothatthecalculationsabovecanbedonewithintegerarithmeticandthescalefactorshouldbelargeenoughtoavoidthexedpoint(zero)ofinthecongestionavoidanceincrement.E.g.,TCPovera2400baudpacketradiolinkisbandwidth-dominatedsincethetransmissiontimefora(typical)576byteIPpacketis2.4seconds,longerthananypossibleterrestrialtransitdelay.TheoriginalSIGCOMM'88versionofthispapersuggestedcalculatingrtoratherthan.SincethattimewehavehadmuchmoreexperiencewithlowspeedSLIPlinksandobservedspuriousretransmissionsduringconnectionstartup.Aninvestigationofwhytheseoccuredledtotheanalysisaboveandthechangetothertocalculationinapp.A. DWINDOWADJUSTMENTPOLICYSpuriousretransmissionduetoawindowincreasecanoccurduringthecongestionavoidancewindowincrementsincethewindowcanonlybechangedinonepacketincre-mentsso,forapacketsize,theremaybeasmanyas1packetsbetweenincrements,longenoughforanyincreaseduetothelastwindowincrementtodecayawaytonothing.Butthisproblemisunlikelyonabandwidth-dominatedpathsincetheincrementswouldhavetobemorethantwelvepacketsapart(thedecaytimeoftheltertimesitsgaininthertocalculation)whichimpliesthataridiculouslylargewindowisbeingusedforthepath.Thusoneshouldregardthesetimeoutsasappropriatepunishmentforgrossmis-tuningandtheireffectwillsimplybetoreducethewindowtosomethingmoreappropriateforthepath.Althoughslow-startandcongestionavoidancearedesignedtonottriggerthiskindofspuriousretransmission,aninteractionwithhigherlevelprotocolsfrequentlydoes:Appli-cationprotocolslikeSMTPNNTPhavea`negotiation'phasewhereafewpacketsareexchangedstop-and-wait,followedbydatatransferphasewhereallofamailmessageornewsarticleissent.Unfortunately,the`negotiation'exchangesopenthecongestionwin-dowsothestartofthedatatransferphasewilldumpseveralpacketsintothenetworkwithnoslow-startand,onabandwidth-dominatedpath,fasterthanrtocantracktheRTTcreasecausedbythesepackets.Therootcauseofthisproblemisthesameonedescribedinsec.1:dumpingtoomanypacketsintoanemptypipe(thepipeisemptysincethene-gotiationexchangewasconductedstop-and-wait)withnoack`clock'.Thexproposedinsec.1,slow-start,willalsopreventthisproblemiftheTCPimplementationcandetectthephasechange.Anddetectionissimple:Thepipeisemptybecausewehaven'tsentanythingforatleastaround-trip-time(anotherwaytoviewRTTisasthetimeittakesthepipetoemptyafterthemostrecentsend).So,ifnothinghasbeensentforatleastoneRTT,thenextsendshouldsetcwndtoonepackettoforceaslow-start.I.e.,iftheconnectionstatevariableholdsthetimethelastpacketwassent,thefollowingcodeshouldappearearlyintheTCPoutputroutine:intidle=(snd max==snd (idle&&nowcwnd=1;Thebooleanistrueifthereisnodataintransit(alldatasenthasbeenacked)sothesays“ifthere'snothingintransitandwehaven'tsentanythingfor`alongtime',slow-start.”OurexperiencehasbeenthateitherthecurrentRTTestimateorthertoestimatecanbeusedfor`alongtime'withgoodresultsDWindowAdjustmentPolicyAreasonforusing asathedecreaseterm,asopposedtothe in[15],wasthefollowinghandwaving:Whenapacketisdropped,you'reeitherstarting(orrestartingafteradrop)orsteady-statesending.Ifyou'restarting,youknowthathalfthecurrentwindowsize`worked',i.e.,thatawindow'sworthofpacketswereexchangedwithnodrops(slow-startguaranteesthis).Thusoncongestionyousetthewindowtothelargestsizethatyouknowworksthenslowlyincreasethesize.Iftheconnectionissteady-staterunningand Thethelargestsensiblewindowforapathisthebottleneckbandwidthtimestheround-tripdelayand,bydenition,thedelayisnegligibleforabandwidth-dominatedpathsothewindowshouldonlybeafewpackets.ThertoestimateismoreconvenientsinceitiskeptinunitsoftimewhileRTTisscaled. apacketisdropped,it'sprobablybecauseanewconnectionstartedupandtooksomeofyourbandwidth.Weusuallyrunournetswith5soit'sprobablethattherearenowexactlytwoconversationssharingthebandwidth.I.e.,youshouldreduceyourwindowbyhalfbecausethebandwidthavailabletoyouhasbeenreducedbyhalf.And,iftherearemorethantwoconversationssharingthebandwidth,halvingyourwindowisconservative—andbeingconservativeathightrafcintensitiesisprobablywise.Althoughafactoroftwochangeinwindowsizeseemsalargeperformancepenalty,insystemtermsthecostisnegligible:Currently,packetsaredroppedonlywhenalargequeuehasformed.EvenwiththeISOIP`congestionexperienced'bit[11]toforcesenderstoreducetheirwindows,we'restuckwiththequeuebecausethebottleneckisrunningat100%utilizationwithnoexcessbandwidthavailabletodissipatethequeue.Ifapacketistossed,somesendershutsupfortwoRTT,exactlythetimeneededtoemptythequeue.Ifthatsenderrestartswiththecorrectwindowsize,thequeuewon'treform.Thusthedelayhasbeenreducedtominimumwithoutthesystemlosinganybottleneckbandwidth.The1-packetincreasehaslessjusticationthanthe0.5decrease.Infact,it'salmostcertainlytoolarge.Ifthealgorithmconvergestoawindowsizeof,therearepacketsbetweendropswithanadditiveincreasepolicy.Wewereshootingforanaveragedroprateof1%andfoundthatontheArpanet(theworstcaseofthefournetworkswetested),windowsconvergedto8–12packets.Thisyields1packetincrementsfora1%averagedroprate.But,sincewe'vedonenothinginthegateways,thewindowweconvergetoisthemax-imumthegatewaycanacceptwithoutdroppingpackets.I.e.,inthetermsof[15],wearejusttotheleftofthecliffratherthanjusttotherightoftheknee.Ifthegatewaysarexedsotheystartdroppingpacketswhenthequeuegetspushedpasttheknee,ourincrementwillbemuchtooaggressiveandshouldbedroppedbyaboutafactoroffour(sinceourmea-surementsonanunloadedArpanetplaceits`pipesize'at4–5packets).Itappearstrivialtoimplementasecondordercontrollooptoadaptivelydeterminetheappropriateincrementtouseforapath.Butsecondorderproblemsareonholduntilwe'vespentsometimeontherstorderpartofthealgorithmforthegateways.References[1]ALDOUS,D.J.Ultimateinstabilityofexponentialback-offprotocolforacknowledg-mentbasedtransmissioncontrolofrandomaccesscommunicationchannels.IEEETransactionsonInformationTheoryIT-33,2(Mar.1987).[2]ARPANETORKINGROUPEQUESTSFOROMMENT,DDNNETWORKNFORMATIONENTERTransmissionControlProtocolSpecication.SRIInternational,MenloPark,CA,Sept.1981.RFC-793.[3]BORRELLI,R.,OLEMAN,C.DifferentialEquations.Prentice-HallInc.,1987.[4]CHIU,D.-M.,AIN,R.Networkswithaconnectionlessnetworklayer;partiii:Analysisoftheincreaseanddecreasealgorithms.Tech.Rep.DEC-TR-509,DigitalEquipmentCorporation,Stanford,CA,Aug.1987.[5]CLARK,D.WindowandAcknowlegementStrategyinTCPRPANETWorkingGroupRequestsforComment,DDNNetworkInformationCenter,SRIInternational,MenloPark,CA,July1982.RFC-813. [6]E,S.W.Anadaptivetimeoutalgorithmforretransmissionacrossapacketswitchingnetwork.InProceedingsofSIGCOMM'83(Mar.1983),ACM.[7]F,W.ProbabilityTheoryanditsApplications,seconded.,vol.II.JohnWiley&Sons,1971.[8]HAJEK,B.Stochasticapproximationmethodsfordecentralizedcontrolofmulti-accesscommunications.IEEETransactionsonInformationTheoryIT-31,2(Mar.[9]HAJEK,B.,,T.Decentralizeddynamiccontrolofamultiaccessbroadcastchannel.IEEETransactionsonAutomaticControlAC-27,3(June1982).1982).ProceedingsoftheSixthInternetEngineeringTaskForce(Boston,MA,Apr.1987).ProceedingsavailableasNICdocumentIETF-87/2PfromDDNNetworkInformationCenter,SRIInternational,MenloPark,CA.[11]INTERNATIONALRGANIZATIONFORTANDARDIZATIONISOInternationalStan-dard8473,InformationProcessingSystems—OpenSystemsInterconnection—Connectionless-modeNetworkServiceProtocolSpecication,Mar.1986.[12]JACOBSON,V.Congestionavoidanceandcontrol.InProceedingsofSIGCOMM'88(Stanford,CA,Aug.1988),ACM.[13]JAIN,R.Divergenceoftimeoutalgorithmsforpacketretransmissions.InProceedingsFifthAnnualInternationalPhoenixConferenceonComputersandCommunications(Scottsdale,AZ,Mar.1986).[14]JAIN,R.Atimeout-basedcongestioncontrolschemeforwindowow-controlledIEEEJournalonSelectedAreasinCommunicationsSAC-4,7(Oct.1986).[15]JAIN,R.,RAMAKRISHNAN,K.,HIU,D.-M.Congestionavoidanceincom-puternetworkswithaconnectionlessnetworklayer.Tech.Rep.DEC-TR-506,DigitalEquipmentCorporation,Aug.1987.[16]K,P.,ARTRIDGE,C.Estimatinground-triptimesinreliabletransportprotocols.InProceedingsofSIGCOMM'87(Aug.1987),ACM.[17]KELLY,F.P.Stochasticmodelsofcomputercommunicationsystems.JournaloftheRoyalStatisticalSocietyB47,3(1985),379–395.[18]KLEINROCK,L.QueueingSystems,vol.II.JohnWiley&Sons,1976.[19]KLINE,C.SupercomputersontheInternet:Acasestudy.InProceedingsofSIG-COMM'87(Aug.1987),ACM.[20]LJUNG,L.,ODERSTR,T.TheoryandPracticeofRecursiveIdenticationMITPress,1983.[21]LUENBERGER,D.G.IntroductiontoDynamicSystems.JohnWiley&Sons,1979.[22]M,D.InternetDelayExperimentsRPANETWorkingGroupRequestsforComment,DDNNetworkInformationCenter,SRIInternational,MenloPark,CA,Dec.1983.RFC-889. [23]NAGLE,J.CongestionControlinIP/TCPInternetworksRPANETWorkingGroupRequestsforComment,DDNNetworkInformationCenter,SRIInternational,MenloPark,CA,Jan.1984.RFC-896.[24]PRUE,W.,OSTEL,J.SomethingAHostCouldDowithSourceQuenchRPANETWorkingGroupRequestsforComment,DDNNetworkInformationCenter,SRIInternational,MenloPark,CA,July1987.RFC-1016.[25]ROMKEY,J.ANonstandardforTransmissionofIPDatagramsOverSerialLines:RPANETWorkingGroupRequestsforComment,DDNNetworkInformationCenter,SRIInternational,MenloPark,CA,June1988.RFC-1055.[26]Z,L.WhyTCPtimersdon'tworkwell.InProceedingsofSIGCOMM'86(Aug.1986),ACM.