In our simple model the memory system is a linear array of bytes and the CP U can access each memory location in a constant amount of time While this is an effective model as f ar as it goes it does not re64258ect the way that modern systems really ID: 30834
Download Pdf The PPT/PDF document "Chapter The Memory Hierarchy To this po..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Chapter6TheMemoryHierarchyTothispointinourstudyofsystems,wehavereliedonasimplemodelofacomputersystemasaCPUthatexecutesinstructionsandamemorysystemthatholdsinstructionsanddatafortheCPU.Inoursimplemodel,thememorysystemisalineararrayofbytes,andtheCPUcanaccesseachmemorylocationinaconstantamountoftime.Whilethisisaneffectivemodelasfarasitgoes,itdoesnotreectthewaythatmodernsystemsreallywork.Inpractice,amemorysystemisahierarchyofstoragedeviceswithdifferentcapacities,costs,andaccesstimes.CPUregistersholdthemostfrequentlyuseddata.Small,fastcachememoriesnearbytheCPUactasstagingareasforasubsetofthedataandinstructionsstoredintherelativelyslowmainmemory.Themainmemorystagesdatastoredonlarge,slowdisks,whichinturnoftenserveasstagingareasfordatastoredonthedisksortapesofothermachinesconnectedbynetworks.Memoryhierarchiesworkbecausewell-writtenprogramstendtoaccessthestorageatanyparticularlevelmorefrequentlythantheyaccessthestorageatthenextlowerlevel.Sothestorageatthenextlevelcanbeslower,andthuslargerandcheaperperbit.Theoveralleffectisalargepoolofmemorythatcostsasmuchasthecheapstoragenearthebottomofthehierarchy,butthatservesdatatoprogramsattherateofthefaststoragenearthetopofthehierarchy.Asaprogrammer,youneedtounderstandthememoryhierarchybecauseithasabigimpactontheperfor-manceofyourapplications.IfthedatayourprogramneedsarestoredinaCPUregister,thentheycanbeaccessedinzerocyclesduringtheexecutionoftheinstruction.Ifstoredinacache,1to30cycles.Ifstoredinmainmemory,50to200cycles.Andifstoredindisktensofmillionsofcycles!Here,then,isafundamentalandenduringideaincomputersystems:Ifyouunderstandhowthesystemmovesdataupanddownthememoryhierarchy,thenyoucanwriteyourapplicationprogramssothattheirdataitemsarestoredhigherinthehierarchy,wheretheCPUcanaccessthemmorequickly.Thisideacentersaroundafundamentalpropertyofcomputerprogramsknownaslocality.Programswithgoodlocalitytendtoaccessthesamesetofdataitemsoverandoveragain,ortheytendtoaccesssetsofnearbydataitems.Programswithgoodlocalitytendtoaccessmoredataitemsfromtheupperlevelsofthememoryhierarchythanprogramswithpoorlocality,andthusrunfaster.Forexample,therunningtimesofdifferentmatrixmultiplicationkernelsthatperformthesamenumberofarithmeticoperations,buthavedifferentdegreesoflocality,canvarybyafactorof20!531 532CHAPTER6.THEMEMORYHIERARCHYInthischapter,wewilllookatthebasicstoragetechnologiesSRAMmemory,DRAMmemory,ROMmemory,androtatingandsolidstatedisksanddescribehowtheyareorganizedintohierarchies.Inparticular,wefocusonthecachememoriesthatactasstagingareasbetweentheCPUandmainmemory,becausetheyhavethemostimpactonapplicationprogramperformance.WeshowyouhowtoanalyzeyourCprogramsforlocalityandweintroducetechniquesforimprovingthelocalityinyourprograms.Youwillalsolearnaninterestingwaytocharacterizetheperformanceofthememoryhierarchyonaparticularmachineasamemorymountainthatshowsreadaccesstimesasafunctionoflocality.6.1StorageTechnologiesMuchofthesuccessofcomputertechnologystemsfromthetremendousprogressinstoragetechnology.Earlycomputershadafewkilobytesofrandom-accessmemory.TheearliestIBMPCsdidn'tevenhaveaharddisk.ThatchangedwiththeintroductionoftheIBMPC-XTin1982,withits10-megabytedisk.Bytheyear2010,typicalmachineshad150,000timesasmuchdiskstorage,andtheamountofstoragewasincreasingbyafactorof2everycoupleofyears.6.1.1Random-AccessMemoryRandom-accessmemory(RAM)comesintwovarietiesstaticanddynamic.StaticRAM(SRAM)isfasterandsignicantlymoreexpensivethanDynamicRAM(DRAM).SRAMisusedforcachememories,bothonandofftheCPUchip.DRAMisusedforthemainmemoryplustheframebufferofagraphicssystem.Typically,adesktopsystemwillhavenomorethanafewmegabytesofSRAM,buthundredsorthousandsofmegabytesofDRAM.StaticRAMSRAMstoreseachbitinabistablememorycell.Eachcellisimplementedwithasix-transistorcircuit.Thiscircuithasthepropertythatitcanstayindenitelyineitheroftwodifferentvoltagecongurations,orstates.Anyotherstatewillbeunstablestartingfromthere,thecircuitwillquicklymovetowardoneofthestablestates.SuchamemorycellisanalogoustotheinvertedpendulumillustratedinFigure6.1. . . . Stable Left Stable Right Unstable Figure6.1:Invertedpendulum.LikeanSRAMcell,thependulumhasonlytwostablecongurations,orstates.Thependulumisstablewhenitistiltedeitherallthewaytotheleftorallthewaytotheright.Fromany 6.1.STORAGETECHNOLOGIES533otherposition,thependulumwillfalltoonesideortheother.Inprinciple,thependulumcouldalsoremainbalancedinaverticalpositionindenitely,butthisstateismetastablethesmallestdisturbancewouldmakeitstarttofall,andonceitfellitwouldneverreturntotheverticalposition.Duetoitsbistablenature,anSRAMmemorycellwillretainitsvalueindenitely,aslongasitiskeptpowered.Evenwhenadisturbance,suchaselectricalnoise,perturbsthevoltages,thecircuitwillreturntothestablevaluewhenthedisturbanceisremoved.DynamicRAMDRAMstoreseachbitaschargeonacapacitor.Thiscapacitorisverysmalltypicallyaround30femto-farads,thatis,3010 15farads.Recall,however,thatafaradisaverylargeunitofmeasure.DRAMstoragecanbemadeverydenseeachcellconsistsofacapacitorandasingleaccess-transistor.UnlikeSRAM,however,aDRAMmemorycellisverysensitivetoanydisturbance.Whenthecapacitorvoltageisdisturbed,itwillneverrecover.Exposuretolightrayswillcausethecapacitorvoltagestochange.Infact,thesensorsindigitalcamerasandcamcordersareessentiallyarraysofDRAMcells.VarioussourcesofleakagecurrentcauseaDRAMcelltoloseitschargewithinatimeperiodofaround10to100milliseconds.Fortunately,forcomputersoperatingwithclockcyclestimesmeasuredinnanoseconds,thisretentiontimeisquitelong.Thememorysystemmustperiodicallyrefresheverybitofmemorybyreadingitoutandthenrewritingit.Somesystemsalsouseerror-correctingcodes,wherethecomputerwordsareencodedafewmorebits(e.g.,a32-bitwordmightbeencodedusing38bits),suchthatcircuitrycandetectandcorrectanysingleerroneousbitwithinaword.Figure6.2summarizesthecharacteristicsofSRAMandDRAMmemory.SRAMispersistentaslongaspowerisappliedtothem.UnlikeDRAM,norefreshisnecessary.SRAMcanbeaccessedfasterthanDRAM.SRAMisnotsensitivetodisturbancessuchaslightandelectricalnoise.Thetrade-offisthatSRAMcellsusemoretransistorsthanDRAMcells,andthushavelowerdensities,aremoreexpensive,andconsumemorepower. Transistors Relative Relative perbit accesstime Persistent? Sensitive? cost Applications SRAM 6 1X Yes No 100X Cachememory DRAM 1 10X No Yes 1X Mainmem,framebuffers Figure6.2:CharacteristicsofDRAMandSRAMmemory.ConventionalDRAMsThecells(bits)inaDRAMchiparepartitionedintodsupercells,eachconsistingofwDRAMcells.AdwDRAMstoresatotalofdwbitsofinformation.Thesupercellsareorganizedasarectangulararraywithrrowsandccolumns,whererc=d.Eachsupercellhasanaddressoftheform(i;j),whereidenotestherow,andjdenotesthecolumn.Forexample,Figure6.3showstheorganizationofa168DRAMchipwithd=16supercells,w=8 534CHAPTER6.THEMEMORYHIERARCHYbitspersupercell,r=4rows,andc=4columns.Theshadedboxdenotesthesupercellataddress(2;1).Informationowsinandoutofthechipviaexternalconnectorscalledpins.Eachpincarriesa1-bitsignal.Figure6.3showstwoofthesesetsofpins:eightdatapinsthatcantransfer1byteinoroutofthechip,andtwoaddrpinsthatcarrytwo-bitrowandcolumnsupercelladdresses.Otherpinsthatcarrycontrolinformationarenotshown. ColsRows 123 Internal row buffer DRAM chip addr data Supercell(2,1) 2/8/ Memorycontroller (to CPU) Figure6.3:Highlevelviewofa128-bit168DRAMchip.Aside:AnoteonterminologyThestoragecommunityhasneversettledonastandardnameforaDRAMarrayelement.Computerarchitectstendtorefertoitasacell,overloadingthetermwiththeDRAMstoragecell.Circuitdesignerstendtorefertoitasaword,overloadingthetermwithawordofmainmemory.Toavoidconfusion,wehaveadoptedtheunambiguoustermsupercell.EndAside.EachDRAMchipisconnectedtosomecircuitry,knownasthememorycontroller,thatcantransferwbitsatatimetoandfromeachDRAMchip.Toreadthecontentsofsupercell(i;j),thememorycontrollersendstherowaddressitotheDRAM,followedbythecolumnaddressj.TheDRAMrespondsbysendingthecontentsofsupercell(i;j)backtothecontroller.TherowaddressiiscalledaRAS(RowAccessStrobe)request.ThecolumnaddressjiscalledaCAS(ColumnAccessStrobe)request.NoticethattheRASandCASrequestssharethesameDRAMaddresspins.Forexample,toreadsupercell(2;1)fromthe168DRAMinFigure6.3,thememorycontrollersendsrowaddress2,asshowninFigure6.4(a).TheDRAMrespondsbycopyingtheentirecontentsofrow2intoaninternalrowbuffer.Next,thememorycontrollersendscolumnaddress1,asshowninFigure6.4(b).TheDRAMrespondsbycopyingthe8bitsinsupercell(2;1)fromtherowbufferandsendingthemtothememorycontroller.OnereasoncircuitdesignersorganizeDRAMsastwo-dimensionalarraysinsteadoflineararraysistoreducethenumberofaddresspinsonthechip.Forexample,ifourexample128-bitDRAMwereorganizedasalineararrayof16supercellswithaddresses0to15,thenthechipwouldneedfouraddresspinsinsteadoftwo.Thedisadvantageofthetwo-dimensionalarrayorganizationisthataddressesmustbesentintwodistinctsteps,whichincreasestheaccesstime. 6.1.STORAGETECHNOLOGIES535 RAS = 2 ColsRows 123 Internal row buffer DRAM chip Row 2 addr data 2/8/ Memorycontroller Supercell (2,1)ColsRows 123 Internal row buffer DRAM chip CAS = 1 addr data 2/8/ Memorycontroller (a)Selectrow2(RASrequest).(b)Selectcolumn1(CASrequest).Figure6.4:ReadingthecontentsofaDRAMsupercell.MemoryModulesDRAMchipsarepackagedinmemorymodulesthatplugintoexpansionslotsonthemainsystemboard(motherboard).Commonpackagesincludethe168-pindualinlinememorymodule(DIMM),whichtransfersdatatoandfromthememorycontrollerin64-bitchunks,andthe72-pinsingleinlinememorymodule(SIMM),whichtransfersdatain32-bitchunks.Figure6.5showsthebasicideaofamemorymodule.Theexamplemodulestoresatotalof64MB(megabytes)usingeight64-Mbit8M8DRAMchips,numbered0to7.Eachsupercellstores1byteofmainmemory,andeach64-bitdoubleword1atbyteaddressAinmainmemoryisrepresentedbytheeightsupercellswhosecorrespondingsupercelladdressis(i;j).IntheexampleinFigure6.5,DRAM0storestherst(lower-order)byte,DRAM1storesthenextbyte,andsoon.Toretrievea64-bitdoublewordatmemoryaddressA,thememorycontrollerconvertsAtoasupercelladdress(i;j)andsendsittothememorymodule,whichthenbroadcastsiandjtoeachDRAM.Inresponse,eachDRAMoutputsthe8-bitcontentsofits(i;j)supercell.Circuitryinthemodulecollectstheseoutputsandformsthemintoa64-bitdoubleword,whichitreturnstothememorycontroller.Mainmemorycanbeaggregatedbyconnectingmultiplememorymodulestothememorycontroller.Inthiscase,whenthecontrollerreceivesanaddressA,thecontrollerselectsthemodulekthatcontainsA,convertsAtoits(i;j)form,andsends(i;j)tomodulek.PracticeProblem6.1:Inthefollowing,letrbethenumberofrowsinaDRAMarray,cthenumberofcolumns,brthenumberofbitsneededtoaddresstherows,andbcthenumberofbitsneededtoaddressthecolumns.ForeachofthefollowingDRAMs,determinethepower-of-twoarraydimensionsthatminimizemax(br;bc),themaximumnumberofbitsneededtoaddresstherowsorcolumnsofthearray. 1IA32wouldcallthis64-bitquantityaquadword. 536CHAPTER6.THEMEMORYHIERARCHY : Supercell (i,j)031 78151623243263 39404748555664-bit double word at main memory address A addr (row = i, col = j) data 64 MB memory moduleconsisting of8 8Mx8 DRAMs Memorycontroller bits0-7DRAM 7DRAM 0bits8-15bits16-23bits24-31bits32-39bits40-47bits48-55bits56-63 64-bit doubleword to CPU chip Figure6.5:Readingthecontentsofamemorymodule. Organization r c br bc max(br;bc) 161 164 1288 5124 10244 EnhancedDRAMsTherearemanykindsofDRAMmemories,andnewkindsappearonthemarketwithregularityasman-ufacturersattempttokeepupwithrapidlyincreasingprocessorspeeds.EachisbasedontheconventionalDRAMcell,withoptimizationsthatimprovethespeedwithwhichthebasicDRAMcellscanbeaccessed.FastpagemodeDRAM(FPMDRAM).AconventionalDRAMcopiesanentirerowofsupercellsintoitsinternalrowbuffer,usesone,andthendiscardstherest.FPMDRAMimprovesonthisbyallowingconsecutiveaccessestothesamerowtobeserveddirectlyfromtherowbuffer.Forexample,toreadfoursupercellsfromrowiofaconventionalDRAM,thememorycontrollermustsendfourRAS/CASrequests,eventhoughtherowaddressiisidenticalineachcase.ToreadsupercellsfromthesamerowofanFPMDRAM,thememorycontrollersendsaninitialRAS/CASrequest,followedbythreeCASrequests.TheinitialRAS/CASrequestcopiesrowiintotherowbufferandreturnsthesupercelladdressedbytheCAS.Thenextthreesupercellsareserveddirectlyfromtherowbuffer,andthusmorequicklythantheinitialsupercell. 6.1.STORAGETECHNOLOGIES537ExtendeddataoutDRAM(EDODRAM).AnenhancedformofFPMDRAMthatallowstheindividualCASsignalstobespacedclosertogetherintime.SynchronousDRAM(SDRAM).Conventional,FPM,andEDODRAMsareasynchronousinthesensethattheycommunicatewiththememorycontrollerusingasetofexplicitcontrolsignals.SDRAMreplacesmanyofthesecontrolsignalswiththerisingedgesofthesameexternalclocksignalthatdrivesthememorycontroller.Withoutgoingintodetail,theneteffectisthatanSDRAMcanoutputthecontentsofitssupercellsatafasterratethanitsasynchronouscounterparts.DoubleData-RateSynchronousDRAM(DDRSDRAM).DDRSDRAMisanenhancementofSDRAMthatdoublesthespeedoftheDRAMbyusingbothclockedgesascontrolsignals.DifferenttypesofDDRSDRAMsarecharacterizedbythesizeofasmallprefetchbufferthatincreasestheeffectivebandwidth:DDR(2bits),DDR2(4bits),andDDR3(8bits).RambusDRAM(RDRAM).ThisisanalternativeproprietarytechnologywithahighermaximumbandwidththanDDRSDRAM.VideoRAM(VRAM).Usedintheframebuffersofgraphicssystems.VRAMissimilarinspirittoFPMDRAM.Twomajordifferencesarethat(1)VRAMoutputisproducedbyshiftingtheentirecontentsoftheinternalbufferinsequence,and(2)VRAMallowsconcurrentreadsandwritestothememory.Thus,thesystemcanbepaintingthescreenwiththepixelsintheframebuffer(reads)whileconcurrentlywritingnewvaluesforthenextupdate(writes).Aside:HistoricalpopularityofDRAMtechnologiesUntil1995,mostPCswerebuiltwithFPMDRAMs.From1996to1999,EDODRAMsdominatedthemarket,whileFPMDRAMsallbutdisappeared.SDRAMsrstappearedin1995inhigh-endsystems,andby2002mostPCswerebuiltwithSDRAMsandDDRSDRAMs.By2010,mostserveranddesktopsystemswerebuiltwithDDR3SDRAMs.Infact,theIntelCorei7supportsonlyDDR3SDRAM.EndAside.NonvolatileMemoryDRAMsandSRAMsarevolatileinthesensethattheylosetheirinformationifthesupplyvoltageisturnedoff.Nonvolatilememories,ontheotherhand,retaintheirinformationevenwhentheyarepoweredoff.Thereareavarietyofnonvolatilememories.Forhistoricalreasons,theyarereferredtocollectivelyasread-onlymemories(ROMs),eventhoughsometypesofROMscanbewrittentoaswellasread.ROMsaredistinguishedbythenumberoftimestheycanbereprogrammed(writtento)andbythemechanismforreprogrammingthem.AprogrammableROM(PROM)canbeprogrammedexactlyonce.PROMsincludeasortoffusewitheachmemorycellthatcanbeblownoncebyzappingitwithahighcurrent.AnerasableprogrammableROM(EPROM)hasatransparentquartzwindowthatpermitslighttoreachthestoragecells.TheEPROMcellsareclearedtozerosbyshiningultravioletlightthroughthewindow.ProgramminganEPROMisdonebyusingaspecialdevicetowriteonesintotheEPROM.AnEPROMcanbeerasedandreprogrammedontheorderof1000times.AnelectricallyerasablePROM(EEPROM)isakintoanEPROM,butdoesnotrequireaphysicallyseparateprogrammingdevice,andthuscanbe 538CHAPTER6.THEMEMORYHIERARCHYreprogrammedin-placeonprintedcircuitcards.AnEEPROMcanbereprogrammedontheorderof105timesbeforeitwearsout.Flashmemoryisatypeofnonvolatilememory,basedonEEPROMs,thathasbecomeanimportantstoragetechnology.Flashmemoriesareeverywhere,providingfastanddurablenonvolatilestorageforaslewofelectronicdevices,includingdigitalcameras,cellphones,musicplayers,PDAs,andlaptop,desktop,andservercomputersystems.InSection6.1.3,wewilllookindetailatanewformofash-baseddiskdrive,knownasasolidstatedisk(SSD),thatprovidesafaster,sturdier,andlesspower-hungryalternativetoconventionalrotatingdisks.ProgramsstoredinROMdevicesareoftenreferredtoasrmware.Whenacomputersystemispoweredup,itrunsrmwarestoredinaROM.Somesystemsprovideasmallsetofprimitiveinputandoutputfunctionsinrmware,forexample,aPC'sBIOS(basicinput/outputsystem)routines.ComplicateddevicessuchasgraphicscardsanddiskdrivecontrollersalsorelyonrmwaretotranslateI/O(input/output)requestsfromtheCPU.AccessingMainMemoryDataowsbackandforthbetweentheprocessorandtheDRAMmainmemoryoversharedelectricalcon-duitscalledbuses.EachtransferofdatabetweentheCPUandmemoryisaccomplishedwithaseriesofstepscalledabustransaction.AreadtransactiontransfersdatafromthemainmemorytotheCPU.AwritetransactiontransfersdatafromtheCPUtothemainmemory.Abusisacollectionofparallelwiresthatcarryaddress,data,andcontrolsignals.Dependingontheparticularbusdesign,dataandaddresssignalscansharethesamesetofwires,ortheycanusedifferentsets.Also,morethantwodevicescansharethesamebus.Thecontrolwirescarrysignalsthatsynchronizethetransactionandidentifywhatkindoftransactioniscurrentlybeingperformed.Forexample,isthistransactionofinteresttothemainmemory,ortosomeotherI/Odevicesuchasadiskcontroller?Isthetransactionareadorawrite?Istheinformationonthebusanaddressoradataitem?Figure6.6showsthecongurationofanexamplecomputersystem.ThemaincomponentsaretheCPUchip,achipsetthatwewillcallanI/Obridge(whichincludesthememorycontroller),andtheDRAMmemorymodulesthatmakeupmainmemory.Thesecomponentsareconnectedbyapairofbuses:asystembusthatconnectstheCPUtotheI/Obridge,andamemorybusthatconnectstheI/Obridgetothemainmemory.TheI/Obridgetranslatestheelectricalsignalsofthesystembusintotheelectricalsignalsofthememorybus.Aswewillsee,theI/ObridgealsoconnectsthesystembusandmemorybustoanI/ObusthatissharedbyI/Odevicessuchasdisksandgraphicscards.Fornow,though,wewillfocusonthememorybus.Aside:AnoteonbusdesignsBusdesignisacomplexandrapidlychangingaspectofcomputersystems.Differentvendorsdevelopdifferentbusarchitecturesasawaytodifferentiatetheirproducts.Forexample,IntelsystemsusechipsetsknownasthenorthbridgeandthesouthbridgetoconnecttheCPUtomemoryandI/Odevices,respectively.InolderPentiumandCore2systems,afrontsidebus(FSB)connectstheCPUtothenorthbridge.SystemsfromAMDreplacetheFSBwiththeHyperTransportinterconnect,whilenewerIntelCorei7systemsusetheQuickPathinterconnect.Thedetailsofthesedifferentbusarchitecturesarebeyondthescopeofthistext.Instead,wewillusethehigh-levelbusarchitecturefromFigure6.6asarunningexamplethroughoutthetext.Itisasimplebutusefulabstractionthat 6.1.STORAGETECHNOLOGIES539 Mainmemory I/O bridge Bus interface ALURegister file CPU chipSystem bus Memory bus Figure6.6:ExamplebusstructurethatconnectstheCPUandmainmemory.allowsustobeconcrete,andcapturesthemainideaswithoutbeingtiedtoocloselytothedetailofanyproprietarydesigns.EndAside.ConsiderwhathappenswhentheCPUperformsaloadoperationsuchasmovlA,%eaxwherethecontentsofaddressAareloadedintoregister%eax.CircuitryontheCPUchipcalledthebusinterfaceinitiatesareadtransactiononthebus.Thereadtransactionconsistsofthreesteps.First,theCPUplacestheaddressAonthesystembus.TheI/Obridgepassesthesignalalongtothememorybus(Figure6.7(a)).Next,themainmemorysensestheaddresssignalonthememorybus,readstheaddressfromthememorybus,fetchesthedatawordfromtheDRAM,andwritesthedatatothememorybus.TheI/Obridgetranslatesthememorybussignalintoasystembussignal,andpassesitalongtothesystembus(Figure6.7(b)).Finally,theCPUsensesthedataonthesystembus,readsitfromthebus,andcopiesittoregister%eax(Figure6.7(c)).Conversely,whentheCPUperformsastoreinstructionsuchasmovl%eax,Awherethecontentsofregister%eaxarewrittentoaddressA,theCPUinitiatesawritetransaction.Again,therearethreebasicsteps.First,theCPUplacestheaddressonthesystembus.Thememoryreadstheaddressfromthememorybusandwaitsforthedatatoarrive(Figure6.8(a)).Next,theCPUcopiesthedatawordin%eaxtothesystembus(Figure6.8(b)).Finally,themainmemoryreadsthedatawordfromthememorybusandstoresthebitsintheDRAM(Figure6.8(c)).6.1.2DiskStorageDisksareworkhorsestoragedevicesthatholdenormousamountsofdata,ontheorderofhundredstothousandsofgigabytes,asopposedtothehundredsorthousandsofmegabytesinaRAM-basedmemory.However,ittakesontheorderofmillisecondstoreadinformationfromadisk,ahundredthousandtimeslongerthanfromDRAMandamilliontimeslongerthanfromSRAM. 540CHAPTER6.THEMEMORYHIERARCHY ALURegister file Bus interfaceA0A xMain memoryI/O bridge%eax (a)CPUplacesaddressAonthememorybus. ALURegister file Bus interfacex 0A xMain memory%eaxI/O bridge (b)MainmemoryreadsAfromthebus,retrieveswordx,andplacesitonthebus. x ALURegister file Bus interface xMain memory0%eaxI/O bridge (c)CPUreadswordxfromthebus,andcopiesitintoregister%eax.Figure6.7:Memoryreadtransactionforaloadoperation:movlA,%eax. 6.1.STORAGETECHNOLOGIES541 y ALURegister file Bus interfaceA Main memory0A%eaxI/O bridge (a)CPUplacesaddressAonthememorybus.Mainmemoryreadsitandwaitsforthedataword. y ALURegister file Bus interfacey Main memory0A%eaxI/O bridge (b)CPUplacesdatawordyonthebus. y ALURegister file Bus interface yMain memory0A%eaxI/O bridge (c)MainmemoryreadsdatawordyfromthebusandstoresitataddressA.Figure6.8:Memorywritetransactionforastoreoperation:movl%eax,A. 542CHAPTER6.THEMEMORYHIERARCHYDiskGeometryDisksareconstructedfromplatters.Eachplatterconsistsoftwosides,orsurfaces,thatarecoatedwithmagneticrecordingmaterial.Arotatingspindleinthecenteroftheplatterspinstheplatterataxedrotationalrate,typicallybetween5400and15,000revolutionsperminute(RPM).Adiskwilltypicallycontainoneormoreoftheseplattersencasedinasealedcontainer.Figure6.9(a)showsthegeometryofatypicaldisksurface.Eachsurfaceconsistsofacollectionofcon-centricringscalledtracks.Eachtrackispartitionedintoacollectionofsectors.Eachsectorcontainsanequalnumberofdatabits(typically512bytes)encodedinthemagneticmaterialonthesector.Sectorsareseparatedbygapswherenodatabitsarestored.Gapsstoreformattingbitsthatidentifysectors. Spindle Surface Tracks Track k Sectors Gaps (a)Single-platterview. Surface 0Surface 1Surface 2Surface 3Surface 4Surface 5 Cylinder k SpindlePlatter 0Platter 1Platter 2 (b)Multiple-platterview.Figure6.9:Diskgeometry.Adiskconsistsofoneormoreplattersstackedontopofeachotherandencasedinasealedpackage,asshowninFigure6.9(b).Theentireassemblyisoftenreferredtoasadiskdrive,althoughwewillusuallyrefertoitassimplyadisk.Wewillsometimerefertodisksasrotatingdiskstodistinguishthemfromash-basedsolidstatedisks(SSDs),whichhavenomovingparts.Diskmanufacturersdescribethegeometryofmultiple-platterdrivesintermsofcylinders,whereacylinderisthecollectionoftracksonallthesurfacesthatareequidistantfromthecenterofthespindle.Forexample,ifadrivehasthreeplattersandsixsurfaces,andthetracksoneachsurfacearenumberedconsistently,thencylinderkisthecollectionofthesixinstancesoftrackk.DiskCapacityThemaximumnumberofbitsthatcanberecordedbyadiskisknownasitsmaximumcapacity,orsimplycapacity.Diskcapacityisdeterminedbythefollowingtechnologyfactors:Recordingdensity(bits=in):Thenumberofbitsthatcanbesqueezedintoa1-inchsegmentofatrack.Trackdensity(tracks=in):Thenumberoftracksthatcanbesqueezedintoa1-inchsegmentoftheradiusextendingfromthecenteroftheplatter. 6.1.STORAGETECHNOLOGIES543Arealdensity(bits=in2):Theproductoftherecordingdensityandthetrackdensity.Diskmanufacturersworktirelesslytoincreasearealdensity(andthuscapacity),andthisisdoublingeveryfewyears.Theoriginaldisks,designedinanageoflowarealdensity,partitionedeverytrackintothesamenumberofsectors,whichwasdeterminedbythenumberofsectorsthatcouldberecordedontheinnermosttrack.Tomaintainaxednumberofsectorspertrack,thesectorswerespacedfartherapartontheoutertracks.Thiswasareasonableapproachwhenarealdensitieswererelativelylow.However,asarealdensitiesincreased,thegapsbetweensectors(wherenodatabitswerestored)becameunacceptablylarge.Thus,modernhigh-capacitydisksuseatechniqueknownasmultiplezonerecording,wherethesetofcylindersispartitionedintodisjointsubsetsknownasrecordingzones.Eachzoneconsistsofacontiguouscollectionofcylinders.Eachtrackineachcylinderinazonehasthesamenumberofsectors,whichisdeterminedbythenumberofsectorsthatcanbepackedintotheinnermosttrackofthezone.Notethatdiskettes(oppydisks)stillusetheold-fashionedapproach,withaconstantnumberofsectorspertrack.Thecapacityofadiskisgivenbythefollowingformula:Diskcapacity=#bytes sectoraverage#sectors track#tracks surface#surfaces platter#platters diskForexample,supposewehaveadiskwithveplatters,512bytespersector,20,000trackspersurface,andanaverageof300sectorspertrack.ThenthecapacityofthediskisDiskcapacity=512bytes sector300sectors track20,000tracks surface2surfaces platter5platters disk=30,720,000,000bytes=30.72GB:Noticethatmanufacturersexpressdiskcapacityinunitsofgigabytes(GB),where1GB=109bytes.Aside:Howmuchisagigabyte?Unfortunately,themeaningsofprexessuchaskilo(K),mega(M),giga(G),andtera(T)dependonthecontext.FormeasuresthatrelatetothecapacityofDRAMsandSRAMs,typicallyK=210,M=220,G=230,andT=240.FormeasuresrelatedtothecapacityofI/Odevicessuchasdisksandnetworks,typicallyK=103,M=106,G=109,andT=1012.Ratesandthroughputsusuallyusetheseprexvaluesaswell.Fortunately,fortheback-of-the-envelopeestimatesthatwetypicallyrelyon,eitherassumptionworksneinprac-tice.Forexample,therelativedifferencebetween220=1;048;576and106=1;000;000issmall:(220 106)=1065%.Similarlyfor230=1;073;741;824and109=1;000;000;000:(230 109)=1097%.EndAside.PracticeProblem6.2:Whatisthecapacityofadiskwithtwoplatters,10,000cylinders,anaverageof400sectorspertrack,and512bytespersector?DiskOperationDisksreadandwritebitsstoredonthemagneticsurfaceusingaread/writeheadconnectedtotheendofanactuatorarm,asshowninFigure6.10(a).Bymovingthearmbackandforthalongitsradialaxis,the 544CHAPTER6.THEMEMORYHIERARCHYdrivecanpositiontheheadoveranytrackonthesurface.Thismechanicalmotionisknownasaseek.Oncetheheadispositionedoverthedesiredtrack,thenaseachbitonthetrackpassesunderneath,theheadcaneithersensethevalueofthebit(readthebit)oralterthevalueofthebit(writethebit).Diskswithmultipleplattershaveaseparateread/writeheadforeachsurface,asshowninFigure6.10(b).Theheadsarelinedupverticallyandmoveinunison.Atanypointintime,allheadsarepositionedonthesamecylinder. By moving radially, the armcan position the read/writehead over any track Spindle The disk surfacespins at a fixedrotational rateThe read/write headis attached to the endof the arm and flies over the disk surface ona thin cushion of air (a)Single-platterview ArmRead/write heads Spindle (b)Multiple-platterviewFigure6.10:Diskdynamics.Theread/writeheadattheendofthearmies(literally)onathincushionofairoverthedisksurfaceataheightofabout0.1micronsandaspeedofabout80km/h.ThisisanalogoustoplacingtheSearsToweronitssideandyingitaroundtheworldataheightof2.5cm(1inch)abovetheground,witheachorbitoftheearthtakingonly8seconds!Atthesetolerances,atinypieceofdustonthesurfaceislikeahugeboulder.Iftheheadweretostrikeoneoftheseboulders,theheadwouldceaseyingandcrashintothesurface(aso-calledheadcrash).Forthisreason,disksarealwayssealedinairtightpackages.Disksreadandwritedatainsector-sizedblocks.Theaccesstimeforasectorhasthreemaincomponents:seektime,rotationallatency,andtransfertime:Seektime:Toreadthecontentsofsometargetsector,thearmrstpositionstheheadoverthetrackthatcontainsthetargetsector.Thetimerequiredtomovethearmiscalledtheseektime.Theseektime,Tseek,dependsonthepreviouspositionoftheheadandthespeedthatthearmmovesacrossthesurface.Theaverageseektimeinmoderndrives,Tavgseek,measuredbytakingthemeanofseveralthousandseekstorandomsectors,istypicallyontheorderof3to9ms.Themaximumtimeforasingleseek,Tmaxseek,canbeashighas20ms.Rotationallatency:Oncetheheadisinpositionoverthetrack,thedrivewaitsfortherstbitofthetargetsectortopassunderthehead.Theperformanceofthisstepdependsonboththepositionofthesurfacewhentheheadarrivesatthetargetsectorandtherotationalspeedofthedisk.Intheworstcase,theheadjustmissesthetargetsectorandwaitsforthedisktomakeafullrotation.Thus,themaximumrotationallatency,inseconds,isgivenbyTmaxrotation=1 RPM60secs 1min 6.1.STORAGETECHNOLOGIES545Theaveragerotationallatency,Tavgrotation,issimplyhalfofTmaxrotation.Transfertime:Whentherstbitofthetargetsectorisunderthehead,thedrivecanbegintoreadorwritethecontentsofthesector.Thetransfertimeforonesectordependsontherotationalspeedandthenumberofsectorspertrack.Thus,wecanroughlyestimatetheaveragetransfertimeforonesectorinsecondsasTavgtransfer=1 RPM1 (average#sectors/track)60secs 1minWecanestimatetheaveragetimetoaccessthecontentsofadisksectorasthesumoftheaverageseektime,theaveragerotationallatency,andtheaveragetransfertime.Forexample,consideradiskwiththefollowingparameters: Parameter Value Rotationalrate 7200RPM Tavgseek 9ms Average#sectors/track 400 Forthisdisk,theaveragerotationallatency(inms)isTavgrotation=1/2Tmaxrotation=1/2(60secs/7200RPM)1000ms/sec4msTheaveragetransfertimeisTavgtransfer=60/7200RPM1/400sectors/track1000ms/sec0.02msPuttingitalltogether,thetotalestimatedaccesstimeisTaccess=Tavgseek+Tavgrotation+Tavgtransfer=9ms+4ms+0.02ms=13.02msThisexampleillustratessomeimportantpoints:Thetimetoaccessthe512bytesinadisksectorisdominatedbytheseektimeandtherotationallatency.Accessingtherstbyteinthesectortakesalongtime,buttheremainingbytesareessentiallyfree.Sincetheseektimeandrotationallatencyareroughlythesame,twicetheseektimeisasimpleandreasonableruleforestimatingdiskaccesstime.