/
Chapter  The Memory Hierarchy To this point in our study of systems we have relied on Chapter  The Memory Hierarchy To this point in our study of systems we have relied on

Chapter The Memory Hierarchy To this point in our study of systems we have relied on - PDF document

test
test . @test
Follow
640 views
Uploaded On 2014-12-28

Chapter The Memory Hierarchy To this point in our study of systems we have relied on - PPT Presentation

In our simple model the memory system is a linear array of bytes and the CP U can access each memory location in a constant amount of time While this is an effective model as f ar as it goes it does not re64258ect the way that modern systems really ID: 30834

our simple model

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Chapter The Memory Hierarchy To this po..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Chapter6TheMemoryHierarchyTothispointinourstudyofsystems,wehavereliedonasimplemodelofacomputersystemasaCPUthatexecutesinstructionsandamemorysystemthatholdsinstructionsanddatafortheCPU.Inoursimplemodel,thememorysystemisalineararrayofbytes,andtheCPUcanaccesseachmemorylocationinaconstantamountoftime.Whilethisisaneffectivemodelasfarasitgoes,itdoesnotreectthewaythatmodernsystemsreallywork.Inpractice,amemorysystemisahierarchyofstoragedeviceswithdifferentcapacities,costs,andaccesstimes.CPUregistersholdthemostfrequentlyuseddata.Small,fastcachememoriesnearbytheCPUactasstagingareasforasubsetofthedataandinstructionsstoredintherelativelyslowmainmemory.Themainmemorystagesdatastoredonlarge,slowdisks,whichinturnoftenserveasstagingareasfordatastoredonthedisksortapesofothermachinesconnectedbynetworks.Memoryhierarchiesworkbecausewell-writtenprogramstendtoaccessthestorageatanyparticularlevelmorefrequentlythantheyaccessthestorageatthenextlowerlevel.Sothestorageatthenextlevelcanbeslower,andthuslargerandcheaperperbit.Theoveralleffectisalargepoolofmemorythatcostsasmuchasthecheapstoragenearthebottomofthehierarchy,butthatservesdatatoprogramsattherateofthefaststoragenearthetopofthehierarchy.Asaprogrammer,youneedtounderstandthememoryhierarchybecauseithasabigimpactontheperfor-manceofyourapplications.IfthedatayourprogramneedsarestoredinaCPUregister,thentheycanbeaccessedinzerocyclesduringtheexecutionoftheinstruction.Ifstoredinacache,1to30cycles.Ifstoredinmainmemory,50to200cycles.Andifstoredindisktensofmillionsofcycles!Here,then,isafundamentalandenduringideaincomputersystems:Ifyouunderstandhowthesystemmovesdataupanddownthememoryhierarchy,thenyoucanwriteyourapplicationprogramssothattheirdataitemsarestoredhigherinthehierarchy,wheretheCPUcanaccessthemmorequickly.Thisideacentersaroundafundamentalpropertyofcomputerprogramsknownaslocality.Programswithgoodlocalitytendtoaccessthesamesetofdataitemsoverandoveragain,ortheytendtoaccesssetsofnearbydataitems.Programswithgoodlocalitytendtoaccessmoredataitemsfromtheupperlevelsofthememoryhierarchythanprogramswithpoorlocality,andthusrunfaster.Forexample,therunningtimesofdifferentmatrixmultiplicationkernelsthatperformthesamenumberofarithmeticoperations,buthavedifferentdegreesoflocality,canvarybyafactorof20!531 532CHAPTER6.THEMEMORYHIERARCHYInthischapter,wewilllookatthebasicstoragetechnologies—SRAMmemory,DRAMmemory,ROMmemory,androtatingandsolidstatedisks—anddescribehowtheyareorganizedintohierarchies.Inparticular,wefocusonthecachememoriesthatactasstagingareasbetweentheCPUandmainmemory,becausetheyhavethemostimpactonapplicationprogramperformance.WeshowyouhowtoanalyzeyourCprogramsforlocalityandweintroducetechniquesforimprovingthelocalityinyourprograms.Youwillalsolearnaninterestingwaytocharacterizetheperformanceofthememoryhierarchyonaparticularmachineasa“memorymountain”thatshowsreadaccesstimesasafunctionoflocality.6.1StorageTechnologiesMuchofthesuccessofcomputertechnologystemsfromthetremendousprogressinstoragetechnology.Earlycomputershadafewkilobytesofrandom-accessmemory.TheearliestIBMPCsdidn'tevenhaveaharddisk.ThatchangedwiththeintroductionoftheIBMPC-XTin1982,withits10-megabytedisk.Bytheyear2010,typicalmachineshad150,000timesasmuchdiskstorage,andtheamountofstoragewasincreasingbyafactorof2everycoupleofyears.6.1.1Random-AccessMemoryRandom-accessmemory(RAM)comesintwovarieties—staticanddynamic.StaticRAM(SRAM)isfasterandsignicantlymoreexpensivethanDynamicRAM(DRAM).SRAMisusedforcachememories,bothonandofftheCPUchip.DRAMisusedforthemainmemoryplustheframebufferofagraphicssystem.Typically,adesktopsystemwillhavenomorethanafewmegabytesofSRAM,buthundredsorthousandsofmegabytesofDRAM.StaticRAMSRAMstoreseachbitinabistablememorycell.Eachcellisimplementedwithasix-transistorcircuit.Thiscircuithasthepropertythatitcanstayindenitelyineitheroftwodifferentvoltagecongurations,orstates.Anyotherstatewillbeunstable—startingfromthere,thecircuitwillquicklymovetowardoneofthestablestates.SuchamemorycellisanalogoustotheinvertedpendulumillustratedinFigure6.1. . . . Stable Left Stable Right Unstable Figure6.1:Invertedpendulum.LikeanSRAMcell,thependulumhasonlytwostablecongurations,orstates.Thependulumisstablewhenitistiltedeitherallthewaytotheleftorallthewaytotheright.Fromany 6.1.STORAGETECHNOLOGIES533otherposition,thependulumwillfalltoonesideortheother.Inprinciple,thependulumcouldalsoremainbalancedinaverticalpositionindenitely,butthisstateismetastable—thesmallestdisturbancewouldmakeitstarttofall,andonceitfellitwouldneverreturntotheverticalposition.Duetoitsbistablenature,anSRAMmemorycellwillretainitsvalueindenitely,aslongasitiskeptpowered.Evenwhenadisturbance,suchaselectricalnoise,perturbsthevoltages,thecircuitwillreturntothestablevaluewhenthedisturbanceisremoved.DynamicRAMDRAMstoreseachbitaschargeonacapacitor.Thiscapacitorisverysmall—typicallyaround30femto-farads,thatis,301015farads.Recall,however,thatafaradisaverylargeunitofmeasure.DRAMstoragecanbemadeverydense—eachcellconsistsofacapacitorandasingleaccess-transistor.UnlikeSRAM,however,aDRAMmemorycellisverysensitivetoanydisturbance.Whenthecapacitorvoltageisdisturbed,itwillneverrecover.Exposuretolightrayswillcausethecapacitorvoltagestochange.Infact,thesensorsindigitalcamerasandcamcordersareessentiallyarraysofDRAMcells.VarioussourcesofleakagecurrentcauseaDRAMcelltoloseitschargewithinatimeperiodofaround10to100milliseconds.Fortunately,forcomputersoperatingwithclockcyclestimesmeasuredinnanoseconds,thisretentiontimeisquitelong.Thememorysystemmustperiodicallyrefresheverybitofmemorybyreadingitoutandthenrewritingit.Somesystemsalsouseerror-correctingcodes,wherethecomputerwordsareencodedafewmorebits(e.g.,a32-bitwordmightbeencodedusing38bits),suchthatcircuitrycandetectandcorrectanysingleerroneousbitwithinaword.Figure6.2summarizesthecharacteristicsofSRAMandDRAMmemory.SRAMispersistentaslongaspowerisappliedtothem.UnlikeDRAM,norefreshisnecessary.SRAMcanbeaccessedfasterthanDRAM.SRAMisnotsensitivetodisturbancessuchaslightandelectricalnoise.Thetrade-offisthatSRAMcellsusemoretransistorsthanDRAMcells,andthushavelowerdensities,aremoreexpensive,andconsumemorepower. Transistors Relative Relative perbit accesstime Persistent? Sensitive? cost Applications SRAM 6 1X Yes No 100X Cachememory DRAM 1 10X No Yes 1X Mainmem,framebuffers Figure6.2:CharacteristicsofDRAMandSRAMmemory.ConventionalDRAMsThecells(bits)inaDRAMchiparepartitionedintodsupercells,eachconsistingofwDRAMcells.AdwDRAMstoresatotalofdwbitsofinformation.Thesupercellsareorganizedasarectangulararraywithrrowsandccolumns,whererc=d.Eachsupercellhasanaddressoftheform(i;j),whereidenotestherow,andjdenotesthecolumn.Forexample,Figure6.3showstheorganizationofa168DRAMchipwithd=16supercells,w=8 534CHAPTER6.THEMEMORYHIERARCHYbitspersupercell,r=4rows,andc=4columns.Theshadedboxdenotesthesupercellataddress(2;1).Informationowsinandoutofthechipviaexternalconnectorscalledpins.Eachpincarriesa1-bitsignal.Figure6.3showstwoofthesesetsofpins:eightdatapinsthatcantransfer1byteinoroutofthechip,andtwoaddrpinsthatcarrytwo-bitrowandcolumnsupercelladdresses.Otherpinsthatcarrycontrolinformationarenotshown. ColsRows 123 Internal row buffer DRAM chip addr data Supercell(2,1) 2/8/ Memorycontroller (to CPU) Figure6.3:Highlevelviewofa128-bit168DRAMchip.Aside:AnoteonterminologyThestoragecommunityhasneversettledonastandardnameforaDRAMarrayelement.Computerarchitectstendtorefertoitasa“cell,”overloadingthetermwiththeDRAMstoragecell.Circuitdesignerstendtorefertoitasa“word,”overloadingthetermwithawordofmainmemory.Toavoidconfusion,wehaveadoptedtheunambiguousterm“supercell.”EndAside.EachDRAMchipisconnectedtosomecircuitry,knownasthememorycontroller,thatcantransferwbitsatatimetoandfromeachDRAMchip.Toreadthecontentsofsupercell(i;j),thememorycontrollersendstherowaddressitotheDRAM,followedbythecolumnaddressj.TheDRAMrespondsbysendingthecontentsofsupercell(i;j)backtothecontroller.TherowaddressiiscalledaRAS(RowAccessStrobe)request.ThecolumnaddressjiscalledaCAS(ColumnAccessStrobe)request.NoticethattheRASandCASrequestssharethesameDRAMaddresspins.Forexample,toreadsupercell(2;1)fromthe168DRAMinFigure6.3,thememorycontrollersendsrowaddress2,asshowninFigure6.4(a).TheDRAMrespondsbycopyingtheentirecontentsofrow2intoaninternalrowbuffer.Next,thememorycontrollersendscolumnaddress1,asshowninFigure6.4(b).TheDRAMrespondsbycopyingthe8bitsinsupercell(2;1)fromtherowbufferandsendingthemtothememorycontroller.OnereasoncircuitdesignersorganizeDRAMsastwo-dimensionalarraysinsteadoflineararraysistoreducethenumberofaddresspinsonthechip.Forexample,ifourexample128-bitDRAMwereorganizedasalineararrayof16supercellswithaddresses0to15,thenthechipwouldneedfouraddresspinsinsteadoftwo.Thedisadvantageofthetwo-dimensionalarrayorganizationisthataddressesmustbesentintwodistinctsteps,whichincreasestheaccesstime. 6.1.STORAGETECHNOLOGIES535 RAS = 2 ColsRows 123 Internal row buffer DRAM chip Row 2 addr data 2/8/ Memorycontroller Supercell (2,1)ColsRows 123 Internal row buffer DRAM chip CAS = 1 addr data 2/8/ Memorycontroller (a)Selectrow2(RASrequest).(b)Selectcolumn1(CASrequest).Figure6.4:ReadingthecontentsofaDRAMsupercell.MemoryModulesDRAMchipsarepackagedinmemorymodulesthatplugintoexpansionslotsonthemainsystemboard(motherboard).Commonpackagesincludethe168-pindualinlinememorymodule(DIMM),whichtransfersdatatoandfromthememorycontrollerin64-bitchunks,andthe72-pinsingleinlinememorymodule(SIMM),whichtransfersdatain32-bitchunks.Figure6.5showsthebasicideaofamemorymodule.Theexamplemodulestoresatotalof64MB(megabytes)usingeight64-Mbit8M8DRAMchips,numbered0to7.Eachsupercellstores1byteofmainmemory,andeach64-bitdoubleword1atbyteaddressAinmainmemoryisrepresentedbytheeightsupercellswhosecorrespondingsupercelladdressis(i;j).IntheexampleinFigure6.5,DRAM0storestherst(lower-order)byte,DRAM1storesthenextbyte,andsoon.Toretrievea64-bitdoublewordatmemoryaddressA,thememorycontrollerconvertsAtoasupercelladdress(i;j)andsendsittothememorymodule,whichthenbroadcastsiandjtoeachDRAM.Inresponse,eachDRAMoutputsthe8-bitcontentsofits(i;j)supercell.Circuitryinthemodulecollectstheseoutputsandformsthemintoa64-bitdoubleword,whichitreturnstothememorycontroller.Mainmemorycanbeaggregatedbyconnectingmultiplememorymodulestothememorycontroller.Inthiscase,whenthecontrollerreceivesanaddressA,thecontrollerselectsthemodulekthatcontainsA,convertsAtoits(i;j)form,andsends(i;j)tomodulek.PracticeProblem6.1:Inthefollowing,letrbethenumberofrowsinaDRAMarray,cthenumberofcolumns,brthenumberofbitsneededtoaddresstherows,andbcthenumberofbitsneededtoaddressthecolumns.ForeachofthefollowingDRAMs,determinethepower-of-twoarraydimensionsthatminimizemax(br;bc),themaximumnumberofbitsneededtoaddresstherowsorcolumnsofthearray. 1IA32wouldcallthis64-bitquantitya“quadword.” 536CHAPTER6.THEMEMORYHIERARCHY : Supercell (i,j)031 78151623243263 39404748555664-bit double word at main memory address A addr (row = i, col = j) data 64 MB memory moduleconsisting of8 8Mx8 DRAMs Memorycontroller bits0-7DRAM 7DRAM 0bits8-15bits16-23bits24-31bits32-39bits40-47bits48-55bits56-63 64-bit doubleword to CPU chip Figure6.5:Readingthecontentsofamemorymodule. Organization r c br bc max(br;bc) 161 164 1288 5124 10244 EnhancedDRAMsTherearemanykindsofDRAMmemories,andnewkindsappearonthemarketwithregularityasman-ufacturersattempttokeepupwithrapidlyincreasingprocessorspeeds.EachisbasedontheconventionalDRAMcell,withoptimizationsthatimprovethespeedwithwhichthebasicDRAMcellscanbeaccessed.FastpagemodeDRAM(FPMDRAM).AconventionalDRAMcopiesanentirerowofsupercellsintoitsinternalrowbuffer,usesone,andthendiscardstherest.FPMDRAMimprovesonthisbyallowingconsecutiveaccessestothesamerowtobeserveddirectlyfromtherowbuffer.Forexample,toreadfoursupercellsfromrowiofaconventionalDRAM,thememorycontrollermustsendfourRAS/CASrequests,eventhoughtherowaddressiisidenticalineachcase.ToreadsupercellsfromthesamerowofanFPMDRAM,thememorycontrollersendsaninitialRAS/CASrequest,followedbythreeCASrequests.TheinitialRAS/CASrequestcopiesrowiintotherowbufferandreturnsthesupercelladdressedbytheCAS.Thenextthreesupercellsareserveddirectlyfromtherowbuffer,andthusmorequicklythantheinitialsupercell. 6.1.STORAGETECHNOLOGIES537ExtendeddataoutDRAM(EDODRAM).AnenhancedformofFPMDRAMthatallowstheindividualCASsignalstobespacedclosertogetherintime.SynchronousDRAM(SDRAM).Conventional,FPM,andEDODRAMsareasynchronousinthesensethattheycommunicatewiththememorycontrollerusingasetofexplicitcontrolsignals.SDRAMreplacesmanyofthesecontrolsignalswiththerisingedgesofthesameexternalclocksignalthatdrivesthememorycontroller.Withoutgoingintodetail,theneteffectisthatanSDRAMcanoutputthecontentsofitssupercellsatafasterratethanitsasynchronouscounterparts.DoubleData-RateSynchronousDRAM(DDRSDRAM).DDRSDRAMisanenhancementofSDRAMthatdoublesthespeedoftheDRAMbyusingbothclockedgesascontrolsignals.DifferenttypesofDDRSDRAMsarecharacterizedbythesizeofasmallprefetchbufferthatincreasestheeffectivebandwidth:DDR(2bits),DDR2(4bits),andDDR3(8bits).RambusDRAM(RDRAM).ThisisanalternativeproprietarytechnologywithahighermaximumbandwidththanDDRSDRAM.VideoRAM(VRAM).Usedintheframebuffersofgraphicssystems.VRAMissimilarinspirittoFPMDRAM.Twomajordifferencesarethat(1)VRAMoutputisproducedbyshiftingtheentirecontentsoftheinternalbufferinsequence,and(2)VRAMallowsconcurrentreadsandwritestothememory.Thus,thesystemcanbepaintingthescreenwiththepixelsintheframebuffer(reads)whileconcurrentlywritingnewvaluesforthenextupdate(writes).Aside:HistoricalpopularityofDRAMtechnologiesUntil1995,mostPCswerebuiltwithFPMDRAMs.From1996to1999,EDODRAMsdominatedthemarket,whileFPMDRAMsallbutdisappeared.SDRAMsrstappearedin1995inhigh-endsystems,andby2002mostPCswerebuiltwithSDRAMsandDDRSDRAMs.By2010,mostserveranddesktopsystemswerebuiltwithDDR3SDRAMs.Infact,theIntelCorei7supportsonlyDDR3SDRAM.EndAside.NonvolatileMemoryDRAMsandSRAMsarevolatileinthesensethattheylosetheirinformationifthesupplyvoltageisturnedoff.Nonvolatilememories,ontheotherhand,retaintheirinformationevenwhentheyarepoweredoff.Thereareavarietyofnonvolatilememories.Forhistoricalreasons,theyarereferredtocollectivelyasread-onlymemories(ROMs),eventhoughsometypesofROMscanbewrittentoaswellasread.ROMsaredistinguishedbythenumberoftimestheycanbereprogrammed(writtento)andbythemechanismforreprogrammingthem.AprogrammableROM(PROM)canbeprogrammedexactlyonce.PROMsincludeasortoffusewitheachmemorycellthatcanbeblownoncebyzappingitwithahighcurrent.AnerasableprogrammableROM(EPROM)hasatransparentquartzwindowthatpermitslighttoreachthestoragecells.TheEPROMcellsareclearedtozerosbyshiningultravioletlightthroughthewindow.ProgramminganEPROMisdonebyusingaspecialdevicetowriteonesintotheEPROM.AnEPROMcanbeerasedandreprogrammedontheorderof1000times.AnelectricallyerasablePROM(EEPROM)isakintoanEPROM,butdoesnotrequireaphysicallyseparateprogrammingdevice,andthuscanbe 538CHAPTER6.THEMEMORYHIERARCHYreprogrammedin-placeonprintedcircuitcards.AnEEPROMcanbereprogrammedontheorderof105timesbeforeitwearsout.Flashmemoryisatypeofnonvolatilememory,basedonEEPROMs,thathasbecomeanimportantstoragetechnology.Flashmemoriesareeverywhere,providingfastanddurablenonvolatilestorageforaslewofelectronicdevices,includingdigitalcameras,cellphones,musicplayers,PDAs,andlaptop,desktop,andservercomputersystems.InSection6.1.3,wewilllookindetailatanewformofash-baseddiskdrive,knownasasolidstatedisk(SSD),thatprovidesafaster,sturdier,andlesspower-hungryalternativetoconventionalrotatingdisks.ProgramsstoredinROMdevicesareoftenreferredtoasrmware.Whenacomputersystemispoweredup,itrunsrmwarestoredinaROM.Somesystemsprovideasmallsetofprimitiveinputandoutputfunctionsinrmware,forexample,aPC'sBIOS(basicinput/outputsystem)routines.ComplicateddevicessuchasgraphicscardsanddiskdrivecontrollersalsorelyonrmwaretotranslateI/O(input/output)requestsfromtheCPU.AccessingMainMemoryDataowsbackandforthbetweentheprocessorandtheDRAMmainmemoryoversharedelectricalcon-duitscalledbuses.EachtransferofdatabetweentheCPUandmemoryisaccomplishedwithaseriesofstepscalledabustransaction.AreadtransactiontransfersdatafromthemainmemorytotheCPU.AwritetransactiontransfersdatafromtheCPUtothemainmemory.Abusisacollectionofparallelwiresthatcarryaddress,data,andcontrolsignals.Dependingontheparticularbusdesign,dataandaddresssignalscansharethesamesetofwires,ortheycanusedifferentsets.Also,morethantwodevicescansharethesamebus.Thecontrolwirescarrysignalsthatsynchronizethetransactionandidentifywhatkindoftransactioniscurrentlybeingperformed.Forexample,isthistransactionofinteresttothemainmemory,ortosomeotherI/Odevicesuchasadiskcontroller?Isthetransactionareadorawrite?Istheinformationonthebusanaddressoradataitem?Figure6.6showsthecongurationofanexamplecomputersystem.ThemaincomponentsaretheCPUchip,achipsetthatwewillcallanI/Obridge(whichincludesthememorycontroller),andtheDRAMmemorymodulesthatmakeupmainmemory.Thesecomponentsareconnectedbyapairofbuses:asystembusthatconnectstheCPUtotheI/Obridge,andamemorybusthatconnectstheI/Obridgetothemainmemory.TheI/Obridgetranslatestheelectricalsignalsofthesystembusintotheelectricalsignalsofthememorybus.Aswewillsee,theI/ObridgealsoconnectsthesystembusandmemorybustoanI/ObusthatissharedbyI/Odevicessuchasdisksandgraphicscards.Fornow,though,wewillfocusonthememorybus.Aside:AnoteonbusdesignsBusdesignisacomplexandrapidlychangingaspectofcomputersystems.Differentvendorsdevelopdifferentbusarchitecturesasawaytodifferentiatetheirproducts.Forexample,IntelsystemsusechipsetsknownasthenorthbridgeandthesouthbridgetoconnecttheCPUtomemoryandI/Odevices,respectively.InolderPentiumandCore2systems,afrontsidebus(FSB)connectstheCPUtothenorthbridge.SystemsfromAMDreplacetheFSBwiththeHyperTransportinterconnect,whilenewerIntelCorei7systemsusetheQuickPathinterconnect.Thedetailsofthesedifferentbusarchitecturesarebeyondthescopeofthistext.Instead,wewillusethehigh-levelbusarchitecturefromFigure6.6asarunningexamplethroughoutthetext.Itisasimplebutusefulabstractionthat 6.1.STORAGETECHNOLOGIES539 Mainmemory I/O bridge Bus interface ALURegister file CPU chipSystem bus Memory bus Figure6.6:ExamplebusstructurethatconnectstheCPUandmainmemory.allowsustobeconcrete,andcapturesthemainideaswithoutbeingtiedtoocloselytothedetailofanyproprietarydesigns.EndAside.ConsiderwhathappenswhentheCPUperformsaloadoperationsuchasmovlA,%eaxwherethecontentsofaddressAareloadedintoregister%eax.CircuitryontheCPUchipcalledthebusinterfaceinitiatesareadtransactiononthebus.Thereadtransactionconsistsofthreesteps.First,theCPUplacestheaddressAonthesystembus.TheI/Obridgepassesthesignalalongtothememorybus(Figure6.7(a)).Next,themainmemorysensestheaddresssignalonthememorybus,readstheaddressfromthememorybus,fetchesthedatawordfromtheDRAM,andwritesthedatatothememorybus.TheI/Obridgetranslatesthememorybussignalintoasystembussignal,andpassesitalongtothesystembus(Figure6.7(b)).Finally,theCPUsensesthedataonthesystembus,readsitfromthebus,andcopiesittoregister%eax(Figure6.7(c)).Conversely,whentheCPUperformsastoreinstructionsuchasmovl%eax,Awherethecontentsofregister%eaxarewrittentoaddressA,theCPUinitiatesawritetransaction.Again,therearethreebasicsteps.First,theCPUplacestheaddressonthesystembus.Thememoryreadstheaddressfromthememorybusandwaitsforthedatatoarrive(Figure6.8(a)).Next,theCPUcopiesthedatawordin%eaxtothesystembus(Figure6.8(b)).Finally,themainmemoryreadsthedatawordfromthememorybusandstoresthebitsintheDRAM(Figure6.8(c)).6.1.2DiskStorageDisksareworkhorsestoragedevicesthatholdenormousamountsofdata,ontheorderofhundredstothousandsofgigabytes,asopposedtothehundredsorthousandsofmegabytesinaRAM-basedmemory.However,ittakesontheorderofmillisecondstoreadinformationfromadisk,ahundredthousandtimeslongerthanfromDRAMandamilliontimeslongerthanfromSRAM. 540CHAPTER6.THEMEMORYHIERARCHY ALURegister file Bus interfaceA0A xMain memoryI/O bridge%eax (a)CPUplacesaddressAonthememorybus. ALURegister file Bus interfacex 0A xMain memory%eaxI/O bridge (b)MainmemoryreadsAfromthebus,retrieveswordx,andplacesitonthebus. x ALURegister file Bus interface xMain memory0%eaxI/O bridge (c)CPUreadswordxfromthebus,andcopiesitintoregister%eax.Figure6.7:Memoryreadtransactionforaloadoperation:movlA,%eax. 6.1.STORAGETECHNOLOGIES541 y ALURegister file Bus interfaceA Main memory0A%eaxI/O bridge (a)CPUplacesaddressAonthememorybus.Mainmemoryreadsitandwaitsforthedataword. y ALURegister file Bus interfacey Main memory0A%eaxI/O bridge (b)CPUplacesdatawordyonthebus. y ALURegister file Bus interface yMain memory0A%eaxI/O bridge (c)MainmemoryreadsdatawordyfromthebusandstoresitataddressA.Figure6.8:Memorywritetransactionforastoreoperation:movl%eax,A. 542CHAPTER6.THEMEMORYHIERARCHYDiskGeometryDisksareconstructedfromplatters.Eachplatterconsistsoftwosides,orsurfaces,thatarecoatedwithmagneticrecordingmaterial.Arotatingspindleinthecenteroftheplatterspinstheplatterataxedrotationalrate,typicallybetween5400and15,000revolutionsperminute(RPM).Adiskwilltypicallycontainoneormoreoftheseplattersencasedinasealedcontainer.Figure6.9(a)showsthegeometryofatypicaldisksurface.Eachsurfaceconsistsofacollectionofcon-centricringscalledtracks.Eachtrackispartitionedintoacollectionofsectors.Eachsectorcontainsanequalnumberofdatabits(typically512bytes)encodedinthemagneticmaterialonthesector.Sectorsareseparatedbygapswherenodatabitsarestored.Gapsstoreformattingbitsthatidentifysectors. Spindle Surface Tracks Track k Sectors Gaps (a)Single-platterview. Surface 0Surface 1Surface 2Surface 3Surface 4Surface 5 Cylinder k SpindlePlatter 0Platter 1Platter 2 (b)Multiple-platterview.Figure6.9:Diskgeometry.Adiskconsistsofoneormoreplattersstackedontopofeachotherandencasedinasealedpackage,asshowninFigure6.9(b).Theentireassemblyisoftenreferredtoasadiskdrive,althoughwewillusuallyrefertoitassimplyadisk.Wewillsometimerefertodisksasrotatingdiskstodistinguishthemfromash-basedsolidstatedisks(SSDs),whichhavenomovingparts.Diskmanufacturersdescribethegeometryofmultiple-platterdrivesintermsofcylinders,whereacylinderisthecollectionoftracksonallthesurfacesthatareequidistantfromthecenterofthespindle.Forexample,ifadrivehasthreeplattersandsixsurfaces,andthetracksoneachsurfacearenumberedconsistently,thencylinderkisthecollectionofthesixinstancesoftrackk.DiskCapacityThemaximumnumberofbitsthatcanberecordedbyadiskisknownasitsmaximumcapacity,orsimplycapacity.Diskcapacityisdeterminedbythefollowingtechnologyfactors:Recordingdensity(bits=in):Thenumberofbitsthatcanbesqueezedintoa1-inchsegmentofatrack.Trackdensity(tracks=in):Thenumberoftracksthatcanbesqueezedintoa1-inchsegmentoftheradiusextendingfromthecenteroftheplatter. 6.1.STORAGETECHNOLOGIES543Arealdensity(bits=in2):Theproductoftherecordingdensityandthetrackdensity.Diskmanufacturersworktirelesslytoincreasearealdensity(andthuscapacity),andthisisdoublingeveryfewyears.Theoriginaldisks,designedinanageoflowarealdensity,partitionedeverytrackintothesamenumberofsectors,whichwasdeterminedbythenumberofsectorsthatcouldberecordedontheinnermosttrack.Tomaintainaxednumberofsectorspertrack,thesectorswerespacedfartherapartontheoutertracks.Thiswasareasonableapproachwhenarealdensitieswererelativelylow.However,asarealdensitiesincreased,thegapsbetweensectors(wherenodatabitswerestored)becameunacceptablylarge.Thus,modernhigh-capacitydisksuseatechniqueknownasmultiplezonerecording,wherethesetofcylindersispartitionedintodisjointsubsetsknownasrecordingzones.Eachzoneconsistsofacontiguouscollectionofcylinders.Eachtrackineachcylinderinazonehasthesamenumberofsectors,whichisdeterminedbythenumberofsectorsthatcanbepackedintotheinnermosttrackofthezone.Notethatdiskettes(oppydisks)stillusetheold-fashionedapproach,withaconstantnumberofsectorspertrack.Thecapacityofadiskisgivenbythefollowingformula:Diskcapacity=#bytes sectoraverage#sectors track#tracks surface#surfaces platter#platters diskForexample,supposewehaveadiskwithveplatters,512bytespersector,20,000trackspersurface,andanaverageof300sectorspertrack.ThenthecapacityofthediskisDiskcapacity=512bytes sector300sectors track20,000tracks surface2surfaces platter5platters disk=30,720,000,000bytes=30.72GB:Noticethatmanufacturersexpressdiskcapacityinunitsofgigabytes(GB),where1GB=109bytes.Aside:Howmuchisagigabyte?Unfortunately,themeaningsofprexessuchaskilo(K),mega(M),giga(G),andtera(T)dependonthecontext.FormeasuresthatrelatetothecapacityofDRAMsandSRAMs,typicallyK=210,M=220,G=230,andT=240.FormeasuresrelatedtothecapacityofI/Odevicessuchasdisksandnetworks,typicallyK=103,M=106,G=109,andT=1012.Ratesandthroughputsusuallyusetheseprexvaluesaswell.Fortunately,fortheback-of-the-envelopeestimatesthatwetypicallyrelyon,eitherassumptionworksneinprac-tice.Forexample,therelativedifferencebetween220=1;048;576and106=1;000;000issmall:(220106)=1065%.Similarlyfor230=1;073;741;824and109=1;000;000;000:(230109)=1097%.EndAside.PracticeProblem6.2:Whatisthecapacityofadiskwithtwoplatters,10,000cylinders,anaverageof400sectorspertrack,and512bytespersector?DiskOperationDisksreadandwritebitsstoredonthemagneticsurfaceusingaread/writeheadconnectedtotheendofanactuatorarm,asshowninFigure6.10(a).Bymovingthearmbackandforthalongitsradialaxis,the 544CHAPTER6.THEMEMORYHIERARCHYdrivecanpositiontheheadoveranytrackonthesurface.Thismechanicalmotionisknownasaseek.Oncetheheadispositionedoverthedesiredtrack,thenaseachbitonthetrackpassesunderneath,theheadcaneithersensethevalueofthebit(readthebit)oralterthevalueofthebit(writethebit).Diskswithmultipleplattershaveaseparateread/writeheadforeachsurface,asshowninFigure6.10(b).Theheadsarelinedupverticallyandmoveinunison.Atanypointintime,allheadsarepositionedonthesamecylinder. By moving radially, the armcan position the read/writehead over any track Spindle The disk surfacespins at a fixedrotational rateThe read/write headis attached to the endof the arm and flies over the disk surface ona thin cushion of air (a)Single-platterview ArmRead/write heads Spindle (b)Multiple-platterviewFigure6.10:Diskdynamics.Theread/writeheadattheendofthearmies(literally)onathincushionofairoverthedisksurfaceataheightofabout0.1micronsandaspeedofabout80km/h.ThisisanalogoustoplacingtheSearsToweronitssideandyingitaroundtheworldataheightof2.5cm(1inch)abovetheground,witheachorbitoftheearthtakingonly8seconds!Atthesetolerances,atinypieceofdustonthesurfaceislikeahugeboulder.Iftheheadweretostrikeoneoftheseboulders,theheadwouldceaseyingandcrashintothesurface(aso-calledheadcrash).Forthisreason,disksarealwayssealedinairtightpackages.Disksreadandwritedatainsector-sizedblocks.Theaccesstimeforasectorhasthreemaincomponents:seektime,rotationallatency,andtransfertime:Seektime:Toreadthecontentsofsometargetsector,thearmrstpositionstheheadoverthetrackthatcontainsthetargetsector.Thetimerequiredtomovethearmiscalledtheseektime.Theseektime,Tseek,dependsonthepreviouspositionoftheheadandthespeedthatthearmmovesacrossthesurface.Theaverageseektimeinmoderndrives,Tavgseek,measuredbytakingthemeanofseveralthousandseekstorandomsectors,istypicallyontheorderof3to9ms.Themaximumtimeforasingleseek,Tmaxseek,canbeashighas20ms.Rotationallatency:Oncetheheadisinpositionoverthetrack,thedrivewaitsfortherstbitofthetargetsectortopassunderthehead.Theperformanceofthisstepdependsonboththepositionofthesurfacewhentheheadarrivesatthetargetsectorandtherotationalspeedofthedisk.Intheworstcase,theheadjustmissesthetargetsectorandwaitsforthedisktomakeafullrotation.Thus,themaximumrotationallatency,inseconds,isgivenbyTmaxrotation=1 RPM60secs 1min 6.1.STORAGETECHNOLOGIES545Theaveragerotationallatency,Tavgrotation,issimplyhalfofTmaxrotation.Transfertime:Whentherstbitofthetargetsectorisunderthehead,thedrivecanbegintoreadorwritethecontentsofthesector.Thetransfertimeforonesectordependsontherotationalspeedandthenumberofsectorspertrack.Thus,wecanroughlyestimatetheaveragetransfertimeforonesectorinsecondsasTavgtransfer=1 RPM1 (average#sectors/track)60secs 1minWecanestimatetheaveragetimetoaccessthecontentsofadisksectorasthesumoftheaverageseektime,theaveragerotationallatency,andtheaveragetransfertime.Forexample,consideradiskwiththefollowingparameters: Parameter Value Rotationalrate 7200RPM Tavgseek 9ms Average#sectors/track 400 Forthisdisk,theaveragerotationallatency(inms)isTavgrotation=1/2Tmaxrotation=1/2(60secs/7200RPM)1000ms/sec4msTheaveragetransfertimeisTavgtransfer=60/7200RPM1/400sectors/track1000ms/sec0.02msPuttingitalltogether,thetotalestimatedaccesstimeisTaccess=Tavgseek+Tavgrotation+Tavgtransfer=9ms+4ms+0.02ms=13.02msThisexampleillustratessomeimportantpoints:Thetimetoaccessthe512bytesinadisksectorisdominatedbytheseektimeandtherotationallatency.Accessingtherstbyteinthesectortakesalongtime,buttheremainingbytesareessentiallyfree.Sincetheseektimeandrotationallatencyareroughlythesame,twicetheseektimeisasimpleandreasonableruleforestimatingdiskaccesstime.