/
DDM – A Cache Only Memory Architecture DDM – A Cache Only Memory Architecture

DDM – A Cache Only Memory Architecture - PowerPoint Presentation

anya
anya . @anya
Follow
29 views
Uploaded On 2024-02-03

DDM – A Cache Only Memory Architecture - PPT Presentation

Hagersten Landin and Haridi 1991 Presented by Patrick Eibl Outline Basics of CacheOnly Memory Architectures The Data Diffusion Machine DDM DDM Coherence Protocol Examples of Replacement Reading Writing ID: 1044412

memory data shared cache data memory cache shared ddm hierarchy request read block coma write issues propagates processor memories

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "DDM – A Cache Only Memory Architecture" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. DDM – A Cache Only Memory ArchitectureHagersten, Landin, and Haridi (1991)Presented by Patrick Eibl

2. OutlineBasics of Cache-Only Memory ArchitecturesThe Data Diffusion Machine (DDM)DDM Coherence ProtocolExamples of Replacement, Reading, WritingMemory OverheadSimulated PerformanceStrengths and WeaknessesAlternatives to DDM Architecture

3. The Big Idea: UMA→NUMA →COMACentralized shared memory feeds data through network to individual cachesUniform access time to all memoryShared memory is distributed among processors (DASH)Data can move from home memory to other caches as neededNo notion of “home” for data; moves to wherever it is neededIndividual memories behave like caches

4. COMA: The BasicsIndividual memories are called Attraction Memories (AM) – each processor “attracts” its working data setAM also contains data that has never been accessed (+/-?)Uses shared memory programming model, but with no pressure to optimize static partitioningLimited duplication of shared memoryThe Data Diffusion Machine is the specific COMA presented here

5. Data Diffusion MachineDirectory hierarchy allows scaling to arbitrary number of processorsBranch factors and bottlenecks a considerationHierarchy can be split into different address domains to improve bandwidth

6. Coherence ProtocolTransient states support split-transaction busFairly standard protocol with important exception of replacement, which must be managed carefully (example to come)Sequential consistency is guaranteed, but with cost that writes must wait for acknowledge before continuingItem StatesI: InvalidE: ExclusiveS: SharedR: ReadingW: WaitingRW: Reading and WaitingBus Transactionse: erasex: exclusiver: readd: datai: injecto: out

7. SIIIIPPPPPPPPIISSIIII: InvalidS: SharedProcessorsCachesDirectorieso: outi: injectReplacement ExampleIIoooiiidIS1. A block needs to be brought into a full AM, necessitating a replacement and an out transaction2. Out propagates up until it finds another copy of block in S, R, W, or RW3. Out reaches top and is converted to inject, meaning this is the last copy of the data and it needs a new home4. Inject finds space in new AM5. Data is transferred to new home6. States change accordingly

8. ISISSIISSSSSSSPPPPPPPPRIRIASASRIRIRIrrrrrrrdddddddI: InvalidR: ReadingA: AnsweringS: SharedProcessorsCachesDirectoriesr: readd: dataMultilevel Read Example1. First cache issues read request2. Read propagates up hierarchy4. Directories change to answering state while waiting for data3. Read reaches directory with block in shared state5. Data moves back along same path, changing states to shared as it goes2. Second cache issues request for same block3. Request for same block encountered; directory simply waits for data reply from other request

9. IIIIIIRWWEEIIIIPPPPPPPPeI: InvalidR: ReadingW: WaitingE: ExclusiveS: SharedProcessorsCachesDirectoriese: erasex: exclusiveEWWSSSSSSSSSWSeeeeeexxMultilevel Write Example1. Cache issues write request2. Erase propagates up hierarchy and back down, invalidating all other copies5. ACK propagates back down, changing states from Waiting to Exclusive4. Top of hierarchy responds with acknowledge2. Second cache issues write to same block3. Second exclusive request encounters other write to same block; first one won because it arrived first; other erase is propagated back down4. State of second cache changed to RW, and will issue a read request before another erase (not shown)

10. Memory OverheadInclusion is necessary for directories, but not for dataDirectories only need state bits and address tagsFor two sample configurations given, overheads were 6% for one-level 32-processor and 16% for two-level 256-processorLarger item size reduces overhead

11. Simulated PerformanceMinimal success on programs for which each processor operates on entire shared dataMP3D was rewritten to improve performance by exploiting fact that data has no homeOS, hardware, and emulator in development at the timeDifferent DDM topology for each program (-)

12. StrengthsEach processor attracts the data it’s using into its own memory spaceData doesn’t need to be duplicated at a home nodeOrdinary shared memory programming modelNo need to optimize static partitioning (there is none)Directory hierarchy scales reasonably wellGood when data is moved around in smaller chunks

13. WeaknessesAttraction memories hold data that isn’t being used, making them bigger and slowerDifferent DDM hierarchy topology was used for each program in simulationsDoes not fully exploit large spatial locality; software wins in that case (S-COMA)Branching hierarchy is prone to bottlenecks and hotspotsNo way to know where data is but with expensive tree traversal (NUMA wins here)

14. Alternatives to COMA/DDMFlat-COMABlocks are free to migrate, but have home nodes with directories corresponding to physical addressSimple-COMAAllocation managed by OS and done at page granularityReactive-NUMASwitches between S-COMA and NUMA with remote cache on per-page basisGood summary of COMAs: http://ieeexplore.ieee.org/iel5/2/16679/00769448.pdf?tp=&isnumber=&arnumber=769448