/
CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories

CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories - PowerPoint Presentation

aaron
aaron . @aaron
Follow
347 views
Uploaded On 2019-11-25

CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories - PPT Presentation

CACTI 7 New Tools for Interconnect Exploration in Innovative OffChip Memories Rajeev Balasubramonian Andrew B Kahng Naveen Muralimanohar Ali Shafiee Vaishnav Srinivas 1 Main Memory Matters Architecture ID: 768083

memory power dimm 667 power memory 667 dimm channel cascaded higher cost cpu 533 bandwidth channels cacti 667mhz dram

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CACTI 7: New Tools for Interconnect Expl..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

CACTI 7: New Tools for Interconnect Exploration in Innovative Off-Chip Memories Rajeev Balasubramonian Andrew B. Kahng Naveen Muralimanohar Ali ShafieeVaishnav Srinivas 1

Main Memory Matters Architecture Software Technology In-Memory DBs, Key-Value StoresGraph Algorithms, Deep LearningDDR4, HMC, HBM, NVM Commodity CPUs, Accelerators Shift in bottlenecks Example innovations: NDP, DDR to GDDR5  3x TOPS in TPU The Innovation Hub is Moving to Memory 2

Two Silos CACTI 7 can be used out-of-the-box when defining memory parameters for traditional memory systemsCACTI 7 primitives can be leveraged to model and evaluate new memory architectures 3

Talk Outline CACTI for the main memory Inputs/outputsThe nuts and boltsModeling I/O powerDesign space exploration Case studies: two novel architecturesCascaded ChannelsNarrow Channels 4

CACTI for Memory Exhaustive Search Channel Configs Energy per accessCapacity#channels, ECC vs. NotDRAM Type: DDR3,DDR4 Access Pattern: bw , row buffer hits, Rd/ Wr ratio Cost Table Bandwidth Table Inputs and outputs 5 Power Parameters

DIMM Cost Cost factors: technology, capacity, support for ECC, max bandwidth, vendor Aggregated costs from online sourcesCost is volatile and should be updated periodically 4GB 8GB 16GB 32GB 64GB DDR3 UDIMM 40 76 RDIMM 42 64 122 304 LRDIMM 211 287 1079 DDR4 UDIMM 26 46 RDIMM 33 60 126 310 LRDIMM 279 331 1474 Cost and capacity relationship is not linear Cost in dollars 6

Bandwidth Bandwidth depends on load, voltage, and DIMM type 1DPC (MHz) 2DPC (MHz) 3DPC (MHz) 1.35V 1.5V 1.35V 1.5V 1.35V 1.5V DDR3 UDIMM-DR 533 667 533 667 RDIMM-DR 667 800 667 667 533 RDIMM-QR 667 667 LRDIMM-QR 667 667 667 667 533 533 1.2V 1.2V 1.2V DDR4 RDIMM-DR 1066 933 800 RDIMM-QR 933 800 LRDIMM-QR 1066 1066 800 7

Power Modeling Extending CACTI-I/ODDR4 and SerDes support addedSerDes parameters from literature for different lengths/speedsFor parallel buses, support for more accurate termination power with HSPICE simulations Different termination models for each bus typeDifferent frequency, DIMMs per channelOn-DIMM and on-board Different range (short or long)8

Interconnect Model API 9

Power Analysis (DDR3) 10

Power Analysis (DDR4) 11

Cost and Bandwidth Analysis Highest possible BW for the demanded capacity Lowest possible cost for the demanded capacity12

Two Case Studies Key ObservationsHigh DPC  less BWMore channels  high bw and low costNew Idea I: Cascaded SegmentsEach segment has few DIMMs  higher BW New Idea II: Narrow ChannelsPartition the channel into many parallel channelsFewer DIMMs per data wire, new ECC  higher BWLower power on DIMM13

Cascaded Channels DIMM DIMM DIMM CPU DIMM DIMM DIMM CPU Same DPC, higher BW 533 MHz 667MHz 667MHz 64 GB 64 GB CPU 64 GB 32 GB 32 GB CPU RoB Same BW, lower cost 667 MHz 667MHz 667MHz one memory cycle increase in latency 14 RoB Relay on Board chip

Hybrid Memory D D CPU N N D N CPU D N NVM is slow  Software optimized to access DRAM more One Channel DRAM One Channel NVM Frontend DRAM Backend NVM Unbalanced channel Load balanced channel Load 15

Narrow Channels Higher Bandwidth but Higher Latency Lower frequency/power for DRAM Chips! ECC on DIMM and CRC for link to reduce bwCommand/Address Bus is shared between channels16

Methodology Trace-based simulationTrace fed to USIMM Memory-intensive Benchmarks (NPB and SPEC2006) Trace generated by Simics 8-core at 3.2 GHzL1D = 32KB, L1I = 32KB, L2 = 8MBPower CACTI 7 17

Cascaded Channels DDR3 DDR4 25% higher BW 22% higher IPC13% higher BW 12% higher IPC18

Cascaded Latency 19

Cascaded Power: DRAM Cartridge DIMM BoB I/O Total Power/BW Baseline 23.2W 5.5W 9.4W 38.1W 7.9 ( nJ /B) Cascaded 22.6W 6.4W 12.2W 41.2W 6.7 ( nJ /B) CPU CPU 533 MHz 70% utilization 667MHz 70% utilization 667MHz 35% utilization 20

Cascaded Cost 21

Cascaded Hybrid Percentage of Load on DRAM 22

Narrow Channel: Performance Performance Improvement: 2-channel-x36  18% 3-channel-x24  17%23

Narrow Channel: Power 23% overall memory power reduction 24

Conclusion CACTI 7: models off-chip memories and I/ODetailed I/O power modelDesign space exploration Analyzes trade-offs: capacity, power, bandwidth, and costTwo novel architectures Cascaded channelsNarrow channels 25