/
18-447 Computer Architecture 18-447 Computer Architecture

18-447 Computer Architecture - PowerPoint Presentation

olivia-moreira
olivia-moreira . @olivia-moreira
Follow
342 views
Uploaded On 2020-01-30

18-447 Computer Architecture - PPT Presentation

18447 Computer Architecture Lecture 30 Inmemory Processing Vivek Seshadri Carnegie Mellon University Spring 2015 4132015 Goals for T his Lecture Understand DRAM technology How it is built How it operates ID: 774184

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "18-447 Computer Architecture" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

18-447Computer ArchitectureLecture 30: In-memory Processing Vivek Seshadri Carnegie Mellon University Spring 2015, 4/13/2015

Goals for This LectureUnderstand DRAM technologyHow it is built?How it operates?What are the trade-offs?Can we use DRAM for more than just storage?In-DRAM copyingIn-DRAM bitwise operations 2

DRAM Module and Chip3

Goals of DRAM DesignCostLatencyBandwidthParallelismPowerEnergyReliability 4

DRAM Chip5 Bank

DRAM Cell – Capacitor6 Empty State Fully Charged State Logical “0” Logical “1” 1 2 Small – Cannot drive circuits Reading destroys the state

Sense Amplifier7 e nable top bottom Inverter

Sense Amplifier – Two Stable States8 en en 0 0 V DD V DD Logical “1” Logical “0”

Sense Amplifier Operation9 dis V T V B V T > V B en 0 V DD

Capacitor to Sense Amplifier10 en 0 V DD en V DD 0 ?

DRAM Cell Operation 11 ½V DD ½V DD dis en 0 V DD ½V DD + δ Cell loses charge Cell regains charge

Amortizing Cost – DRAM Tile12 Row Driver

DRAM Subarray13 Row Driver Tile Tile Tile Row Decoder

DRAM Subarray14 Row Driver Tile Tile Tile Row Decoder Tile Tile Tile Tile

DRAM Bank 15 Row Decoder Array of Sense Amplifiers (8Kb) Cell Array Cell Array Row Decoder Array of Sense Amplifiers Cell Array Cell Array Bank I/O (64b) Address Address Data

DRAM Chip16 Shared internal bus M emory channel - 8bits Row Decoder Array of Sense Amplifiers (8Kb) Cell Array Cell Array Row Decoder Array of Sense Amplifiers Cell Array Cell Array Bank I/O (64b) Row Decoder Array of Sense Amplifiers (8Kb) Cell Array Cell Array Row Decoder Array of Sense Amplifiers Cell Array Cell Array Bank I/O (64b) Row Decoder Array of Sense Amplifiers (8Kb) Cell Array Cell Array Row Decoder Array of Sense Amplifiers Cell Array Cell Array Bank I/O (64b) Row Decoder Array of Sense Amplifiers (8Kb) Cell Array Cell Array Row Decoder Array of Sense Amplifiers Cell Array Cell Array Bank I/O (64b) Row Decoder Array of Sense Amplifiers (8Kb) Cell Array Cell Array Row Decoder Array of Sense Amplifiers Cell Array Cell Array Bank I/O (64b) Row Decoder Array of Sense Amplifiers (8Kb) Cell Array Cell Array Row Decoder Array of Sense Amplifiers Cell Array Cell Array Bank I/O (64b) Row Decoder Array of Sense Amplifiers (8Kb) Cell Array Cell Array Row Decoder Array of Sense Amplifiers Cell Array Cell Array Bank I/O (64b) Row Decoder Array of Sense Amplifiers (8Kb) Cell Array Cell Array Row Decoder Array of Sense Amplifiers Cell Array Cell Array Bank I/O (64b)

DRAM Operation17 Row Decoder Row Decoder Array of Sense Amplifiers Cell Array Cell Array Bank I/O Data 1 2 ACTIVATE Row READ/WRITE Column 3 PRECHARGE Row Address Column Address

Goals for This LectureUnderstand DRAM technologyHow it is built?How it operates?What are the trade-offs?Can we use DRAM for more than just storage?In-DRAM copyingIn-DRAM bitwise operations 18

Trade-offs in DRAM DesignCostLatencyBandwidthParallelismPowerEnergyReliability 19 Rows/Subarray Data width, Chips/DIMM Banks/Chip

Goals for This LectureUnderstand DRAM technologyHow it is built?How it operates?What are the trade-offs?Can we use DRAM for more than just storage?In-DRAM copying In-DRAM bitwise operations 20

RowCloneFast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization Y. Kim, C. Fallin , D. Lee, R. Ausavarungnirun , G. Pekhimenko , Y. Luo , O. Mutlu, P. B. Gibbons, M. A. Kozuch , T. C. Mowry Vivek Seshadri

Memory Channel – BottleneckCore Core Cache MC Memory Channel Limited Bandwidth High Energy

Goal: Reduce Memory Bandwidth DemandCore Core Cache MC Memory Channel Reduce unnecessary data movement

Bulk Data Copy and InitializationBulk Data CopyBulk Data Initialization src dst dst val

Bulk Data Copy and InitializationBulk Data CopyBulk Data Initialization src dst dst val

Bulk Copy and Initialization – Applications Forking 000000000000000 Zero initialization (e.g., security) VM Cloning Deduplication Checkpointing Page Migration Many more

Shortcomings of Existing ApproachCore Core Cache MC Channel src dst High latency ( 1046 ns to copy 4 KB) Interference High Energy ( 3600 nJ to copy 4 KB)

Our Approach: In-DRAM Copy with Low CostCore Core Cache MC Channel dst High latency Interference High Energy src X X X ?

RowClone: In-DRAM Copy29

Bulk Copy in DRAM – RowClone30 ½V DD ½V DD 0 1 0 V DD ½V DD + δ Data gets copied

Fast Parallel Mode – Benefits31Latency Energy Bulk Data Copy (4KB across a module) 1046 ns to 90 ns 3600 nJ to 40 nJ No bandwidth consumption Very little changes to the DRAM chip 11X 74X

Fast Parallel Mode – ConstraintsLocation constraintSource and destination in same subarraySize constraintEntire row gets copied (no partial copy) 32 1 2 Can still accelerate many existing primitives ( copy-on-write, bulk zeroing ) Alternate mechanism to copy data across banks ( pipelined serial mode – lower benefits than Fast Parallel )

End-to-end System DesignSoftware interfacememcpy and meminit instructionsManaging cache coherenceUse existing DMA support!Maximizing use of Fast Parallel Mode Smart OS page allocation 33

Applications Summary34

Results Summary35

Goals for This LectureUnderstand DRAM technologyHow it is built?How it operates?What are the trade-offs?Can we use DRAM for more than just storage?In-DRAM copying In-DRAM bitwise operations 36

Triple Row Activation 37 ½V DD ½V DD dis A B C Final State AB + BC + AC ½V DD + δ C(A + B) + ~C(AB) en 0 V DD

In-DRAM Bitwise AND/ORRequired Operation: Perform a bitwise AND of two rows A and B and store the result in CR0 – reserved zero row, R1 – reserved one row D1, D2, D3 – Designated rows for triple activation RowClone A into D1RowClone B into D2RowClone R0 into D3 ACTIVATE D1,D2,D3 RowClone Result into C 38

Throughput Results39 L1 L2 L3 Memory

Bitmap IndexAlternative to B-tree and its variantsEfficient for performing range queries and joins 40 Bitmap 1 Bitmap 2 Bitmap 4 Bitmap 3 age < 18 18 < age < 25 25 < age < 60 age > 60

Performance Evaluation41

Goals for This LectureUnderstand DRAM technologyHow it is built?How it operates?What are the trade-offs?Can we use DRAM for more than just storage? In-DRAM copying In-DRAM bitwise operations 42