Multicore Architecture Michael Gschwind et al Presented by Jia Zou CS258 3508 Goal for Cell Increase processor efficiency for most performance per area Reduce area per core have more core in a given chip are ID: 418398
Download Presentation The PPT/PDF document "Synergistic Processing In Cell’s" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Synergistic Processing In Cell’s Multicore ArchitectureMichael Gschwind, et al.
Presented by:
Jia
Zou
CS258
3/5/08Slide2
Goal for CellIncrease processor efficiency for most performance per areaReduce area per core, have more core in a given chip areTake advantage of the application parallelismAimd at data-processing intensive applicationsSlide3
Cell ArchitectureSlide4
Design PhilosophySimple cores, lots of themAny complexity reduction directly translates into increased performanceExploiting the compiler to eliminate hardware complexityPPE serves as controller, SPE provides performance
PPE and SPEs share address translation and virtual memory architectureSlide5
Synergic Processing UnitSlide6
Data alignment for Scalar and Vector ProcessingSPU has no separate support for scalar processingUnified scalar/SIMD register Unified execution unit
Simpler control unit
Software-controlled data-alignment approach
Simplifies scalar data extraction, insertion, sharing between scalar and vector data
Increases compiler efficiencySlide7
Scalar LayeringSlide8
Data-Parallel Conditional ExecutionSlide9
Deterministic Data DeliverySPE has local stores4Kb – 4Gb address rangeStores both instruction and dataAll memory operations that the SPU executes refer to address space of this local store
Different from cache memory by:
No cache coherency problem
Offers low and deterministic access latencySlide10
Statically Scheduled ILPInstruction fetches are scheduled staticallyDelivery up to two instructions per cycleOne to each complexStatic branch prediction: prepare-to-branch instruction => initiate instruction prefetchSlide11
SPE MicroarchitectureSlide12
Design Goals and Decisions