One approach add sockets to your MOBOminimal changes to existing CPUspower delivery heat removal and IO not too bad since each chip has own set of pins and coolingCPUCPUCPUCPUPictures found from goog ID: 878248
Download Pdf The PPT/PDF document "Spring 2009 Prof HyesoonKim Thanks to Pr..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1 Spring 2009 Prof. HyesoonKim
Spring 2009 Prof. HyesoonKim Thanks to Prof. Loh& Prof. Prvulovic One approach: add sockets to your MOBOminimal changes to existing CPUspower delivery, heat removal and I/O not too bad since each chip has own set of pins and cooling CPUCPU CPUCPU Pictures found from google images Simple SMP on the same chip Intel Smithfield Block DiagramAMD Dual-Core Athlon FXPictures found from google images Resources can be shared between CPUsex. IBM Power 5 CPU CPU L2 cache shared betweenboth CPUs (no need tokeep two copies coherent) L3 cache is also shared (only tagsare on-chip; data are off-chip) Cheaper than mobo-based SMPall/most interface logic integrated on to main chip (fewer total chips, single CPU
2 socket, single interface to main memory
socket, single interface to main memory)less power than mobo-based SMP as well (communication on-die is more power-efficient than chip-to-chip communication)Performanceon-chip communication is fasterEfficiencypotentially better use of hardware resources than trying to make wider/more OOO single-threaded CPU Single thread in superscalar execution: dependences cause most of stallsIdea: when one thread stalled, other can goDifferent granularities of multithreadingCoarse MT: can change thread every few cyclesFine MT: can change thread every cycleSimultaneous Multithreading (SMT)Instrs from different threads even in the same cycleAKA Hyperthreading Uni-Processor: 4-6 wide, lucky if you get 1-2 IPCpoor u
3 tilizationSMP: 2-4 CPUs, but need indep
tilizationSMP: 2-4 CPUs, but need independent taskselse poor utilization as wellSMT: Idea is to use a single large uni-processor as a multi-processor Regular CPU CMP 2x HW Cost SMT (4 threads) Approx 1x HW Cost For an N-way (N threads) SMT, we need:Fetch:Ability to fetch from N threads, multiple PCs RenameN rename tables (RATs)N ARFNeed to maintain interrupts, exceptions, faults on a per-thread basisBut we dont need to replicate the entire OOO execution engine (schedulers, execution units, bypass networks, ROBs, etc.) Each process has own virtual address spaceTLB must be thread-awaretranslate (thread-id,virtual page) physical pageVirtual portion of caches must also be thread-awareVIVT cache must
4 now be (virtual addr, thread-id)-indexed
now be (virtual addr, thread-id)-indexed, (virtual addr, thread-id)-taggedSimilar for VIPT cacheNo changes needed if using PIPT cache (like L2) Can have a system that supports SMP, CMP and SMT at the same timeTake a dual-socket SMP motherboard Insert two chips, each with a dual-core CMP Where each core supports two-way SMTNehalemThis example provides 8 threads worth of execution, shared on 4 actual cores, split across two physical packages SMT/CMP is supposed to look like multiple CPUs to the software/OS 2-waySMT 2-waySMT 2 cores(either SMP/CMP) CPUCPUCPUCPU Say OS has twotasks to run AA BB idleidle idleidle Schedule tasks to(virtual) CPUs A/BA/B idleidle Performanceworse thanif SMT wasturned offand used2-way SMPon