Jaeung Han¹ Jeongseob Ahn¹ Changdae Kim ¹ Youngjin Kwon¹ Young ri Choi² and Jaehyuk Huh¹ ¹ KAIST Korea Advanced Institute of Science and Technology ² KISTI Korea Institute of Science and Technology Information ID: 795633
Download The PPT/PDF document "The Effect of Multi-core on HPC Applicat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Effect of Multi-core on HPC Applications in Virtualized Systems
Jaeung Han¹, Jeongseob Ahn¹,
Changdae
Kim
¹, Youngjin Kwon¹,
Young-
ri
Choi², and Jaehyuk Huh¹
¹ KAIST
(Korea Advanced Institute of Science and Technology)
² KISTI
(Korea Institute of Science and Technology Information)
Slide2Outline
Virtualization for HPC
Virtualization on Multi-core
Virtualization for HPC on Multi-coreMethodologyPARSEC – shared memory modelNPB – MPI modelConclusion
2
Slide3Outline
Virtualization for HPC
Virtualization on Multi-core
Virtualization for HPC on Multi-coreMethodologyPARSEC – shared memory modelNPB – MPI modelConclusion
3
Slide4Benefits of Virtualization
4
Hardware
Virtual Machine Monitor
VM
VM
VM
Improve system utilization by consolidation
Slide5Benefits of Virtualization
5
Hardware
Virtual Machine Monitor
VM
Windows
VM
Linux
VM
Solaris
Improve system utilization by consolidation
Support for multiple types of
OSes
on a system
Slide6Benefits of Virtualization
6
Hardware
Virtual Machine Monitor
VM
Windows
VM
Linux
VM
Solaris
Improve system utilization by consolidation
Support for multiple types of
OSes
on a system
Fault isolation
Slide7Benefits of Virtualization
7
Hardware
Virtual Machine Monitor
VM
Windows
VM
Linux
VM
Solaris
Hardware
Virtual Machine Monitor
Improve system utilization by consolidation
Support for multiple types of
OSes
on a system
Fault isolation
Flexible resource management
Slide8Benefits of Virtualization
8
Improve system utilization by consolidation
Support for multiple types of OSes on a systemFault isolationFlexible resource management
Hardware
Virtual Machine Monitor
VM
Windows
VM
Linux
VM
Solaris
Hardware
Virtual Machine Monitor
Slide9Benefits of Virtualization
9
Improve system utilization by consolidation
Support for multiple types of OSes on a systemFault isolationFlexible resource management
Cloud computing
VM
Windows
VM
Linux
VM
Solaris
Cloud
Hardware
Virtual Machine Monitor
Slide10Virtualization for HPC
Benefits of virtualization
Improve system utilization by consolidation
Support for multiple types of OSes on a systemFault isolationFlexible resource managementCloud computing
HPC is performance-sensitive
Virtualization can help HPC
workloads
10
resource-sensitive
Slide11Outline
Virtualization for HPC
Virtualization on Multi-core
Virtualization for HPC on Multi-coreMethodologyPARSEC – shared memory modelNPB – MPI modelConclusion
11
Slide12Virtualization on Multi-core
12
core
More VMs on a physical machine
More complex memory hierarchy (NUCA, NUMA)
VM
VM
core
VM
VM
core
VM
VM
core
VM
VM
core
VM
VM
core
VM
VM
Shared cache
Shared cache
Memory
Memory
core
VM
VM
core
VM
VM
Slide13ChallengesVM management cost
Semantic gaps
vCPU
scheduling, NUMA13
Virtual Machine Monitor
VM
VM
VM
VM
VM
VM
VM
VM
Scheduling, Memory, Communication,
I/O multiplexing…
M
e
m
M
e
m
core
core
core
core
core
core
core
core
Virtual Machine Monitor
core
core
core
core
OS
Memory
$
$
Slide14Outline
Virtualization for HPC
Virtualization on Multi-core
Virtualization for HPC on Multi-coreMethodologyPARSEC – shared memory modelNPB – MPI modelConclusion
14
Slide15Virtualization for HPC on Multi-coreVirtualization may help HPC
Virtualization on multi-core may have some overheads
For servers, improving system utilization is a key factor
For HPC, performance is a key factor.15
How much overheads are there?
Where do they come from?
Slide16Outline
Virtualization for HPC
Virtualization on Multi-core
Virtualization for HPC on Multi-coreMethodologyPARSEC – shared memory modelNPB – MPI modelConclusion
16
Slide17MachinesSingle Socket System
12-cores AMD processor
Uniform memory access latency
Two 6MB L3 caches shared by 6 coresDual Socket System 2x 4-core Intel processorNon-uniform memory
access
latency
Two 8MB L3 caches shared by 4 cores
17
P
L2
P
L2
L3
P
L2
P
L2
P
L2
P
L2
P
L2
P
L2
L3
P
L2
P
L2
P
L2
P
L2
Single socket: 12-core CPU
Memory
P
L2
P
L2
P
L2
P
L2
L3
P
L2
P
L2
P
L2
P
L2
L3
Dual socket: 2x 4-core CPUs
Slide18Workloads
PARSEC
Shared memory model
Input: nativeOn one machine
Single and Dual socket
Fix: One VM
Vary: 1, 4, 8
vCPUs
NAS Parallel Benchmark
MPI model
Input: class C
On two machines (dual socket)
1Gb Ethernet switchFix: 16
vCPUsVary: 2 ~ 16 VMs
18
M
e
m
M
e
m
core
core
core
core
core
core
core
core
Virtual Machine Monitor
core
core
core
core
OS
Memory
$
$
Virtual Machine Monitor
VM
VM
VM
VM
VM
VM
VM
VM
Hardware
Virtual Machine Monitor
VM
VM
VM
VM
VM
VM
VM
VM
Hardware
Semantic gaps
VM management cost
Slide19Outline
Virtualization for HPC
Virtualization on Multi-core
Virtualization for HPC on Multi-coreMethodologyPARSEC – shared memory modelNPB – MPI modelConclusion
19
Slide20PARSEC – Single SocketSingle socket
No NUMA effect
Very low virtualization overheads
20
2~4 %
Execution times normalized to native runs
Slide21PARSEC – Single SocketSingle socket +
pin
vCPU to each pCPUReduce semantic gaps by prevent vCPU migration
vCPU
migration has negligible effect
21
Execution times normalized to native runs
Similar to unpinned
Slide22PARSEC – Dual SocketDual socket,
unpinned
vCPUsNUMA effect semantic gapSignificant increase of overheads
22
16~37 %
Execution times normalized to native runs
Slide23PARSEC – Dual SocketDual socket,
pinned
vCPUsMay reduce NUMA effect alsoReduced overheads with 1 and 4 vCPUs23
Execution times normalized to native runs
Slide24XEN and NUMA machine
Memory allocation policy
Allocate up to 4GB chunk on one socket
Scheduling policyPinning to allocated socketNothing morePinning 1 ~ 4 vCPUs on the socket
mem
. allocated is possible
Impossible with 8
vCPUs
24
M
e
m
core
core
core
core
core
core
core
core
$
$
M
e
m
VM
0
VM
1
VM
2
VM
3
Slide25Mitigating NUMA Effects
Range pinning
Pin
vCPUs of a VM on a socketWork only if # of vCPUs < # of cores on a socketRange-pinned (best):
memory of VM in the
same
socket
Range-pinned (worst):
memory of VM in the
other
socket
NUMA-first scheduler
If there is an idle core in the socket memory allocated, pick itIf not, anyway, pick a core in the machine
All vCPUs are not active all the time (sync. or I/O)
25
Slide26Range PinningFor 4
vCPUs
case
Range-pinned(best) ≈ Pinned 26
Execution times normalized to native runs
Slide27NUMA-first SchedulerFor 8
vCPUs
case
Significant improvement by NUMA-first scheduler27
Execution times normalized to native runs
Slide28Outline
Virtualization for HPC
Virtualization on Multi-core
Virtualization for HPC on Multi-coreMethodologyPARSEC – shared memory modelNPB – MPI modelConclusion
28
Slide29VM Granularity for MPI modelFine-grained VMs
Few processes in a VM
Small VM:
vCPUs, memoryFault isolation among processes in different VMsMany VMs on a machineMPI communications mostly through the VMM
Coarse-grained VMs
Many processes in a VM
Large VM:
vCPUs
, memory
Single failure
point
for processes in a VM
Few VMs on a machine
MPI communications mostly within a VM29
VMM
Hardware
VMM
Hardware
VMM
Hardware
VMM
Hardware
Slide30NPB - VM GranularityWork to do are same for all granularity
2 VMs: each VM has 8
vCPUs
, 8 MPI processes16 VMs: each VM has 1 vCPU, 1 MPI processes
30
Execution times normalized to native runs
11~54 %
Slide31NPB - VM Granularity
Fine-grained VMs
significant overheads (avg. 54%)
MPI communications mostly through VMMWorst in CG with high communication ratioSmall memory per VM
VM management costs of VMM
Coarse-grained VMs
much less overheads (avg. 11%)
Still dual socket, but less overheads than shared memory model the bottle neck is moved to communication
MPI communication largely within VM
Large memory per VM
31
Slide32Outline
Virtualization for HPC
Virtualization on Multi-core
Virtualization for HPC on Multi-coreMethodologyPARSEC – shared memory modelNPB – MPI modelConclusion
32
Slide33ConclusionQuestions on virtualization for HPC on multi-core system
How much overheads are there?
Where do they come from?
For shared memory modelWithout NUMA little overheadsWith NUMA large overheads from semantic gaps
For MPI model
Less NUMA effect communication is important
Fine-grained VMs have large overheads
Communication mostly through VMM
Small memory / VM management cost
Future Works
NUMA-aware VMM scheduler
Optimize communication among VMs in a machine
33
Slide3434
Thank you!
Slide3535
Backup slides
Slide36PARSEC CPU UsageEnvironments: native
linux
, turn on only 8 cores (use 8 threads mode)
Get CPU usage every seconds, then average them
For all workloads, less than 800% (fully parallel)
NUMA-first can work
36