Ross McIlroy Chris Hawblitzel Galen Hunt Microsoft Research Helios Heterogeneous Multiprocessing with Satellite Kernels 1 Problem HW now heterogeneous Heterogeneity ignored by operating systems ID: 492095
Download Presentation The PPT/PDF document "Ed Nightingale, Orion Hodson," is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen HuntMicrosoft Research
Helios: Heterogeneous Multiprocessing with Satellite Kernels
1Slide2
Problem: HW now heterogeneousHeterogeneity ignored by operating systems
RAM
Programming models are
fragmented
Standard OS abstractions are
missing
2
CPU
CPU
Once upon a time…
CPU
Hardware was homogeneous
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
RAM
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
GP-GPU
RAM
Programmable
NIC
RAM
Single CPU
SMP
CMP
NUMASlide3
SolutionHelios manages ‘distributed system in the small’ Simplify app development, deployment, and tuning Provide single programming model for heterogeneous systems
4 techniques to manage heterogeneitySatellite kernels: Same OS abstraction everywhere
Remote message passing
:
Transparent IPC between kernels
Affinity
: Easily express arbitrary placement policies to OS2-phase compilation: Run apps on arbitrary devices3Slide4
ResultsHelios offloads processes with zero code changesEntire networking stackEntire file system
Arbitrary applicationsImprove performance on NUMA architectures
Eliminate resource contention with multiple kernels
Eliminate remote memory accesses
4Slide5
OutlineMotivationHelios designSatellite kernels
Remote message passingAffinityEncapsulating many ISAs
Evaluation
Conclusion
5Slide6
Kernel
Programmable device
Driver interface is poor app interface
Hard to perform
basic
tasks: debugging, I/O, IPC
Driver encompasses services and runtime…an OS!
6
CPU
I/O device
driver
1010
App
App
JIT
Sched
.
Mem
.
IPCSlide7
Satellite kernels provide single interface 7
Sat. Kernel
CPU
Programmable device
App
NUMA
App
FS
App
Satellite kernels:
Efficiently manage local resources
Apps developed for single system call interface
μkernel
: Scheduler, memory manager, namespace manager
Sat. Kernel
TCP
NUMA
\\
Sat. KernelSlide8
Remote Message PassingLocal IPC uses zero-copy message passingRemote IPC
transparently marshals data
8
Unmodified apps work with multiple kernels
Sat. Kernel
Programmable device
App
NUMA
App
FS
App
Sat. Kernel
TCP
NUMA
\\
Sat. KernelSlide9
Connecting processes and servicesApplications register in a namespace
as servicesNamespace is used to connect IPC channels
9
/fs
/dev/nic0
/dev/disk0
/services/TCP/services/
PNGEater/services/kernels/ARMv5Satellite kernels register in namespaceSlide10
Where should a process execute?Three constraints impact initial placement
decisionHeterogeneous ISAs makes migration is difficult
Fast message passing may be expected
Processes might prefer a particular platform
Helios exports an
affinity
metric to applicationsAffinity is expressed in application metadata and acts as a hintPositive represents emphasis on communication – zero copy IPCNegative represents desire for non-interference10Slide11
Affinity Expressed in ManifestsAffinity easily edited by dev, admin, or user
11
<?xml version=“1.0” encoding=“utf-8”?>
<application name=
TcpTest
” runtime=full>
<endpoints> <inputPipe id=“0” affinity=“0”
contractName=“PipeContract”/> <endpoint id=“2” affinity=“+10” contractName=“TcpContract
”/> </endpoints></application>Slide12
Platform AffinityPlatform affinity processed firstGuarantees certain performance characteristics
12
X86
NUMA
GP-GPU
Programmable
NIC
X86NUMA
/services/kernels/vector-CPUplatform affinity = +2
/services/kernels/x86
platform affinity = +1
+2
+1
+1Slide13
Positive AffinityRepresents ‘tight-coupling’ between processesEnsure fast message passing between processesPositive affinities on each kernel summed
13
X86
NUMA
GP-GPU
Programmable
NIC
X86NUMA
/services/TCPcommunication affinity = +1
/services/
PNGEatercommunication affinity = +2
/services/antiviruscommunication affinity = +3
X86
NUMA
Programmable NIC
+1
+2
+5
TCP
PNG
A/VSlide14
Negative AffinityExpresses a preference for non-interference Used as a means of avoiding resource contention
Negative affinities on each kernel summed14
X86
NUMA
GP-GPU
Programmable
NIC
X86NUMA
/services/kernels/x86
platform affinity = +100
/services/antivirusnon-interference affinity = -1
X86
NUMA
-1
X86
NUMA
A/VSlide15
Self-Reference AffinitySimple scale-out policy across available processors 15
X86
NUMA
GP-GPU
Programmable
NIC
X86
NUMA/services/
webserver
non-interference affinity = -1
W
1
-1-1
W
2
W
3Slide16
Turning policies into actionsPriority based algorithm reduces candidate kernels by:First: Platform affinitiesSecond: Other positive affinities
Third: Negative affinitiesFourth: CPU utilization
Attempt to
balance
simplicity and optimality
16Slide17
Encapsulating many architecturesTwo-phase compilation strategyAll apps first compiled to MSILAt install-time, apps compiled down to available ISAs
MSIL encapsulates multiple versions of a method
Example: ARM and x86 versions of
Interlocked.CompareExchange
function
17Slide18
ImplementationBased on Singularity operating systemAdded satellite kernels, remote message passing, and affinityXScale programmable I/O card
2.0 GHz ARM processor, Gig E, 256 MB of DRAMSatellite kernel identical to x86 (except for ARM asm bits)
Roughly
7x
slower than comparable x86
NUMA support on 2-socket, dual-core AMD machine
2 GHz CPU, 1 GB RAM per domainSatellite kernel on each NUMA domain.18Slide19
LimitationsSatellite kernels require timer, interrupts, exceptionsBalance device support with support for basic abstractionsGPUs headed in this direction (e.g., Intel Larrabee
)Only supports two platformsNeed new compiler support for new platformsLimited set of applications
Create satellite kernels out of commodity system
Access to more applications
19Slide20
OutlineMotivationHelios designSatellite kernels
Remote message passingAffinity
Encapsulating many ISAs
Evaluation
Conclusion
20Slide21
Evaluation platform21
NUMA Evaluation
XScale
NIC
Kernel
X86
X86
Satellite
Kernel
NIC
XScale
SatelliteKernelX86NUMA
Single Kernel
X86
NUMA
X86
NUMA
Satellite
Kernel
X86NUMASatelliteKernel
A
B
A
BSlide22
Offloading Singularity applicationsHelios applications offloaded with very little effort
22
Name
LOC
LOC
changedLOM changedNetworking stack960001FAT 32 FS142000
1TCP test harness30051Disk indexer90001Network driver17000
0Mail server2700
01Web server
185001Slide23
Netstack offloadOffloading improves performance as cycles freed
Affinity made it easy to experiment with offloading
23
PNG
Size
X86 Only uploads/sec
X86+Xscaleuploads/secSpeedup% reduction in context switches28 KB1611716%
54%92 KB556112%58%150 KB353810%65%290 KB19
2110%53%Slide24
Email NUMA benchmarkSatellite kernels improve performance 39%24Slide25
Related WorkHive [Chapin et. al. ‘95]Multiple kernels – single system imageMultikernel [Baumann et. Al. ’09]
Focus on scale-out performance on large NUMA architecturesSpine [Fiuczynski et. al.‘98]
Hydra [
Weinsberg
et. al. ‘08]
Custom run-time on programmable device
25Slide26
ConclusionsHelios manages ‘distributed system in the small’ Simplify application development, deployment, tuning
4 techniques to manage heterogeneitySatellite kernels: Same OS abstraction everywhereRemote message passing
:
Transparent IPC between kernels
Affinity
:
Easily express arbitrary placement policies to OS2-phase compilation: Run apps on arbitrary devicesOffloading applications with zero code changesHelios code release soon.26