/
Ed Nightingale, Orion Hodson, Ed Nightingale, Orion Hodson,

Ed Nightingale, Orion Hodson, - PowerPoint Presentation

kittie-lecroy
kittie-lecroy . @kittie-lecroy
Follow
424 views
Uploaded On 2016-11-23

Ed Nightingale, Orion Hodson, - PPT Presentation

Ross McIlroy Chris Hawblitzel Galen Hunt Microsoft Research Helios Heterogeneous Multiprocessing with Satellite Kernels 1 Problem HW now heterogeneous Heterogeneity ignored by operating systems ID: 492095

affinity cpu kernels kernel cpu affinity kernel kernels numa services x86 satellite x86numa app message programmable passing device sat platform ipc apps

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Ed Nightingale, Orion Hodson," is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Ed Nightingale, Orion Hodson, Ross McIlroy, Chris Hawblitzel, Galen HuntMicrosoft Research

Helios: Heterogeneous Multiprocessing with Satellite Kernels

1Slide2

Problem: HW now heterogeneousHeterogeneity ignored by operating systems

RAM

Programming models are

fragmented

Standard OS abstractions are

missing

2

CPU

CPU

Once upon a time…

CPU

Hardware was homogeneous

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

RAM

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

GP-GPU

RAM

Programmable

NIC

RAM

Single CPU

SMP

CMP

NUMASlide3

SolutionHelios manages ‘distributed system in the small’ Simplify app development, deployment, and tuning Provide single programming model for heterogeneous systems

4 techniques to manage heterogeneitySatellite kernels: Same OS abstraction everywhere

Remote message passing

:

Transparent IPC between kernels

Affinity

: Easily express arbitrary placement policies to OS2-phase compilation: Run apps on arbitrary devices3Slide4

ResultsHelios offloads processes with zero code changesEntire networking stackEntire file system

Arbitrary applicationsImprove performance on NUMA architectures

Eliminate resource contention with multiple kernels

Eliminate remote memory accesses

4Slide5

OutlineMotivationHelios designSatellite kernels

Remote message passingAffinityEncapsulating many ISAs

Evaluation

Conclusion

5Slide6

Kernel

Programmable device

Driver interface is poor app interface

Hard to perform

basic

tasks: debugging, I/O, IPC

Driver encompasses services and runtime…an OS!

6

CPU

I/O device

driver

1010

App

App

JIT

Sched

.

Mem

.

IPCSlide7

Satellite kernels provide single interface 7

Sat. Kernel

CPU

Programmable device

App

NUMA

App

FS

App

Satellite kernels:

Efficiently manage local resources

Apps developed for single system call interface

μkernel

: Scheduler, memory manager, namespace manager

Sat. Kernel

TCP

NUMA

\\

Sat. KernelSlide8

Remote Message PassingLocal IPC uses zero-copy message passingRemote IPC

transparently marshals data

8

Unmodified apps work with multiple kernels

Sat. Kernel

Programmable device

App

NUMA

App

FS

App

Sat. Kernel

TCP

NUMA

\\

Sat. KernelSlide9

Connecting processes and servicesApplications register in a namespace

as servicesNamespace is used to connect IPC channels

9

/fs

/dev/nic0

/dev/disk0

/services/TCP/services/

PNGEater/services/kernels/ARMv5Satellite kernels register in namespaceSlide10

Where should a process execute?Three constraints impact initial placement

decisionHeterogeneous ISAs makes migration is difficult

Fast message passing may be expected

Processes might prefer a particular platform

Helios exports an

affinity

metric to applicationsAffinity is expressed in application metadata and acts as a hintPositive represents emphasis on communication – zero copy IPCNegative represents desire for non-interference10Slide11

Affinity Expressed in ManifestsAffinity easily edited by dev, admin, or user

11

<?xml version=“1.0” encoding=“utf-8”?>

<application name=

TcpTest

” runtime=full>

<endpoints> <inputPipe id=“0” affinity=“0”

contractName=“PipeContract”/> <endpoint id=“2” affinity=“+10” contractName=“TcpContract

”/> </endpoints></application>Slide12

Platform AffinityPlatform affinity processed firstGuarantees certain performance characteristics

12

X86

NUMA

GP-GPU

Programmable

NIC

X86NUMA

/services/kernels/vector-CPUplatform affinity = +2

/services/kernels/x86

platform affinity = +1

+2

+1

+1Slide13

Positive AffinityRepresents ‘tight-coupling’ between processesEnsure fast message passing between processesPositive affinities on each kernel summed

13

X86

NUMA

GP-GPU

Programmable

NIC

X86NUMA

/services/TCPcommunication affinity = +1

/services/

PNGEatercommunication affinity = +2

/services/antiviruscommunication affinity = +3

X86

NUMA

Programmable NIC

+1

+2

+5

TCP

PNG

A/VSlide14

Negative AffinityExpresses a preference for non-interference Used as a means of avoiding resource contention

Negative affinities on each kernel summed14

X86

NUMA

GP-GPU

Programmable

NIC

X86NUMA

/services/kernels/x86

platform affinity = +100

/services/antivirusnon-interference affinity = -1

X86

NUMA

-1

X86

NUMA

A/VSlide15

Self-Reference AffinitySimple scale-out policy across available processors 15

X86

NUMA

GP-GPU

Programmable

NIC

X86

NUMA/services/

webserver

non-interference affinity = -1

W

1

-1-1

W

2

W

3Slide16

Turning policies into actionsPriority based algorithm reduces candidate kernels by:First: Platform affinitiesSecond: Other positive affinities

Third: Negative affinitiesFourth: CPU utilization

Attempt to

balance

simplicity and optimality

16Slide17

Encapsulating many architecturesTwo-phase compilation strategyAll apps first compiled to MSILAt install-time, apps compiled down to available ISAs

MSIL encapsulates multiple versions of a method

Example: ARM and x86 versions of

Interlocked.CompareExchange

function

17Slide18

ImplementationBased on Singularity operating systemAdded satellite kernels, remote message passing, and affinityXScale programmable I/O card

2.0 GHz ARM processor, Gig E, 256 MB of DRAMSatellite kernel identical to x86 (except for ARM asm bits)

Roughly

7x

slower than comparable x86

NUMA support on 2-socket, dual-core AMD machine

2 GHz CPU, 1 GB RAM per domainSatellite kernel on each NUMA domain.18Slide19

LimitationsSatellite kernels require timer, interrupts, exceptionsBalance device support with support for basic abstractionsGPUs headed in this direction (e.g., Intel Larrabee

)Only supports two platformsNeed new compiler support for new platformsLimited set of applications

Create satellite kernels out of commodity system

Access to more applications

19Slide20

OutlineMotivationHelios designSatellite kernels

Remote message passingAffinity

Encapsulating many ISAs

Evaluation

Conclusion

20Slide21

Evaluation platform21

NUMA Evaluation

XScale

NIC

Kernel

X86

X86

Satellite

Kernel

NIC

XScale

SatelliteKernelX86NUMA

Single Kernel

X86

NUMA

X86

NUMA

Satellite

Kernel

X86NUMASatelliteKernel

A

B

A

BSlide22

Offloading Singularity applicationsHelios applications offloaded with very little effort

22

Name

LOC

LOC

changedLOM changedNetworking stack960001FAT 32 FS142000

1TCP test harness30051Disk indexer90001Network driver17000

0Mail server2700

01Web server

185001Slide23

Netstack offloadOffloading improves performance as cycles freed

Affinity made it easy to experiment with offloading

23

PNG

Size

X86 Only uploads/sec

X86+Xscaleuploads/secSpeedup% reduction in context switches28 KB1611716%

54%92 KB556112%58%150 KB353810%65%290 KB19

2110%53%Slide24

Email NUMA benchmarkSatellite kernels improve performance 39%24Slide25

Related WorkHive [Chapin et. al. ‘95]Multiple kernels – single system imageMultikernel [Baumann et. Al. ’09]

Focus on scale-out performance on large NUMA architecturesSpine [Fiuczynski et. al.‘98]

Hydra [

Weinsberg

et. al. ‘08]

Custom run-time on programmable device

25Slide26

ConclusionsHelios manages ‘distributed system in the small’ Simplify application development, deployment, tuning

4 techniques to manage heterogeneitySatellite kernels: Same OS abstraction everywhereRemote message passing

:

Transparent IPC between kernels

Affinity

:

Easily express arbitrary placement policies to OS2-phase compilation: Run apps on arbitrary devicesOffloading applications with zero code changesHelios code release soon.26