/
Extensible Distributed Tracing from Kernels to Clusters Extensible Distributed Tracing from Kernels to Clusters

Extensible Distributed Tracing from Kernels to Clusters - PowerPoint Presentation

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
412 views
Uploaded On 2016-07-29

Extensible Distributed Tracing from Kernels to Clusters - PPT Presentation

Úlfar Erlingsson Google Inc Marcus Peinado Microsoft Research Simon Peter Systems Group ETH Zurich Mihai Budiu Microsoft Research 1 Fay Wouldnt it be nice if We could know what our clusters were doing ID: 424645

evt fay call vectors fay evt vectors call cluster function data return count foo tracing amp kernel query var

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Extensible Distributed Tracing from Kern..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Extensible Distributed Tracing from Kernels to Clusters

Úlfar Erlingsson, Google Inc.Marcus Peinado, Microsoft ResearchSimon Peter, Systems Group, ETH ZurichMihai Budiu, Microsoft Research

1

FaySlide2

Wouldn’t it be nice if…

We could know what our clusters were doing?We could ask any question, … easily, using one simple-to-use system.We could collect answers extremely efficiently … so cheaply we may even ask continuously.2Slide3

Let’s imagine...Applying data-mining to cluster tracing

Bag of words techniqueCompare documents w/o structural knowledgeN-dimensional feature vectorsK-means clusteringCan apply to clusters, too!3Slide4

Cluster-mining with Fay

Automatically categorize cluster behavior, based on system call activity4Slide5

Cluster-mining

with FayAutomatically categorize cluster behavior, based on system call activity Without measurable overhead on the executionWithout any special Fay data-mining support

5Slide6

Vector Nearest(Vector pt, Vectors centers) {

var near = centers.First(); foreach (var c in centers) if

(Norm(pt – c) < Norm(pt – near))

near = c;

return

near;

}

6

var

kernelFunctionFrequencyVectors =

cluster.Function(kernel,

“syscalls!*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = fay.MachineID(),

Interval = evt.Cycles / CPS,

Function =

evt.CallerAddr })

.GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });

Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}

Fay K-Means Behavior-Analysis CodeSlide7

7

var

kernelFunctionFrequencyVectors =

cluster.Function(kernel,

“syscalls!*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = fay.MachineID(),

Interval = evt.Cycles / CPS,

Function =

evt.CallerAddr })

.GroupBy(evt => evt,

(k,g) => new { key = k, count = g.Count() });

Fay K-Means Behavior-Analysis CodeSlide8

Fay vs. Specialized TracingCould’ve built

a specialized tool for thisAutomatic categorization of behavior (Fmeter)Fay is general, but can efficiently doTracing across abstractions, systems (Magpie)Predicated and windowed tracing (Streams)Probabilistic tracing (Chopstix)Flight recorders, performance counters, …8Slide9

Key Takeaways

Fay: Flexible monitoring of distributed executionsCan be applied to existing, live Windows serversSingle query specifies both tracing & analysisEasy to write & enables automatic optimizationsPervasively data-parallel, scalable processingSame model within machines & across clustersInline, safe machine-code

at tracepointsAllows us to do computation right at data source

9Slide10

Vector Nearest(Vector pt, Vectors centers) {

var near = centers.First(); foreach (var c in centers) if

(|pt – c| < |pt – near|)

near = c;

return

near;

}

10

var

kernelFunctionFrequencyVectors =

cluster.Function(kernel, “*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = MachineID(),

Interval = w.Cycles / CPS,

Function = w.CallerAddr})

.GroupBy(evt => evt,

(k,g) => new { key = k, count = g.Count() });

Vectors OneKMeansStep(Vectors vs, Vectors cs) {

return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}K-Means: Single, Unified Fay Query

var

kernelFunctionFrequencyVectors =

cluster.Function(kernel, “*”)

.Where(evt => evt.time < Now.AddMinutes(3))

.Select(evt => new { Machine = fay.MachineID(),

Interval = evt.Cycles / CPS,

Function = evt.CallerAddr})

.GroupBy(evt => evt,

(k,g) => new { key = k, count = g.Count() });

Vector Nearest(Vector pt, Vectors centers) {

var

near = centers.First();

foreach

(

var

c in centers) if (Norm(pt – c) < Norm(pt – near)) near = c; return near; }

Vectors OneKMeansStep(Vectors vs, Vectors cs) {

return

vs.GroupBy(v => Nearest(v, cs))

.Select(g => g.Aggregate((x,y)

=> x+y)/g.Count());

}

Vectors KMeans(Vectors vs, Vectors cs, int K) {

for

(

int

i=0; i < K; ++i)

cs = OneKMeansStep(vs, cs);

return

cs;

}Slide11

Fay is Data-Parallel on Cluster

11

View trace query as distributed computation

Use cluster for analysisSlide12

Fay is Data-Parallel

on Cluster12

System call trace events

Fay does early aggregation & data

reduction

Fay knows what’s needed for later analysisSlide13

Fay is Data-Parallel

on Cluster13

System call trace events

Fay does early aggregation & data

reduction

K-Means analysis

Fay builds an efficient processing plan from querySlide14

Fay is Data-Parallel within Machines

14

Early

aggregation

Inline, in OS kernel

Reduce dataflow

&

kernel/user

transitions

Data-parallel per each core/threadSlide15

Processing w/o Fay Optimizations

15

Collect data first (on disk)

Reduce later

Inefficient, can suffer data overload

K-Means: System calls

K-Means: ClusteringSlide16

Traditional Trace Processing

16

First log all data (a deluge)

Process later (centrally)

Compose tools via scripting

K-Means: System calls

K-Means: ClusteringSlide17

Takeaways so far

Fay: Flexible monitoring of distributed executions Single query specifies both tracing & analysisPervasively data-parallel, scalable processing17Slide18

Safety of Fay Tracing ProbesA variant of XFI used for safety

[OSDI’06]Works well in the kernel or any address spaceCan safely use existing stacks, etc.Instead of language interpreter (DTrace)Arbitrary, efficient, stateful computationProbes can access thread-local/global stateProbes can try to read any addressI/O registers are protected

18Slide19

Key Takeaways, Again

Fay: Flexible monitoring of distributed executions Single query specifies both tracing & analysisPervasively data-parallel, scalable processingInline, safe machine-code at tracepoints

19Slide20

Target

Installing and Executing Fay TracingFay runtime on each machineFay module in each traced address spaceTracepoints at hotpatched function boundary20

Tracing Runtime

Fay

User-Space

Kernel

Probe

X

FI

Create

probe

Hotpatching

query

ETW

200 cyclesSlide21

Low-level Code Instrumentation

21Caller: ... e8ab62ffff call Foo ... ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6

cccccc

Foo2

:

57 push rdi

...

c3 ret

Module with a traced function

Foo

Replace 1

st

opcode

of functionsSlide22

Low-level Code Instrumentation

22Caller: ... e8ab62ffff call Foo ... ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6

cccccc

Foo2

:

57 push rdi

...

c3 ret

Module with a traced function

Foo

Fay platform module

Dispatcher:

t = lookup(return_addr)

...

call t.entry_probes

...

call t.Foo2_trampoline

... call t.return_probes ... return /* to after call Foo */Replace 1st opcode of functionsFay dispatcher called via trampolineSlide23

Low-level Code Instrumentation

23PF5PF3PF4

Caller:

...

e8ab62ffff call Foo

...

ff1508e70600 call[Dispatcher]

Foo:

ebf8 jmp Foo-6

cccccc

Foo2

:

57 push rdi

...

c3 ret

Module with a traced function

Foo

Fay platform module

Dispatcher:

t = lookup(return_addr) ... call t.entry_probes ... call t.Foo2_trampoline ... call t.return_probes ... return /* to after call Foo */

Fay probes

XFI

XFI

XFI

Replace

1

st

opcode

of functions

Fay dispatcher called via trampoline

Fay calls the function, and entry & exit probesSlide24

Fay adds 220 to 430 cycles per traced function

Fay adds 180% CPU to trace all kernel functions

Both

approx

10x faster than

Dtrace

,

SystemTap

What’s Fay’s

Performance &

Scalability?

24

Null-probe overhead

Slowdown (x)

CyclesSlide25

Fay Scalability on a Cluster

25Fay tracing memory allocations, in a loop:Ran workload on a 128-node, 1024-core clusterSpread work over 128 to 1,280,000 threads100% CPU utilizationFay overhead was 1% to 11% (mean 7.8%)Slide26

More Fay Implementation Details

Details of query-plan optimizationsCase studies of different tracing strategiesExamples of using Fay for performance analysisFay is based on LINQ and Windows specificsCould build on Linux using Ftrace, Hadoop, etc.Some restrictions apply currentlyE.g., skew towards batch processing due to Dryad26Slide27

ConclusionFay: Flexible tracing of distributed executions

Both expressive and efficientUnified trace queriesPervasive data-parallelismSafe machine-code probe processingOften equally efficient as purpose-built tools27Slide28

Backup

28Slide29

A Fay Trace Query29

from io in cluster.Function("iolib!Read") where

io.time < Now.AddMinutes(5) let

size =

io.Arg

(2) // request size in bytes

group

io

by

size/1024

into

g

select

new

{ sizeInKilobytes = g.Key,

countOfReadIOs

= g.Count() };

Aggregates read activity in

iolib moduleAcross cluster, both user-mode & kernelOver 5 minutesSlide30

A Fay Trace Query30

from io in cluster.Function("iolib!Read")

where io.time < Now.AddMinutes(5)

let

size =

io.Arg

(2) // request size in bytes

group

io

by

size/1024

into

g

select

new

{ sizeInKilobytes = g.Key,

countOfReadIOs

= g.Count() };

Specifies what to trace2nd argument of read function in iolibAnd how to aggregateGroup into kb-size buckets and count