Úlfar Erlingsson Google Inc Marcus Peinado Microsoft Research Simon Peter Systems Group ETH Zurich Mihai Budiu Microsoft Research 1 Fay Wouldnt it be nice if We could know what our clusters were doing ID: 424645
Download Presentation The PPT/PDF document "Extensible Distributed Tracing from Kern..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Extensible Distributed Tracing from Kernels to Clusters
Úlfar Erlingsson, Google Inc.Marcus Peinado, Microsoft ResearchSimon Peter, Systems Group, ETH ZurichMihai Budiu, Microsoft Research
1
FaySlide2
Wouldn’t it be nice if…
We could know what our clusters were doing?We could ask any question, … easily, using one simple-to-use system.We could collect answers extremely efficiently … so cheaply we may even ask continuously.2Slide3
Let’s imagine...Applying data-mining to cluster tracing
Bag of words techniqueCompare documents w/o structural knowledgeN-dimensional feature vectorsK-means clusteringCan apply to clusters, too!3Slide4
Cluster-mining with Fay
Automatically categorize cluster behavior, based on system call activity4Slide5
Cluster-mining
with FayAutomatically categorize cluster behavior, based on system call activity Without measurable overhead on the executionWithout any special Fay data-mining support
5Slide6
Vector Nearest(Vector pt, Vectors centers) {
var near = centers.First(); foreach (var c in centers) if
(Norm(pt – c) < Norm(pt – near))
near = c;
return
near;
}
6
var
kernelFunctionFrequencyVectors =
cluster.Function(kernel,
“syscalls!*”)
.Where(evt => evt.time < Now.AddMinutes(3))
.Select(evt => new { Machine = fay.MachineID(),
Interval = evt.Cycles / CPS,
Function =
evt.CallerAddr })
.GroupBy(evt => evt, (k,g) => new { key = k, count = g.Count() });
Vectors OneKMeansStep(Vectors vs, Vectors cs) { return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}
Fay K-Means Behavior-Analysis CodeSlide7
7
var
kernelFunctionFrequencyVectors =
cluster.Function(kernel,
“syscalls!*”)
.Where(evt => evt.time < Now.AddMinutes(3))
.Select(evt => new { Machine = fay.MachineID(),
Interval = evt.Cycles / CPS,
Function =
evt.CallerAddr })
.GroupBy(evt => evt,
(k,g) => new { key = k, count = g.Count() });
Fay K-Means Behavior-Analysis CodeSlide8
Fay vs. Specialized TracingCould’ve built
a specialized tool for thisAutomatic categorization of behavior (Fmeter)Fay is general, but can efficiently doTracing across abstractions, systems (Magpie)Predicated and windowed tracing (Streams)Probabilistic tracing (Chopstix)Flight recorders, performance counters, …8Slide9
Key Takeaways
Fay: Flexible monitoring of distributed executionsCan be applied to existing, live Windows serversSingle query specifies both tracing & analysisEasy to write & enables automatic optimizationsPervasively data-parallel, scalable processingSame model within machines & across clustersInline, safe machine-code
at tracepointsAllows us to do computation right at data source
9Slide10
Vector Nearest(Vector pt, Vectors centers) {
var near = centers.First(); foreach (var c in centers) if
(|pt – c| < |pt – near|)
near = c;
return
near;
}
10
var
kernelFunctionFrequencyVectors =
cluster.Function(kernel, “*”)
.Where(evt => evt.time < Now.AddMinutes(3))
.Select(evt => new { Machine = MachineID(),
Interval = w.Cycles / CPS,
Function = w.CallerAddr})
.GroupBy(evt => evt,
(k,g) => new { key = k, count = g.Count() });
Vectors OneKMeansStep(Vectors vs, Vectors cs) {
return vs.GroupBy(v => Nearest(v, cs)) .Select(g => g.Aggregate((x,y) => x+y)/g.Count());}Vectors KMeans(Vectors vs, Vectors cs, int K) { for (int i=0; i < K; ++i) cs = OneKMeansStep(vs, cs); return cs;}K-Means: Single, Unified Fay Query
var
kernelFunctionFrequencyVectors =
cluster.Function(kernel, “*”)
.Where(evt => evt.time < Now.AddMinutes(3))
.Select(evt => new { Machine = fay.MachineID(),
Interval = evt.Cycles / CPS,
Function = evt.CallerAddr})
.GroupBy(evt => evt,
(k,g) => new { key = k, count = g.Count() });
Vector Nearest(Vector pt, Vectors centers) {
var
near = centers.First();
foreach
(
var
c in centers) if (Norm(pt – c) < Norm(pt – near)) near = c; return near; }
Vectors OneKMeansStep(Vectors vs, Vectors cs) {
return
vs.GroupBy(v => Nearest(v, cs))
.Select(g => g.Aggregate((x,y)
=> x+y)/g.Count());
}
Vectors KMeans(Vectors vs, Vectors cs, int K) {
for
(
int
i=0; i < K; ++i)
cs = OneKMeansStep(vs, cs);
return
cs;
}Slide11
Fay is Data-Parallel on Cluster
11
View trace query as distributed computation
Use cluster for analysisSlide12
Fay is Data-Parallel
on Cluster12
System call trace events
Fay does early aggregation & data
reduction
Fay knows what’s needed for later analysisSlide13
Fay is Data-Parallel
on Cluster13
System call trace events
Fay does early aggregation & data
reduction
K-Means analysis
Fay builds an efficient processing plan from querySlide14
Fay is Data-Parallel within Machines
14
Early
aggregation
Inline, in OS kernel
Reduce dataflow
&
kernel/user
transitions
Data-parallel per each core/threadSlide15
Processing w/o Fay Optimizations
15
Collect data first (on disk)
Reduce later
Inefficient, can suffer data overload
K-Means: System calls
K-Means: ClusteringSlide16
Traditional Trace Processing
16
First log all data (a deluge)
Process later (centrally)
Compose tools via scripting
K-Means: System calls
K-Means: ClusteringSlide17
Takeaways so far
Fay: Flexible monitoring of distributed executions Single query specifies both tracing & analysisPervasively data-parallel, scalable processing17Slide18
Safety of Fay Tracing ProbesA variant of XFI used for safety
[OSDI’06]Works well in the kernel or any address spaceCan safely use existing stacks, etc.Instead of language interpreter (DTrace)Arbitrary, efficient, stateful computationProbes can access thread-local/global stateProbes can try to read any addressI/O registers are protected
18Slide19
Key Takeaways, Again
Fay: Flexible monitoring of distributed executions Single query specifies both tracing & analysisPervasively data-parallel, scalable processingInline, safe machine-code at tracepoints
19Slide20
Target
Installing and Executing Fay TracingFay runtime on each machineFay module in each traced address spaceTracepoints at hotpatched function boundary20
Tracing Runtime
Fay
User-Space
Kernel
Probe
X
FI
Create
probe
Hotpatching
query
ETW
200 cyclesSlide21
Low-level Code Instrumentation
21Caller: ... e8ab62ffff call Foo ... ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6
cccccc
Foo2
:
57 push rdi
...
c3 ret
Module with a traced function
Foo
Replace 1
st
opcode
of functionsSlide22
Low-level Code Instrumentation
22Caller: ... e8ab62ffff call Foo ... ff1508e70600 call[Dispatcher]Foo: ebf8 jmp Foo-6
cccccc
Foo2
:
57 push rdi
...
c3 ret
Module with a traced function
Foo
Fay platform module
Dispatcher:
t = lookup(return_addr)
...
call t.entry_probes
...
call t.Foo2_trampoline
... call t.return_probes ... return /* to after call Foo */Replace 1st opcode of functionsFay dispatcher called via trampolineSlide23
Low-level Code Instrumentation
23PF5PF3PF4
Caller:
...
e8ab62ffff call Foo
...
ff1508e70600 call[Dispatcher]
Foo:
ebf8 jmp Foo-6
cccccc
Foo2
:
57 push rdi
...
c3 ret
Module with a traced function
Foo
Fay platform module
Dispatcher:
t = lookup(return_addr) ... call t.entry_probes ... call t.Foo2_trampoline ... call t.return_probes ... return /* to after call Foo */
Fay probes
XFI
XFI
XFI
Replace
1
st
opcode
of functions
Fay dispatcher called via trampoline
Fay calls the function, and entry & exit probesSlide24
Fay adds 220 to 430 cycles per traced function
Fay adds 180% CPU to trace all kernel functions
Both
approx
10x faster than
Dtrace
,
SystemTap
What’s Fay’s
Performance &
Scalability?
24
Null-probe overhead
Slowdown (x)
CyclesSlide25
Fay Scalability on a Cluster
25Fay tracing memory allocations, in a loop:Ran workload on a 128-node, 1024-core clusterSpread work over 128 to 1,280,000 threads100% CPU utilizationFay overhead was 1% to 11% (mean 7.8%)Slide26
More Fay Implementation Details
Details of query-plan optimizationsCase studies of different tracing strategiesExamples of using Fay for performance analysisFay is based on LINQ and Windows specificsCould build on Linux using Ftrace, Hadoop, etc.Some restrictions apply currentlyE.g., skew towards batch processing due to Dryad26Slide27
ConclusionFay: Flexible tracing of distributed executions
Both expressive and efficientUnified trace queriesPervasive data-parallelismSafe machine-code probe processingOften equally efficient as purpose-built tools27Slide28
Backup
28Slide29
A Fay Trace Query29
from io in cluster.Function("iolib!Read") where
io.time < Now.AddMinutes(5) let
size =
io.Arg
(2) // request size in bytes
group
io
by
size/1024
into
g
select
new
{ sizeInKilobytes = g.Key,
countOfReadIOs
= g.Count() };
Aggregates read activity in
iolib moduleAcross cluster, both user-mode & kernelOver 5 minutesSlide30
A Fay Trace Query30
from io in cluster.Function("iolib!Read")
where io.time < Now.AddMinutes(5)
let
size =
io.Arg
(2) // request size in bytes
group
io
by
size/1024
into
g
select
new
{ sizeInKilobytes = g.Key,
countOfReadIOs
= g.Count() };
Specifies what to trace2nd argument of read function in iolibAnd how to aggregateGroup into kb-size buckets and count