Jim Hogg UW CSE P501 X1 1 CSE P501 Compiler Construction Inlining Devirtualization long res void foolong x res 2 x void bar foo 5 long res void foolong x ID: 307484
Download Presentation The PPT/PDF document "Spring 2014" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Spring 2014
Jim Hogg - UW - CSE - P501
X1-1
CSE P501 – Compiler Construction
Inlining
DevirtualizationSlide2
long res;void foo(long x) { res = 2 * x;}void bar() { foo(5);}
long res;void foo(long x) { res = 2 * x;}void bar() { res = 2 * 5;}long res;void foo(long x
) { res = 2 * x;}void bar() { res = 10;}Spring 2014X1-2Jim Hogg - UW - CSE - P501inlining
constant folding
InliningSlide3
BenefitsRemoves overhead of function-callNo marshalling of argumentsNo unmarshalling of return valueBetter instruction-cache (I-cache) localityBonus: expands opportunities for further optimizationCSE, constant-prop, DCE, ...Poor man’s interprocedural optimization
Spring 2014X1-3Jim Hogg - UW - CSE - P501Slide4
CostsCode sizeTypically expands overall program sizeCan hurt I-cacheCompilation timeLarge methods take longer to compile - hurts "through-put"Eg: optimizing, instruction-selection, register-allocationSpring 2014X1-
4Jim Hogg - UW - CSE - P501Slide5
Language / Runtime AspectsWhat is the cost of a function call?C: cheapJava: moderate (virtual dispatch)Python: expensiveAre targets resolved at compile time or run time?C: compile timeJava, Python: runtimeIs the whole program available for analysis?
"separate compilation"Is profile information available?Eg: if "f" is rarely called, don't inline itSpring 2014X1-5Jim Hogg - UW - CSE - P501Slide6
When to Inline?Jikes RVM (with Hazelwood/Grove adaptations):Call Instruction Sequence (CIS) = # of instructions to make callTiny (size < 2x call size): Always inlineSmall (2-5x): Inline subject to space constraintsMedium (5-25x):
Inline if hot (subject to space constraints)Large: Never inlineSpring 2014X1-6Jim Hogg - UW - CSE - P501"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"Slide7
Gathering Profile InfoCounter-based: Instrument edges in flowgraphEntry + loop back edgesEnough edges (enough to get good results without excessive overhead)Expensive - always removed in optimized codeDepends critically on the "training sets"Call-stack samplingPeriodically walk stackInterrupt-based or instrumentation-basedMay gather info on what called what (callsite info)
Spring 2014Jim Hogg - UW - CSE - P501X1-7Slide8
OO encourages lots of small methodsgetters, setters, ...Inlining is a requirement for performanceHigh call overhead wrt total executionLimited scope for compiler optimizations without itFor Java, C# ,etc if you’re going to anything, do this!But ... virtual methods are a challengeSpring 2014
Jim Hogg - UW - CSE - P501X1-8DevirtualizationSlide9
Virtual MethodsIn general, we cannot determine the target until runtimeSome languages (eg, Java) allow dynamic class loading: all subclasses of A may not be visible until runtimeclass A {
int foo() { return 0; } int bar() { return 1; }}class B extends A { int foo() { return 2; }}void baz(A x) {
= x.foo(); = x.bar();}Spring 2014X1-9Jim Hogg - UW - CSE - P501x.foo may return 0 or 2 (depending on
x's runtime type - A or B)
x.bar
will return 1 (unless we dynamically loaded
C
derived from
B
which over-rides method
bar
) Slide10
Virtual tablesObject layout in a JVM for object of class B:Spring 2014X1-10
Jim Hogg - UW - CSE - P501Slide11
Virtual method dispatchx is the receiver objectif x has runtime type B, t2 will refer to B::foo
t1 = ldvtable xt2 = ldvirtfunaddr t1, A::foot3 = call [t2] (x)t4 = ldvtable x
t5 = ldvirtfunaddr t4, A::bart6 = call [t4] (x)Spring 2014X1-11Jim Hogg - UW - CSE - P501
=
x.foo
();
=
x.bar
();Slide12
DevirtualizationGoal: change virtual calls to static calls at compile-timeBenefits: enables inlininglowers call overheadbetter I-cache performancebetter indirect-branch predictionOften optimistic:Make guess at compile timeTest guess at run timeFall back to virtual call if necessary
Spring 2014X1-12Jim Hogg - UW - CSE - P501Slide13
Guarded DevirtualizationGuess receiver type is B (based on profile or other information)Call to B::foo is statically known - can be inlinedBut guard inhibits optimization
t1 = ldvtable xt7 = getvtable B
if t1 == t7 t3 = call B::foo(x)else t2 = ldvirtfunaddr t1, A::foo
t3 = call [t2] (x
)
Spring 2014
X1-
13
Jim Hogg - UW - CSE - P501Slide14
Guarded by Method TestGuess that method is B:foo outside guardMore robust, but more overheadHarder to optimize redundant guardst1 = ldvtable
xt2 = ldvirtfunaddr t1t7 = getfunaddr B::foo
if t2 == t7 t3 = call B::foo(x)else t2 = ldvirtfunaddr t1, A::foo
t3 = call [t2] (x)
Spring 2014
X1-
14
Jim Hogg - UW - CSE - P501Slide15
How to guess receiver?Profile informationRecord call site targets and/or frequently executed methods"monomorphic" versus "polymorphic"Class hierarchy analysisWalk class hierarchy at compile timeType analysisIntra/inter procedural data flow analysisSpring 2014X1-
15Jim Hogg - UW - CSE - P501Slide16
Class Hierarchy AnalysisWalk class hierarchy at compile-timeIf only one implementation of a method (ie, in the base class), devirtualize to that targetNot guaranteed in the presence of runtime class loadingStill need runtime test / fallbackSpring 2014
X1-16Jim Hogg - UW - CSE - P501Slide17
Flow-Sensitive Type AnalysisPerform a forward dataflow analysis propagating type infoAt each callsite compute possible set of typesUse type info of receiver to narrow targets.
A a1 = new B();a1.foo();if (a2 instanceof C) a2.bar(); Spring 2014X1-17
Jim Hogg - UW - CSE - P501Slide18
Alternatives to GuardingGuards impose overheadrun-time test on every call, merge points impede optimizationOften “know” only one target is invokedcall site is monomorphicAlternative: compile without guardsrecover as assumption is violated (eg, at class load, recompile)cheaper runtime test vs more costly recoverySpring 2014
X1-18Jim Hogg - UW - CSE - P501Slide19
Recompilation ApproachOptimistically assume current class hierarchy will never change wrt a call Devirtualize and/or inline call sites without guardOn violating class load, recompile caller methodRecompiled code installed before new classNew invocations will call de-optimized codeWhat about current invocations?Nice match with JIT compilingSpring 2014
X1-19Jim Hogg - UW - CSE - P501Slide20
Preexistence analysisIdea: if receiver object pre-existed the caller method invocation, then callsite is only affected by a class load in future invocationsIf new class C is loaded during execution of baz, x cannot have type C:
void baz(A x) { ... // C loaded here x.bar();
}Spring 2014X1-20Jim Hogg - UW - CSE - P501x is bound to object on entry; cannot refer to a CSlide21
Code-patchingPre-generate fallback virtual call out-of-lineOn an invalidating class load, overwrite direct-call with a jump to the fallback codeMust be thread-safe!On x86, single write within a cache line is atomicSelf-modifying code (also used for JIT "jump thunks")No recompilation necessary
Spring 2014X1-21Jim Hogg - UW - CSE - P501Slide22
Patching t3 = 2 //
B::foonext: ...fallback: t2 = ldvirtfunaddr t1, A::foo
t3 = call [t2] (x) goto nextSpring 2014X1-22Jim Hogg - UW - CSE - P501
B
::foo() { return 2; }
on class-load, stomp with jump fallback