/
Spring 2014 Spring 2014

Spring 2014 - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
396 views
Uploaded On 2016-05-06

Spring 2014 - PPT Presentation

Jim Hogg UW CSE P501 X1 1 CSE P501 Compiler Construction Inlining Devirtualization long res void foolong x res 2 x void bar foo 5 long res void foolong x ID: 307484

cse p501 jim 2014 p501 cse 2014 jim hogg spring call foo class bar compile virtual res void return runtime type ldvirtfunaddr

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Spring 2014" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Spring 2014

Jim Hogg - UW - CSE - P501

X1-1

CSE P501 – Compiler Construction

Inlining

DevirtualizationSlide2

long res;void foo(long x) { res = 2 * x;}void bar() { foo(5);}

long res;void foo(long x) { res = 2 * x;}void bar() { res = 2 * 5;}long res;void foo(long x

) { res = 2 * x;}void bar() { res = 10;}Spring 2014X1-2Jim Hogg - UW - CSE - P501inlining

constant folding

InliningSlide3

BenefitsRemoves overhead of function-callNo marshalling of argumentsNo unmarshalling of return valueBetter instruction-cache (I-cache) localityBonus: expands opportunities for further optimizationCSE, constant-prop, DCE, ...Poor man’s interprocedural optimization

Spring 2014X1-3Jim Hogg - UW - CSE - P501Slide4

CostsCode sizeTypically expands overall program sizeCan hurt I-cacheCompilation timeLarge methods take longer to compile - hurts "through-put"Eg: optimizing, instruction-selection, register-allocationSpring 2014X1-

4Jim Hogg - UW - CSE - P501Slide5

Language / Runtime AspectsWhat is the cost of a function call?C: cheapJava: moderate (virtual dispatch)Python: expensiveAre targets resolved at compile time or run time?C: compile timeJava, Python: runtimeIs the whole program available for analysis?

"separate compilation"Is profile information available?Eg: if "f" is rarely called, don't inline itSpring 2014X1-5Jim Hogg - UW - CSE - P501Slide6

When to Inline?Jikes RVM (with Hazelwood/Grove adaptations):Call Instruction Sequence (CIS) = # of instructions to make callTiny (size < 2x call size): Always inlineSmall (2-5x): Inline subject to space constraintsMedium (5-25x):

Inline if hot (subject to space constraints)Large: Never inlineSpring 2014X1-6Jim Hogg - UW - CSE - P501"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"Slide7

Gathering Profile InfoCounter-based: Instrument edges in flowgraphEntry + loop back edgesEnough edges (enough to get good results without excessive overhead)Expensive - always removed in optimized codeDepends critically on the "training sets"Call-stack samplingPeriodically walk stackInterrupt-based or instrumentation-basedMay gather info on what called what (callsite info)

Spring 2014Jim Hogg - UW - CSE - P501X1-7Slide8

OO encourages lots of small methodsgetters, setters, ...Inlining is a requirement for performanceHigh call overhead wrt total executionLimited scope for compiler optimizations without itFor Java, C# ,etc if you’re going to anything, do this!But ... virtual methods are a challengeSpring 2014

Jim Hogg - UW - CSE - P501X1-8DevirtualizationSlide9

Virtual MethodsIn general, we cannot determine the target until runtimeSome languages (eg, Java) allow dynamic class loading: all subclasses of A may not be visible until runtimeclass A {

int foo() { return 0; } int bar() { return 1; }}class B extends A { int foo() { return 2; }}void baz(A x) {

= x.foo(); = x.bar();}Spring 2014X1-9Jim Hogg - UW - CSE - P501x.foo may return 0 or 2 (depending on

x's runtime type - A or B)

x.bar

will return 1 (unless we dynamically loaded

C

derived from

B

which over-rides method

bar

) Slide10

Virtual tablesObject layout in a JVM for object of class B:Spring 2014X1-10

Jim Hogg - UW - CSE - P501Slide11

Virtual method dispatchx is the receiver objectif x has runtime type B, t2 will refer to B::foo

t1 = ldvtable xt2 = ldvirtfunaddr t1, A::foot3 = call [t2] (x)t4 = ldvtable x

t5 = ldvirtfunaddr t4, A::bart6 = call [t4] (x)Spring 2014X1-11Jim Hogg - UW - CSE - P501

=

x.foo

();

=

x.bar

();Slide12

DevirtualizationGoal: change virtual calls to static calls at compile-timeBenefits: enables inlininglowers call overheadbetter I-cache performancebetter indirect-branch predictionOften optimistic:Make guess at compile timeTest guess at run timeFall back to virtual call if necessary

Spring 2014X1-12Jim Hogg - UW - CSE - P501Slide13

Guarded DevirtualizationGuess receiver type is B (based on profile or other information)Call to B::foo is statically known - can be inlinedBut guard inhibits optimization

t1 = ldvtable xt7 = getvtable B

if t1 == t7 t3 = call B::foo(x)else t2 = ldvirtfunaddr t1, A::foo

t3 = call [t2] (x

)

Spring 2014

X1-

13

Jim Hogg - UW - CSE - P501Slide14

Guarded by Method TestGuess that method is B:foo outside guardMore robust, but more overheadHarder to optimize redundant guardst1 = ldvtable

xt2 = ldvirtfunaddr t1t7 = getfunaddr B::foo

if t2 == t7 t3 = call B::foo(x)else t2 = ldvirtfunaddr t1, A::foo

t3 = call [t2] (x)

Spring 2014

X1-

14

Jim Hogg - UW - CSE - P501Slide15

How to guess receiver?Profile informationRecord call site targets and/or frequently executed methods"monomorphic" versus "polymorphic"Class hierarchy analysisWalk class hierarchy at compile timeType analysisIntra/inter procedural data flow analysisSpring 2014X1-

15Jim Hogg - UW - CSE - P501Slide16

Class Hierarchy AnalysisWalk class hierarchy at compile-timeIf only one implementation of a method (ie, in the base class), devirtualize to that targetNot guaranteed in the presence of runtime class loadingStill need runtime test / fallbackSpring 2014

X1-16Jim Hogg - UW - CSE - P501Slide17

Flow-Sensitive Type AnalysisPerform a forward dataflow analysis propagating type infoAt each callsite compute possible set of typesUse type info of receiver to narrow targets.

A a1 = new B();a1.foo();if (a2 instanceof C) a2.bar(); Spring 2014X1-17

Jim Hogg - UW - CSE - P501Slide18

Alternatives to GuardingGuards impose overheadrun-time test on every call, merge points impede optimizationOften “know” only one target is invokedcall site is monomorphicAlternative: compile without guardsrecover as assumption is violated (eg, at class load, recompile)cheaper runtime test vs more costly recoverySpring 2014

X1-18Jim Hogg - UW - CSE - P501Slide19

Recompilation ApproachOptimistically assume current class hierarchy will never change wrt a call Devirtualize and/or inline call sites without guardOn violating class load, recompile caller methodRecompiled code installed before new classNew invocations will call de-optimized codeWhat about current invocations?Nice match with JIT compilingSpring 2014

X1-19Jim Hogg - UW - CSE - P501Slide20

Preexistence analysisIdea: if receiver object pre-existed the caller method invocation, then callsite is only affected by a class load in future invocationsIf new class C is loaded during execution of baz, x cannot have type C:

void baz(A x) { ... // C loaded here x.bar();

}Spring 2014X1-20Jim Hogg - UW - CSE - P501x is bound to object on entry; cannot refer to a CSlide21

Code-patchingPre-generate fallback virtual call out-of-lineOn an invalidating class load, overwrite direct-call with a jump to the fallback codeMust be thread-safe!On x86, single write within a cache line is atomicSelf-modifying code (also used for JIT "jump thunks")No recompilation necessary

Spring 2014X1-21Jim Hogg - UW - CSE - P501Slide22

Patching t3 = 2 //

B::foonext: ...fallback: t2 = ldvirtfunaddr t1, A::foo

t3 = call [t2] (x) goto nextSpring 2014X1-22Jim Hogg - UW - CSE - P501

B

::foo() { return 2; }

on class-load, stomp with jump fallback