/
always_inline   performance always_inline   performance

always_inline performance - PowerPoint Presentation

telempsyc
telempsyc . @telempsyc
Follow
356 views
Uploaded On 2020-08-06

always_inline performance - PPT Presentation

Benchmark using Intel Compiler Version 1502164 Build 20150121 Calebe de Paula Bianchini IPCCUNESP The Problem Some issues were detected using Intel Compiler with alwaysinline 10x slower to compile ID: 800528

gcc inline compilation results inline gcc results compilation intel compiler 10m benchmark inlineregular icc limit regular attribute finline lib

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "always_inline performance" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

always_inline performance

Benchmark using Intel Compiler Version 15.0.2.164 Build 20150121Calebe de Paula BianchiniIPCC/UNESP

Slide2

The Problem

Some issues were detected using Intel Compiler with always_inline10x slower to compile VecGeomUsing -j 1

: ± 80 minutesUsing -j 12: ± 18 minutes2x bigger lib file

libvecgeom.a

± 70 MB

What happen in GCC ?gcc version 4.8.4

1

Slide3

The Benchmark

Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz6 x 32 KB L1 cache (data/instruction)6 x 256 KB L2 cache15 MB L3 shared cache32 GB of RAM

Vc lib & AVX enabledScript shape_benchmark.shNPOINTS = 1024NREP = 1024

JOBS = 10

CMS Shapes

: Box, Tube, Trapezoid, Cone, Polycone, Polyhedron

2

Slide4

Intel Compiler Benchmark

inline onlyVECGEOM_INLINE = inlineRegular compilation for icc (-O3 -Wall -fPIC

-diag-disable 3438 -fno-alias -xAVX)

always_inline

w/

limite=10MVECGEOM_INLINE = inline __attribute__((always_inline)

)

Regular compilation

+

-

finline-limit=

10000000always_inlineVECGEOM_INLINE = inline __attribute__((always_inline))Regular compilation no lineVECGEOM_INLINE = inlineRegular compilation + -fno-inline -inline-level=0 –Winline

3

Slide5

Intel Compiler Results

4

Slide6

Intel Compiler Results

5

Slide7

Intel Compiler Results

Compiler time (using -j 12)inline only: ± 13,0 minutesalways_inline w/ 10M: ± 17 minutes

always_inline: ± 17 minutesno inline: ± 2,0 minutesLib size

inline only:

32

MBalways_inline w/ 10M: 66 MBalways_inline

:

66 MB

no

inline:

53 MB

6

Slide8

Intel Compiler Results

inline is faster than others modifiers (or combination)On the worst case, inline is similar to always_inlineOnly one case that it really loose:

Cone::DistanceToIn()

7

Slide9

GCC Benchmark

inline onlyVECGEOM_INLINE = inlineRegular compilation for gcc without -finline-limit=10000000

always_inline w/ limite=10MVECGEOM_INLINE = inline __attribute__((always_inline))Regular compilation

(

-O2

-finline-limit=10000000 -ffast-math -ftree-vectorize

-

mavx

-

fabi

-version=6 -Wall -

fPIC)always_inlineVECGEOM_INLINE = inline __attribute__((always_inline))Regular compilation without -finline-limit=10000000no lineVECGEOM_INLINE = inlineRegular compilation + -Winline -fno-inline

8

Slide10

GCC Results

9

Slide11

GCC Results

10

Slide12

GCC Results

Time (using -j 12)inline only: ± 1,0 minutesalways_inline w/ 10M: ± 2,0

minutesalways_inline: ± 2,0 minutesno inline: ± 1,0

minutes

Lib size

inline only: 22 MBalways_inline w/ 10M: 26

MB

always_inline

:

26

MB

no inline: 44 MB11

Slide13

GCC Results

always_inline with 10M is usually faster than other modifiersin worst case, always_inline is similar to inlineBox::

DistanceToIn() and Trapezoid::DistanceToIn() are exceptions (?)

12

Slide14

Next steps…

Build VecGeom with Profile Guided OptimizationThere are some evidences that ICC will increase performanceCompare the results

ICC & GCC with inline (only)ICC & GCC with always_inline

w/ 10M

13

Slide15

Next steps…

14