Benchmark using Intel Compiler Version 1502164 Build 20150121 Calebe de Paula Bianchini IPCCUNESP The Problem Some issues were detected using Intel Compiler with alwaysinline 10x slower to compile ID: 800528
Download The PPT/PDF document "always_inline performance" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
always_inline performance
Benchmark using Intel Compiler Version 15.0.2.164 Build 20150121Calebe de Paula BianchiniIPCC/UNESP
Slide2The Problem
Some issues were detected using Intel Compiler with always_inline10x slower to compile VecGeomUsing -j 1
: ± 80 minutesUsing -j 12: ± 18 minutes2x bigger lib file
libvecgeom.a
± 70 MB
What happen in GCC ?gcc version 4.8.4
1
Slide3The Benchmark
Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz6 x 32 KB L1 cache (data/instruction)6 x 256 KB L2 cache15 MB L3 shared cache32 GB of RAM
Vc lib & AVX enabledScript shape_benchmark.shNPOINTS = 1024NREP = 1024
JOBS = 10
CMS Shapes
: Box, Tube, Trapezoid, Cone, Polycone, Polyhedron
2
Slide4Intel Compiler Benchmark
inline onlyVECGEOM_INLINE = inlineRegular compilation for icc (-O3 -Wall -fPIC
-diag-disable 3438 -fno-alias -xAVX)
always_inline
w/
limite=10MVECGEOM_INLINE = inline __attribute__((always_inline)
)
Regular compilation
+
-
finline-limit=
10000000always_inlineVECGEOM_INLINE = inline __attribute__((always_inline))Regular compilation no lineVECGEOM_INLINE = inlineRegular compilation + -fno-inline -inline-level=0 –Winline
3
Slide5Intel Compiler Results
4
Slide6Intel Compiler Results
5
Slide7Intel Compiler Results
Compiler time (using -j 12)inline only: ± 13,0 minutesalways_inline w/ 10M: ± 17 minutes
always_inline: ± 17 minutesno inline: ± 2,0 minutesLib size
inline only:
32
MBalways_inline w/ 10M: 66 MBalways_inline
:
66 MB
no
inline:
53 MB
6
Slide8Intel Compiler Results
inline is faster than others modifiers (or combination)On the worst case, inline is similar to always_inlineOnly one case that it really loose:
Cone::DistanceToIn()
7
Slide9GCC Benchmark
inline onlyVECGEOM_INLINE = inlineRegular compilation for gcc without -finline-limit=10000000
always_inline w/ limite=10MVECGEOM_INLINE = inline __attribute__((always_inline))Regular compilation
(
-O2
-finline-limit=10000000 -ffast-math -ftree-vectorize
-
mavx
-
fabi
-version=6 -Wall -
fPIC)always_inlineVECGEOM_INLINE = inline __attribute__((always_inline))Regular compilation without -finline-limit=10000000no lineVECGEOM_INLINE = inlineRegular compilation + -Winline -fno-inline
8
Slide10GCC Results
9
Slide11GCC Results
10
Slide12GCC Results
Time (using -j 12)inline only: ± 1,0 minutesalways_inline w/ 10M: ± 2,0
minutesalways_inline: ± 2,0 minutesno inline: ± 1,0
minutes
Lib size
inline only: 22 MBalways_inline w/ 10M: 26
MB
always_inline
:
26
MB
no inline: 44 MB11
Slide13GCC Results
always_inline with 10M is usually faster than other modifiersin worst case, always_inline is similar to inlineBox::
DistanceToIn() and Trapezoid::DistanceToIn() are exceptions (?)
12
Slide14Next steps…
Build VecGeom with Profile Guided OptimizationThere are some evidences that ICC will increase performanceCompare the results
ICC & GCC with inline (only)ICC & GCC with always_inline
w/ 10M
13
Slide15Next steps…
14