/
New Track Seeding Techniques New Track Seeding Techniques

New Track Seeding Techniques - PowerPoint Presentation

liane-varnes
liane-varnes . @liane-varnes
Follow
386 views
Uploaded On 2017-06-27

New Track Seeding Techniques - PPT Presentation

at the CMS experiment Felice Pantaleo CERN felicecernch Overview Motivations Heterogeneous Computing Track seeding on GPUs during Run3 Run2 Track Seeding Online Offline Conclusion 2 3 ID: 563835

seeding pixel cms track pixel seeding track cms 2016 performance offline tracks tracking cells gpu hlt automaton cellular cpu

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "New Track Seeding Techniques" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

New Track Seeding Techniques at the CMS experiment

Felice Pantaleo

CERN

felice@cern.chSlide2

Overview

Motivations

Heterogeneous Computing

Track seeding on GPUs during Run-3Run-2 Track SeedingOnlineOfflineConclusion

2Slide3

3

Today the CMS online

farm consists of

~22k Intel Xeon coresThe current approach: one event per logical corePixel Tracks are not reconstructed for all the events at the HLT

This

will be even more difficult at higher pile-up

Combinatorial

time in pixel seeding O(m!) in worst caseMore memory/event

CMS High-Level Trigger in 2016

(1/2

)Slide4

CMS High-Level Trigger in 2016 (2/2)

4

full track reconstruction and particle flow e.g. jets, tau

Today the CMS online

farm consists of

~22k Intel Xeon cores

The current approach: one event per logical core

Pixel Tracks are not reconstructed for all the events at the HLTThis will be even more difficult at higher pile-upCombinatorial time in pixel seeding

O(

m

!) in worst case

More

memory/eventSlide5

CMS and LHC Upgrade Schedule

5

CMS

pixel detector upgradeSlide6

Terminology - CMS Silicon Tracker

6

Online:

Pixel-only tracks used for fast tracking and vertexing

Offline

:

Pixel tracks are used as seeds for the

Kalman filter in the strip detectorSlide7

Phase 1 Pixel detector (1/2)

The already complex online and offline track reconstruction will have to deal not only with a much more crowded environment but also with data coming from a more complex detector.

7Slide8

Phase 1 Pixel detector (2/2)

8

The already complex online and offline track reconstruction will have to deal not only with a much more crowded environment but also with data coming from a more complex detector. Slide9

Tracking at HLT

Pixel hits are used for pixel tracks, vertices, seeding

HLT Iterative tracking:

Iteration name

Phase0 Seeds

Phase1

SeedsTarget TracksPixel Trackstripletsquadruplets

Iter0

Pixel

Tracks

Pixel Tracks

Prompt, high

p

T

Iter1

triplets

quadruplets

Prompt, low

p

T

Iter2

doublets

triplets

High

p

T

,

recovery

9Slide10

Pixel Tracks

Evaluation of Pixel Tracks combinatorial

complexity could easily be dominated by track density and become the bottleneck of the High-Level Trigger and offline reconstruction execution times.

The CMS HLT farm and its offline computing infrastructure cannot rely anymore on an exponential growth of frequency guaranteed by the manufacturers.

Hardware

and algorithmic solutions have been

studied

10Slide11

Heterogeneous ComputingSlide12

CPU vs GPU architectures

CPU

GPU

12Slide13

CPU vs GPU architectures

Large caches (slow memory accesses to quick cache accesses)

Powerful ALUs

Low bandwidth to memory (tens GB/s)

In CMS:

One event per thread

Rested on our laurels, thanks to independency of events

HLT/

Reco

limited by I/O

Memory footprint a issue

CPU

13Slide14

CPU vs GPU architectures

Many Streaming Multiprocessors execute kernels (aka functions) using hundreds of threads concurrently

High bandwidth to memory (up to 1TB/s)

Number of threads in-fly increasesIn CMS:Cannot make threads work on different events due to SIMT architectureHandle heterogeneity to assign the job to the best matchLong term solution: unroll and offload combinatorics to many threads

GPU

14Slide15

Track seeding on GPUs during Run-3Slide16

From RAW to Tracks during run 3

Profit from the end-of-year upgrade of the Pixel to redesign the seeding code

Exploiting the information coming from the 4

th layer would improve efficiency, b-tag, IP resolutionTrigger avg latency should stay within 220msReproducibility of the results (bit-by-bit equivalence CPU-GPU)Integration in the CMS software frameworkIngredients:Massive parallelism within the eventIndependence from thread ordering in algorithmsAvoid useless data transfers and transformations

Simple data formats optimized for parallel memory access

Result:

A GPU based application

that takes RAW data and gives Tracks as result16Slide17

Algorithm Stack

17

Raw to Digi

Hits - Pixel Clusterizer

Hit Pairs

Ntuplets

- Cellular Automaton

Input, size linear with PU

Output, size ~linear with PUSlide18

Cellular Automaton (CA)

The CA is a track seeding algorithm designed for parallel architectures

It requires a list of layers and their pairings

A graph of all the possible connections between layers is createdDoublets aka Cells are created for each pair of layers (compatible with a region hypothesis)Fast computation of the compatibility between two connected cellsNo knowledge of the world outside adjacent neighboring cells required, making it easy to parallelize18Slide19

CAGraph of seeding layers

Seeding layers interconnections

Hit doublets for each layer pair can be computed independently by sets of threads

19Slide20

CA: R-z plane compatibility

The compatibility between two cells is checked only if they share one hit

AB and BC share hit B

In the R-z plane a requirement is alignment of the two cells:There is a maximum value of that depends on the minimum value of the momentum range that we would liketo explore

 

20Slide21

CA: x-y plane compatibility

In the transverse plane, the intersection between the circle passing through the hits forming the two cells and

the

beamspot is checked:They intersect if the distancebetween the centers d(C,C’)satisfies:r’-r < d(C,C’)

<

r’+r

Since it is a Out – In propagation,

a tolerance is added to the beamspot radius (in red)One could also ask for a minimumvalue of transverse momentum and reject low values of r’ 21Slide22

Cells Connection

blockIdx.x

and

threadIdx.x

= Cell id in a

LayerPair

Each cell asks its innermost hits for cells to check compatibility with.

22

blockIdx.y

=

LayerPairIndex

[0,13)Slide23

Evolution

If two cells satisfy all the compatibility requirements they are said to be neighbors and their state is set to 0

In the evolution stage, their state increases in discrete generations if there is an outer neighbor with the same state

At the end of the evolution stage the state of the cells will contain theinformation about the lengthIf one is interested in quadruplets, there will be surely one starting from a state 2 cell, pentuplets state 3, etc.23Slide24

24

24

T=0

T=1

T=2Slide25

Quadruplets finding

blockIdx.x

and

threadIdx.x

= Cell id in an outermost

LayerPair

blockIdx.y

=

LayerPairIndex

in

ExternalLayerPairs

Each cell on. an outermost layer pair will perform a DFS of depth = 4 following inner neighbors.

25Slide26

Triplet Propagation

26

Propagate 1-2-3 triplet to 4th layer and search for compatible hits

Natural continuation of the current approach from pairs to tripletsSlide27

Simulated Physics Performance PixelTracks

27

CA tuned to have same efficiency as Triplet Propagation

Efficiency significantly larger than 2016, especially in the forward region (|

η

|>1.5).Slide28

Simulated Physics Performance PixelTracks

28

Fake rate up to 40% lower than Triplet Propagation

Two orders of magnitudes lower than 2016 tracking thanks to higher purity of quadruplets wrt to tripletsSlide29

Timing

The physics performance and timing justified the integration of the Cellular Automaton in its sequential implementation at the HLT already in 2017

Hardware: Intel

Core

i7-4771@3.5GHz , NVIDIA

GTX 1080

29

Algorithmtime per event [ms]

2016 Pixel Tracks

29.3 ± 13.1

Triplet

Propagation

72.1 ± 25.7

GPU Cellular Automaton

1.2 ± 0.9

CPU Cellular Automaton

14 ± 6.2Slide30

Cellular Automaton @

Run-2 Offline Track Seeding Slide31

CA in offline tracking

The performance of the sequential Cellular Automaton at the HLT justified its integration also in the 2017 offline iterative tracking

31Slide32

CA in offline tracking

The performance of the sequential Cellular Automaton at the HLT justified its integration also in the 2017 offline iterative tracking

32Slide33

CA Physics performance vs 2016

33

Reconstruction efficiency increased

especially in forward region.Fake rate significantly reduced in the entire pseudo-rapidity rangeSlide34

CA Physics performance vs conventional

34

Overall track reconstruction efficiencies are very similar

Fake rate lower up to about 40% when transitioning from the barrel to the forward (|η

|>1.5).Slide35

CA Physics performance vs

PU

35

Efficiency slightly affected by the increasing pile-upMore fake tracks pass the track selection and their hits get removed from the pool of available hits

Fake rate ~quadratic dependency from PUSlide36

Timing vs PU

36

CA track seeding at same level of the 2016 seeding

More robust, smaller complexity vs PU than 2016 track seeding despite the increased number of layer combinations involved in the seeding phase with respect to the 2016 seeding

In pattern recognition, the CA seeding brings no additional gain

~20% faster track reconstruction

wrt

to 2016 tracking at avg PU70Slide37

Conclusion

Pixel Track seeding algorithms have been redesigned with

high-throughput parallel architectures in mind

Improvements in performance may come even when running sequentiallyFactors at the HLT, tens of % in the offline, depending on the fraction of the code that use new algos

CA algorithms with additional graph knowledge capability are very powerful

By adding more Graph Theory sugar, steal some work from the track building and become more flexible

The GPU and CPU algorithms run in CMSSW and produce the same bit-by-bit result

Transition to GPUs@HLT during Run3 smootherRunning Pixel Tracking at the CMS HLT for every event would become cheap @PU ~ 50 – 70Integration in the CMS High-Level Trigger farm under study

37Slide38

Backup

felice@cern.ch

38Slide39

Impact parameters resolution

39

The 2017 detector shows better performance than 2016 over all the η spectrum

.Slide40

Transverse momentum resolution

40

The

performance between 2016 and 2017 is comparable.