2 nd AACL Workshop on DAQLHC Chateau de Bossey 12 April 2016 Beat Jost Cern Disclaimer Of course I cannot give an indepth description of all aspects of the system For this an entire day would prob not be sufficient ID: 593160
Download Presentation The PPT/PDF document "The LHCb Run 2 Online System" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The LHCb Run 2 Online System
2
nd
AACL Workshop on DAQ@LHC
Chateau de
Bossey
12 April 2016
Beat Jost /
CernSlide2
Disclaimer
Of course I cannot give an in-depth description of all aspects of the system. For this an entire day would prob. not be sufficient…
I will therefore focus on some specific areas
2
LHCb Run-2 System, DAQ@LHC Workshop Slide3
The LHCb Detector
3
LHCb Run-2 System, DAQ@LHC Workshop
Vertexing
K
s
Identification
Tracking
p-Measurement
Particle ID
Calorimetry
Trigger Support
Muon
ID
Trigger SupportSlide4
Design Goals of the LHCb Online System
Communality and standardisation
As few different flavours of the components and protocols as possible
Scalability
The architecture and protocols should not need changes if the requirements are increased (new subdetectors, data rates, etc…)Obviously the system would get bigger, but no basic changes are needed
Simplicity and ease of use
Simple protocols and ‘stupid’ basic components
Same look-and-feel for all components
Central guidelines for ease of integration into overall system
4
LHCb Run-2 System, DAQ@LHC Workshop Slide5
Overall Architecture
5
LHCb Run-2 System, DAQ@LHC Workshop
Average event size
60
kB
Average rate into farm 1 MHz
Average rate to tape
~12
kHz
HLT farm
Detector
TFC
System
L0 trigger
LHC clock
MEP Request
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
C
P
U
Readout
Board
Experiment Control System
VELO
ST
OT
RICH
ECal
HCal
Muon
L0
Trigger
SWITCH
MON farm
C
P
U
C
P
U
C
P
U
C
P
U
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
FE
Electronics
Front - End
Event Building
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
SWITCH
READOUT NETWORK
Event data
Timing and Fast Control Signals
Control and Monitoring data
~60
GB/s
~700 MB/sSlide6
Main Components
Timing and Fast Control (TFC)
Responsible for distributing the LHC clock and (beam)synchronous commands, such as e.g. L0 Trigger decisions
Data Acquisition System
Responsible for transferring the data from the front-end electronics to the storage, via the High-Level Trigger (HLT farm)Experiment Control SystemResponsible for setting-up the detector and the TFC and DAQ system in a coherent mode for taking physics data and monitoring the proper operation of the entire experiment.
6
LHCb Run-2 System, DAQ@LHC Workshop Slide7
TFC System
Designed around RD12-TTC system
Functional Components
Clock transmission
TFC signal generation
Readout Supervisor
TFC signal distribution
Switches
RD12-TTC System (
TTCtx’s
, Optical Couplers,
TTCRx’s
)TFC signal handlingFront-End Electronics
Buffer overflow control
Central in RS (synchronous Part)Throttle signals and infrastructure (ORs and Switches)
7LHCb Run-2 System, DAQ@LHC Workshop Slide8
TFC Features
The main feature is the support of partitioning, i.e. the possibility to run different parts of the detector simultaneously and concurrently
For each accepted event (max. 1 MHz) a TTC broadcast message is sent to let the front-end electronics know what kind of event is triggered.
Contiguous events are packed at the Readout board (
upto 16) to minimize the overheads through the readout network. The event packing is controlled by the TFC system (Readout Supervisor) via TTC broadcasts.
8
LHCb Run-2 System, DAQ@LHC Workshop Slide9
Data Acquisition System
The main components of the DAQ system are
A common readout board
acquires the data from the front-end electronics, performs Zero-Suppression and formats the data for transfer to the HLT farm
Upto 24 1.6 Gb/s (GOL) input linksMax 4
GbEthernet
output links
335 units
A large
GbEthernet
-based network (~1500 ports) that connects the Readout Boards to the CPU nodes in the HLT farm
Two main routers (Force10)Upto 1280 GbEthernet ports eachDeep buffering
9
LHCb Run-2 System, DAQ@LHC Workshop Slide10
Data Acquisition System
HLT CPU farm
Organized in 62
Subfarms
(28-32 nodes each) Total 1780 nodesConnected to the main routers via edge-routers (1 per subfarm) normally 12 Gb input links from main routers (some have 2 10G links)
mainly Intel (some 400 AMD)
Input rate: 1 MHz*~60 kB
Output rate: ~12.5 *~60 kB
Storage System
4 nodes receiving the data from the Farm nodes connected with 10GEthernet to the farm
6 Store Nodes connected with 10G towards the receive nodes and
FiberChannel to the disk controllerTotal disk space available ~400 TB
Used largely as temporary buffer for the event data before sent to CastorAlso holds the home and group directoriesFiles exported via NFS/Samba to all other nodes
10
LHCb Run-2 System, DAQ@LHC Workshop Slide11
Protocols
Front-End to Readout Board
Pure push protocol, driven by TFC Triggers
No Zero Suppression on Front-End Electronics
Readout Board has to be prepared to accept the next triggerOtherwise raise throttle to stop triggeringReadout Board to Farm CPU
Pure push protocol of Raw IP packets
Destination defined by TFC (Readout Supervisor, long TTC broadcast), based on token previously sent by farm nodes
Thanks to token mechanism, farm node is guaranteed to be able to receive the data
This protocol relies heavily on deep buffering in the readout network
Upon trigger to send data, the output port of the switch is typically overloaded by a factor 300!
11
LHCb Run-2 System, DAQ@LHC Workshop Slide12
Inside a Farm Node
Basic building blocks are
Buffer Manager allowing to share data between processes
Producers and Consumers to the Buffer Manager
Process architecture is defined by an ‘Architecture File’, built by a graphical interface
12
LHCb Run-2 System, DAQ@LHC Workshop
Passthrough
HLT1 (Physics)
HLT2 (Physics)Slide13
Experiment Control System
LHCb has an integrated control system
Same tools for all aspects of controls, such as
HV/LV controls and monitoring
Configuration of electronics (front-end, readout boards, etc…)Configuration and monitoring of the software components (trigger, readout, storage,
etc
…)
The ECS is based on
WinCC
-OA (former PVSS)
Enhanced by lot of tools and utilities (JCOP, LHCb)
Has the advantage that the functionality (archiving, smoothing, trending, etc…) of WinCC-OA is immediately applicable to all aspectsSequencing and communication to components outside
WinCC-OA are based on DIM and SMI++Both tools are integrated in WinCC-OA for easy use by usersSMI++ is a rule-based system allowing to implement Finite State Machines (FSM)
13
LHCb Run-2 System, DAQ@LHC Workshop Slide14
ECS Architecture
The design of the ECS is based on a hierarchical, tree-like architecture
14
LHCb Run-2 System, DAQ@LHC Workshop Slide15
Access to Electronics
Two basic application areas in LHCb
On/near detector with moderate radiation, but potential of Single Event Upsets (SEU)
Counting room and surface, with no radiation problems
Applied Solutions
1a)
SPECS
(
S
erial
P
rotocol for ECS)
evolution of Atlas SPACradiation tolerant
1b) Atlas ELMB-based Solution
based on CAN-busradiation tolerantCredit-Card PCs
PCs with very small form-factor (85x65 mm2)very flexible
Ethernet
15LHCb Run-2 System, DAQ@LHC Workshop Slide16
16
LHCb Run-2 System, DAQ@LHC Workshop
Control & Monitoring of DAQ Components
Two different protocols for FEE
ELMB through CAN bus (Muon system)
SPECS (
S
erial
P
rotocol for
ECS
)
One protocol for all Readout Boards
CAN Interface
ELMB
ELMB
ELMB
FEE
FEE
FEE
OPC
Server
SPECS Master
SPECS
slave
FEE
DIM
Server
SPECS
slave
FEE
SPECS
slave
FEE
CCPC
Gluelogic
FPGA
DIM
Server
TTC
Others
I
2
C
, JTAG, Parallel busSlide17
Hardware Definition for the ECS
A generic tool has been developed that allows to define hardware components (down to individual bit of registers)
e
.g. Tell1 board
This hardware type can then be instantiated a libido to physical boardsThis created all necessary WinCC-OA data points and (if necessary) the DIM communication points to task to the board.
WinCC
-OA scripts and panels will then use these data points to configure and monitor the board
17
LHCb Run-2 System, DAQ@LHC Workshop Slide18
The experiment is operated with 2 persons on shift
(since day 1…)
Shift leader (SLIMOS) operates the Experiment via (basically) 2
WinCC-OA panels
Run Control‘Big Brother’Data Manger, checking histograms to ensure good data qualityMany operations and error recoveries are handled by an ‘Auto Pilot’, this allows that the shifts can be manned by non-experts
The Auto Pilot is a set of FSM rules to recover from errors or anomalies
Basic credo is ‘Keep System in Running State at all costs’
BigBrother
takes input from e.g. the LHC and drives the experiment into the desired state and configuration
Handles e.g. automatically the switching from HLT1 operation to HLT2 operation after beam-loss.
Etc
…
Operating the Experiment18
LHCb Run-2 System, DAQ@LHC Workshop Slide19
Status Panels
Of course there are many Panels that show the state of specific areas of the system
19
LHCb Run-2 System, DAQ@LHC Workshop Slide20
Summary
The LHCb Run-2 Online system is the result of many evolutionary steps, since the TDR in 2001.
However the basic architecture of the system has not fundamentally changed, besides removing obsolete components
But many new features have been added since the
startup of the LHC, such asThe deferred HLT in 2012
The split HLT in 2015
The choices made at the time for homogeneity and standardisation have payed off
Especially the integrated control system allowed major economisation in development effort and eases the debugging efforts
The system works very well and will surely be a good starting point for the upgraded system post-LS2
20
LHCb Run-2 System, DAQ@LHC Workshop