/
Streaming mode data acquisition and data processing at Jefferson Lab Streaming mode data acquisition and data processing at Jefferson Lab

Streaming mode data acquisition and data processing at Jefferson Lab - PowerPoint Presentation

AngelFace
AngelFace . @AngelFace
Follow
344 views
Uploaded On 2022-08-03

Streaming mode data acquisition and data processing at Jefferson Lab - PPT Presentation

E nvironment for R ealtime S treaming A cquisition and P rocessing FBP based reactive datastream processing framework V Gyurjyan G Heyes D Lawrence C Timmer N Brey C Cuevas D Abbott B ID: 934247

dst data process dpe data dst dpe process node clara micro processing service monitor stream services message ersap jlab

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Streaming mode data acquisition and data..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Streaming mode data acquisition and data processing at Jefferson Lab

Environment for Real-time Streaming, Acquisition and Processing FBP based, reactive data-stream processing framework

V. Gyurjyan, G. Heyes, D. Lawrence, C. Timmer, N. Brey, C. Cuevas, D. Abbott, B. Raydo, C. Larrieu, M. Diefenthaler

Slide2

Outline

Looking forward to future experiments at the JLAB and EICReevaluate existing trigger based readout systemsReactive, actor-model based programming paradigm: FBP (Flow Based Programming)ERSAP reactive micro-services based data-stream processing framework for SRO and data stream processing, that utilizes theoretical FBP paradigm2

Slide3

Triggered vs. SRO

TS

L1

L2

L3

FE

Modules

L1 accept

L2 accept

L3 accept

clear

busy

ROCs

trigger type

strob

ack

EB

DL

ER

FE

Modules

EB

EH

ER

user

user

3

TS

Trigger supervisor

ROC

Readout controller

EB

Event Builder

EH

Event Hub

ER

Event Recorder

DL

Data Lake/tiered storage

Challenging to deal with event-pileups

Not ideal to read general purpose detectors

Cary bias to low-energy particles

Data is digitized at fixed rate

Data is read out continuous parallel streams

Data is cooled down at the tiered storages

CODA

ERSAP

DL

Slide4

Flow-Based Programming Paradigm

4

Proposed in the late 60s by J. Paul Rodker Morrison

“Assembly line” data processing

Data flows through asynchronous, concurrent processors (“black box” micro-services)

Micro-services communicate via data chunks (called information packets or data-quanta)

Data-quanta are traveling across predefined connections (conveyor belts), where connections are specified externally to the processors.

Data is pushed through processors, while processors are reacting on passing data quantum.

Slide5

Streaming system components: Tiered storage model

5

FEDetector 1

FE

Detector 2

FE

Detector N

FE

Buffer

FE

Buffer

FE

Buffer

Aggregator

Processor

Processor

Online

Storage

Processors

Nearline

Storage

Processors

Permanent

Storage

Tier 1

Tier 2

Tier N

Final Tier

Storage capacity/ retention time >= . Processors latency

Components

Framework Support

A stream-oriented timing system

A standard stream data format

Front-end electronics that outputs time-based streams of data

Efficient streaming data transport

A stream oriented random-access data storage tier

A framework for data processing tasks (virtual triggers, calibration, reconstruction, monitoring, data storage, etc.)

A framework for data-flow orchestration and data-stream processing application design and deployment

Detector

DAQ

Online/nearline data processing

Processors

Offline data processing

Slide6

ERSAP architecture

Reactive event driven actor (micro-service), data-stream pipe, and orchestrator.Stream of data quanta, flowing through directed graph of actors.Application is a network of independent “black box” actors.

Data moves across actors not instructionsActors communicate with each other by exchanging the data quanta across predefined connections by message passing, where connections are specified externally to actors. User provided data processing single-threaded algorithms (engines) are presented as fully scalable actors in the framework.

S9

S6

S2

S8

Data

stream pipe

Data processing station

Workflow manager

Service engine

S1

S3

S4

S5

S7

S10

6

Slide7

Data processing station: actor

7User engine run-time environment.

Engine follows data-in/data-out interface.Engine gets JSON object for run-time configuration.

Data Processing Station

Data Processing Actor

(micro-services)

Multi-threading

Configuration

Runtime Environment

Communication

User provided code

Engine

Slide8

ERSAP current and potential deployments

8ERSAP active deployments

Successful beam test at CLAS12Early streaming readout prototype for measurement of inclusive electroproduction of neutral pions, including ML for cluster reconstructionOngoing work at INDRAData processing of GEM detector readout, including autonomous monitoring and calibration using ML.

CODA SRO test setups

Topics:

autonomous control and experimentation

reconstruction and physics analysis

physics event generation and phenomenology

detector optimization and design

Lightning talks covering 33 AI/ML projects

Projects limited to EPSCI, experimental physics, and theory,

not including various other projects in the Accelerator and

Computer Science & Technology Divisions .

Many of these projects can run within the ERSAP as micro-services based applications.

Slide9

Streaming data transport (conveyer belt)

9

Communication

Data-Stream Pipe

Meta-description

Serialization

0MQ/POSIX_SHM/Data-Grid

Publish/Subscribe

P2P

Transient Data

Slide10

Workflow orchestrator

10

Orchestrator

Hardware Optimizations

Service Registration/Discovery

Data-Set Handling

and Distribution

Farm (batch or cloud) Interface

Application Deployment

and Execution

Application Monitoring,

Real-time Benchmarking

Command-Line Interface

Exception Logging and Reporting

Slide11

ERSAP DAQ micro-services

11

Ag

EP

Ag

CPU 600%

Resident memory 5.4GByte

Throughput

5.3

GB/sec

Ag

Based on LMAX disruptor technology

aaabbbbbcc

ffffddddssss

abccc

ffddds

aaaaabc

fffddsssss

bccccc

f

dddss

aacccdddddd

ffffddddss

ffffddddssss

ffddds

fffddsssss

f

dddss

ffffddddss

aaabbbbbcc

abccc

aaaaabc

bccccc

aacccdddddd

Ag

Ep

65536ns

Intel Xeon (R) Gold 5218 @ 2.3 GHz

Slide12

Advantages of the FBP based micro-services model

Artifacts are small, simple and independentEasier to understand and develop Reduced develop-deploy-debug cycle Easy to migrate to dataScales independentlyIndependent optimizationsImproves fault isolationEasy to embrace hardware as well as software heterogeneity. Eliminates long term commitment to a single technology stack.

12

Slide13

Summary and future plans

ERSAP is an architecture for streaming readout and real-time processing of NP experiments.Combines decade-long experience: CLARA, JANA2 and CODAERSAP Java binding, betta release: https://github.com/JeffersonLab/ersap-java.gitERSAP C++ binding development in progress: https://github.com/JeffersonLab/ersap-cpp.gitERSAP Python

binding in the design stagePlans to design ERSAP Julia binding Many ERSAP engine development projects in progressCODA engines: https://github.com/JeffersonLab/ersap-vtp.gitJANA2 based engines: https://github.com/JeffersonLab/ersap-java.git

TriDAS

engines:

https://github.com/JeffersonLab/ersap-tridas.git

CLAS12 AI reconstruction engines

INDRA ASTRA project ML engines

Collaborative effort between JLAB Physics and CST divisions.

13

Slide14

14

Thank you.

Slide15

Actor deployment optimization

15

In-Process SHM

In-Process SHM

In-Process SHM

Node-1

Node-2

DPE-1

DPE-2

DPE-3

POSIX Shared Memory

In-Memory Data-Grid

Detector-1

Detector-2

Meta-data

1

2

3

4

5

6

7

Slide16

Heterogeneous deployment algorithm

16

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

In-Process SHM

Farm Node

Java-DPE

In-Memory Data-Grid

FTOF

DCHBc

EC

C++-DPE

DCHBg

 

 

 

 

P

g

=

 

P

c

=

 

 

Slide17

Monoliths vs. Micro-services

Pros

Strong coupling, thus better performance

Full control of your application

Cons

No agility for isolating, compartmentalizing and decoupling data processing functionalities, suitable to run on diverse hardware/software infrastructures

No agility for rapid development or scalability

API Gateway / Proxy

Pros

Technology independent

Fast iterations

Small teams

Fault isolation

Scalable

Cons

Complexity networking (distributed system)

Requires administration and real-time orchestration

The Art of Scalability.

by Martin L. Abbott and Michael T.

Fisher.

ISBN-13:

 978-0134032801

Horizontal scaling

Vertical scaling or multi-treading

Z scaling: micro-services

17

Slide18

Event-Driven vs Message-Driven

18

S1

S1

S2

S1 event broadcasting

S1 message has a

clear destination

Event Driven

Message Driven

Slide19

Sub-event level parallelization

19

FCAL

FTHODO

FTEB

DCHB

FTOFTB

EC

CVT

EBTB

CTOFTB

DCTB

EBHB

LTCC

HTCC

RICH

CND

MAGF

FTOFHB

Slide20

Data-quantum size and GPU occupancy

20

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

In-Process SHM

Farm Node

Java-DPE

In-Memory Data-Grid

FTOF

EC

C++-DPE

DCHBg

 

set data-quantum size

Slide21

Data-processing chain per NUMA

21

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

Processing Node/DPE

Clara

Monitor

Server

InfluxDB

service

Grafana

DB

claraweb.jlab.org

JSON data-format

Binary/HIPO

DST

Remote

Client

Histogram

Analyzer

Archiver

DB

RS

0.07ms

WS

0.25ms

Monitor Node/DPE

Filter

TLS

DST

Data

DST

WS

DST Node/DPE/process

CLARA micro-service

Can be deployed as a separate process or

a thread within a process. Multi-threaded

CLARA transient data-stream

Message passing through pub-sub middleware. No dependencies between micro-services.

In-Process SHM

Farm Node

Java-DPE

Start DPE pinned to a NUMA socket.

In-Process SHM

NUMA 1

Back pressure

control

NUMA 0

Java-DPE

Slide22

ERSAP potential actors

22

MAGF

FCAL

FTHODO

FTEB

DCHB

FTOFHB

EC

CVT

EBTB

FTOFTB

DCTB

EBHB

LTCC

HTCC

BAND

CND

EH

Data Quality Assurance monitoring

Event Reconstruction Application

J/

 

 

 

 

 

 

 

Physics Analysis Application

CLAS12 off-line data processing