/
Event Service Wen Guan University of Wisconsin Event Service Wen Guan University of Wisconsin

Event Service Wen Guan University of Wisconsin - PowerPoint Presentation

bitechmu
bitechmu . @bitechmu
Follow
342 views
Uploaded On 2020-07-02

Event Service Wen Guan University of Wisconsin - PPT Presentation

1 Content EventService Event Service Introduction Event Service queue setup Event Service Monitor Yoda Event Service on HPC Yoda on HPC Yoda on Edison Yoda on ARC 2 What is Event Service ID: 792413

service job yoda event job service event yoda queue rank arc outputs panda hpc pilot json droid getjob hpcmanager

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Event Service Wen Guan University of Wis..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Event Service

Wen GuanUniversity of Wisconsin

1

Slide2

Content

EventService

Event Service Introduction

Event Service queue setup

Event Service MonitorYoda: Event Service on HPCYoda on HPCYoda on EdisonYoda on ARC

2

Slide3

What is Event Service

Event level processing

3

Slide4

Event Service Processing

GetJob

Pilot

S3 objectstore

panda

GetEvents

Process

StageOut

updateEvent

job

Events

(1-10),(10-20)…(990-1000)

Merge_job

getJob

stagein

merge

stageout

Pilot

dCache

/

dpm

4

Slide5

Difference between ES job and Normal Job

Pilot runs getJob

to request work from Panda.

A payload is returned from Panda with can be normal or ES work

eventService=True for ES job.Normal job doesn’t have it.Pilot parses the payload.Pilot automatically selects different processes for different jobs.

5

Slide6

Define ES Queue

Difference from normal queue:Corecount

can be 1. cannot be None.

catchall:

localEsMergejobseed=es or std

(non-

es

) or

all

(

es

and non-es)jobseed is used by panda to schedule ES jobs to the queue.Attach

ObjectstoreIf no OS attached, ES job will failIn AGIS, associate OS to the queueA default OS is already attached to a queue.Example:https://atlas-agis.cern.ch/agis/pandaqueue

/detail/Arizona_ES/full/6

Slide7

Attach OS to queue(1)

7

Slide8

Attach OS to queue(2)

8

Slide9

Event Service Monitor

9

Slide10

Event Service Monitor

10

Slide11

Event Service Monitor

11

Slide12

Event Service Monitor

12

Slide13

Summary

Easy to setup an ES queue.Documentation available, comments on it welcome

https://twiki.cern.ch/twiki/bin/view/PanDA/EventServiceOperations

Including OS setup for a queue.

Also some debug info.https://twiki.cern.ch/twiki/bin/view/PanDA/EventServerHelp:atlas-comp-event-service@cern.ch Already many ES queues

13

Slide14

Content

EventServiceEvent Service IntroductionEvent Service queue setup

Event Service Monitor

Yoda: Event Service on HPC

Yoda on HPCYoda on EdisonYoda on ARC

14

Slide15

Yoda on HPC

PurposeMake use of HPC with many CPUs in one job.No outbound internet connection

Prevent us from conventional ES

Yoda:

Run ES as a single MPI job.15

Slide16

Schematic view of Yoda

16

Slide17

Yoda on NERSC (in production)

 

 

 

 

 

 

 

 

 

 

 

Frontend(login machine)

submit job

poll job

poll outputs

 

 

HPCManager

slurm

Plugin

getJob

(from Panda)

stageIn

getEventRanges

getOutputs

(from

HPCmanager

)

stageOut

 

 

RunJobHPCEvent

Pilot

 

 

 

 

 

 

 

 

 

 

 

HPC cluster

 

 

 

 

 

 

HPCJob

Yoda

Rank 0

Droid

Rank 1

Droid

Rank n

 

 

Share File system

Input Files, PFC

job.json,events.json

 

outputs

 

outputs

 

17

Slide18

Yoda on ARC (testing)

 

 

 

 

 

 

 

 

 

 

 

Frontend(login machine)

submit job

poll job

poll outputs

 

 

HPCManager

slurm

Plugin

 

 

 

 

 

 

 

 

 

 

 

HPC cluster

 

 

 

 

 

 

HPCJob

Yoda

Rank 0

Droid

Rank 1

Droid

Rank n

 

 

Share File system

Input Files,

job.json,events.json

 

outputs

 

outputs

 

getJob

(from Panda)

stageIn

getEventRanges

getOutputs

(from

HPCmanager

)

stageOut

 

 

RunJobHPCEvent

mpirun

 

 

HPCManager

MPI Plugin

 

 

CE

18

Slide19

Yoda on ARC (testing)

 

 

 

 

 

 

 

 

 

 

 

HPC cluster

 

 

 

 

 

 

HPCJob

Yoda

Rank 0

Droid

Rank 1

Droid

Rank n

 

 

Share File system

Pilot, Input

Files,

job.json,events.json

 

outputs

 

outputs

 

 

 

CE

 

 

ARC Control Tower

Release the interactive node

19

Slide20

Summary

Yoda is an ES solution on HPCProduction Running on NERSC.Since last year on Edison HPC.

Switch from PBS to

Slurm

on Edison.Tested on new NERSC Cori system.Yoda on ARC.Release the interactive node.Simulated on NERSC Edison.Integrating testing with ARC-CT.Will be tested on ARC sites.

20