/
Xrootd usage @ LHC An up-to- Xrootd usage @ LHC An up-to-

Xrootd usage @ LHC An up-to- - PowerPoint Presentation

TheOneWithNoFilter
TheOneWithNoFilter . @TheOneWithNoFilter
Follow
342 views
Uploaded On 2022-08-01

Xrootd usage @ LHC An up-to- - PPT Presentation

date technical survey about xrootdbased storage solutions Outline Intro Main use cases in the storage arena Generic Pure xrootd LHC The AtlasSLAC way The Alice way CASTOR2 Roadmap Conclusions ID: 932026

cern xrootd usage data xrootd cern data usage lhc furano castor access file client performance xroot read server wan

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Xrootd usage @ LHC An up-to-" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Xrootd usage @ LHC

An up-to-

date technical survey about xrootd-based

storage solutions

Slide2

Outline

Intro

Main use cases in the storage arena

Generic Pure xrootd @ LHCThe Atlas@SLAC wayThe Alice wayCASTOR2RoadmapConclusions

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide3

Introduction and use cases

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide4

The historical Problem: data access

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Physics experiments rely on rare events and statistics

Huge amount of data to get a significant number of eventsThe typical data store can reach 5-10 PB… nowMillions of files, thousands of concurrent clients

The transaction rate is very high

Not uncommon O(10

3

) file opens/sec per cluster

Average, not peak

Traffic sources: local GRID site, local batch system, WAN

Up to O(10

4

) clients per server!

If not met then the outcome is:

Crashes, instability, workarounds, “need” for crazy things

Scalable high performance direct data access

No imposed limits on performance and size, connectivity

Higher performance, supports WAN direct data access

Avoids WN under-utilization

No need to do inefficient local copies if not needed

Do we fetch entire websites to browse one page?

Slide5

The

Challenges

LHC User Analysis

Boundary Conditions

GRID environment

GSI authentication

User space deployment

CC environment

Kerberos, - admin deployment

High I/O load

Moderate Namespace load

Many clients

O(1000-10000)

Sequential File Access

Sparse File Access

Basic Analysis (

today

)

RAW, ESD

Advanced Analysis (tomorrow)ESD,AOD, Ntuple, Histograms

Batch Data Access

Interactive Data Access

RAP root, dcap,rfio ....

MFS Mounted File Systems

T0/T3

@ CERN

Preferred interface is MFSEasy, intuitive, fast response, standard applicationsModerate I/O loadHigh Namespace load CompilationSoftware startupsearchesLess Clients O(#users)

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide6

Main requirement

Data access has to work reliably at the desired scale

This also means:

It has not to waste resourcesF.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide7

A simple use case

I am a physicist, waiting for the results of my analysis jobs

Many bunches, several outputs

Will be saved e.g. to an SE at CERNMy laptop is configured to show histograms etc, with ROOTI leave for a conference, the jobs finish while in the planeWhen there, I want to simply draw the results from my home directoryWhen there, I want to save my new

histos

in the same place

I have no time to loose in tweaking to get a copy of everything. I loose copies into the confusion.

I want to leave the things where they are.

I know nothing about things to tweak.

What can I expect? Can I do it?

F. Furano, A. Hanushevsky - Scalla/xrootd WAN globalization tools: where we are. (CHEP09)

Slide8

Another use case

ALICE analysis on the GRID

Each job reads ~100-150MB from ALICE::CERN::SE

These are cond data accessed directly, not file copiesI.e. VERY efficient, one job reads only what it needs.It just works, no workaroundsAt 10-20MB/s it takes 5-10 secs

(most common case)

At 5MB/s it takes 20secs

At 1MB/s it takes 100

Sometimes data are accessed elsewhere

Alien allows to save a job by making it read data from a different site. Very good performance

Quite

often the results are written/merged

elsewhere

F. Furano, A. Hanushevsky - Scalla/xrootd WAN globalization tools: where we are. (CHEP09)

Slide9

Pure Xrootd

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide10

xrootd Plugin Architecture

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

lfn2pfn

prefix encoding

Storage System

(oss,

drm/srm

, etc)

authentication

(gsi, krb5, etc)

Clustering

(

cmsd

)

authorization

(name based)

File System

(ofs, sfs, alice, etc)

Protocol (1 of n)

(xrootd)

Protocol Driver

(XRD)

Slide11

The client side

Fault tolerance in data access

Meets WAN requirements, reduces jobs mortality

Connection multiplexing (authenticated sessions)Up to 65536 parallel r/w requests at once per client processUp to 32767 open files per client processOpens bunches of up to O(1000) files at once, in parallel

Full support for huge bulk

prestages

Smart

r/w

caching

Supports normal

readaheads

and “Informed Prefetching”

Asynchronous background writes

Boosts writing performance in LAN/WAN

Sophisticated integration with ROOT

Reads in advance the “right” chunks while the app computes the preceding onesBoosts read performance in LAN/WAN (up to the same order)F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide12

The Xrootd

“protocol”

The

XRootD protocol is a good oneEfficient, clean, supports fault-tolerance etc. etc…It doesn’t do any magic, howeverIt does not multiply your resourcesIt does not overcome hw bottlenecks

BUT it allows the true usage of the hw resources

One of the aims of the project still is

sw

quality

In the carefully crafted pieces of

sw

which come with the distribution

What makes the difference with

Scalla/XRootD

is:

Scalla/XRootD Implementation details (performance + robustness)

And bad performance can hurt robustness (and vice-versa)Scalla SW architecture (scalability + performance + robustness)Designed to fit the HEP requirementsYou need a clean design where to insert itBorn with efficient direct access in mindBut with the requirements of high performance computingCopy-like access becomes a particular caseF.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide13

Pure Xrootd @ LHC

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide14

The Atlas@SLAC

way with XROOTD

Pure Xrootd + Xrootd-based “

filesystem” extensionAdapters to talk to BestMan SRM and GridFTP

More details in

A.Hanushevsky’s

talk @ CHEP09

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Scalla Cluster

xrootd

/

cmsd

/

cnsd

Data

Data

Data

Data

Data

Data

Data

DIR

F

U

S

E

A

D

A

P

T

E

R

F

U

S

E

GridFTP

Fire Wall

SRM

GRID

Clients

Slide15

The ALICE way with XROOTD

Pure Xrootd + ALICE strong

authz

plugin. No difference among T1/T2 (only size and QOS)WAN-wide globalized deployment, very efficient direct data accessCASTOR at Tier-0 serving data, Pure Xrootd serving conditions to the GRID jobs“Old” DPM+Xrootd

in several tier2s

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Xrootd site

(GSI)

A globalized cluster

ALICE

global

redirector

Local clients work

Normally at each site

Missing a file?

Ask to the global redirector

Get redirected to the rightcollaborating cluster, and fetch it.Immediately.

A smart client

could point here

Any other

Xrootd site

Xrootd site(CERN)

Cmsd

Xrootd

V

irtual

MassStorageSystem… built on data Globalization

More details and complete info in “Scalla/Xrootd WAN globalization tools: where we are.” @ CHEP09

Slide16

CASTOR2

Putting everything together @ Tier0/1s

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide17

The CASTOR way

Client connects to a redirector node

The redirector asks CASTOR where the file is

Client then connects directly to the node holding the dataCASTOR handles tapes in the backF.Furano (CERN IT-DM) - Xrootd usage @ LHC

Disk Servers

Redirector

A

B

C

Client

Open file X

Go to C

CASTOR

Where is X ?

On C

Tape backend

Trigger migration/recall

Credits:

S.Ponce

(IT-DM)

Slide18

CASTOR 2.1.8

Improving Latency - Read

1

st

focus on file (read) open latencies

Estimate

1

10

100

1000

ms

Castor 2.1.7

(rfio)

Castor 2.1.8

(xroot)

Castor 2.1.9

(xroot)

October 2008

Network Latency Limit

Read Open Latencies

Credits:

A.Peters

(IT-DM)

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide19

Estimate

CASTOR 2.1.8

Improving Latency – Metadata Read

Next focus on meta data (read) latencies

1

10

100

1000

ms

Castor 2.1.7

Castor 2.1.8

Castor 2.1.9

October 2008

Network Latency Limit

Stat Latencies

Credits:

A.Peters

(IT-DM)

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide20

Prototype - Architecture

XCFS Overview - xroot + FUSE

DATA FS

Authz

CLIENT

xcfsd

libXrdPosix

libXrdClient

/dev/fuse

VFS

Client Application

glibc

libfuse

FUSE LL Implementation

XROOT Posix Library

XROOT Client Library

Posix

access

to /

xcfs

(

i.e

. a

generic

application

)libXrdCatalogFsxrootdlibXrdSec<plugin>xrootdlibXrdSec<plugin>libXrdCatalogFs

xrootd

libXrdSec<plugin>

libXrdCatalogOfs

xrootd

libXrdSecUnix

XFS

FS

Name Space Provider

Meta Data Filesystem

libXrdCatalogAuthz

Strong Auth Plugin

xrootd server daemon

Remote

Access

Protocol

(ROOT

plugs

here

)

DISK SERVER

MD SERVER

Capability

Credits:

A.Peters

(IT-DM)

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide21

Early

Prototype

- Evaluation

Meta Data Performance

File Creation*

File

Rewrite

File

Read

Rm

Readdir/Stat

Access

~1.000/s

~2.400/s

~2.500/s

~3.000/sΣ = 70.000/s *These values have

been measured executing shell

commands on 216mount

clients. Creation

performance decreases with the filling

of the namespace on a spinning medium.

Using an XFS filesystem over a DRBD

blockdevicein a high-availability setup

file creation perfromance stabilizes at 400/s (20

Mio files in the namespace)

Credits: A.Peters (IT-DM)F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide22

Network usage (or waste!)

Network traffic is an important factor – it has to match the ratio IO(CPU Server) /

IO(Disk

Server)Too much unneeded traffic means fewer clients supported (serious bottleneck: 1 client works well, 100-1000 clients do not at all)Lustre doesn't disable readahead during forward-seeking access and transfers the complete file if reads are found in the buffer cache (

readahead

window starts with 1M and scales up to 40 M)

XCFS/LUSTRE/NFS4 network volume without read-ahead is based on 4k pages in Linux

Most of the requests are not page aligned and result in additional pages to be transferred (avg. read size 4k), hence they

xfer

twice as much data (but XCFS can skip this now!)

2nd execution plays no real role for analysis since datasets are usually bigger than client buffer cache

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Credits:

A.Peters

(IT-DM) – ACAT2008

Slide23

Why

is

that

useful

?

Users

can

access

data

by LFNwithout specification of the stagerUsers are automatically directed to 'their' pool with write permissions

CASTOR 2.1.8-6

Cross Pool Redirection

T3

Stager

T0

Stager

X

X

X

Manager

Server

Server

Meta Manager

Name

Space

T3 pool subscribed

r/w for /castor/user

r/w for /castor/cms/user/

T0 pool subscribed

ro for /castor

ro for /castor/cms/data

Example Configuration

There are even more possibilities

if a part of the namespace

can be assigned to individual pools

for write operations.

X

xrootd

cmsd

(cluster management)

Manager

Credits:

A.Peters

(IT-DM)

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide24

Towards

a

Production

VersionFurther Improvements – Security

GSI/VOMS

authentication plugin prototype developed

based on

pure

OpenSSL

using additionally code from mod_ssl & libgridsite

significantly faster than GLOBUS implementation

After Security Workshop with A.Hanushevsky

Virtual Socket Layer

introduced into xrootd authentication plugin base to allow socket oriented authentication over xrootd protocol layer

Final version should be based on OpenSSL and VOMS libraryVirtual Socket

Virtual

Socket

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide25

The roadmap

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide26

XROOT Roadmap @CERN

XROOT is strategic for scalable analysis support with CASTOR at CERN / T1s

will support other file access protocols until they become obsolete

CASTORSecure RFIO has been released in 2.1.8deployment impact in terms of CPU may be significantSecure XROOT is default in 2.1.8 (Kerb. or X509)

Expect to lower CPU cost than

rfio

due to session model

No plans to provide un-authenticated access via XROOT

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide27

XROOTD Roadmap

CASTOR

Secure RFIO has been released in 2.1.8

deployment impact in terms of CPU may be significantSecure XROOT is default in 2.1.8 (Kerb. or X509)Expect to lower CPU cost than rfio due to session model

No plans to provide un-authenticated access via XROOT

DPM

support for authentication via xrootd is scheduled start certification begin of July

dCache

Relies on a custom full re-implementation of XROOTD protocol

protocol docs have been updated by A.

Hanushevsky

in contact with CASTOR/DPM team to add authentication/

authorisation

on the server side

evaluating common client plug-in / security protocol

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide28

Conclusion

A very dense roadmap

Many, many tech details

Heading forSolid and high performance data accessFor production and analysisMore advanced user analysis scenariosNeed to match existing architectures, protocols and workarounds

F.Furano (CERN IT-DM) - Xrootd usage @ LHC

Slide29

Thank you

Questions?