/
The  LHC Computing Challenge The  LHC Computing Challenge

The LHC Computing Challenge - PowerPoint Presentation

evadeshell
evadeshell . @evadeshell
Follow
342 views
Uploaded On 2020-06-22

The LHC Computing Challenge - PPT Presentation

Tim Bell Fabric Infrastructure amp Operations Group Information Technology Department CERN 2 nd April 2009 1 The Four LHC Experiments ATLAS General purpose Origin of mass Supersymmetry ID: 782572

analysis data cern amp data analysis amp cern 000 computing tier lhc linux technology tape grid physics event disk

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "The LHC Computing Challenge" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The LHC Computing ChallengeTim BellFabric Infrastructure & Operations GroupInformation Technology DepartmentCERN2nd April 2009

1

Slide2

The Four LHC Experiments…ATLAS

General purpose

Origin of mass

Supersymmetry

2,000 scientists from 34 countries

CMS

General purpose Origin of mass Supersymmetry1,800 scientists from over 150 institutes

ALICE heavy ion collisions, to create quark-gluon plasmas 50,000 particles in each collision

LHCb

to study the differences between matter and antimatter

will detect over 100 million b and b-bar mesons each

year

Slide3

… generate lots of data …

The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments’ detectors

Slide4

… generate lots of data …reduced by online computers toa few hundred “good” eventsper second.

Which are recorded on disk and magnetic tape

at 100-1,000 MegaBytes/sec

~

15 PetaBytes

per year

for all four experiments

Slide5

simulationreconstruction

analysis

interactive

physics

analysis

batch

physics

analysis

detector

event

summary

data

raw

data

event

reprocessing

event

simulation

analysis objects

(extracted by physics topic)

Data Handling and Computation for Physics Analysis

event filter

(selection &

reconstruction)

processed

data

CERN

Slide6

… leading to a high box countCPUDisk

Tape

~2,500

PCs

Another

~1,500

boxes

Slide7

Computing Service HierarchyTier-0 – the accelerator centreData acquisition & initial processingLong-term data curationDistribution of data  Tier-1 centres

Canada – Triumf (Vancouver)

France – IN2P3 (Lyon)

Germany – Forschunszentrum

Karlsruhe

Italy – CNAF (Bologna)

Netherlands – NIKHEF/SARA (Amsterdam)

Nordic countries – distributed Tier-1

Spain – PIC (Barcelona)

Taiwan – Academia SInica (Taipei)

UK – CLRC (Oxford)

US – FermiLab (Illinois)

– Brookhaven (NY)

Tier-1 –

“online” to the data acquisition process  high availability

Managed Mass

Storage

Data-heavy analysis

National, regional support

Tier-2 – ~100 centres in ~40 countries

Simulation

End-user analysis – batch and interactive

Slide8

The GridTimely Technology!Deploy to meet LHC computing needs.Challenges for theWorldwideLHCComputingGrid Project due toworldwide naturecompeting middleware…newness of technologycompeting middleware…scale…

Slide9

Interoperability in action

Slide10

ReliabilitySite ReliabilityTier-2 Sites

83 Tier-2 sites being monitored

Slide11

1990s – Unix wars – 6 different Unix flavoursLinux allowed all users to align behind a single OS which was low cost and dynamicScientific Linux is based on Red Hat with extensions of key usability and performance featuresAFS global file systemXFS high performance file systemBut how to deploy without proprietary tools?Why Linux ?See EDG/WP4 report on current technology (http://cern.ch/hep-proj-grid-fabric/Tools/DataGrid-04-TED-0101-3_0.pdf) or “Framework for Managing Grid-enabled Large Scale Computing Fabrics”(http:/cern.ch/quattor

/documentation/poznanski-phd.pdf) for reviews of various packages.

Slide12

Commercial Management Suites(Full) Linux support rare (5+ years ago…)Much work needed to deal with specialist HEP applications; insufficient reduction in staff costs to justify license fees.Scalability5,000+ machines to be reconfigured1,000+ new machines per yearConfiguration change rate of 100s per dayDeploymentSee EDG/WP4 report on current technology (http://cern.ch/hep-proj-grid-fabric/Tools/DataGrid-04-TED-0101-3_0.pdf) or “Framework for Managing Grid-enabled Large Scale Computing Fabrics”(http:/cern.ch/quattor/documentation/poznanski-phd.pdf) for reviews of various packages.

Slide13

Dataflows and ratesRemember this figure

1430MB/s

700MB/s

1120MB/s

700MB/s

420MB/s

(1600MB/s)

(2000MB/s)

Averages! Need to be able to

support 2x for recovery!

Scheduled work only!

Slide14

15PB/year. Peak rate to tape >2GB/s3 full SL8500 robots/yearRequirement in first 5 years to reread all past data between runs60PB in 4 months: 6GB/sCan run drives at sustained 80MB/s75 drives flat out merely for controlled accessData Volume has interesting impact on choice of technologyMedia use is advantageous: high-end technology (3592, T10K) favoured over LTO.Volumes & Rates

Slide15

Castor Architecture

Tape Servers

TapeDaemon

Client

StagerJob

RTCPD

NameServer

VDQM

VMGR

Disk Servers

Mover

Mover

RH

RR

Scheduler

DB

Svc

Job

Svc

Qry

Svc

Error

Svc

Stager

MigHunter

GC

RTCPClientD

DB

Detailed view

Central Services

Disk cache subsystem

Tape archive subsystem

Slide16

Castor Performance16

Slide17

LEP, CERN’s last accelerator, started in 1989 and shutdown 10 years later.First data recorded to IBM 3480s; at least 4 different technologies used over the period.All data ever taken, right back to 1989, was reprocessed and reanalysed in 2001/2.LHC starts in 2007 and will run until at least 2020.What technologies will be in use in 2022 for the final LHC reprocessing and reanalysis?Data repacking required every 2-3 years.Time consumingData integrity must be maintainedLong lifetime

Slide18

Disk capacity & I/O rates199620001TB

2006

4GB

10MB/s

50GB

20MB/s

500GB

60MB/s

I/O

250x10MB/s

2,500MB/s

20x20MB/s

400MB/s

2x60MB/s

120MB/s

CERN now purchases two different storage server models: capacity oriented and throughput oriented.

fragmentation increases management complexity

(purchase overhead also increased…)

Slide19

Daily Backup volumes of around 18TB to 10 Linux TSM servers.. and backup – TSM on Linux

Slide20

Capacity Requirements

Slide21

Power Outlook

Slide22

Immense Challenges & ComplexityData rates, developing software, lack of standards, worldwide collaboration, …Considerable Progress in last ~5-6 yearsWLCG service existsPetabytes of data transferredBut more data is coming in November…Will the system cope with chaotic analysis?Will we understand the system enough to identify problems—and fix underlying causes ?Can we meet requirements given power available?Summary22