/
Accelerating Science with OpenStack Accelerating Science with OpenStack

Accelerating Science with OpenStack - PowerPoint Presentation

haroublo
haroublo . @haroublo
Follow
342 views
Uploaded On 2020-06-15

Accelerating Science with OpenStack - PPT Presentation

Tim Bell TimBellcernch noggin143 OpenStack Summit San Diego 17 th October 2012 What is CERN OpenStack Summit October 2012 Tim Bell CERN 2 Conseil E urop éen pour la R echerche ID: 778121

summit openstack bell october openstack summit october bell cern tim 2012 2012tim data 000 cloud high puppet grid capacity

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Accelerating Science with OpenStack" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Accelerating Sciencewith OpenStack

Tim BellTim.Bell@cern.ch@noggin143OpenStack Summit San Diego17th October 2012

Slide2

What is CERN ?

OpenStack Summit October 2012Tim Bell, CERN2

Conseil

E

urop

éen

pour la

Recherche Nucléaire – aka European Laboratory for Particle PhysicsBetween Geneva and the Jura mountains, straddling the Swiss-French borderFounded in 1954 with an international treatyOur business is fundamental physics , what is the universe made of and how does it work

Slide3

OpenStack Summit October 2012

Tim Bell, CERN3

Answering

fundamental

questions…

How to

explain

particles

have mass?

We

have

theories

and

accumulating

experimental

evidence

..

Getting

close…

What

is

96% of the

u

niverse

made of ?

We

can

only

see

4% of

its

estimated

mass!

Why

isn’t

there

anti-

matter

in the

universe

?

Nature

should

be

symmetric

What

was

the state of

matter

just

after

the « 

Big

Bang » ?

Travelling back to the

earliest

instants of

the

u

niverse

would

help…

Slide4

Community collaboration on an international scale

Tim Bell, CERN

4OpenStack Summit October 2012

Slide5

The Large Hadron Collider

Tim Bell, CERN

5

OpenStack Summit October 2012

Slide6

6

The Large Hadron Collider (LHC) tunnel

OpenStack Summit October 2012

Tim Bell, CERN

Slide7

OpenStack Summit October 2012

Tim Bell, CERN7

Slide8

Accumulating events in 2009-2011

OpenStack Summit October 2012Tim Bell, CERN8

Slide9

OpenStack Summit October 2012

Tim Bell, CERN9

Slide10

Heavy Ion Collisions

OpenStack Summit October 2012Tim Bell, CERN10

Slide11

OpenStack Summit October 2012

Tim Bell, CERN11

Slide12

OpenStack Summit October 2012

Tim Bell, CERN12

Tier-1

(11

centres

):

Permanent storage

Re-processing

AnalysisTier-0 (CERN):Data recordingInitial data reconstructionData distributionTier-2 (~200 centres): Simulation

End-user

analysis

Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid

In a normal day, the grid provides 100,000 CPU days executing

over 2

million

jobs

Slide13

OpenStack Summit October 2012

Tim Bell, CERN13

Data Centre by Numbers

Hardware installation & retirement

~7,000 hardware movements/year; ~1,800 disk failures/year

High Speed Routers

(640

Mbps → 2.4 Tbps)

24

Ethernet Switches

350

10 Gbps ports

2,000

Switching Capacity

4.8

Tbps

1 Gbps ports

16,939

10 Gbps ports

558

Racks

828

Servers

11,728

Processors

15,694

Cores

64,238

HEPSpec06

482,507

Disks

64,109

Raw disk capacity (

TiB

)

63,289

Memory modules

56,014

Memory capacity (

TiB

)

158

RAID controllers

3,749

Tape Drives

160

Tape Cartridges

45,000

Tape slots

56,000

Tape

Capacity (

TiB

)

73,000

IT Power Consumption

2,456 KW

Total

Power Consumption

3,890 KW

Slide14

OpenStack Summit October 2012

Tim Bell, CERN14

Slide15

Our Challenges - Data storage

OpenStack Summit October 2012Tim Bell, CERN15

>20 years retention

6GB/s average

25GB/s peaks

30

PB/year

to record

Slide16

OpenStack Summit October 2012

Tim Bell, CERN16

45,000 tapes holding 73PB of physics data

Slide17

New data centre to expand capacity

OpenStack Summit October 2012Tim Bell, CERN17

Data centre in Geneva at the limit of electrical capacity at 3.5MW

New centre chosen in Budapest,

Hungary

Additional 2.7MW of usable power

Hands

off facility

Deploying from 2013 with 200Gbit/s network to CERN

Slide18

Time to change strategy

RationaleNeed to manage twice the servers as todayNo increase in staff numbersTools becoming increasingly brittle and will not scale as-isApproachCERN is no longer a special case for compute

Adopt an open source tool chain modelOur engineers rapidly iterateEvaluate solutions in the problem domain

Identify functional gaps and challenge

them

Select first choice but be prepared to change in future

Contribute new function back to the community

OpenStack Summit October 2012

Tim Bell, CERN18

Slide19

Building Blocks

OpenStack Summit October 2012Tim Bell, CERN19

Bamboo

Koji, Mock

AIMS/PXE

Foreman

Yum repo

Pulp

Puppet-DB

mcollective

, yum

JIRA

Lemon /

Hadoop

git

OpenStack

Nova

Hardware database

Puppet

Active Directory /

LDAP

Slide20

Training and Support

Buy the book rather than guru mentoringFollow the mailing lists to learnNewcomers are rapidly productive (and often know more than us)Community and Enterprise support means we’re not on our ownOpenStack Summit October 2012

Tim Bell, CERN

20

Slide21

Staff Motivation

Skills valuable outside of CERN when an engineer’s contracts endOpenStack Summit October 2012Tim Bell, CERN

21

Slide22

Prepare the move to the clouds

Improve operational efficiencyMachine ordering, reception and testingHardware interventions with long running programsMultiple operating system demandImprove resource efficiencyExploit idle resources, especially waiting for disk and tape I/OHighly variable load such as interactive or build machines

Enable cloud architecturesGradual migration to cloud interfaces and workflowsImprove responsiveness

Self-Service with coffee

break response time

OpenStack Summit October 2012

Tim Bell, CERN

22

Slide23

Public Procurement Purchase Model

StepTime (Days)

Elapsed (Days)User expresses requirement

0

Market

Survey prepared

15

15

Market Survey for possible vendors3045Specifications prepared1560Vendor responses3090Test systems evaluated30120Offers adjudicated10130Finance committee30160Hardware delivered90250Burn in and acceptance

30

days typical 38

0 worst

case

280

Total

280+

Days

OpenStack Summit October 2012

Tim Bell, CERN

23

Slide24

Service Model

OpenStack Summit October 2012Tim Bell, CERN24

Pets are given names

like pussinboots.cern.ch

They are unique, lovingly hand raised and cared for

When they get ill, you nurse them back to health

Cattle are given numbers like vm0042.cern.ch

They are almost identical to other cattle

When they get ill, you get another one

Future application

architectures

should use

Cattle but Pets with strong configuration management

are

viable and still needed

Slide25

Supporting the Pets with OpenStack

NetworkInterfacing with legacy site DNS and IP managementEnsuring Kerberos identity before VM startPuppetEase use of configuration management tools with our usersExploit mcollective for

orchestration/delegationExternal Block StorageCurrently using nova-volume with Gluster

backing store

Live

migration to maximise availability

KVM live migration using

Gluster

KVM and Hyper-V block migrationOpenStack Summit October 2012Tim Bell, CERN25

Slide26

Current Status of OpenStack at CERN

Working on an Essex code base from the EPEL repositoryExcellent experience with the Fedora cloud-sig teamCloud-init for contextualisation, oz for images with RHEL/FedoraComponents

Current focus is on Nova with KVM and Hyper-VTests with Swift are ongoing but require significant experiment code changes

Pre-productio

n facility with

around

150 Hypervisors, with 2000 VMs integrated with CERN infrastructure, Puppet deployed and used for simulation of magnet placement using

LHC@Home

and batchOpenStack Summit October 2012Tim Bell, CERN26

Slide27

OpenStack Summit October 2012

Tim Bell, CERN27

Slide28

When communities combine…

OpenStack’s many components and options make configuration complex out of the boxPuppet forge module from PuppetLabs does our configurationThe Foreman adds OpenStack provisioning for user kiosk to a configured machine in 15 minutesOpenStack Summit October 2012

Tim Bell, CERN

28

Slide29

Foreman to manage Puppetized VM

OpenStack Summit October 2012Tim Bell, CERN29

Slide30

Active Directory Integration

CERN’s Active DirectoryUnified identity management across the site44,000 users29,000 groups200 arrivals/departures per monthFull integration with Active Directory via LDAPUses the OpenLDAP

backend with some particular configuration settingsAim for minimal changes to

Active Directory

7 patches submitted around hard coded values and additional filtering

Now in use

in

our

pre-production instanceMap project roles (admins, members) to groupsDocumentation in the OpenStack wikiOpenStack Summit October 2012Tim Bell, CERN30

Slide31

Welcome Back Hyper-V!

We currently use Hyper-V/System Centre for our server consolidation activitiesBut need to scale to 100x current installation sizeChoice of hypervisors should be tacticalPerformanceCompatibility/Support with integration componentsImage migration from legacy environments

CERN is working closely with the Hyper-V OpenStack teamPuppet to configure hypervisors on WindowsMost functions work well but further work on Console, Ceilometer, …

OpenStack Summit October 2012

Tim Bell, CERN

31

Slide32

Opportunistic Clouds in online experiment farms

The CERN experiments have farms of 1000s of Linux servers close to the detectors to filter the 1PByte/s down to 6GByte/s to be recorded to tapeWhen the accelerator is not running, these machines are currently idleAccelerator has regular maintenance slots of several days

Long Shutdown due from March 2013-November 2014One of the experiments are

deploying

OpenStack on their farm

Simulation (low I/O, high CPU)

Analysis (high I/O, high CPU, high network)

OpenStack Summit October 2012

Tim Bell, CERN32

Slide33

Federated European Clouds

Two significant European projects around Federated CloudsEuropean Grid Initiative Federated Cloud as a federation of grid sites providing IaaSHELiX Nebula European Union funded project to create a scientific cloud based on commercial providersOpenStack Summit October 2012

Tim Bell, CERN

33

EGI Federated Cloud Sites

CESGA

CESNET

INFN

SARACyfronetFZ JülichSZTAKI

IPHC

GRIF

GRNET

KTH

Oxford

GWDG

IGI

TCD

IN2P3

STFC

Slide34

Federated Cloud Commonalities

Basic building blocksEach site gives an IaaS endpoint with an API and common security policyOCCI? CDMI ? Libcloud ? Jclouds ?Image stores available across the sites Federated identity management based on X.509

certificatesConsolidation of accounting information to validate pledges and usageMultiple cloud technologies

OpenStack

OpenNebula

Proprietary

OpenStack Summit October 2012

Tim Bell, CERN

34

Slide35

Next Steps

Deploy into production at the start of 2013 with Folsom running the Grid software on top of OpenStack IaaSSupport multi-site operations with 2nd data centre in HungaryExploit

new functionalityCeilometer for meteringBare metal for non-virtualised use cases such as high I/O servers

X.509

user certificate authentication

Load balancing as a

service

Ramping to

15,000 hypervisors with 100,000 to 300,000 VMs by 2015 OpenStack Summit October 2012Tim Bell, CERN35

Slide36

What are we missing (or haven’t found yet) ?

Best practice forMonitoring and KPIs as part of core functionalityGuest disaster recoveryMigration between versions of OpenStackRoles within multi-user projects

VM owner allowed to manage their own resources (start/stop/delete)Project admins allowed to manage all resourcesOther members should not have high rights over other members VMs

Global quota

management for non-elastic private cloud

Manage

resource prioritisation and allocation

centrally

Capacity management / utilisation for planningOpenStack Summit October 2012Tim Bell, CERN36

Slide37

Conclusions

Production at CERN in next few months on FolsomOur emphasis will shift to focus on stabilityIntegrate CERN legacy integrations via formal user exitsWork together with others on scaling improvementsCommunity is key to shared success

Our problems are often resolved before we raise themPackaging teams are producing reliable builds promptly

CERN

contributes

and

benefits

Thanks

to everyone for their efforts and enthusiasmNot just code but documentation, tests, blogs, …OpenStack Summit October 2012Tim Bell, CERN37

Slide38

Slide39

References

OpenStack Summit October 2012Tim Bell, CERN39

CERN

http://public.web.cern.ch/public/

Scientific Linux

http://www.scientificlinux.org/

Worldwide LHC Computing Grid

http://lcg.web.cern.ch/lcg/

http://rtm.hep.ph.ic.ac.uk/Jobshttp://cern.ch/jobsDetailed Report on Agile Infrastructurehttp://cern.ch/go/N8wpHELiX Nebulahttp://helix-nebula.eu/EGI Cloud Taskforcehttps://wiki.egi.eu/wiki/Fedcloud-tf

Slide40

Backup Slides

OpenStack Summit October 2012Tim Bell, CERN40

Slide41

OpenStack Summit October 2012

Tim Bell, CERN41

Slide42

CERN’s tools

The world’s most powerful accelerator: LHCA 27 km long tunnel filled with high-tech instrumentsEquipped with thousands of superconducting magnetsAccelerates particles to energies never before obtainedProduces particle collisions creating microscopic “big bangs”

Very large sophisticated detectorsFour experiments each the size of a cathedral

Hundred million measurement channels each

Data acquisition systems treating Petabytes per second

Top level

computing

to distribute and analyse the data

A Computing Grid linking ~200 computer centres around the globeSufficient computing power and storage to handle 25 Petabytes per year, making them available to thousands of physicists for analysisOpenStack Summit October 2012Tim Bell, CERN42

Slide43

Our Infrastructure

Hardware is generally based on commodity, white-box serversOpen tendering process based on SpecInt/CHF, CHF/Watt and GB/CHFCompute nodes typically dual processor, 2GB per coreBulk storage on 24x2TB disk storage-in-a-box with a RAID cardVast majority of servers run Scientific Linux, developed by Fermilab and

CERN, based on Redhat EnterpriseFocus is on stability in view of the number of centres on the WLCG

OpenStack Summit October 2012

Tim Bell, CERN

43

Slide44

New architecture data flows

OpenStack Summit October 2012Tim Bell, CERN44

Slide45

Virtualisation on SCVMM/Hyper-V

OpenStack Summit October 2012Tim Bell, CERN45

Slide46

Scaling up with Puppet and OpenStack

Use LHC@Home based on BOINC for simulating magnetics guiding particles around the LHCNaturally, there is a puppet module puppet-boinc1000 VMs spun up to stress test the hypervisors with Puppet, Foreman and OpenStack

OpenStack Summit October 2012

Tim Bell, CERN

46