Tim Bell TimBellcernch noggin143 OpenStack Summit San Diego 17 th October 2012 What is CERN OpenStack Summit October 2012 Tim Bell CERN 2 Conseil E urop éen pour la R echerche ID: 778121
Download The PPT/PDF document "Accelerating Science with OpenStack" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Accelerating Sciencewith OpenStack
Tim BellTim.Bell@cern.ch@noggin143OpenStack Summit San Diego17th October 2012
Slide2What is CERN ?
OpenStack Summit October 2012Tim Bell, CERN2
Conseil
E
urop
éen
pour la
Recherche Nucléaire – aka European Laboratory for Particle PhysicsBetween Geneva and the Jura mountains, straddling the Swiss-French borderFounded in 1954 with an international treatyOur business is fundamental physics , what is the universe made of and how does it work
Slide3OpenStack Summit October 2012
Tim Bell, CERN3
Answering
fundamental
questions…
How to
explain
particles
have mass?
We
have
theories
and
accumulating
experimental
evidence
..
Getting
close…
What
is
96% of the
u
niverse
made of ?
We
can
only
see
4% of
its
estimated
mass!
Why
isn’t
there
anti-
matter
in the
universe
?
Nature
should
be
symmetric
…
What
was
the state of
matter
just
after
the «
Big
Bang » ?
Travelling back to the
earliest
instants of
the
u
niverse
would
help…
Slide4Community collaboration on an international scale
Tim Bell, CERN
4OpenStack Summit October 2012
Slide5The Large Hadron Collider
Tim Bell, CERN
5
OpenStack Summit October 2012
Slide66
The Large Hadron Collider (LHC) tunnel
OpenStack Summit October 2012
Tim Bell, CERN
Slide7OpenStack Summit October 2012
Tim Bell, CERN7
Slide8Accumulating events in 2009-2011
OpenStack Summit October 2012Tim Bell, CERN8
Slide9OpenStack Summit October 2012
Tim Bell, CERN9
Slide10Heavy Ion Collisions
OpenStack Summit October 2012Tim Bell, CERN10
Slide11OpenStack Summit October 2012
Tim Bell, CERN11
Slide12OpenStack Summit October 2012
Tim Bell, CERN12
Tier-1
(11
centres
):
Permanent storage
Re-processing
AnalysisTier-0 (CERN):Data recordingInitial data reconstructionData distributionTier-2 (~200 centres): Simulation
End-user
analysis
Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid
In a normal day, the grid provides 100,000 CPU days executing
over 2
million
jobs
Slide13OpenStack Summit October 2012
Tim Bell, CERN13
Data Centre by Numbers
Hardware installation & retirement
~7,000 hardware movements/year; ~1,800 disk failures/year
High Speed Routers
(640
Mbps → 2.4 Tbps)
24
Ethernet Switches
350
10 Gbps ports
2,000
Switching Capacity
4.8
Tbps
1 Gbps ports
16,939
10 Gbps ports
558
Racks
828
Servers
11,728
Processors
15,694
Cores
64,238
HEPSpec06
482,507
Disks
64,109
Raw disk capacity (
TiB
)
63,289
Memory modules
56,014
Memory capacity (
TiB
)
158
RAID controllers
3,749
Tape Drives
160
Tape Cartridges
45,000
Tape slots
56,000
Tape
Capacity (
TiB
)
73,000
IT Power Consumption
2,456 KW
Total
Power Consumption
3,890 KW
Slide14OpenStack Summit October 2012
Tim Bell, CERN14
Slide15Our Challenges - Data storage
OpenStack Summit October 2012Tim Bell, CERN15
>20 years retention
6GB/s average
25GB/s peaks
30
PB/year
to record
Slide16OpenStack Summit October 2012
Tim Bell, CERN16
45,000 tapes holding 73PB of physics data
Slide17New data centre to expand capacity
OpenStack Summit October 2012Tim Bell, CERN17
Data centre in Geneva at the limit of electrical capacity at 3.5MW
New centre chosen in Budapest,
Hungary
Additional 2.7MW of usable power
Hands
off facility
Deploying from 2013 with 200Gbit/s network to CERN
Slide18Time to change strategy
RationaleNeed to manage twice the servers as todayNo increase in staff numbersTools becoming increasingly brittle and will not scale as-isApproachCERN is no longer a special case for compute
Adopt an open source tool chain modelOur engineers rapidly iterateEvaluate solutions in the problem domain
Identify functional gaps and challenge
them
Select first choice but be prepared to change in future
Contribute new function back to the community
OpenStack Summit October 2012
Tim Bell, CERN18
Slide19Building Blocks
OpenStack Summit October 2012Tim Bell, CERN19
Bamboo
Koji, Mock
AIMS/PXE
Foreman
Yum repo
Pulp
Puppet-DB
mcollective
, yum
JIRA
Lemon /
Hadoop
git
OpenStack
Nova
Hardware database
Puppet
Active Directory /
LDAP
Slide20Training and Support
Buy the book rather than guru mentoringFollow the mailing lists to learnNewcomers are rapidly productive (and often know more than us)Community and Enterprise support means we’re not on our ownOpenStack Summit October 2012
Tim Bell, CERN
20
Slide21Staff Motivation
Skills valuable outside of CERN when an engineer’s contracts endOpenStack Summit October 2012Tim Bell, CERN
21
Slide22Prepare the move to the clouds
Improve operational efficiencyMachine ordering, reception and testingHardware interventions with long running programsMultiple operating system demandImprove resource efficiencyExploit idle resources, especially waiting for disk and tape I/OHighly variable load such as interactive or build machines
Enable cloud architecturesGradual migration to cloud interfaces and workflowsImprove responsiveness
Self-Service with coffee
break response time
OpenStack Summit October 2012
Tim Bell, CERN
22
Slide23Public Procurement Purchase Model
StepTime (Days)
Elapsed (Days)User expresses requirement
0
Market
Survey prepared
15
15
Market Survey for possible vendors3045Specifications prepared1560Vendor responses3090Test systems evaluated30120Offers adjudicated10130Finance committee30160Hardware delivered90250Burn in and acceptance
30
days typical 38
0 worst
case
280
Total
280+
Days
OpenStack Summit October 2012
Tim Bell, CERN
23
Slide24Service Model
OpenStack Summit October 2012Tim Bell, CERN24
Pets are given names
like pussinboots.cern.ch
They are unique, lovingly hand raised and cared for
When they get ill, you nurse them back to health
Cattle are given numbers like vm0042.cern.ch
They are almost identical to other cattle
When they get ill, you get another one
Future application
architectures
should use
Cattle but Pets with strong configuration management
are
viable and still needed
Slide25Supporting the Pets with OpenStack
NetworkInterfacing with legacy site DNS and IP managementEnsuring Kerberos identity before VM startPuppetEase use of configuration management tools with our usersExploit mcollective for
orchestration/delegationExternal Block StorageCurrently using nova-volume with Gluster
backing store
Live
migration to maximise availability
KVM live migration using
Gluster
KVM and Hyper-V block migrationOpenStack Summit October 2012Tim Bell, CERN25
Slide26Current Status of OpenStack at CERN
Working on an Essex code base from the EPEL repositoryExcellent experience with the Fedora cloud-sig teamCloud-init for contextualisation, oz for images with RHEL/FedoraComponents
Current focus is on Nova with KVM and Hyper-VTests with Swift are ongoing but require significant experiment code changes
Pre-productio
n facility with
around
150 Hypervisors, with 2000 VMs integrated with CERN infrastructure, Puppet deployed and used for simulation of magnet placement using
LHC@Home
and batchOpenStack Summit October 2012Tim Bell, CERN26
Slide27OpenStack Summit October 2012
Tim Bell, CERN27
Slide28When communities combine…
OpenStack’s many components and options make configuration complex out of the boxPuppet forge module from PuppetLabs does our configurationThe Foreman adds OpenStack provisioning for user kiosk to a configured machine in 15 minutesOpenStack Summit October 2012
Tim Bell, CERN
28
Slide29Foreman to manage Puppetized VM
OpenStack Summit October 2012Tim Bell, CERN29
Slide30Active Directory Integration
CERN’s Active DirectoryUnified identity management across the site44,000 users29,000 groups200 arrivals/departures per monthFull integration with Active Directory via LDAPUses the OpenLDAP
backend with some particular configuration settingsAim for minimal changes to
Active Directory
7 patches submitted around hard coded values and additional filtering
Now in use
in
our
pre-production instanceMap project roles (admins, members) to groupsDocumentation in the OpenStack wikiOpenStack Summit October 2012Tim Bell, CERN30
Slide31Welcome Back Hyper-V!
We currently use Hyper-V/System Centre for our server consolidation activitiesBut need to scale to 100x current installation sizeChoice of hypervisors should be tacticalPerformanceCompatibility/Support with integration componentsImage migration from legacy environments
CERN is working closely with the Hyper-V OpenStack teamPuppet to configure hypervisors on WindowsMost functions work well but further work on Console, Ceilometer, …
OpenStack Summit October 2012
Tim Bell, CERN
31
Slide32Opportunistic Clouds in online experiment farms
The CERN experiments have farms of 1000s of Linux servers close to the detectors to filter the 1PByte/s down to 6GByte/s to be recorded to tapeWhen the accelerator is not running, these machines are currently idleAccelerator has regular maintenance slots of several days
Long Shutdown due from March 2013-November 2014One of the experiments are
deploying
OpenStack on their farm
Simulation (low I/O, high CPU)
Analysis (high I/O, high CPU, high network)
OpenStack Summit October 2012
Tim Bell, CERN32
Slide33Federated European Clouds
Two significant European projects around Federated CloudsEuropean Grid Initiative Federated Cloud as a federation of grid sites providing IaaSHELiX Nebula European Union funded project to create a scientific cloud based on commercial providersOpenStack Summit October 2012
Tim Bell, CERN
33
EGI Federated Cloud Sites
CESGA
CESNET
INFN
SARACyfronetFZ JülichSZTAKI
IPHC
GRIF
GRNET
KTH
Oxford
GWDG
IGI
TCD
IN2P3
STFC
Slide34Federated Cloud Commonalities
Basic building blocksEach site gives an IaaS endpoint with an API and common security policyOCCI? CDMI ? Libcloud ? Jclouds ?Image stores available across the sites Federated identity management based on X.509
certificatesConsolidation of accounting information to validate pledges and usageMultiple cloud technologies
OpenStack
OpenNebula
Proprietary
OpenStack Summit October 2012
Tim Bell, CERN
34
Slide35Next Steps
Deploy into production at the start of 2013 with Folsom running the Grid software on top of OpenStack IaaSSupport multi-site operations with 2nd data centre in HungaryExploit
new functionalityCeilometer for meteringBare metal for non-virtualised use cases such as high I/O servers
X.509
user certificate authentication
Load balancing as a
service
Ramping to
15,000 hypervisors with 100,000 to 300,000 VMs by 2015 OpenStack Summit October 2012Tim Bell, CERN35
Slide36What are we missing (or haven’t found yet) ?
Best practice forMonitoring and KPIs as part of core functionalityGuest disaster recoveryMigration between versions of OpenStackRoles within multi-user projects
VM owner allowed to manage their own resources (start/stop/delete)Project admins allowed to manage all resourcesOther members should not have high rights over other members VMs
Global quota
management for non-elastic private cloud
Manage
resource prioritisation and allocation
centrally
Capacity management / utilisation for planningOpenStack Summit October 2012Tim Bell, CERN36
Slide37Conclusions
Production at CERN in next few months on FolsomOur emphasis will shift to focus on stabilityIntegrate CERN legacy integrations via formal user exitsWork together with others on scaling improvementsCommunity is key to shared success
Our problems are often resolved before we raise themPackaging teams are producing reliable builds promptly
CERN
contributes
and
benefits
Thanks
to everyone for their efforts and enthusiasmNot just code but documentation, tests, blogs, …OpenStack Summit October 2012Tim Bell, CERN37
Slide38Slide39References
OpenStack Summit October 2012Tim Bell, CERN39
CERN
http://public.web.cern.ch/public/
Scientific Linux
http://www.scientificlinux.org/
Worldwide LHC Computing Grid
http://lcg.web.cern.ch/lcg/
http://rtm.hep.ph.ic.ac.uk/Jobshttp://cern.ch/jobsDetailed Report on Agile Infrastructurehttp://cern.ch/go/N8wpHELiX Nebulahttp://helix-nebula.eu/EGI Cloud Taskforcehttps://wiki.egi.eu/wiki/Fedcloud-tf
Slide40Backup Slides
OpenStack Summit October 2012Tim Bell, CERN40
Slide41OpenStack Summit October 2012
Tim Bell, CERN41
Slide42CERN’s tools
The world’s most powerful accelerator: LHCA 27 km long tunnel filled with high-tech instrumentsEquipped with thousands of superconducting magnetsAccelerates particles to energies never before obtainedProduces particle collisions creating microscopic “big bangs”
Very large sophisticated detectorsFour experiments each the size of a cathedral
Hundred million measurement channels each
Data acquisition systems treating Petabytes per second
Top level
computing
to distribute and analyse the data
A Computing Grid linking ~200 computer centres around the globeSufficient computing power and storage to handle 25 Petabytes per year, making them available to thousands of physicists for analysisOpenStack Summit October 2012Tim Bell, CERN42
Slide43Our Infrastructure
Hardware is generally based on commodity, white-box serversOpen tendering process based on SpecInt/CHF, CHF/Watt and GB/CHFCompute nodes typically dual processor, 2GB per coreBulk storage on 24x2TB disk storage-in-a-box with a RAID cardVast majority of servers run Scientific Linux, developed by Fermilab and
CERN, based on Redhat EnterpriseFocus is on stability in view of the number of centres on the WLCG
OpenStack Summit October 2012
Tim Bell, CERN
43
Slide44New architecture data flows
OpenStack Summit October 2012Tim Bell, CERN44
Slide45Virtualisation on SCVMM/Hyper-V
OpenStack Summit October 2012Tim Bell, CERN45
Slide46Scaling up with Puppet and OpenStack
Use LHC@Home based on BOINC for simulating magnetics guiding particles around the LHCNaturally, there is a puppet module puppet-boinc1000 VMs spun up to stress test the hypervisors with Puppet, Foreman and OpenStack
OpenStack Summit October 2012
Tim Bell, CERN
46