in CERNITs FIO group Helge Meinhard CERNIT HEPiX Fall 2009 LBNL 27 October 2009 Outline Followup of two presentations in Umea meeting iSCSI technology Andras Horvath Lustre ID: 789220
Download The PPT/PDF document "R&D Activities on Storage" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
R&D Activities on Storagein CERN-IT’s FIO group
Helge
Meinhard
/ CERN-IT
HEPiX
Fall 2009 LBNL
27 October 2009
Slide2Outline
Follow-up of two presentations in Umea meeting:
iSCSI
technology (Andras Horvath)Lustre evaluation project (Arne Wiebalck)
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide3iSCSI - Motivation
Three approaches
Possible replacement for rather expensive setups with
Fibre Channel SANs (used e.g. for physics databases with Oracle RAC, and for backup infrastructure) or proprietary high-end NAS appliancesPotential cost-savingPossible replacement for bulk disk servers (Castor)
Potential gain in availability, reliability and flexibility
Possible use for applications, for which small disk servers have been used in the past
Potential gain in flexibility, cost-savingFocus is functionality, robustness and large-scale deployment rather than ultimate performance
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide4iSCSI terminology
iSCSI
is a set of protocols for block-level access to storage
Similar to FCUnlike NAS (e.g. NFS)“Target”: storage unit listening to block-level requestsAppliances available on the market
Do-it-yourself: put software stack on storage node, e.g. our storage-in-a-box nodes
“Initiator”: unit sending block-level requests (e.g. read, write) to the target
Most modern operating systems feature an iSCSI initiator stack: Linux RH4, RH5; Windows
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide5Hardware used
Initiators: number of different servers including
Dell M610 blades
Storage-in-a-box serverAll running SLC5Targets:Dell Equallogic PS5000E (12 drives, 2 controllers with 3
GigE
each)
Dell Equallogic PS6500E (48 drives, 2 controllers with 4 GigE each)
Infortrend
A12E-G2121 (12 drives, 1 controller with 2
GigE
)
Storage-in-a-box: Various models with multiple
GigE or 10GigE interfaces, running LinuxNetwork (if required): private, HP ProCurve 3500 and 6600
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide6Target stacks under Linux
RedHat
Enterprise 5 comes with
tgtdSingle-threadedDoes not scale wellTests with IETMulti-threadedNo performance limitation in our tests
Required newer kernel to work out of the box (Fedora and
Ubuntu
server worked for us) In context of collaboration between CERN and Caspur, work going on to understand the steps to be taken for
backporting
IET to RHEL 5
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide7Performance comparison
8k random I/O test with Oracle tool Orion
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide8Performance measurement
1 server, 3 storage-in-a-box servers as targets
Each target exporting 14 JBOD disks over 10GigE
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide9Almost production status…
Two storage-in-a-box servers with hardware RAID5 running SLC5 and
tgtd
on GigEInitiator provides multipathing and software RAID 1
Used for some grid services
No issues
Two Infortrend boxes (JBOD configuration)Again, initiator provides multipathing and software RAID 1
Used as backend storage for
Lustre
MDT (see next part)
Tools for setup, configuration and monitoring in place
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide10Being worked on
Large deployment of
Equallogic
‘Sumos’ (48 drives of 1 TB each, dual controllers, 4 GigE/controller): 24 systems, 48 front-end nodesExperience encouraging, but there are issuesControllers don’t support DHCP, manual
config
required
Buggy firmwareProblems with batteries on controllersSupport not fully integrated into Dell structures yetRemarkable stability
We have failed all network and server components that can fail, the boxes kept running
Remarkable performance
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide11Equallogic performance
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
16 servers, 8
sumos, 1
GigE
per server,
iozone
Slide12Appliances vs. home-made
Appliances
Stable
PerformantHighly functional (Equallogic: snapshots, relocation without server involvement, automatic load balancing, …)Home-made with storage-in-a-box servers
Inexpensive
Complete control over configuration
Can run other things than target software stackCan select function at software install time (iSCSI target vs. classical disk server with
rfiod
or
xrootd
)
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide13Ideas (partly started testing)
Two storage-in-a box servers as highly redundant setup
Running target and initiator stacks at the same time
Mounting half the disks local, half on the other machineSome heartbeat detects failures and (e.g. by resetting an IP alias) moves functionality to one or the other boxSeveral storage-in-a-box servers as targetsExporting disks either as JBOD or as RAID
Front-end server creates software RAID (e.g. RAID 6) over volumes from all storage-in-a-box servers
Any one (or two with SW RAID 6) storage-in-a-box server can fail entirely, the data remain available
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide14Lustre Evaluation Project
Tasks and goals
Evaluate
Lustre as a candidate for storage consolidationHome directoriesProject spaceAnalysis spaceHSM
Reduce service catalogue
Increase overlap between service teams
Integrate with CERN fabric management tools
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide15Areas of interest (1)
Installation
Quattorized
installation of Lustre instancesClient RPMs for SLC5Backup
LVM-based snapshots for meta data
Tested with TSM, set up for PPS instance
Changelogs feature of v2.0 not yet usableStrong Authentication
v2.0: early adaptation, full Kerberos Q1/2011
Tested & used by other sites (not by us yet)
Fault-tolerance
Lustre
comes with built-in failover
PPS MDS
iSCSI
setup
Storage R&D at CERN IT-FIO – Helge Meinhard at cern.ch – 27-Oct-2009
Slide16FT: MDS PPS Setup
Dell
Equallogic
iSCSI
Arrays
16x 500GB SATA
Dell
PowerEdge
M600
Blade Server 16GB
Private
iSCSI
Network
MDS
MDT
OSS
OSS
CLT
Fully redundant against component failure
iSCSI for shared storage
Linux device mapper + md for mirroring
Quattorized
Needs testing
Slide17Areas of Interest (2/2)
Special performance & Optimization
Small files: „Numbers dropped from slides“
Postmark benchmark (not done yet)HSM interfaceActive developement, driven by CEA
Access to Lustre HSM code (to be tested with TSM/CASTOR)
Life Cycle Management (LCM) & Tools
Support for day-to-day operations?Limited support for setup, monitoring and management
Slide18Findings and Thoughts
No strong authentication as of now
Foreseen for Q1/2011
Strong client/server couplingRecoveryVery powerful usersStriping, PoolsMissing support for life cycle managementNo user transparent data migration
Lustre/Kernel upgrades difficult
Moving targets on the roadmap
V2.0 not yet stable enough for testing
Slide19Summary
Some desirable features not there (yet)
Wish list communicated to SUN
SUN interested in evaluationSome more tests to be doneKerberos, Small files, HSMDocumentation