2 Arne Wiebalck CERN IT Job Efficiency Meeting June 5 2015 Arne Wiebalck Update on Cloud Performance Optimisations IOwait Recap ALICE identified a high fraction of their jobs spent time in IOwait ID: 493660
Download Presentation The PPT/PDF document "Update on Cloud Performance Optimisation..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1Slide2
Update on Cloud Performance Optimisations
2
Arne WiebalckCERN ITJob Efficiency MeetingJune 5, 2015
Arne Wiebalck – Update on Cloud Performance OptimisationsSlide3
IOwait Recap
ALICE identified a high fraction of their jobs spent time in IOwait
Reduction of swapping by reduction of job slots (see Jérôme’s slide deck last time)IOwait better for extreme cases, but still significant background 3
Arne Wiebalck – Update on Cloud Performance OptimisationsSlide4
Small VMs vs Big VMs
4
Arne Wiebalck – Update on Cloud Performance OptimisationsSlide5
KVM caching
5
Default: I/O from the
V
M
goes
directly to the
disk
Required for live migration
Not optimal for
performance
~100 IOPS
‘write-back’: I/O from theVM goes to the hypervisor’spage/buffer cacheSeveral 1000 IOPS (short term)
Disk
hypervisor
Impact on
ATLAS SAM VM
(‘none’ to ‘write-back’)
pa
ge & buffer cache
VM1
VM2
Arne Wiebalck – Update on Cloud Performance Optimisations
‘none’
‘write-back’Slide6
Impact on batch
6
Arne Wiebalck – Update on Cloud Performance Optimisations
Enabled with write-back on all “compute” cells.
KVM caching
disabled:
[b6ff7c5cf5
~]#
fio
--name xyz --
rw
=
randwrite
--size=128M --direct=1
…
write
:
io
=6288.0KB,
bw
=325181B/s,
iops=79, runt= 19801msec
KVM caching enabled:
[b6ff7c5cf5 ~]# fio --name xyz --rw=
randwrite --size=128M --direct=1…write: io=131072KB, bw
=22935KB/s, iops=5733 , runt= 5715msecSlide7
Impact on ALICE
7
Arne Wiebalck – Update on Cloud Performance OptimisationsSlide8
Ongoing: CPU optimizations
OpenStack host mode: pass-through
Compute VMs have access to all processor features (done) numad on, ksm off,ept
off
Up to 10% HEPSpec06
On two batch cells already, being rolled out on the third, 5 more to go (ongoing)
SLC6
CC7.1
2..20% increase in HEPSpec06 observed (ongoing)
8
Arne Wiebalck – Update on Cloud Performance Optimisations
CPU pinning
≈
numad
? (under investigation)
Huge tables
Reported to have big performance impact
Relation to
ept
?
Needs to be looked at
Containers
Performance results promisingIntegration in cloud service?Long term optionSlide9