Improve Your Cloud Performance at Your Neighbors Expense Venkatanathan Varadarajan Thawan Kooburat Benjamin Farley Thomas Ristenpart and Michael Swift 1 Department of Computer Sciences ID: 290024
Download Presentation The PPT/PDF document "Resource-Freeing Attacks:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Resource-Freeing Attacks:Improve Your Cloud Performance(at Your Neighbor's Expense)
(Venkat)anathan Varadarajan, Thawan Kooburat, Benjamin Farley, Thomas Ristenpart, and Michael Swift
1
Department of Computer SciencesSlide2
Public Clouds (EC2, Azure, Rackspace, …)
VM
Multi-tenancy
Different customers’ virtual machines (VMs) share same server
Why multi-tenancy?
Improved resource utilization
VM
VM
VM
VM
VM
2
VMSlide3
Implications of Multi-tenancy
VMs share many resourcesCPU, cache, memory, disk, network, etc.Virtual Machine Managers (VMM) Goal: Provide IsolationDeployed VMMs don’t perfectly isolate VMsSide-channels [Ristenpart
et al. ’09, Zhang et al. ’12]
3
Today: Performance degraded by other customers
VM
VM
VMMSlide4
Contention in Xen4
Local Xen TestbedMachineIntel Xeon E5430, 2.66 GhzCPU
2 packages each with 2 cores
Cache Size6MB per package
VM
VM
Non-work-conserving
CPU scheduling
Work-conserving
scheduling
3x-6x Performance loss
Higher costSlide5
This work: Greedy customer can recover performance by interfering with other tenantsResource-Freeing Attack
What can a tenant do?
5
Pack up VM and move
(See our SOCC 2012 paper)
… but, not all workloads cheap
to move
VM
VM
Ask provider for better isolation
… requires overhaul of the cloud Slide6
Resource-freeing attacks (RFAs)What is an RFA? RFA case studiesTwo highly loaded web server VMsLast Level Cache (LLC) bound VM andhighly loaded webserver VMDemonstration on Amazon EC2
6Slide7
The Setting
Victim:One or more VMsPublic interface (eg, http)Beneficiary:VM whose performance we want to improveHelper:Mounts the attack
Beneficiary and victim fighting over
a target resourceHelper
7
VM
VM
Victim
BeneficiarySlide8
Example: Network Contention Beneficiary & VictimApache webservers hosting static and dynamic (CGI) web pages.
Target Resource: Network Bandwidth Work-conserving scheduler network bandwidth
8
Net
Clients
What can you do?
Victim
Beneficiary
Local Xen Test bedSlide9
Ways to Reduce
Contention?Break into victim VM and disable it
9
Net
Clients
Local Xen Test bed
But:
Requires knowledge of vulnerability
Drastic
Easy to detect
Helper
Victim
Beneficiary
The good:
frees up
resources used by victimSlide10
Ways to Reduce Contention?Do a simple DoS attack?This may NOT free up target resources
10Net
Clients
Local Xen Test bed
Backfires:
May increase the contention
Helper
SYN flood
Victim
BeneficiarySlide11
Recipe for a Successful RFAShift resource away from the target resource towards the bottleneck resource
11
Shift resource usage via public interface
Proportion of Network usage
CPU intensive dynamic pages
Static pages
Proportion of CPU usage
Push towards CPU bottleneck
Reduce target resource usage
LimitsSlide12
An RFA in Our Example12
Net
Helper
CGI Request
CPU Utilization
Clients
Result in our
testbed
:
Increases
beneficiary’s
share of
bandwidth
No RFA: 1800 page requests/sec
W/ RFA: 3026 page requests/sec
50%
85%
share of bandwidthSlide13
Shared CPU Cache:Ubiquitous: Almost all workloads need cacheHardware controlled: Not easily isolated via softwarePerformance Sensitive: High performance cost!13
Resource-freeing attacks 1) Send targeted requests to victim 2) Shift resources use from target to a bottleneckCan we mount RFAs when targetresource is CPU cache?Slide14
Cache Contention
14
RFA GoalSlide15
Case Study: Cache vs. NetworkVictim : Apache webserver hosting static and dynamic (CGI) web pagesBeneficiary: Synthetic cache bound workload (LLCProbe)Target Resource: CacheNo cache isolation:
~3x slower when sharing cache with webserver15
Net
Cache
$$$
Clients
Local Xen Test bed
Victim
Beneficiary
Core
CoreSlide16
Net
Cache vs. Network
Victim webserver frequently
interrupts, pollutes the cacheReason: Xen gives higher priority to VM consuming less CPU time
Cache
16
Clients
$$$
Cache state time line
Beneficiary starts to run
Core
Core
decreased cache efficiency
Webserver
receives a request
Heavily loaded web server
cache
stateSlide17
Net
Cache vs. Network w/ RFA
RFA helps in two ways:
Webserver
loses
its priority
.
Reducing the capacity of webserver
.
Cache
17
Clients
$$$
Cache state time line
Core
Core
Helper
Heavily loaded webserver requests under RFA
CGI Request
Beneficiary starts to run
Webserver
receives a request
Heavily loaded web server
cache
stateSlide18
RFA: Performance Improvement
18
RFA intensities – time in
ms
per second
196% slowdown
86% slowdown
60%
Perf
ormance
ImprovementSlide19
RFA Effect on InterruptionsBeneficiary: LLCProbe
19
40%
85%
x
+Slide20
RFA Effect on Victim’s capacity
Decreases with increasing RFA intensity20Slide21
Instance typem1.small# of co-resident pairs9 (23 total instances)
Machine typeIntel Xeon E5507 with 4MB LLC
Experiments
on Amazon EC2
VM
VM
VM
VM
21
VM
Multiple Accounts
Co-resident VMs from our accounts:
Stand-ins for
victim
and
beneficiary
Separate instances for helper and web clients
No
direct
interact with any
other customers
Indirect interaction akin to
normal usage cases
VMSlide22
LLCProbe Synthetic Benchmark
RFA improved performance of LLCProbe on all experimental EC2 instances!Highest performance improvement of 13%,
recovering 33% of performance lost.
22Average performance improvement:
6%Slide23
mcf from SPEC-CPU23
10% slowdown
6% slowdown
3% performance improvement = 35% reduction in performance loss
On average RFA improved performance across
all
SPEC workloads!Slide24
Discussion: Practical AspectsRFA case studies used CPU intensive CGI requestsAlternative: DoS vulnerabilities (Eg. hash-collision attacks)Identifying co-resident victims
Easy on most clouds (Co-resident VMs have predictable internal IP addresses)No public interface? Paper discusses possibilities for RFAs24
VM
VMSlide25
ConclusionResource-Freeing AttacksInterfere with victim to shift resource use Proof-of-concept of efficacy in public cloudsOpen questions: Other RFAs?
Countermeasures: Detection, stricter isolation, smarter scheduling?25
VM
VMSlide26
References[MMSys10] Sean K. Barker and Prashant Shenoy. “Empirical evaluation of latency-sensitive application performance in the cloud.” In MMSys, 2010.[Security10] Thomas Moscibroda and Onur Mutlu.
“Memory performance attacks: Denial of memory service in multi-core systems.” In Usenix Security Symposium, 2007.[CCS09] T. Ristenpart, E. Tromer, H. Shacham, and S. Savage. “Hey, you, get off my cloud: exploring information leakage in third party compute clouds.” In CCS, 2009.
26Slide27
Backup Slides27Slide28
Discussion: CountermeasuresDetection?May be hard to differentiate RFA from legitimateStricter Isolation?Works but expensiveContention-aware schedulingNot yet used in public IaaS
28Slide29
Discussion: EconomiesCost of RFAHelper instance, andRFA traffic.Co-resident helperAn efficient implementation of helper can run inside the attacker’s VM.Current helper implementation consumes 15 Kbps of network bandwidth and a CPU utilization of 0.7%.Multiplex Singe Helper Instance for many beneficiaries.Note: Currently, internal EC2 network traffic is free-of-cost.
29Slide30
Identifying Co-resident VMsIdentifying the public interface:Predictable numerical distance between internal IP addresses in public clouds.Identifying port used by the victim application (standard ports like http(s), etc.).30Slide31
Experiment: Measuring Resource ContentionSynthetic workloads
31Slide32
Other RFAsRFAs are not limited to the presented case studies.LLC vs. DiskSending spurious, random disk requests asynchronously to create a bottleneck for the shared disk resource.Memory vs. DiskSimilarly to the above RFA32Slide33
Discussion: More on Practical AspectsWork-conserving vs. Non-work-conserving schedulersIt is expected that public cloud environment manage resources in a non-work-conserving fashion.Eg. Net vs. Net RFA won’t work on Amazon EC2.Simulated client workloadWhat is the effect of RFA in the presence of multiple independent client requests originating from numerous clients?33Slide34
N/W
Core
Core
Core
Core
cache
memory
Disk
Hypervisor
Dom0
Dom0
Dom0
Dom0
VM
VM
VM
VM
VM
VM
VM
VM
Xen Internals
Domain-0
Privileged Domain, direct access to I/O devices.
All I/O requests goes through Dom-0
Xen scheduler internal
Boost priority for interactive workloads
Incoming request
34Slide35
Experiment: Measuring Resource ContentionOn a local Xen test bedLocal Xen Test bed
VM
N/W
Core
Core
Core
Core
VM
LLC
memory
Disk
VM
VM
VM
VM
VM
VM
VM
Machine
Intel Xeon E5430, 2.66 Ghz
Packages
2,
2 cores per package
LLC Size
6MB
per package
LLC
Not all resources conflict
Some have huge performance degradation
35Slide36
Boost Priority and InterruptionsVictim: Webserver Beneficiary: LLCProbe
95%
< 30%
40%
85%
Fewer interruptions
Higher cache efficiency
36Slide37
Demonstration on EC2Problem #1: Achieving Co-residenceLaunching multiple instances simultaneously from two or more accounts.Problem #2: Verifying Co-residencyNumerical distance between internal IP addresses [CCS09].Faster packet round-trip times.Using resource contention experiments.37Slide38
Normalized Performance on EC2
Baseline
Higher is better
Aggregate performance degradation is within 5 performance points
6%
On an average all SPEC workloads benefitted from RFA
38