G eoReplicated Storage Spanning Multiple Cloud Services Zhe Wu Michael Butkiewicz Dorian Perkins Ethan KatzBassett Harsha V Madhyastha UC Riverside and USC Geodistributed Services for Low Latency ID: 387245
Download Presentation The PPT/PDF document "SPANStore: Cost-Effective" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
SPANStore: Cost-Effective Geo-Replicated Storage Spanning Multiple Cloud Services
Zhe Wu
, Michael Butkiewicz, Dorian Perkins, Ethan Katz-Bassett, Harsha V. Madhyastha
UC Riverside and USCSlide2
Geo-distributed Services for Low Latency
2Slide3
Cloud Services Simplify Geo-distribution
3Slide4
Need for Geo-Replication
Data uploaded by a user may be viewed/edited by users in
other locations
Social networking (
Facebook, Twitter)
File sharing (Dropbox, Google Docs)
Geo-replication of data is necessary
Isolated storage service in each cloud data center
Application needs to handle replication itself
4Slide5
Geo-replication on Cloud Services
Lots of recent work on enabling geo-replication
Walter(SOSP’11),
COPS(SOSP’11),
Spanner(OSDI’12), Gemini(OSDI’12),
Eiger(NSDI’13)…
Faster performance or stronger consistencyAdded consideration on cloud services
5
Minimizing costSlide6
Outline
Problem and motivation
SPANStore
overview
Techniques for reducing cost
Evaluation
6Slide7
SPANStore
Key value store (GET/PUT interface) spanning cloud storage services
Main objective:
minimize cost
Satisfy application requirementsLatency SLOs
Consistency (Eventual vs. sequential consistency)Fault-tolerance
7Slide8
SPANStore
Overview
8
SPANStore
App
Metadata lookups
Return data/ACK
Library
request
Read/write data based on optimal replication policy
Data center A
Data center B
Data center C
Data center DSlide9
SPANStore Overview
9
SPANStore
A
pp
Data center B
SPANStore
A
pp
Data center C
SPANStore
Data center
A
SPANStore
A
pp
Data center D
Placement Manager
workload
Replication policy
Inter-DC latencies
Pricing policies
Latency, consistency and fault tolerance requirements
SPANStore
Characterization
Application InputSlide10
Outline
Problem and motivation
SPANStore
overview
Techniques for reducing cost
Evaluation
10Slide11
Questions to be addressed for every object:
Where to store replicas
How to execute PUTs and GETsSlide12
Cloud Storage Service Cost
12
Storage cost
Request cost
Data transfer cost
+
+
=
Storage service cost
(the amount of data stored)
(the number of PUT and GET requests issued)
(the amount of data transferred out of data center)Slide13
Low Latency SLO Requires High Replication in Single Cloud Deployment
13
R
R
R
R
Latency bound = 100ms
AWS regionsSlide14
Technique 1: Harness Multiple Clouds
14
R
R
R
R
R
R
Latency bound = 100ms
AWS regionsSlide15
Price Discrepancies across Clouds
15
Cloud region
Storage
price (GB)
Data
transfer price (GB)
GET request price (10000 requests)
PUT request price (1000
requests
)
S3 US
West
0.095$
0.12$
0.004$
0.005$
Azure Zone2
0.095$
0.19$
0.001$
0.0001$
GCS
0.085$
0.12$
0.01$
0.01$
…
…
…
…
…
Leveraging discrepancies judiciously
can reduce cost
Slide16
Range of Candidate Replication Policies
16
Strategy 1:
s
ingle replica in cheapest storage cloud
R
High latenciesSlide17
Range of Candidate Replication Policies
17
Strategy 2:
f
ew replicas to reduce latencies
R
R
High data transfer
cost
High data transfer
cost
High data transfer costSlide18
Range of Candidate Replication Policies
18
Strategy 3: replicated everywhere
PUT
R
R
R
R
High latencies& cost of PUTs
High storage cost
Optimal replication policy depends on:
1. application requirements
2. workload
propertiesSlide19
High Variability of Individual Objects
19
Estimate workload based on same hour in previous week
60% of hours have error higher than 50%
20
% of hours have error higher than 100%
Error can be as high as 1000%
Analyze predictability of Twitter workloadSlide20
Technique 2: Aggregate Workload Prediction per Access Set
Observation: stability in aggregate workload
Diurnal and weekly patterns
Classify objects by access set:
Set of data centers from which object is accessed
Leverage application knowledge of sharing patternDropbox/Google Docs know users that share a file
Facebook controls every user’s news feed
20Slide21
Technique 2: Aggregate Workload Prediction per Access Set
21
Aggregate workload is more stable and predictable
Estimate workload based on same hour in previous weekSlide22
Optimizing Cost for GETs and PUTs
22
R
R
GET
R
R
Use cheap (request + data transfer) data centersSlide23
Technique 3: Relay Propagation
23
PUT
Asynchronous propagation (no latency constraint)
R
0.25$/GB
0.19$/GB
0.2$/GB
0.19$/GB
0.12$/GB
R
R
R
RSlide24
Technique 3: Relay Propagation
24
PUT
0.25$/GB
0.19$/GB
0.2$/GB
0.19$/GB
0.12$/GB
Violate
SLO
Asynchronous propagation (no latency constraint)
Synchronous propagation (bounded by latency SLO)
R
R
R
R
RSlide25
SummaryInsights to reduce cost
Multi-cloud deployment
Use aggregate workload per access set
Relay propagation
Placement manager uses ILP to combine insightsOther techniques
Metadata managementTwo phase-locking protocolAsymmetric quorum set
25Slide26
Outline
Problem and motivation
SPANStore
overview
Techniques for reducing cost
Evaluation
26Slide27
Evaluation
Scenario
Application is deployed
on
EC2SPANStore is deployed across S3, Azure and GCSSimulations to evaluate cost savings
Deployment to verify application requirementsRetwis
ShareJS
27Slide28
Simulation Settings
Compare
SPANStore
againstReplicate everywhere
Single replicaSingle cloud deployment
Application requirementsSequential consistencyPUT SLO: min SLO satisfies replicate everywhere
GET SLO: min SLO satisfies single replica
28Slide29
SPANStore Enables Cost Savings across Disparate Workloads
29
Savings by relay propagation
#
1: big objects, more GETs
(Lots of data transfers from replicas)
#2: big objects, more PUTs
(
Lots of data transfers to replicas
)
Savings by reducing data transfer
#3:
small
objects, more GETs
(
Lots of GET requests
)
Savings by price discrepancy
of GET request
#4
:
small objects, more PUTs
(
Lots of PUT requests
)
Savings by price discrepancy
of PUT requestSlide30
Deployment Settings
30
Retwis
Scale down Twitter workload
GET: read timeline
PUT: make post
Insert: read follower’s timeline and append post to itRequirements:Eventual consistency
90%ile PUT/GET SLO = 100msSlide31
SPANStore
Meets SLOs
31
SLO
90%ile
Insert SLOSlide32
Conclusions
SPANStore
Minimize cost while satisfying latency, consistency and fault-tolerance requirements
Use multiple cloud providers for greater data center density and pricing discrepancies
Judiciously
determine replication policy based on workload properties and application needs
32Slide33