Umar Farooq Minhas University of Waterloo David Lomet Chandu Thekkath Microsoft Research Distributed database architectures In a shared nothing system a single node can only access local data ID: 238106
Download Presentation The PPT/PDF document "Chimera: Data Sharing Flexibility, Share..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Chimera: Data Sharing Flexibility, Shared Nothing Simplicity
Umar
Farooq
Minhas
University of Waterloo
David
Lomet
,
Chandu
Thekkath
Microsoft ResearchSlide2
Distributed database architectures
In a
shared nothing
system a single node can only access local dataless complex, easier to implementprovides good performance if data is partitionablee.g., Microsoft SQL Server, IBM DB2/UDBData sharing allows multiple nodes to share access to common datacomplex, difficult to implementprovides increased responsiveness to load imbalancese.g., Oracle RAC, IBM Mainframe DB2
Goal: Design and implement a hybrid database systemSlide3
Shared nothing vs data sharing
Shared nothing
Node 1
CPUCPUMemory
Node 2
CPU
CPU
Memory
Node 3
CPU
CPU
Memory
Disk
Hardware configuration can be identical for both systems
Software managing the system is different
Disk
Disk
Data sharing
Node 1
CPU
CPU
Memory
Node 2
CPU
CPU
Memory
Node 3
CPU
CPU
Memory
Disk
Disk
Disk
Data sharing software layerSlide4
Our approach
Start with shared nothing cluster of low-cost desktop machines
each node hosts a standalone shared nothing DBMS with locally attached storage
Extend shared nothing system with data sharing capabilitya remote node can access a database hosted at a local nodeAdditional code required fordistributed lockingcache consistency
Techniques presented here are applicable to any shared nothing DBMSSlide5
Outline
Introduction
Chimera: Overview
Chimera: Implementation DetailsExperimental EvaluationConclusionSlide6
Chimera: Best of both worlds
Chimera is an “extension” to a shared nothing DBMS
built using off-the-shelf components
Provides the simplicity of shared nothing, flexibility of data sharingProvides effective scalability and load balancing with less than 2% overheadSlide7
Chimera: Main components
Shared file system
to store data accessible to all nodes of a clustere.g., Common Internet File System (CIFS) or Network File System (NFS)Generic distributed lock manager provides ownership controle.g., ZooKeeper, Chubby, BoxwoodExtra code in the shared nothing DBMSfor data access and sharing among nodesSlide8
Advantages of Chimera
Load balancing at table granularity
offloads execution cost of database functionality
Scale-out for read-mostly workloadsread-mostly workloads are very common and importante.g., a service hosted at Microsoft, Yahoo, or Google.non-partitionable data is stored in a centralized databaseChimera provides effective scale-out for such workloadsClose to shared nothing simplicitykey point: allow only a single node to update a database at a timegreatly simplifies data sharing, transaction log, and recoverySlide9
Outline
Introduction
Chimera: Overview
Chimera: Implementation DetailsExperimental EvaluationConclusionSlide10
Chimera: Overall system architecture
DB
CIFS
…
GLM
DBMS 1
(local)
SP
Queries
EBM
LC
DBMS N
(remote)
SP
Queries
EBM
LC
DBMS 2
(remote)
SP
Queries
EBM
LC
SP – Stored Procedure
LC – Lock Client
EBM
– Enhance Buffer Manager
GLM – Global Local
Manager
CIFS – Common Internet File SystemSlide11
Stored Procedure
Most of the required changes are implemented in a user defined stored procedure
invoked like a standard stored procedure
An instance of this stored procedure is installed at each nodeaccepts user queriesdoes appropriate locking and buffer managementexecutes the query against a local or remote tablereturns the results to the callerSlide12
Enhanced Buffer Manager
Implement a cross-node cache invalidation scheme
maintain cache consistency across nodes
Dirty pages need to be evicted from all readers after an updatewe do not know in advance which pages will get updatedSelective cache invalidationupdating node captures a list of dirty pagessends a message to all the readers to evict those pagesSlide13
Global Lock Manager
We need a richer lock manager that can handle locks on shared resources across machines
i
mplemented using an external global lock manager with corresponding local lock clientsA lock client is integrated with each DBMS instanceLock types: Shared or ExclusiveLock resources: an abstract name (string)Slide14
Read sequence
Acquire a
shared
lock on the abstract resource (table)ServerName.DBName.TableNameOn lock acquire, proceed with SelectRelease the shared lockSlide15
Write sequence
Acquire an
exclusive
lock on ServerName.DBNameServerName.DBName.TableNameOn lock acquire, proceed with the UpdateDo selective cache invalidation on all reader nodesRelease the exclusive locksSlide16
Outline
Introduction
Chimera: Overview
Chimera: Implementation DetailsExperimental EvaluationConclusionSlide17
Experimental setup
We use a 16 node cluster
2x AMD
Opteron CPU @ 2.0GHz8GB RAMWindows Server 2008 Enterprise with SP2patched Microsoft SQL Server 2008buffer pool size = 1.5GBBenchmarkTPC-H: A decision support benchmarkscale factor 1total size on disk ~3GBSlide18
Overhead of our prototype
Run the
22 TPC-H
queries on a single node with and without the prototype codeAvg Slowdown: 1.006 XTPCH Query
Runtime Without Prototype
(ms)
Runtime With Prototype
(ms)
Slowdown Factor
Q1
4809
5120
1.06
Q6
163
171
1.05
Q9
2258
2303
1.02
Q11
462
431
0.93
Q12
1131
1247
1.10
Q13
1349
1345
1.00
Q18
4197
3895
0.93
Q19
183
185
1.01
Q21
2655
2673
1.01
Q22
457
485
1.06Slide19
Remote execution overhead (cold cache)
Run the
22 TPC-H
queries on the local node and remote nodemeasure the query run time and calculate the slowdown factorflush DB cache between subsequent runsSlide20
Remote execution overhead (warm cache)
Repeat the previous experiment with
warm cache
Avg Slowdown (before): 1.46 XAvg
Slowdown (now):
1.03
XSlide21
Cost of updates
Baseline:
A simple update on
a node with no readersTest Scenarios: Perform update while 1, 2, 4, or 8 other nodes read the database in an infinite loopSlide22
Cost of reads with updates
Perform simple updates at
local
node with varying frequency: 60s, 30s, 15s, and 5sRun one of the TPC-H read queries at a remote node for a fixed duration of 300s and calculateResponse time: average runtimeThroughput: queries completed per secondSlide23
Update
Frequency
(secs)
Average
Runtime
(secs)
Steady State Average
(secs)
Queries/
sec
Q6
60
0.21
0.20
4.85
30
0.22
4.54
15
0.23
4.23
5
0.26
3.77
Q13
60
1.38
1.43
0.73
30
1.38
0.73
15
1.39
0.72
5
1.38
0.73
Q20
60
1.62
1.62
0.62
30
1.65
0.60
15
1.99
0.49
5
2.04
0.49
Q21
60
2.78
2.73
0.36
30
2.85
0.35
15
3.01
0.33
5
3.60
0.28
Cost of reads with updates (1)
Non-conflicting
readSlide24
Scalability
Run concurrent TPC-H
streams
start with a single local nodeincrementally add remote nodes up to a total of 16 nodesSlide25
Conclusion
Data-sharing systems are desirable for
load-balancing
We enable data-sharing as an extension to a shared nothing DBMSWe presented design and implementation of Chimeraenables data sharing at table granularityuses global locks for synchronizationimplements cross-node cache invalidationdoes not require extensive changes to shared nothing DBMS
Chimera provides effective scalability and load balancing with
overhead