Dr Nicholas Gibbins nmg ecssotonacuk 20122013 Definitions 2 Parallelism An arrangement or state that permits several operations or tasks to be performed simultaneously rather than consecutively ID: 652919
Download Presentation The PPT/PDF document "Parallel Databases COMP3017 Advanced Dat..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Parallel Databases
COMP3017 Advanced Databases
Dr Nicholas
Gibbins -
nmg
@ecs.soton.ac.uk
2012-2013Slide2
Definitions
2
Parallelism
An arrangement or state that permits several operations or tasks to be performed simultaneously rather than consecutively
Parallel Databases
have the ability to split
processing of data
access to data
across multiple processorsSlide3
Why Parallel Databases
3
Hardware trends
Reduced elapsed time for queries
Increased transaction throughput
Increased scalability
Better price/performance
Improved application availability
Access to more data
In short, for better performanceSlide4
Tightly coupledSymmetric Multiprocessor (SMP)
P = processorM = memory
Shared Memory Architecture
4
P
Global Memory
P
PSlide5
Less complex database softwareLimited scalability
Single bufferSingle database storage
Software
–
Shared Memory
5
P
Global Memory
P
PSlide6
Loosely coupledDistributed Memory
Shared Disc Architecture
6
P
P
P
M
M
M
SSlide7
Avoids memory bottleneckSame page may be in more than one buffer at once
– can lead to incoherenceNeeds global locking mechanismSingle logical database storage
Each processor has its own database buffer
Software
–
Shared Disc
7
P
P
P
M
M
M
SSlide8
Massively ParallelLoosely Coupled
High Speed Interconnect (between processors)
Shared Nothing Architecture
8
P
P
P
M
M
MSlide9
One page is only in one local buffer –
no buffer incoherenceNeeds distributed deadlock detectionNeeds multiphase commit protocol
Needs to break SQL requests into multiple sub-requests
Each processor has its own database buffer
Each processor owns part of the
data
Software - Shared Nothing
9
P
P
P
M
M
MSlide10
Hardware vs. Software Architecture
10
It is possible to use one software strategy on a different hardware arrangement
Also possible to simulate one hardware configuration on another
Virtual Shared Disk (VSD) makes an IBM SP shared nothing system look like a shared disc setup (for Oracle)
From this point on, we deal only with shared
nothi
ng
software modeSlide11
Shared Nothing Challenges
11
Partitioning the data
K
eeping the partitioned data balanced
Splitting up queries to get the work done
Avoiding distributed deadlock
Concurrency control
Dealing with node failureSlide12
Hash Partitioning
12
TableSlide13
Range Partitioning
13
A-H
I-P
Q-ZSlide14
Schema Partitioning
14
Table 1
Table 2Slide15
Rebalancing Data
15
Data in proper balance
Data grows, performance drops
Add new nodes and disc
Redistribute data to new nodesSlide16
Dividing up the Work
16
Application
Coordinator Process
Worker
Process
Worker
Process
Worker
ProcessSlide17
DB Software on each node
17
App1
DBMS
W1
W2
C1
DBMS
W1
W2
App2
DBMS
W1
W2
C2Slide18
Transaction Parallelism
18
Improves throughput
Inter-Transaction
Run many transactions in parallel, sharing the DB
Often referred to as ConcurrencySlide19
Query Parallelism
19
Improves response times (lower latency)
Intra-
o
perator (horizontal) parallelism
Operators decomposed into independent
operator instances
, which perform
the same operation on different subsets of
data
Inter-operator (vertical) parallelism
Operations are overlapped
Pipeline data from one stage to the next without materialisationSlide20
Intra-Operator Parallelism
20
SQL Query
Subset
Queries
Subset
Queries
Subset
Queries
Subset
Queries
Processor
Processor
Processor
ProcessorSlide21
Intra-Operator Parallelism
21
Example query:
SELECT c1,c2 FROM t1 WHERE c1>5.5
Assumptions:
100,000 rows
Predicates eliminate 90% of the rowsSlide22
I/O Shipping
22
Coordinator
and
Worker
Network
Worker
Worker
Worker
Worker
25,000 rows
25,000 rows
25,000 rows
25,000 rows
10,000
(c1,c2)Slide23
Function Shipping
23
Coordinator
Network
Worker
Worker
Worker
Worker
2,500 rows
2,500 rows
2,500 rows
2,500 rows
10,000
(c1,c2)Slide24
Function Shipping Benefits
24
Database operations are performed where the data are, as far as possible
Network traffic in minimised
For basic database operators, code developed for serial implementations can be reused
In practice, mixture of Function Shipping and I/O Shipping has to be employedSlide25
Inter-Operator Parallelism
25
time
Scan
Join
Sort
Scan
Join
SortSlide26
Resulting Parallelism
26
time
Scan
Join
Sort
Scan
Join
Sort
Scan
Scan
Join
Join
Sort
SortSlide27
Basic Operators
27
The DBMS has a set of basic operations which it can perform
The result of each operation is another table
Scan a table
–
sequentially or selectively via an index
–
to produce another smaller table
Join two tables together
Sort a table
Perform aggregation functions (sum, average, count)
These are combined to execute a querySlide28
The Volcano Architecture
28
The Volcano Query Processing System was devised by Goetze Graefe (Oregon)
Introduced the Exchange operator
Inserted between the steps of a query to:
Pipeline results
Direct streams of data to the next step(s), redistributing as necessary
Provides mechanism to support both vertical and horizontal parallelism
Informix ‘Dynamic Scalable Architecture’ based on VolcanoSlide29
Exchange Operators
29
Example query:
SELECT county, SUM(
order_item
)
FROM customer, order
WHERE
order.customer_id
=
customer_id
GROUP BY county
ORDER BY SUM(
order_item)Slide30
Exchange Operators
30
SORT
GROUP
HASH
JOIN
SCAN
SCAN
Customer
OrderSlide31
Exchange Operators
31
EXCHANGE
SCAN
SCAN
Customer
HASH
JOIN
HASH
JOIN
HASH
JOINSlide32
Exchange Operators
32
EXCHANGE
SCAN
SCAN
Customer
HASH
JOIN
HASH
JOIN
HASH
JOIN
EXCHANGE
SCAN
SCAN
SCAN
OrderSlide33
EXCHANGE
SCAN
SCAN
Customer
HASH
JOIN
HASH
JOIN
HASH
JOIN
EXCHANGE
SCAN
SCAN
SCAN
Order
EXCHANGE
EXCHANGE
GROUP
GROUP
SORT
33Slide34
Query ProcessingSlide35
Some Parallel Queries
35
Enquiry
Collocated Join
Directed Join
Broadcast Join
Repartitioned JoinSlide36
Orders Database
36
CUSTKEY
C_NAME
…
C_NATION
…
ORDERKEY
DATE
…
CUSTKEY
…
SUPPKEY
S
_NAME
…
S
_NATION
…
SUPPKEY
…
CUSTOMER
ORDER
SUPPLIERSlide37
Enquiry/Query
37
“How many customers live in the UK?”
SCAN
COUNT
COORDINATOR
SLAVE TASKS
SUM
Multiple partitions
of customer table
Return
subcounts
to coordinator
Return to application
Data from applicationSlide38
Collocated Join
38
“Which customers placed orders in July?
”
SCAN
UNION
ORDER
Tables both partitioned on
CUSTKEY (the same key) and
therefore corresponding entries
are on the same node
Requires a JOIN of CUSTOMER
and ORDER
SCAN
JOIN
CUSTOMERSlide39
Directed Join
39
“Which customers placed orders in July?”
(tables have different keys)
CUSTOMER
SCAN
ORDER
SCAN
JOIN
Slave Task 1
Slave Task 2
Coordinator
Return to application
ORDER partitioned on ORDERKEY, CUSTOMER partitioned on CUSTKEY
Retrieve rows from ORDER, then use ORDER.CUSTKEY to direct
appropriate rows to nodes with CUSTOMERSlide40
Broadcast Join
40
“Which customers and suppliers are in the same country?”
CUSTOMER
SCAN
SUPPLIER
SCAN
JOIN
Slave Task 1
Slave Task 2
Coordinator
Return to application
SUPPLIER partitioned on SUPPKEY, CUSTOMER on CUSTKEY.
Join required on *_NATION
Send all SUPPLIER to each CUSTOMER node
BROADCASTSlide41
Repartitioned Join
41
“Which customers and suppliers are in the same country?”
CUSTOMER
SCAN
SUPPLIER
Slave Task 1
Coordinator
SUPPLIER partitioned on SUPPKEY, CUSTOMER on CUSTKEY.
Join required on *_NATION. Repartition both tables on *_NATION to
localise and minimise the join effort
SCAN
Slave Task 2
JOIN
Slave Task 3Slide42
Query Handling
42
Critical aspect of Parallel DBMS
User wants to issue simple SQL, and not be concerned with parallel aspects
DBMS needs structures and facilities to parallelise queries
Good optimiser technology is essential
As database grows, effects (good or bad) of optimiser become increasingly significantSlide43
Further Aspects
43
A single transaction may update data in several different places
Multiple transactions may be using the same (distributed) tables simultaneously
One or several nodes cou
ld
fail
Requires concurrency control and recovery across multiple nodes for:
Locking and deadlock detection
Two-phase commit to ensure ‘all or nothing’Slide44
Deadlock DetectionSlide45
Locking and Deadlocks
45
With Shared Nothing architecture, each node is responsible for locking its own data
No global locking mechanism
However:
T1 locks item A on Node 1 and wants item B on Node 2
T2 locks item B on Node 2 and wants item A on Node 1
Distributed DeadlockSlide46
Resolving Deadlocks
46
One approach
–
Timeouts
Timeout T2, after wait exceeds a certain interval
Interval may need random element to avoid ‘chatter’
i.e. both transactions give up at the same time and then try again
Rollback T2 to let T1 to proceed
Restart T2, which can now completeSlide47
Resolving Deadlocks
47
More sophisticated approach (DB2)
Each node maintains a local ‘wait-for’ graph
Distributed deadlock detector (DDD) runs at the catalogue node for each database
Periodically, all nodes send their graphs to the DDD
DDD records all locks found in wait state
Transaction becomes a candidate for termination if found in same lock wait state on two successive iterationsSlide48
ReliabilitySlide49
Reliability
49
We wish to preserve the ACID properties for parallelised transactions
Isolation is taken care of by 2PL protocol
Isolation implies Consistency
Durability can be taken care of node-by-node, with proper logging and recovery routines
Atomicity is the hard part. We need to commit all parts of a transaction, or abort all parts
Two-phase commit protocol (2PC) is used to ensure that Atomicity is preservedSlide50
Two-Phase Commit (2PC)
Distinguish between:
The global transaction
The local transactions into which the global transaction is decomposed
G
lobal
transaction
is managed by a single site,
known as the
coordinator
L
ocal transactions
may be executed on separate sites, known as the participants50Slide51
Phase 1: Voting
Coordinator sends “prepare T” message to all participants
Participants respond with either “vote-commit T” or
“vote-abort T”
Coordinator
waits for participants to respond within a timeout periodSlide52
Phase 2: Decision
If all participants return “vote-commit T” (
to commit), send
“commit T” to
all participants. Wait for acknowledgements within timeout period.
If any participant returns
“vote-abort T”,
send
“abort T” to
all participants. Wait for acknowledgements within timeout period.
When all acknowledgements received, transaction is completed.
If a site does not acknowledge, resend global decision until it is acknowledged.Slide53
Normal Operation
53
C
P
prepare T
vote-commit T
commit T
ack
vote-commit T
received from all
participants Slide54
Parallel UtilitiesSlide55
Parallel Utilities
55
Ancillary operations can also exploit the parallel hardware
Parallel Data Loading/Import/Export
Parallel Index Creation
Parallel Rebalancing
Parallel Backup
Parallel Recovery