/
Cloud Data Management Cloud Data Management

Cloud Data Management - PowerPoint Presentation

luanne-stotts
luanne-stotts . @luanne-stotts
Follow
370 views
Uploaded On 2015-11-07

Cloud Data Management - PPT Presentation

Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot scale to these workloads using commodity hardware ID: 185731

server tablet data sstable tablet server sstable data master key gfs row servers chunk chubby bigtable google

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Cloud Data Management" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Cloud Data ManagementSlide2

Inexpensive Scalable Information Access

Many Internet applications need to access data for millions of concurrent users

Relational DBMS technology cannot scale to these workloads using commodity hardware

Need for low cost scalable DBMSs resulted in the advent of the

key-value stores

(e.g., Google’s

Bigtable

, Yahoo!’s PNUTS, and Amazon’s Dynamo)Slide3

Key-value Stores

Scalability and availability is more important than rich functionality

Scalability

: Scale out

to

thousands of commodity servers

Availability

: Data replicated across data centers to ensure high availability of user data in the presence of failuresSlide4

Key-value Data Model

Primary abstraction is a

table

of rows or key-value pair

Each

row

is identified by a unique

key

, and the

value

can vary in its structure

Keys are arbitrary strings which can be up to 64K bytes

Arbitrary number of columns per row

Arbitrary data type for each column (i.e., data validation done by applications)

An interpreted binary string , i.e., a Blob

Columns with their own attribute as in relational DBMSs

Multiple

versions

of each row can be maintained and accessed through timestamps Slide5

From Needs to Constraints

Retrieval

(row, column, timestamp) lookup only

In some systems, simple relational operations are supported such as selection and projection

Update

Updates and deletes need to specify the primary key

Atomicity

Atomic Read and write only possible at row levelSlide6

Scalability & Fault Tolerance Consideration

Logical

entity

can be effectively represented as a single row

Each row typically resides in a single server, and data

access is restricted

to a single key

Application-level data manipulation is restricted to a single computer obviating the need for multi-server coordination and synchronization

Rationale

: (1) requests generally distributed throughout the data set, (2) impact of failure limited to the rows served by the failed server Slide7

Cluster Management – Master-based

A centralized master server keeps track of all data servers using a highly fault-tolerant (FT) service

This FT service keeps track of the data stored at the different servers

When a data server fails, FT service reports this failure and the master can reassign the data to other servers

If the master fails, a new master is elected to take overSlide8

Cluster Management – Decentralized

Typically based on gossip messages exchanged among the servers continuously

These messages contain relevant performance measurements

The failure of a server is detected when a gossip message from that server is missing

This approach is more fault tolerant; but it incurs message overheadSlide9

Google’s

Bigtable

Master

Chubby node

Tablet Server

i

Tablet Server j

Tablet 1

Tablet 2

Tablet 3

GFS Chunk Server

SSTable

1

SSTable

2

SSTable

3

SSTable

4

(replica)

Tablet 4

Tablet 5

Tablet 6

GFS Chunk Server

SSTable

4

SSTable

5

SSTable

6

SSTable

2

(replica)

A table is a set of

tablets

A master server allocates tablets among

tabet

servers and is responsible for load balancing

Logical

view

Physical

layout

A tablet is stored as a collection of

SSTable

files

Tablet, logically represented as a key range, is the unit of distribution and load balancing

Distributed file systemSlide10

Tablets

A logical table is divided into multiple tablets, each hold an interval of table rows

Each tablet is stored in one or more

SSTable

files

When a tablet grows beyond a certain size, it is split into two new tabletsSlide11

Google’s

Bigtable

- Chubby

Master

Chubby node

Tablet Server

i

Tablet Server j

Tablet 1

Tablet 2

Tablet 3

GFS Chunk Server

SSTable

1

SSTable

2

SSTable

3

SSTable

4

(replica)

Tablet 4

Tablet 5

Tablet 6

GFS Chunk Server

SSTable

4

SSTable

5

SSTable

6

SSTable

2

(replica)

Logical

view

Physical

layout

A tablet is stored as a collection of

SSTables

Highly fault tolerant - consisting of five active replicas. Service is live when majority of replicas are running

It is used for managing the tablet servers

Replication is handled by GFS

Determines which server to hold a tabletSlide12

Google’s

Bigtable

- Column Families

Related columns stored in fixed number of families (the unit for data colocation and access at the storage layer)

Permissions can be applied at family level to grant access to different applicationsSlide13

Google’s

Bigtable

- Chubby

The master and every tablet server obtains a timed

lease

with Chubby that must be periodically renewed

A server can carry out its responsibilities only if it has an active lease

Every tablet server periodically reports to the master using

heartbeat messages

(that also contain the load statistics)

Master detects failures based on the heartbeat messages and uses the statistics for load balancing Slide14

Google’s

Bigtable

– Server Failure

Master

Chubby node

Tablet Server

i

Tablet Server j

Tablet 1

Tablet 2

Tablet 3

GFS Chunk Server

SSTable

1

SSTable

2

SSTable

3

SSTable

4

(replica)

Tablet 4

Tablet 5

Tablet 6

GFS Chunk Server

SSTable

4

SSTable

5

SSTable

6

SSTable

2

(replica)

Logical

view

Physical

layout

If this server fails

Tablet 4

Informs Server

i

to take over

T

ablet 4