/
Cassandra Training Cassandra Training

Cassandra Training - PowerPoint Presentation

alida-meadow
alida-meadow . @alida-meadow
Follow
385 views
Uploaded On 2016-07-29

Cassandra Training - PPT Presentation

Introduction amp Data Modeling Aims Introduction to Cassandra By the end of today you should know How Cassandra organises data How to configure replicas How to choose between consistency and availability ID: 424152

introduction cassandra nodes data cassandra introduction data nodes amp consistency key log columns code virtual rows column read sstables model node write

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Cassandra Training" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Cassandra Training

Introduction & Data ModelingSlide2

Aims

Introduction to Cassandra

By the end of today you should know:

How Cassandra

organises

dataHow to configure replicasHow to choose between consistency and availabilityHow to efficiently model data for both reads and writesYou need to consider Active-Active scenarios Who to ask to help you & sign off on your data modelHINT: Ask Neil directly or email harch@expedia.com.Slide3

Agenda – 100ft

Introduction to Cassandra

Quick Introduction

Data Structures

Efficient Data Modeling

Data Modeling ExamplesSlide4

Agenda - Introduction

Introduction to Cassandra

Elevator

Pitch

Brewer’s

Theorem & Tuneable ConsistencyDistributed Hash Table 101Write pathRead pathTTL, Deletion & TombstonesBackground Processes

Data Model in 5mins

Thrift

vs

CQL

Maintaining Consistency

Scaling

CassandraSlide5

Agenda – Advanced Topics

Introduction to Cassandra

Data

Modelling

Key Concepts

Time Series ModellingWide rowsCompound KeysCode examplePerformance Tuning LeversWhat is DataStax Enterprise?

Multi DC

Support

Virtual

Nodes

NodetoolSlide6

Elevator Pitch

Introduction to Cassandra

What?

Write

path

optimisedEventually consistent (ms)Distributed Hash TableHighly durableTunable consistencySlide7

Elevator Pitch

Introduction to Cassandra

Why?

Linear horizontal

r

ead & write scalingData is important and should always be thereOften times we don’t need consistency guaranteelet me choose my tradeoffSlide8

Elevator Pitch

Introduction to Cassandra

How?

Data partitioned internally across

nodes

Writes must just hit the commit logStore data read-optimised to minimise read & write work: no indexes to update, no query to planSpecify agreement (consistency) per querySlide9

Elevator Pitch

Introduction to Cassandra

What it’s Not

No support for transactions - atomicity,

isolation mostly not available

Not a silver bullet - easy to design a poorly-performing data modelSlide10

DHT 101

Introduction to Cassandra

Each physical node is assigned a

token

Nodes own the range from the previous

tokenSlide11

Cassandra Write Path

Introduction to Cassandra

The

coordinator will send the update to two nodes, starting at the owning node and working

clockwiseSlide12

Cassandra Write Path

Introduction to Cassandra

128-bit hash used to compute

partition key

Keys are therefore distributed randomly around the

ringIf Unavailable - Hinted HandoffSlide13

Cassandra Write Path

Introduction to Cassandra

Concepts

The Snitch

proximityRandom Partitioner – key -> tokenReplication Factor – how many replicasGossip – discovery protocolSlide14

Cassandra Write Path

Introduction to Cassandra

SSTables

are sequential and immutable

Data may reside across

SSTablesSSTables are periodically compacted togetherSlide15

Cassandra Read Path

Introduction to Cassandra

Data read command sent to closest replica - snitch

Digest commands sent to other replicas – CL

Read Repair Chance 10% - digest all replicasSlide16

Start & Interrogate C*

Introduction to Cassandra

vagrant

box add

dse.box

http://htraining.s3.amazonaws.com/dse.boxmkdir ~/vagrantcurl http://htraining.s3.amazonaws.com/vagrant-dse.tar.gz > ~/vagrant/dse.tar.gzcd ~/vagrant && tar

xzvf

dse.tar.gz

c

d

dse

&& vagrant up

vagrant

ssh

node1

n

odetool

ringSlide17

Cassandra Read Path

Introduction to Cassandra

Read Mechanics

Find Candidate

SSTables

- Bloom FiltersSeek Through SSTablesMemory Mapped FilesCheck Memtable->

minimise

sstables

for best efficiencySlide18

Deletion

& Tombstones

Introduction to Cassandra

Deleted data marked as removed – tombstone

Stops zombie data – distributed system

Tombstones collected after a few days – configurableSlide19

Brewer’s Theorem

Introduction to Cassandra

Distributed Data

– only 2 at a time –

Consistency

AvailabilityPartition ToleranceSlide20

Brewer’s Theorem

Introduction to Cassandra

CA - normal operation, no partition, consistency and availability

providedSlide21

Brewer’s Theorem

Introduction to Cassandra

AP - partition occurs, maintaining two mutable, disconnected state copies breaks consistency, availability is

conservedSlide22

Brewer’s Theorem

Introduction to Cassandra

CP - partition occurs, to maintain consistency we need to take one side offline, sacrificing availabilitySlide23

Tuneable Consistency

Introduction to Cassandra

Cassandra Consistency Level

Specify node number to agree on read/write

Choose consistency or availability:CL.LOCAL_QUORUM, CL.ONE Eventual consistency will bring both sides into agreement eventually Slide24

Background Processes

Introduction to Cassandra

SSTables

Compacted Periodically

Size-Tiered Compaction

– default, no compaction guaranteeLeveled-Compaction– better chance of tombstone compaction– more continual compaction, 2x I/O – impact on online– use for update-heavy workloads – creates many SSTablesSlide25

Agenda – 100ft

Introduction to Cassandra

Quick Introduction

Data Structures

Efficient Data Modeling

Data Modeling ExamplesSlide26

Data Model

Introduction to Cassandra

Keyspace

Analogous to Database/Schema

Segregate Applications

Replication configured at this levelSlide27

Data Model

Introduction to Cassandra

Column Family

Analogous to Table

Contains many rows

Caches configurable at this levelSlide28

Data Model

Introduction to Cassandra

Row

Each one has a partition key - hash

Has many columns

– up to 2BnColumns don’t have to be defined ahead of timeRows in the same CF can have different columnsNo sorting by rows, model ordering in rowsSlide29

Data Model

Introduction to Cassandra

Columns

Sorted by name before being written to

SSTable

Name and Value are typedValues can be type-validatedColumn update is timestampedCan have TTLSlide30

Data Model

Introduction to Cassandra

Counter Columns

Distributed counters

Can get false countsSlide31

Data Model

Introduction to Cassandra

Super Columns – Don’t Use

Blob of columns stored inside a single column

Have to read and write whole blob

Memory intensiveConflicts resolved for whole blob - badSlide32

Secondary Indices

Introduction to Cassandra

Can define an index on a column

Cassandra will maintain an inverted index

Use sparingly

Low Cardinality Columns OnlyOften times better to maintain own viewSlide33

Thrift vs

CQL

Introduction to Cassandra

Thrift

Original interface, hash style syntax

CQLSQL-like syntax but highly limitedSent over Thrift but plans for own protocolSlide34

Maintaining Consistency

Introduction to Cassandra

Consistency Level

Used on read & write operations

ONE, TWO, LOCAL_QUORUM, ALL, ANY

Do you really need consistency guarantee?Slide35

Scaling Cassandra

Introduction to Cassandra

Imagine RF=3, Quorum, Nodes=6

Each query impacts 2 nodes sync

Each write will touch all 3 nodes, though

asyncTo scale writes add more nodesTo scale reads, add more replicasSlide36

Advanced Topics

Introduction to Cassandra

Advanced Topics

Data

Modelling

Wide Rows & ClusteringPerformanceSolr 4 & Hadoop IntegrationSlide37

Agenda – 100ft

Introduction to Cassandra

Quick Introduction

Data Structures

Efficient Data Modeling

Data Modeling ExamplesSlide38

Data Modelling

Introduction to Cassandra

Data

Modelling

Concepts that

Drive Data ModelingTime-series ModelingWide Rows (Composite Columns)Compound Keys & CQL3Slide39

Data Modelling - Concepts

Introduction to Cassandra

Rows in same CF will live on different nodes

High cost of multi-get

De-

normalise your data into rowsDon’t Put Consistent Load on Single RowWill heat up replica nodesSlide40

Data Modelling - Concepts

Introduction to Cassandra

Writes to Single Row Atomic & Isolated

Columns are Ordered

Column Range Slicing Efficient

Mutating data often needs compaction tuningSlide41

Wide Rows

Introduction to Cassandra

Efficient Reads

Store how you want to fetch

Fetch most efficient over few rows

Store what you want to fetch in few rowsSlide42

Time Series

Introduction to Cassandra

Use Timestamp for Column Name – ordered

Range slicing efficient

Can limit row length by using date partition key

e.g. 20121004Slide43

Composite Columns

Introduction to Cassandra

Composite Column

e.g. time1:log_class, time1:log_message,

time2:

log_class, time2:log_messageSlide44

Time Series

Introduction to Cassandra

Writing to a Single Row Hotspots

Use Round Robin Over Rows

e.g. 20121004

:1, 20121004:2, etc…Slide45

Compound Keys

Introduction to Cassandra

Compound Key in CQL3

Partition Key is the row key

Compound Key = Partition Key +

Composite Keye.g. partition key = 20121004, composite key = time120121004 => time1:name, time1:msg, time2:name, time2:msgSlide46

Agenda – 100ft

Introduction to Cassandra

Quick Introduction

Data Structures

Efficient Data Modeling

Data Modeling ExamplesSlide47

Working with CQL

Introduction to Cassandra

c

qlsh

-3 192.168.33.21

CREATE KEYSPACE my_app_data WITH strategy_class =

SimpleStrategy

AND

strategy_options:replication_factor

=

2;

DESCRIBE KEYSPACE

my_app_data

;Slide48

Compound Keys

Introduction to Cassandra

USE

my_app_data

;

CREATE COLUMNFAMILY logs ( day text, -- partition key log_id timeuuid,

--

clustering column

log_class

text

,

log_message

text,

primary

key

(day,

log_id

)

);

DESCRIBE

columnfamilies

;Slide49

Compound Keys

Introduction to Cassandra

INSERT

INTO logs (

day,log_id,log_class,log_message

) VALUES (‘20130604’, ‘2013-06-04 10:05:00’, ‘error

’, ‘

it

broke

)

USING

CONSISTENCY

ONE;

INSERT

INTO logs (

day,log_id,log_class,log_message

)

VALUES

(‘20130604’,

2013-06-04

11:05:00’

,

error

’, ‘

it

broke

again

)

USING

CONSISTENCY

QUORUM;Slide50

Compound Keys

Introduction to Cassandra

SELECT * FROM logs USING

CONSISTENCY

ONE

WHERE day=‘20130604’;SELECT * FROM logs USING CONSISTENCY QUORUM WHERE day

=‘20130604

AND

log_id

> ‘2013-06-04 11:00:00’;

TRY WITH CL.TWO:

vagrant suspend node2

Setting CL and range querying columns, losing consistencySlide51

Compound Keys

Introduction to Cassandra

cassandra

-cli -h

192.168.33.21

use my_app_data;list logs;See the raw Cassandra dataSlide52

Code Example - Clients

Introduction to Cassandra

Hector

Solid Java Client

In Use in Production

Round RobinNode DiscoverySlide53

Code Example - Clients

Introduction to Cassandra

Astyanax

Netflix Open Source Library

Simpler APIsSlide54

Code Example

Introduction to Cassandra

Example: Storing Payment Methods

https://

github.com

/neilbeveridge/example-compoundkeysSlide55

Code Example

Introduction to Cassandra

Requirements

Store 1-10 payment methods

Use a single rowSlide56

Code Example

Introduction to Cassandra

Non-CQL

Define a composite column class

public static final class Composite { private @Component(ordinal = 0) String paymentUuid; private @Component(ordinal = 1) String

field;Slide57

Code Example

Introduction to Cassandra

Writing Data

UUID

paymentUUID

= TimeUUIDUtils.getUniqueTimeUUIDinMillis();String sPaymentUUID = paymentUUID.toString();batch.withRow(PAYMENTS_CF,

userId

)

.

putColumn

(new Composite(

sPaymentUUID

, "

pvtoken"), paymentInfo.pvToken, null)

.

putColumn

(new Composite(

sPaymentUUID

, "name"),

paymentInfo.name

, null)

.

putColumn

(new Composite(

sPaymentUUID

, "number"),

paymentInfo.number

, null

)Slide58

Code Example

Introduction to Cassandra

Reading Data

Need some logic to handle record boundaries

//

handle the payment info boundaryif (lastSeen != null && !column.getName().

getPaymentUuid

().equals(

lastSeen

)) {

payments.add

(payment);

payment = new

PaymentInfo

();

payment.paymentUUID

=

UUID.fromString

(

column.getName

().

paymentUuid

);

}

lastSeen

=

column.getName

().

getPaymentUuid

();Slide59

Code Example

Introduction to Cassandra

A Bit MessySlide60

Code Example

Introduction to Cassandra

CQL3

Need to define a Schema

Cassandra needs it to split up the row for usSlide61

Code Example

Introduction to Cassandra

Schema

create

table paymentinfo_cql ( user text,

paymentid

timeuuid

,

name

text,

number

text,

pvtoken

text,

primary

key (

user,paymentid

)

)

;Slide62

Code Example

Introduction to Cassandra

Inserting Data

insert

into

paymentinfo_cql ( user, paymentid, name, number, pvtoken

) values (

'

%1$s','%2$s','%3$s','%4$s','%5$

s’

)Slide63

Code Example

Introduction to Cassandra

Reading Data

select * from

paymentinfo_cql

where user='%sSlide64

Multi Datacentre Support

Introduction to Cassandra

Cassandra RF=2 (availability),

Solr

RF=1 (offline search)

RFs set per Column Family and per logical datacentreSlide65

Multi Datacentre Support

Introduction to Cassandra

Both DCs participate in same ring

Cassandra walks clockwise as normal to

fulfill

RFsSlide66

Performance Tuning Levers

Introduction to Cassandra

Memory Mapped Files

SSTables

memory mapped

Visible as high virtual memory consumptionRead fastest when working set fits in free RAMSlide67

Performance Tuning Levers

Introduction to Cassandra

Row Cache

Saves locating

SSTables

, seeking, reconciliationOff-heap – IPC marshaling penaltyWhole row in memoryGood for small numbers of hot rows – Gaussian dist.Slide68

Performance Tuning Levers

Introduction to Cassandra

Key Cache

Saves seeking through

SSTables

Beneficial for large SSTables - tiered compactionOn-heapSlide69

Performance Tuning Levers

Introduction to Cassandra

Cache hit-rates exposed over JMXSlide70

Performance Tuning Levers

Introduction to Cassandra

Take care using memory that might be stolen from the read path (

VirtMem

)Slide71

DataStax

EnterpriseIntroduction to Cassandra

Solr

4.0 Integration

Near-

realtime indexingColumns are available to Solr to indexIndexes maintained in original file formatSupports distributed searchUse Cassandra API or Solr

APISlide72

DataStax

EnterpriseIntroduction to Cassandra

Hadoop

Integration

DataStax

impements the HDFS on Cassandra – CFSUse H* or C* APINo ETLMap operations are sent to replicasReduce back to the task ownerSlide73

Virtual Nodes

Introduction to Cassandra

Problem #1: Adding New NodesSlide74

Virtual Nodes

Introduction to Cassandra

Wish to add node

Ring already loaded

Minimise

streaming caused by movesCould put it in between 2 existing nodes – only helps a small range (this sucks)Slide75

Virtual Nodes

Introduction to Cassandra

Double size of ring

Minimise

streaming caused by moves

Don’t want to have to buy 2 x servers each time (also sucks)Slide76

Virtual Nodes

Introduction to Cassandra

Choose to rebalance the ring

Load already warranted expansion

Now adding streaming loadSlide77

Virtual Nodes

Introduction to Cassandra

Problem #2: Replacing Failed NodesSlide78

Virtual Nodes

Introduction to Cassandra

Node fails

Remaining replica heats upSlide79

Virtual Nodes

Introduction to Cassandra

Bootstrap another

Now node 20 starts streaming

=> FIRE!Slide80

Virtual Nodes

Introduction to Cassandra

The SolutionSlide81

Virtual Nodes

Introduction to Cassandra

Slice each node into 256 token rangesSlide82

Virtual Nodes

Introduction to Cassandra

Randomly distribute tokens to other nodesSlide83

Virtual Nodes

Introduction to Cassandra

Each

colour

represents a node

Each node owns an even, random distribution of the ringSlide84

Virtual Nodes

Introduction to Cassandra

Replacing a node

Can stream from every nodeSlide85

Nodetool

& Opscenter

Introduction to Cassandra

Do stuff with your deployment

watch “

nodetool ring”Useful overview of the ring – tokens, healthOpscenterSlide86

Aims

Introduction to Cassandra

By the end of today you should know:

How Cassandra

organises

dataHow to configure replicasHow to choose between consistency and availabilityHow to efficiently model data for both reads and writesYou need to consider Active-Active scenarios Who to ask to help you & sign off on your data modelHINT: Ask Neil directly or email harch@expedia.com.Slide87

Code Example

Introduction to Cassandra

Questions

htraining.s3.amazonaws.com/

cassandra-training.pptx