HDB High Availability with l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Page 1 Overview What is Cassandra C Who is using C CQL C architecture Request Coordination Consistency Monitoring tool ID: 763388
Download Presentation The PPT/PDF document "HDB++: High Availability with l TANGO Me..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
HDB++: High Availability with l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Page 1
Overview What is Cassandra (C*)? Who is using C*? CQL C* architectureRequest CoordinationConsistencyMonitoring toolHDB++ Page 2 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Overview What is Cassandra (C*)? Who is using C *? CQL C* architecture Request Coordination ConsistencyMonitoring tool HDB++ Page 3 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra? Mythology: an excellent Oracle not believed. A massively scalable open source NoSQL (Not Only SQL) databaseCreated by FacebookOpen Source since 2008Apache license, 2.0, compatible with GPLV3 Page 4 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra? Peer to peer architecture No Single Point of Failure Replication Continuous AvailabilityMulti Data Centers support100s to 1000s nodesJavaHigh Write Throughput Read efficiency Page 5 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
What is Cassandra? Page 6 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Source: http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
Overview What is Cassandra (C*)? Who is using C *? CQL C* architecture Request Coordination ConsistencyMonitoring tool HDB++ Page 7 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Who is using Cassandra? Page 8l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Overview What is Cassandra (C*)? Who is using C *? CQL C* architectureRequest CoordinationConsistency Monitoring tool HDB++ Page 9 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Cassandra Query Language CQL : Cassandra Query Language Very similar to SQLBut restrictions and limitationsJOIN requests are forbiddenNo subqueries String comparisons are limited (when not using SOLR) select * from my_table where mystring like ‘%tango %’ No OR operator Can only apply a WHERE condition on an indexed column (or primary key ) Page 10 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Cassandra Query Language Collections (64K Limitation): list set m apTTL INSERT = UPDATE (UPSERT) Doc: http://www.datastax.com/documentation/cql/3.1/cql/cql_intro_c.html cqlsh Page 11 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Cassandra Query Language CREATE TABLE IF NOT EXISTS att_scalar_devdouble_ro ( att_conf_id timeuuid, period text, data_time timestamp, data_time_us int , value_r double, quality int, error_desc text, PRIMARY KEY (( att_conf_id ,period), data_time,data_time_us ) ) WITH comment='Scalar DevDouble ReadOnly Values Table‘; Page 12 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Cassandra Query Language CREATE TABLE IF NOT EXISTS att_scalar_devdouble_ro ( att_conf_id timeuuid, period text , data_time timestamp , data_time_us int , value_r double , quality int , error_desc text , PRIMARY KEY ( ( att_conf_id ,period) , data_time,data_time_us ) ); Page 13 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Partition key Clustering columns
Overview What is Cassandra (C*)? Who is using C *? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 14 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Cassandra Architecture Node : one Cassandra instance (Java process) Page 15 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 5 Node 6 Node 3 Node 4 Node 7 Node 8 Token Range +2 63 -1 -2 63
Cassandra Architecture Partition : ordered and replicable unit of data on a node identified by a token Partitioner (based on mumur3 algorithm by default) will distribute the data across the nodes.Page 16l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 5 Node 6 Node 3 Node 4 Node 7 Node 8 Token Range + 2 63 -1 -2 63
Cassandra Architecture Page 17l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Rack : logical set of nodes Rack 1 Rack 2 Rack 4 Rack 3 Node 1 Node 5 Node 7 Node 3 Node 2 Node 6 Node 4 Node 8 Token Range -2 63 + 2 63 -1
Cassandra Architecture Page 18l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Data Center : logical set of racks Rack 1 Rack 2 Rack 4 Rack 3 Node 1 Node 5 Node 7 Node 3 Node 2 Node 6 Node 4 Node 7 Data Center 1 Data Center 2 Token Range + 2 63 -1 -2 63
Request Coordination Page 19 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Cluster : full set of nodes which maps to a single complete token ring Rack 1 Rack 2 Rack 4 Rack 3 Node 1 Node 5 Node 7 Node 3 Node 2 Node 6 Node 4 Node 7 Data Center 1 Data Center 2 Cassandra Cluster Token Range + 2 63 -1 -2 63
Overview What is Cassandra (C*)? Who is using C *? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 20 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Request Coordination Page 21 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Coordinator : the node chosen by the client to receive a particular read or write request to its cluster Data Center 1 Node 1 Node 2 Node 4 Node 3 Client
Request Coordination Page 22 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Coordinator : the node chosen by the client to receive a particular read or write request to its cluster Data Center 1 Node 1 Node 2 Node 4 Node 3 Client Coordinator
Request Coordination Page 23 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Coordinator : the node chosen by the client to receive a particular read or write request to its cluster Data Center 1 Node 1 Node 2 Node 4 Node 3 Client Read/Write Coordinator
Request Coordination Page 24 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Any node can coordinate any request Each client request may be coordinated by a different node Data Center 1 Node 1 Node 2 Node 4 Node 3 Client Acknowledge Coordinator No Single Point of Failure
Request Coordination Page 25 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg The Cassandra driver chooses the coordinator node Round-Robin pattern, token-aware pattern Client library to manage requests Many open source drivers for many programming languages Node 1 Node 2 Node 4 Node 3 Client Coordinator Driver Java Python C++ C# Node.js PHP Perl Go Clojure Haskell R (GNU S) Ruby Scala Erlang ODBC Rust
Request Coordination Page 26 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg The coordinator manages the replication process Replication Factor (RF) : onto how many nodes should a write be copied The write will occur on the nodes responsible for that partition 1 ≤ RF ≤ ( # nodes in cluster) Every write is time-stamped Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3
Request Coordination Page 27 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 The coordinator manages the replication process Replication Factor (RF) : onto how many nodes should a write be copied The write will occur on the nodes responsible for that partition 1 ≤ RF ≤ (#nodes in cluster) Every write is time-stamped
Overview What is Cassandra (C*)? Who is using C *? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 28 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Consistency Page 29 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 The coordinator applies the Consistency Level (CL) Consistency Level (CL) : Number of nodes which must acknowledge a request Examples of CL : ONE TWO THREE ANY ALL QUORUM (= RF/2 + 1) EACH_QUORUM LOCAL_QUORUM CL may vary for each request On success, the coordinator notifies the client (with most recent partition data in case of read request)
Consistency ONE - READ - Single DC Page 30l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Direct Read Request Digest Read Request (Hash) + eventual read repair
Consistency ONE - READ - Single DC Page 31 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Direct Read Request Digest Read Request (Hash) + eventual read repair
Consistency ONE – READ - Single DC Page 32 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Direct Read Request Digest Read Request (Hash) + eventual read repair
Consistency ONE - READ - Single DC Page 33 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Direct Read Request Digest Read Request (Hash) + eventual read repair
Consistency QUORUM – READ - Single DC Page 34 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Direct Read Request Digest Read Request (Hash)
Consistency QUORUM – READ - Single DC Page 35 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Direct Read Request Digest Read Request (Hash)
Consistency QUORUM – READ - Single DC Page 36 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Direct Read Request Digest Read Request (Hash)
Consistency QUORUM – READ - Single DC Page 37 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Direct Read Request Digest Read Request (Hash) In case of inconsistency: the most recent data is returned
Consistency QUORUM – READ - Single DC Page 38 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Direct Read Request Digest Read Request (Hash) Read repair if needed
Consistency ONE – WRITE - Single DC Page 39 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Write Request
Consistency ONE – WRITE - Single DC Page 40 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 ACK ACK
Consistency ONE – WRITE - Single DC Page 41 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Write Request
Consistency ONE – WRITE - Single DC Page 42 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 ACK ACK SUCCESS
Consistency ONE – WRITE - Single DC Page 43 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 ACK ACK hint max_hint_window_in_ms property in cassandra.yaml file Hinted handoff mechanism SUCCESS
Consistency ONE – WRITE - Single DC Page 44 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Write Request hint max_hint_window_in_ms property in cassandra.yaml file Hinted handoff mechanism
Consistency ONE – WRITE - Single DC Page 45 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Write Request hint max_hint_window_in_ms property in cassandra.yaml file Hinted handoff mechanism
Consistency ONE – WRITE - Single DC Page 46 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Hinted handoff mechanism
Consistency Page 47 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 4 i f node downtime > max_hint_window_in_ms Anti-entropy node repair
Consistency QUORUM – WRITE - Single DC Page 48 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Write Request
Consistency QUORUM – WRITE - Single DC Page 49 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 ACK ACK
Consistency QUORUM – WRITE - Single DC Page 50 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 ACK ACK SUCCESS
Consistency QUORUM – WRITE - Single DC Page 51 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Write Request
Consistency QUORUM – WRITE - Single DC Page 52 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 ACK ACK SUCCESS
Consistency QUORUM – WRITE - Single DC Page 53 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 Write Request
Consistency QUORUM – WRITE - Single DC Page 54 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Node 1 Node 2 Node 4 Node 3 Coordinator Client Driver RF=3 Node 5 Node 6 ACK ACK FAILURE
Overview What is Cassandra (C*)? Who is using C *? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 55 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Monitoring tool: OpsCenter Page 56 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg http:// cassandra2:8888
Overview What is Cassandra (C*)? Who is using C *? CQL C* architecture Request Coordination Consistency Monitoring tool HDB++ Page 57 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
HDB++ Page 58 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg h db ++ es-srv h db ++cm- srv l ibhdb ++ l ibhdb ++ cassandra <<implements>> l ibhdb ++ mysql <<implements>> h db ++ es-srv h db ++ es-srv h db ++ es-srv h db ++ es-srv h db ++cm- srv h db ++ es-srv h db ++ es-srv h db ++ es-srv <<use>> <<use>> MySQL Cassandra Cassandra Cassandra
Conclusion: C* pros Page 59 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg High Availaibility SW upgrade with no downtime HW failure Linear Scalability Need more performances? => Add nodes Big community with industrial support Can use Apache Spark for analytics (distributed processing) List, Set, Map data types (tuples and user defined types soon) Tries not to let you do actions which do not perform well Backups = snapshot = hard links => very fast Difficult to lose data Good fit for time series data
Conclusion: C* Cons Page 60l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg Requires more total disk space and machines sstable format can change from one version to another No easy way to come back to a previous version once the sstables have been converted to a newer version Cannot rename keyspaces or tables easily (not foreseen in CQL) Difficult to modify existing partitions (Needs to duplicate the data at some point in the process) Different way of modelling Not designed for huge read requests Can be tricky to tune to avoid long GC pauses Maintenance: Need to run nodetool repair regularly if some data are deleted to avoid resurrections (CPU intensive operation) Can take quite some time to redeem disk space after deletion in some specific cases.
The End
Useful links Page 62 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg http ://cassandra.apache.org Planet Cassandra ( http://planetcassandra.org ) Datastax academy ( https://academy.datastax.com ) Cassandra Java Driver getting started ( https://academy.datastax.com/demos/cassandra-java-driver-getting-started ) Cassandra C++ Driver: https://github.com/datastax/cpp-driver Datastax documentation ( http://www.datastax.com/docs ) Users mailing list: user-subscribe@cassandra.apache.org #Cassandra channel on IRC (http://webchat.freenode.net/?channels=#Cassandra)
Cassandra FUTURE DEPLOYMENT Page 63 l Cassandra HDB++ Implementation Status l 9 th April 2015 l Accelerator Control Unit DC Prod1 partition/hourKeyspace prod RF:3(write LOCAL_QUORUM)7200 RPM Disks Big CPU - 64GB RAM DC Analytics 1 Keyspace prod RF:3 (read LOCAL_QUORUM) Keyspace analytics RF:3 (write LOCAL_QUORUM) SSD Disks Big CPU – 128 GB RAM DC Analytics 2 Keyspace analytics RF:5 (read LOCAL_QUORUM) 7200 RPM Disks Tiny CPU – 32 GB RAM
Cassandra FUTURE DEPLOYMENT Page 64 l Cassandra HDB++ Implementation Status l 9 th April 2015 l Accelerator Control Unit DC Prod1 partition/hourKeyspace prod RF:3(write LOCAL_QUORUM)7200 RPM Disks Big CPU - 64GB RAM DC Analytics 1 Keyspace prod RF:3 (read LOCAL_QUORUM) Keyspace analytics RF:3 (write LOCAL_QUORUM) SSD Disks Big CPU – 128 GB RAM
Cassandra’s node-based architecture Page 65 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Basic Write Path Concept Page 66 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg
Basic READ Path Concept Page 67 l TANGO Meeting l 20 May 2015 l Reynald Bourtembourg