Databases that has to scale single instance cannot handle the workload Alternative to RDBMS Should provide comparable level of a service What is a database At least semistructured data model ID: 787937
Download The PPT/PDF document "Scale-out databases What is all about?" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Scale-out databases
Slide2What is all about?
Databases that has to scale – single instance cannot handle the workload
Alternative to RDBMS
Should provide comparable level of a service
What is a database?
At least semi-structured data model
Defined R/W interfaces
R/W changes supported
Changes are durable – all successful writes should survive machine failure (at least one)
ACID?
Optimized data access paths (buffer caches, indexes,
etc
)
Workload characteristics
High throughput
Big volume
Low latency access (reads and writes)
Slide3Current u
se cases
NXCALS
Already using HBASE
Atlas
EventIndex
All the data in HDFS, partial in Oracle and
HBase
(competing solutions)
They expressed willingness of using Apache Kudu
WinCC
OA
Running on Oracle, having partial offline copy in HDFS (Parquet)
They expressed willingness of using Apache Kudu
Slide4Past use cases
Castor Cockpit - monitoring
They were using
HBase
The current state is unknown
openstack
ceilometer
Was using
HBase
, moved to something else due to not satisfactory performance …
Slide5Potential new use cases
LSA (LHC Software Architecture) controls project
Using Oracle today
Would like to have something faster… willing to test
HBase
/Cassandra or anything we suggest
Critical for LHC operations (setting up beams)
Alice Namespace
using their own MySQL installation
moving towards
ScillaDB
(C++ Cassandra implementation)
they would use a central service…
Scale-out time series
dbs
Apparently there is a room for that
ITMonit
would try it if such thing exist
Others may come…depending on the success of the platform
Atlas
Rucio
, CMS
Phedex
….
Slide6The goal
We need to pick
the one solution
There are many technical solution to provide scale-out
dbs
/SQL
Challenges
Has to be production ready
Has to last for at least next 10 years
Scale-out without extra license
Bare-metal vs VMs (vs Containers?)
Replaceable back-ends?
Consolidate communities
Provide an official central service and support
Slide7Candidate back-ends (1)
HBase
(KV
db
on top of HDFS)
Already used at CERN by NXCALS and AEI
Requires
HDFS..still
can run on VMs
Popular and used by many communities world-wide
Mature, with a lot of features
Has multiple open sourced extensions (products on top):
RDBMS with ACID transactions (Phoenix)
Time series (
OpenTSDB
)
Graph
dbs
…
Slide8Candidate back-ends (2)
Cassandra (KV store)
Similar to
HBase
but runs on its own (not HDFS required)
No single source of truth elements – eventually can scale-out more than
Hbase
In terms of computing resources needed it fits quite well to VM-based deployments
Popular and used by many communities world-wide
Mature, with a lot of features
Limited expertise at CERN
ScillaDB
(C++ implementation of Cassandra)
There is a commercial company behind offering some features only in paid version – this can be a trap
No experience
Slide9Candidate back-ends (3)
Kudu (table store)
Tested by AEI and
WinCCOA
with satisfactory results
Stand alone - does not need HDFS, however works best with distributed frameworks like Spark or Impala
Quite unique (comparing to
HBase
and Cassandra) due to columnar data organization and C++ implementation – this enables it for high throughput analytics
Young and not much
popular (yet?)
Slide10Scale-out SQL instead of scale-out databases
SQL abstraction on top of anonymous backend?
Storage back-end can change interface will stay
For reads and writes
Possible implementations
Presto
Hive
SparkSQL
with shared context
Impala