/
Scale-out databases What is all about? Scale-out databases What is all about?

Scale-out databases What is all about? - PowerPoint Presentation

broadcastworld
broadcastworld . @broadcastworld
Follow
344 views
Uploaded On 2020-06-26

Scale-out databases What is all about? - PPT Presentation

Databases that has to scale single instance cannot handle the workload Alternative to RDBMS Should provide comparable level of a service What is a database At least semistructured data model ID: 787937

scale hbase cassandra hdfs hbase scale hdfs cassandra ends data writes cases popular top candidate oracle features communities sql

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Scale-out databases What is all about?" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Scale-out databases

Slide2

What is all about?

Databases that has to scale – single instance cannot handle the workload

Alternative to RDBMS

Should provide comparable level of a service

What is a database?

At least semi-structured data model

Defined R/W interfaces

R/W changes supported

Changes are durable – all successful writes should survive machine failure (at least one)

ACID?

Optimized data access paths (buffer caches, indexes,

etc

)

Workload characteristics

High throughput

Big volume

Low latency access (reads and writes)

Slide3

Current u

se cases

NXCALS

Already using HBASE

Atlas

EventIndex

All the data in HDFS, partial in Oracle and

HBase

(competing solutions)

They expressed willingness of using Apache Kudu

WinCC

OA

Running on Oracle, having partial offline copy in HDFS (Parquet)

They expressed willingness of using Apache Kudu

Slide4

Past use cases

Castor Cockpit - monitoring

They were using

HBase

The current state is unknown

openstack

ceilometer

Was using

HBase

, moved to something else due to not satisfactory performance …

Slide5

Potential new use cases

LSA (LHC Software Architecture) controls project

Using Oracle today

Would like to have something faster… willing to test

HBase

/Cassandra or anything we suggest

Critical for LHC operations (setting up beams)

Alice Namespace

using their own MySQL installation

moving towards

ScillaDB

(C++ Cassandra implementation)

they would use a central service…

Scale-out time series

dbs

Apparently there is a room for that

ITMonit

would try it if such thing exist

Others may come…depending on the success of the platform

Atlas

Rucio

, CMS

Phedex

….

Slide6

The goal

We need to pick

the one solution

There are many technical solution to provide scale-out

dbs

/SQL

Challenges

Has to be production ready

Has to last for at least next 10 years

Scale-out without extra license

Bare-metal vs VMs (vs Containers?)

Replaceable back-ends?

Consolidate communities

Provide an official central service and support

Slide7

Candidate back-ends (1)

HBase

(KV

db

on top of HDFS)

Already used at CERN by NXCALS and AEI

Requires

HDFS..still

can run on VMs

Popular and used by many communities world-wide

Mature, with a lot of features

Has multiple open sourced extensions (products on top):

RDBMS with ACID transactions (Phoenix)

Time series (

OpenTSDB

)

Graph

dbs

Slide8

Candidate back-ends (2)

Cassandra (KV store)

Similar to

HBase

but runs on its own (not HDFS required)

No single source of truth elements – eventually can scale-out more than

Hbase

In terms of computing resources needed it fits quite well to VM-based deployments

Popular and used by many communities world-wide

Mature, with a lot of features

Limited expertise at CERN

ScillaDB

(C++ implementation of Cassandra)

There is a commercial company behind offering some features only in paid version – this can be a trap

No experience

Slide9

Candidate back-ends (3)

Kudu (table store)

Tested by AEI and

WinCCOA

with satisfactory results

Stand alone - does not need HDFS, however works best with distributed frameworks like Spark or Impala

Quite unique (comparing to

HBase

and Cassandra) due to columnar data organization and C++ implementation – this enables it for high throughput analytics

Young and not much

popular (yet?)

Slide10

Scale-out SQL instead of scale-out databases

SQL abstraction on top of anonymous backend?

Storage back-end can change interface will stay

For reads and writes

Possible implementations

Presto

Hive

SparkSQL

with shared context

Impala