DriveScale Use Cases April 24, 2017 - PowerPoint Presentation

344 views
Uploaded On 2018-09-20

DriveScale Use Cases April 24, 2017 - PPT Presentation

DriveScale Software Defined Infrastructure for Hadoop and Big Data Presentation Overview Target Users Use Cases Deployment Scenarios Reference Accounts 2 2017 DriveScale Inc All Rights Reserved ID: 673124

storage drivescale data nodes drivescale storage nodes data compute infrastructure hadoop server jbod node buy disks drives servers isilon

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/673124" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "DriveScale Use Cases April 24, 2017" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

DriveScale Use Cases

April 24, 2017

DriveScale

Software Defined Infrastructure for Hadoop and Big DataSlide2

Presentation Overview

Target Users

Use Cases / Deployment Scenarios

Reference Accounts

DriveScale Target Customers

Big Data Applications – Hadoop, Spark, Cassandra, NoSQL,

etc

On-Premise Applications – Private and Hybrid Public Cloud

Concerned about infrastructure costs & wasted spend

Application profiles are dynamic – where infrastructure requirements are hard to predict and always changing

Approaching Power / Rack Space Limits

Want to share infrastructure across multiple applications

Want to buy storage and compute independently (storage or compute bound)Slide4

DriveScale

Target Customers – Common questions/statements

How can I buy compute separate from storage?

I just want to buy more compute, or

I just want to buy more storage

I want to virtualize

Hadoop

like AWS does…

How can I increase my utilization?

I want to share storage among server nodes with an NAS (i.e. Isilon)I need to lower my infrastructure costsCompared to status quo… Compared to NAS… Compared to Cloud…Slide5

Benefits of Software Defined Infrastructure to Big Data Operators

Benefit

Details

Lower Capital Costs

Reduced server costs (don’t have to buy disks)

Buy Less Storage (higher utilization)

Less Rack space needed (more dense CPU and Storage)

Lower Disk cost (3.5” disks cost less than 2.5”)

Lower Operational Cost

Add storage without physical labor

Replace failed drives without labor

Less equipment = reduced power

Speed up Big Data Deployments

Faster Time to Value

Create new clusters and nodes in minutes instead of weeks

Integrated with Cloudera Director and Horton CloudBreakShare resources among multiple applications and clustersSlide6

SDI Revolutionizes Big Data Storage

Storage Type

DAS

(Direct Attached Storage)

DriveScale

SW Defined

Infrastructure

Centralized

storage

NAS or SAN

Comments

Cost

5-10x

Buy disks, instead of proprietary ‘appliances’, which are 5x to 10x the cost. Don’t waste money on storage features Hadoop doesn’t need (

dedup

, RAID, erasure encoding, etc.)

Performance

1/2 - 1/4

Give Hadoop nodes direct access to rack local disks, not shared or centralized file systems with limited IO Bandwidth

Utilization

30-50%

Buy only the disks you need, pooled local to the rack. This allows better storage utilization and re-balancing disks across nodes in a cluster gives better CPU utilization, putting more servers to work.

Adaptability

(ability to change node storage)

none

Re-define your server and storage infrastructure as application needs change. Scalabilityanti-hadoopGive nodes direct access to their own disks. Don’t share file systems (“nothing shared”).

Good

Poor

Fair

2Slide7

DriveScale Vs

Isilon Enterprise NAS

Category

Isilon

DriveScale

Price

$0.50 - $1.00 per GB

1/5

$0.10 - $0.15 per GB

Storage Performance

Peak

Bandwidth:

2.4

GB/sec

per

Isilon

Node

Peak Bandwidth:

10GB/sec

per DSA

Hadoop/HDFS version compatibility

1-1.5 year delay

from latest Hadoop (Isilon FS needs to continuously be made compatible)Always current (sits below HDFS layer)Storage Scale288 Compute Nodes(144 Isilon Nodes)4000+ Compute Nodes. Drive Density6 – 9 Drives per RU12 – 18 Drives per RUNetwork ComplexityTwo Networks (Ethernet and Infiniband)One Network(Ethernet)Compute/Storage Ratio

2:1 (2 Compute nodes per Isilon node)

16:1

(16

Compute nodes per DSA)

Minimum config

Isilon

Nodes

1 JBOD/DSASlide8

Hadoop Storage Needs vs Various Storage Types

Hadoop storage need ->

Storage Type

Data Locality

Converged

compute & storage

Replica-

tion

Extreme Read BW

Low Cost

Commodity

Total

Examples

DAS &

DriveScale

SDI

✔

Dell, HP (

nonApollo

), Cisco,

SuperMicro, etc.NAS - Enterprise✖✖✔✖✖Isilon, Netapp, GlusterNAS - HPC✖✖✖✔✖

Lustre

, GPFS

SAN/Block - External

✖

✔

✖

ScaleIO

Ceph

Datera

, Cinder, AWS EBS

SAN/Block -

Hyperconverged

✖

✔

✖

Nutanix

ScaleIO

, Robin

Object

✖

✔

✖

✔

AWS S3,

Scality

, Swift, EMC ECS

✔

✖

Optimal

SubOptimal

DebatableSlide9

1RU diskless Server

Benefits of Software Defined Infrastructure

$ Lower Server Costs - Example

with

DriveScale

With DAS

2RU Server with DAS

(Direct Attached Storage)

$3,065

$1,142

$12,832

System/CPU/

MoBo

/NICs

RAM

DISK

$2,063

$1,246

$172

$17,039

$3,481

Save $13,558 per server node.

x 1000 nodes = $13M

TOTAL

System/CPU/

MoBo

/NICs

RAMDISK (1TB for OS)

79.5% savings

++Slide10

Benefits of Software Defined Infrastructure

$ Lower Disk Costs - Example

2.5” drive DAS

with

DriveScale

With DAS

$802.27 per 2.5” drive = $0.45/GB

Save $385 per drive.

Save $192,000 per

PetaByte

$417.62 per 3.5” drive = $0.22/GB

3.5” drive in JBOD

48% savingsSlide11

Per Rack Cost; Commodity Server MFG with and without

DriveScale (higher drive utilization)

$346,410

Day 1 Cost

(only 2 JBODs)

$198,504

2 x

48 port 10GbE switches (ON-4940S)

20 x

R730 2RU Servers w/20

2/5” 1.2TB drives each

2 x

48 port 10GbE switches (ON-4940S)

20 x

R430 1RU Servers

2 x

JBOD w/60

3.5” 2.0TB drives each

2 x

DriveScale

Adaptor

with

DriveScale

withOUT

DriveScale240 Terabytes StorageOpen RackSpace for more servers43% SavingsSlide12

Per Rack Cost; Commodity Server MFG with and without

DriveScale (equal storage)

2 x

48 port 10GbE switches (ON-4940S)

20 x

R730 2RU Servers w/20

2/5” 1.2TB drives each

2 x

48 port 10GbE switches (ON-4940S)

20 x

R430 1RU Servers

4 x

JBOD w/60

3.5” 2.0TB drives each

4 x

DriveScale

Adaptor

with

DriveScale

withOUT

DriveScale

35% Savings

$692,820

4-year Cost(server refresh)$452,208Slide13

DriveScale

Deployment ScenariosPreserve your existing investment

Greenfield

Start off on the right foot with

DriveScale

Software Defined Infrastructure

Existing Hadoop – Need more storage on nodes (

i.e

storage bound)

Just add a JBOD and a DSA, and add disks to existing servers Existing Hadoop – Rebalance Storage between Nodes and Clusters Just add a JBOD and a DSA, pull unused disks from DAS servers to JBOD, and redistribute storage and nodes into new clusters. Existing Hadoop – Need more compute nodes (i.e. compute bound) Add a compute server, JBOD, and DSA. And you can add more nodes to your cluster without buying more DAS. Existing Hadoop – Moving from Public to Private or Hybrid Cloud Cloudera Director and

HortonWorks CloudBreak deployed you to the cloud? They are both integrated with DriveScale to deploy your private cloud the same way. You can even have a single cluster span your private and public clouds (i.e. hybrid).Slide14

DriveScale

Reference CustomersSlide15

DriveScale

Target Customers

Big Data Applications – Hadoop, Spark, Cassandra, NoSQL,

etc

On-Premise Applications – Private and Hybrid Public Cloud

Concerned about infrastructure costs & wasted spend

Application profiles are dynamic – where infrastructure requirements are hard to predict and always changing

Approaching Power / Rack Space Limits

Want to share infrastructure across multiple applications

Want to buy storage and compute independently (storage or compute bound)Slide16

DriveScale

Target Customers – Common questions/statements

How can I buy compute separate from storage?

I just want to buy more compute, or

I just want to buy more storage

I want to virtualize

Hadoop

like AWS does…

How can I increase my utilization?

I want to share storage among server nodes with an NAS (i.e. Isilon)I need to lower my infrastructure costsCompared to status quo… Compared to NAS… Compared to Cloud…Slide17

DriveScale

Customer Reference -

Clearsense

Technology company which transforms data from electronic medical records (EMRs), supporting source systems, & legacy databases to enable new revenue streams for healthcare

Real-time analytics solution ingests many kinds of data from existing EMRs & data warehouses, and puts them into one platform to serve up analytics

After 18 months in production at AWS, they were unhappy with responsiveness and costs, and decided to build a Private Cloud

Hadoop responsiveness is something that they tout to their customers, and this technology supports that via infrastructure flexibility

DriveScale

Customer Reference -

Clearsense

“

DriveScale

helps us build a ‘future-proof’ infrastructure. As our customer needs increase, we can now respond more quickly and without having to make massive capital expenditures.”

“We spent a lot of time engineering our software environment around real-time needs, ensuring a productive, scalable and economically viable environment. Now, with

DriveScale

, we have the ability to do the same thing with hardware,”

Charles Boicey, Chief Innovation Officer at

ClearSense

DriveScale

Customer Case Study:

AppNexus

Global technology company whose private cloud-based software platform enables and optimizes programmatic online advertising

Serves 10s of billions ad buys per day; revenue > $2B annually

2200

Nodes split between Hadoop &

Hbase

Their customers are asking for more

vcore and memory, but not more storage – they wanted to scale compute independent of storageIssues we solve:

Highly Compute-bound environment – purchase compute-only nodesWant to reduce Operational impact of refreshing their servers

Rack / Power constraints also an issue for them

Timothy Smith, SVP of Technical Operations at

AppNexus

DriveScale

understood one of our core requirements, namely the desire to manage CPU and storage resources as separate pools.”

“Storage and server technology upgrades move on two different timetables. Without

DriveScale

, we are forced into storage refresh cycles that aren't strictly necessary, and are very cumbersome. Separating storage from compute allows us to upgrade or reallocate compute resources independent of storage. “

“

DriveScale

significantly decreases the operations workload, as a result increasing the velocity of delivering new products and features to our customer base.

DriveScale will also help us reduce wasted resources trapped in siloed clusters and thus contribute directly to our bottom line."Slide21

Cloudera Do’s and Don’t of building private Infrastructure…

Avoid

Isilon

for large (>100 servers) Hadoop infrastructure

DriveScale

behaves just like Locally Attached StorageSlide22

DriveScale

(DS) vs HPE BDRA

TCO for

DriveScale

is approximately 38% cheaper just for HPE hardware costs

Additional Cloudera/Hortonworks licenses for each of the data nodes in BDRA

Apollo 4200 which serves as the data node has a single controller and hence becomes a single point of failure (SPOF) for BDRA

- If a node fails the all the 28 drives needs to be replicated on the other storage nodes and drives. Ex: If the drives are 6TB each then, the network is flooded with 168 TB of data at once.

- The network is flooded with the data from the 28 drives to be replicated and the ongoing jobs will be delayed and are affected with the data replication

- Spare Apollo 4200 data nodes might be required based on the amount of data in each data node.

- With DS we have 4 path to each drive in the JBOD with dual IO controllers. - With DS we can sustain with 1 JBOD IO controller, 1 switch and 1DSA failures.

40Gb switches are required for BDRA. 10Gb switches for DS.OS maintenance on data nodes: If there an OS update required, then the entire node with 28 drives needs to be put under maintenance mode and that would also trigger replication of data across the other available storage nodes

Firmware upgrades of storage nodes poses a similar issue as above

Every read/write in a BDRA cluster is over the network

One of the reasons that traditional Hadoop deployments stripe data across a large number of nodes with a few disks each, is that the ‘failure locality’ is more manageable. If a large amount of data is stored on a single node, the penalty for failure is tremendous which is a problem for BDRA.

DriveScale

SummarySlide24

DriveScale

Summary

First Enterprise-Class Architecture designed for Scale Out Infrastructure

Eliminate Overprovisioning

Lower TCO

Improve Agility to workload changes

Integrated with Cloudera Director

DriveScale’s Core Value Propositions

Save Money

Lower Server and

Storage

CapEx

Reduce Time to Value of your Big Data initiative

Improve Utilization

3x better utilization of hardware resources

Modify Infrastructure on demand to respond to changing workloads

Multiple workloads on the same infrastructure

Simplify Everything

Commodity HW and fewer composable elements

Software controlled Infra

One-click deploymentSlide26

Questions and AnswersSlide27

Frequently Asked Questions

Q1) Can I continue to buy servers and storage from my preferred vendors?

A1) Yes.

Drivescale

does not sell server or storage, or JBODS. But all the big vendors do. The only HW you will need from

DriveScale

is DSA (

DriveScale

Adaptor), to convert the JBOD SAS interface to Ethernet. 4 DSA (1RU chassis full) per high performance JBOD.

Q2) Does

DriveScale SDI Disaggregation work for more than just Hadoop?A2) Yes. It will work for any application. However, there may be performance implications for _non_ Big Data applications that use “small” blocks.Q3) What is the minimum storage required in “diskless” compute servers?A3) Any drive, as small as possible, just for the Operating System. Could be USB or small FLASH. All data stored on JBOD disks.

Q4) Does DriveScale

support SSD in JBODs?A4) Yes. The JBOD can have many types of heterogenous storage, as long as SAS compatible: HDD, High or low RPM, High or low Capacity, SSD, etc.

Q5) Is there a network impact of accessing disks over iscsi/Ethernet vs direct attached SAS?

A5) Minimal to zero impact; HDFS storage access peaks during copy of data from one node to another. This requires one node to read its disk (NIC RX port activity from disk) and transfer the data to another node (NIC TX port activity to other node), the receiving node does the inverse. In both nodes, there is an increase in network traffic to read or write the disk, but this increased traffic is on the opposite side of the bidirectional link used for internode traffic, so there is no performance impact.

TestDFSIO

performance testing results confirm this answer.Slide28

Thank You