April 24 2017 DriveScale Software Defined Infrastructure for Hadoop and Big Data Presentation Overview The Market Problem How DriveScale Solves the Problem SDI Software Defined Infrastructure for Big Data ID: 672316
Download Presentation The PPT/PDF document "DriveScale End User Sales Presentation" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
DriveScale End User Sales Presentation
April 24, 2017
DriveScale
Software Defined Infrastructure for Hadoop and Big DataSlide2
Presentation Overview
The Market Problem
How
DriveScale
Solves the Problem
SDI = Software Defined Infrastructure for Big DataBenefits of Software Defined InfrastructureSolution Overview & Components
2
©2017 DriveScale Inc. All Rights Reserved.Slide3
Example Meeting Agenda
3
©2017
DriveScale
Inc. All Rights Reserved.
Presenter
Role
Topic
Time
John Doe
Jane X
Customer - VP Infrastructure
VAR – Account Manager
Meeting Kickoff
10
Ryan Shorter
Jeff
Chesson
Howard Doherty
DriveScale
– Director of Sales Eastern US
DriveScale
– Director of Sales Central US
DriveScale
– VP Sales
DriveScale
Exec Summary and Key Benefits
20
Chris Munford
Salah
Chaou
DriveScale
- VP Field Operations
DriveScale
– Principal Solution
Architecut
DriveScale
Solution Overview
20
All
Q&A
10Slide4
Executive Summary:
DriveScale
4
DriveScale is
Software Defined Infrastructure for Hadoop and Big Data
E.g. Spark, Cassandra, NO SQL and other
webscale apps
enables the most efficient & agile infrastructure for Private & Hybrid Clouds
Economical
– Save up to 60% vs DAS, Centralized NAS/SAN, or Public Cloud.
DriveScale
brings
HyperScale
Hadoop Architecture to Enterprise
Efficient
– Get more compute and storage from your HW investment. Up to 3x improvement by pooling resources across clusters
Easy
– Uses same servers & drives; No changes to SW stack; Managed by Cloudera Director and HWX
Cloudbreak
; Improved performance vs alternatives – maintains data localitySlide5
©2017 DriveScale Inc. All Rights Reserved.
5
The Big Data Market ProblemSlide6
6
The
Big Data Market Problem:
How to deploy in a private or hybrid cloud efficiently?Slide7
How to right-size your Big Data Infrastructure?
7
How Many Clusters?
How Many Server Nodes in each Cluster?
For each Server Node:
How much CPU/cores?
Memory?
Storage/Drives?
How can you change the cluster
balance as workloads change?Slide8
The Problem:
You must choose infrastructure FIRST
8
Infrastructure
Application
These decisions will determine the success (and cost) of your Big Data plans
… before your applications are written (or deployed or scaled or evolved)Slide9
The
default approach…
9
Choose a “one size fits all” Server Node
and buy many of them =>
2U servers: The minivans of computing – versatile, but not the best at anything
=
=
$20,000Slide10
But, One Size Does
_not_ Fit All…
10
Each type of cluster “wants” a different amount of disk per server
Hadoop Data Lake
Dev
/TestHbaseKafkaCassandra
…
Fixed silos per cluster type lead to madness
No resource sharing
No elasticity
Too many server types / SKUsSlide11
You want Choice AND Adaptability
CPU/RAM per rack unit
STORAGE
500GB
8 PB
1/8GB
4/256GB
sportscar
minivan
cargo truck
cargo plane
moped
11Slide12
The RIGHT Solution is Software Defined Infrastructure
12
Dynamically change your infrastructure to match your application workflow needs
# of clusters
# of server nodes per cluster
# CPUs per server
amount of RAM per server # disks per server
DriveScale
SDI allows you to
Right-Size your Big Data Infrastructure
“You can be wrong and still be right”Slide13
©2017 DriveScale Inc. All Rights Reserved.
13
DriveScale Software Defined Infrastructure
Solution OverviewSlide14
Software Defined Infrastructure for Big Data
14
1) Server
Disaggregation
, separate servers into
“compute” servers = diskless compute servers
JBODs (just a bunch of drives) = ‘compute-less’ storage servers
2) Logical Server
Composition
Join “Compute” with “Storage” to create “logical” servers & clusters
over Ethernet/IP top of rack switch
dynamically change as needs change
Requires 2 actions:Slide15
15
Step #1: Disaggregate the Servers
Stop buying 2RU Servers with storage
(NOTE: you can still use your existing servers)
2) Instead, start to buy (a) high density commodity compute servers and (b) JBODs for drive storage
You can now buy compute when you need compute, and storage when you need storage.
JBOD
(Just A Bunch of Drives)
12 RU
8 RU
5 RU
2 RU
2 RU
2 RU
2 RU
2 RU
2 RU
1 RU
1 RU
1 RUSlide16
16
Step #2: Compose Logical Server Nodes and Clusters
Physical JBODs
(Just a Bunch of Disks)
Physical Compute Servers
“Logical” Server Nodes and Clusters
Right-sized for Big Data / Hadoop
Node 1
Node 2
Node 3
Node 4
Node 1
Node 2
Node 5
Cluster 1 - Performance - Spark
Node 3
Cluster 2 – Storage – Data LakeSlide17
17
Adapt over time as workloads change
Physical JBODs
(Just a Bunch of Disks)
Physical Compute Servers
“Logical” Server Nodes and Clusters
Right-sized for Big Data / Hadoop
Node 1
Node 2
Node 3
Node 4
Node 1
Node 2
Node 5
Cluster 1 - Performance - Spark
Node 3
Cluster 2 – Storage – Data Lake
--- Add or Remove Nodes ---
--- Return Storage to pool ---
Add
Storage
to nodes
--- ---Slide18
Shown another way…
18
Create a Cluster X
Create another Cluster Y
Create a third Cluster Z
Expand Cluster XSlide19
19
DriveScale
scales to 1000s of nodes+
… keep JBOD storage local in each rack
DriveScale
Adapter
DriveScale
Adapter
Cluster 1
“Balanced
”
Cluster 2
Storage focus
Cluster 3
Compute focus
compute
JBOD
storage
DriveScale
Adapter
DriveScale
Adapter
DriveScale
Adapter
DriveScale
Adapter
DriveScale
Adapter
DriveScale Adapter
DriveScale
Adapter
DriveScale Adapter
DriveScale
Adapter
DriveScale AdapterSlide20
You just built a private cloud
20
Your Infrastructure is Highly Elastic
Now you have a Private Cloud
All hardware resources are shared
DriveScale
makes your private data center operate like a public cloud for HadoopSlide21
Hadoop Storage Options
Storage Type
DAS
(Direct Attached Storage)
DriveScale
SW Defined
Infrastructure
Centralized
storage
NAS or SAN
Comments
Cost
5-10x
Buy disks, instead of proprietary ‘appliances’, which are 5x to 10x the cost. Don’t waste money on storage features Hadoop doesn’t need (
dedup
, RAID, erasure encoding, etc.)
Performance
1/2 - 1/4
Give Hadoop nodes direct access to rack local disks, not shared or centralized file systems with limited IO Bandwidth
Utilization
30-50%
Buy only the disks you need, pooled local to the rack. This allows better storage utilization and re-balancing disks across nodes in a cluster gives better CPU utilization, putting more servers to work.
Adaptability
(ability to change node storage)
none
Re-define your server and storage infrastructure as application needs change.
Scalability
anti-
hadoop
Give nodes direct access to their own disks. Don’t share file systems (“nothing shared”).
3
1
3
3
1
1
3
3
1
3
2
3
3
1
3
Good
1
Poor
2
Fair
21
2Slide22
©2017 DriveScale Inc. All Rights Reserved.
22
Benefits of Software Defined InfrastructureSlide23
DriveScale Target Customers
23
Big Data Applications – Hadoop, Spark, Cassandra, NoSQL,
etc
On-Premise Applications – Private and Hybrid Public Cloud
Concerned about infrastructure costs & wasted spend
Application profiles are dynamic – where infrastructure requirements are hard to predict and always changing
Approaching Power / Rack Space Limits
Want to share infrastructure across multiple applications
Want to buy storage and compute independently (storage or compute bound)Slide24
DriveScale
Target Customers – Common questions/statements
24
How can I buy compute separate from storage?
I just want to buy more compute, or
I just want to buy more storage
I want to virtualize
Hadoop
like AWS does…
How can I increase my utilization?
I want to share storage among server nodes with an NAS
(i.e.
Isilon
)
I need to lower my infrastructure costs
Compared to
status quo… Compared to NAS… Compared to Cloud…Slide25
Benefits of Software Defined Infrastructure to Hadoop Operators
25
#
Benefit
Details
1
Lower Capital Costs
Reduced server costs (don’t have to buy disks)
Buy Less Storage (higher utilization)
Less Rack space needed (more dense CPU and Storage)
Lower Disk cost (3.5” disks cost less than 2.5”)
2
Lower Operational Cost
Add storage without physical labor
Replace failed drives without labor
Less equipment = reduced power
3
Speed up Big Data Deployments
Faster Time to Value
Create new clusters and nodes in minutes instead of weeks
Integrated with Cloudera Director and Horton
CloudBreak
Share resources among multiple applications and clustersSlide26
1RU diskless Server
Benefits of Software Defined Infrastructure
$ Lower Server Costs - Example
26
with
DriveScale
before
26
2RU Server with DAS
(Direct Attached Storage)
$3,065
$1,142
$12,832
System/CPU/
MoBo
/NICs
RAM
DISK
$2,063
$1,246
$172
$17,039
$3,481
Save $13,558 per server node.
x 1000 nodes = $13M
TOTAL
TOTAL
System/CPU/
MoBo
/NICs
RAM
DISK (1TB for OS)
79.5% savings
+
+Slide27
Benefits of Software Defined Infrastructure
$ Lower Disk Costs - Example
27
2.5” drive DAS
27
with
DriveScale
before
27
$802.27 per 2.5” drive = $0.45/GB
Save $385 per drive.
Save $192,000 per
PetaByte
$417.62 per 3.5” drive = $0.22/GB
3.5” drive in JBOD
48% savingsSlide28
Per Rack Cost; Commodity Server MFG with and without
DriveScale (higher drive utilization)
28
$346,410
Day 1 Cost
(only 2 JBODs)
$198,504
2 x
48 port 10GbE switches (ON-4940S)
20 x
R730 2RU Servers w/20
2/5” 1.2TB drives each
2 x
48 port 10GbE switches (ON-4940S)
20 x
R430 1RU Servers
2 x
JBOD w/60
3.5” 2.0TB drives each
2 x
DriveScale
Adaptor
with
DriveScale
withOUT
DriveScale
240 Terabytes Storage
Open
RackSpace
for more servers
43% SavingsSlide29
Per Rack Cost; Commodity Server MFG with and without
DriveScale (equal storage)
29
2 x
48 port 10GbE switches (ON-4940S)
20 x
R730 2RU Servers w/20
2/5” 1.2TB drives each
2 x
48 port 10GbE switches (ON-4940S)
20 x
R430 1RU Servers
4 x
JBOD w/60
3.5” 2.0TB drives each
4 x
DriveScale
Adaptor
with
DriveScale
withOUT
DriveScale
35% Savings
$692,820
4-year Cost
(server refresh)
$452,208Slide30
DriveScale
Deployment ScenariosPreserve your existing investment
30
Greenfield
Start off on the right foot with
DriveScale
Software Defined Infrastructure
Existing Hadoop – Need more storage on nodes (
i.e
storage bound)
Just add a JBOD and a DSA, and add disks to existing servers
Existing Hadoop – Rebalance Storage between Nodes and Clusters
Just add a JBOD and a DSA, pull unused disks from DAS servers to JBOD, and redistribute storage and nodes into new clusters.
Existing Hadoop – Need more compute nodes (i.e. compute bound)
Add a compute server, JBOD, and DSA. And you can add more nodes to your cluster without buying more DAS.
Existing Hadoop – Moving from Public to Private or Hybrid Cloud
Cloudera Director and
HortonWorks
CloudBreak
deployed you to the cloud? They are both integrated with
DriveScale
to deploy your private cloud the same way. You can even have a single cluster span your private and public clouds (i.e. hybrid).Slide31
©2017 DriveScale Inc. All Rights Reserved.
31
DriveScale Solution ComponentsSlide32
DriveScale
Components shown in typical rack deployment
32
Top of Rack Switches:
64-128 port 10GbE
(
Cisco, Arista, HPE, Dell, Quanta,
etc
)
Out-of-band
Management
Switch: 1GbE
Node Server Compute Pool
Rack Servers: 2U, 1U, 1/2U or 1/4U
Dell, HPE, Cisco,
SuperMicro
, Quanta,
Foxconn
,
etc
DriveScale
Adapter
Ethernet to SAS Bridge
1 per JBOD
Storage Pool – JBOD (Just a Bunch of Drives)
Dell, HPE,
SuperMicro
, Sanmina, Quanta,
Foxconn
,
etc
Cloud
Software
Hardware
DSC
DriveScale
Central
1 per customer
Customer Support
Remote upgrade
Remote licensing
DMS
DriveScale
Management
System
1-3 per customer
Linux RPM
Runs on VMs
Inventory
Cluster config
Node config
DSN
DriveScale
Server Node Agent
1 on each node
Inventory discoverySlide33
The DriveScale System
33
DriveScale Adapter (DSA)
DriveScale Management System (DMS)
DriveScale Cloud Central (DSC)
Highly Automated Infrastructure Provisioning and Management
DriveScale Server Node (DSN)
DriveScale Agent
DriveScale AgentSlide34
DriveScale Software Management Architecture
34
The 4 principal Components:
1.
DriveScale
Management Server (DMS)
Data repository consists of:
Inventory (DMS’s, DS Adapters, Switches, JBOD Chassis, Disks, Server Nodes)
Configuration (Node Templates, Cluster Templates, Configured Clusters)
Typical deployment consists of 3 DMS Systems
DMS Database is used as a message bus to communicate with the end points
2.
DriveScale
Adapter (DSA)
DSA Agent Discovery provides inventory for hardware
Creates mappings for Server Nodes to consume disks
3.
DriveScale
Server Node (DSN)
DS Server Agent provides inventory for server hardware
Consumes mapped disks via DSA
4.
DriveScale
Central (DSC)
Cloud-based Portal where
DriveScale
repos are stored for software distribution to subscribers
.Slide35
The
DriveScale Adapter
35
2x 10GbE Interfaces per Adapter
2x 12Gb 4 Lane SAS Interfaces per Adapter
Enables SAS connected drives to be mounted over Ethernet
4 DriveScale Ethernet to SAS Adapters in 1u Chassis
With 80 Gb throughput, a single chassis can comfortably support simultaneous access to 80 drives w/ equivalent performance to Direct Attached Storage
Dual Redundant Power SuppliesSlide36
PoC Minimal HW Requirements (with resiliency)
36
1x DSA chassis (includes 4x DS Adapters)
1x JBOD to be loaded with the 60 drives
3 (minimum) x Servers (Data Nodes) with 1 direct attached drive each (4-12 drives will be remotely attached via DSA)
1x Server (Name Node) with 1 direct attached drive1x Server (can be VM) for DMS and
Ambari2x 10GE switches for Data FlowManagement Switch (1GE)
DriveScale Proprietary Information © 2017