DriveScale Software Defined Infrastructure for Hadoop and Big Data Presentation Overview Target Users Use Cases Deployment Scenarios Reference Accounts 2 2017 DriveScale Inc All Rights Reserved ID: 673124
Download Presentation The PPT/PDF document "DriveScale Use Cases April 24, 2017" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
DriveScale Use Cases
April 24, 2017
DriveScale
Software Defined Infrastructure for Hadoop and Big DataSlide2
Presentation Overview
Target Users
Use Cases / Deployment Scenarios
Reference Accounts
2
©2017 DriveScale Inc. All Rights Reserved.Slide3
DriveScale Target Customers
3
Big Data Applications – Hadoop, Spark, Cassandra, NoSQL,
etc
On-Premise Applications – Private and Hybrid Public Cloud
Concerned about infrastructure costs & wasted spend
Application profiles are dynamic – where infrastructure requirements are hard to predict and always changing
Approaching Power / Rack Space Limits
Want to share infrastructure across multiple applications
Want to buy storage and compute independently (storage or compute bound)Slide4
DriveScale
Target Customers – Common questions/statements
4
How can I buy compute separate from storage?
I just want to buy more compute, or
I just want to buy more storage
I want to virtualize
Hadoop
like AWS does…
How can I increase my utilization?
I want to share storage among server nodes with an NAS (i.e. Isilon)I need to lower my infrastructure costsCompared to status quo… Compared to NAS… Compared to Cloud…Slide5
Benefits of Software Defined Infrastructure to Big Data Operators
5
#
Benefit
Details
1
Lower Capital Costs
Reduced server costs (don’t have to buy disks)
Buy Less Storage (higher utilization)
Less Rack space needed (more dense CPU and Storage)
Lower Disk cost (3.5” disks cost less than 2.5”)
2
Lower Operational Cost
Add storage without physical labor
Replace failed drives without labor
Less equipment = reduced power
3
Speed up Big Data Deployments
Faster Time to Value
Create new clusters and nodes in minutes instead of weeks
Integrated with Cloudera Director and Horton CloudBreakShare resources among multiple applications and clustersSlide6
SDI Revolutionizes Big Data Storage
Storage Type
DAS
(Direct Attached Storage)
DriveScale
SW Defined
Infrastructure
Centralized
storage
NAS or SAN
Comments
Cost
5-10x
Buy disks, instead of proprietary ‘appliances’, which are 5x to 10x the cost. Don’t waste money on storage features Hadoop doesn’t need (
dedup
, RAID, erasure encoding, etc.)
Performance
1/2 - 1/4
Give Hadoop nodes direct access to rack local disks, not shared or centralized file systems with limited IO Bandwidth
Utilization
30-50%
Buy only the disks you need, pooled local to the rack. This allows better storage utilization and re-balancing disks across nodes in a cluster gives better CPU utilization, putting more servers to work.
Adaptability
(ability to change node storage)
none
Re-define your server and storage infrastructure as application needs change. Scalabilityanti-hadoopGive nodes direct access to their own disks. Don’t share file systems (“nothing shared”).
3
1
3
3
1
1
3
3
1
3
2
3
3
1
3
Good
1
Poor
2
Fair
6
2Slide7
DriveScale Vs
Isilon Enterprise NAS
7
Category
Isilon
DriveScale
Price
$0.50 - $1.00 per GB
1/5
$0.10 - $0.15 per GB
Storage Performance
Peak
Bandwidth:
2.4
GB/sec
per
Isilon
Node
4x
Peak Bandwidth:
10GB/sec
per DSA
Hadoop/HDFS version compatibility
1-1.5 year delay
from latest Hadoop (Isilon FS needs to continuously be made compatible)Always current (sits below HDFS layer)Storage Scale288 Compute Nodes(144 Isilon Nodes)4000+ Compute Nodes. Drive Density6 – 9 Drives per RU12 – 18 Drives per RUNetwork ComplexityTwo Networks (Ethernet and Infiniband)One Network(Ethernet)Compute/Storage Ratio
2:1 (2 Compute nodes per Isilon node)
16:1
(16
Compute nodes per DSA)
Minimum config
3
Isilon
Nodes
1 JBOD/DSASlide8
Hadoop Storage Needs vs Various Storage Types
8
Hadoop storage need ->
Storage Type
Data Locality
Converged
compute & storage
Replica-
tion
Extreme Read BW
Low Cost
Commodity
Total
Examples
DAS &
DriveScale
SDI
✔
✔
✔
✔
✔
Dell, HP (
nonApollo
), Cisco,
SuperMicro, etc.NAS - Enterprise✖✖✔✖✖Isilon, Netapp, GlusterNAS - HPC✖✖✖✔✖
Lustre
, GPFS
SAN/Block - External
✖
✖
✔
✖
?
ScaleIO
,
Ceph
,
Datera
, Cinder, AWS EBS
SAN/Block -
Hyperconverged
✖
✔
✔
✖
?
Nutanix
,
ScaleIO
, Robin
Object
✖
✖
✔
✖
✔
AWS S3,
Scality
, Swift, EMC ECS
✔
✖
?
Optimal
SubOptimal
DebatableSlide9
1RU diskless Server
Benefits of Software Defined Infrastructure
$ Lower Server Costs - Example
9
with
DriveScale
With DAS
9
2RU Server with DAS
(Direct Attached Storage)
$3,065
$1,142
$12,832
System/CPU/
MoBo
/NICs
RAM
DISK
$2,063
$1,246
$172
$17,039
$3,481
Save $13,558 per server node.
x 1000 nodes = $13M
TOTAL
TOTAL
System/CPU/
MoBo
/NICs
RAMDISK (1TB for OS)
79.5% savings
++Slide10
Benefits of Software Defined Infrastructure
$ Lower Disk Costs - Example
10
2.5” drive DAS
10
with
DriveScale
With DAS
10
$802.27 per 2.5” drive = $0.45/GB
Save $385 per drive.
Save $192,000 per
PetaByte
$417.62 per 3.5” drive = $0.22/GB
3.5” drive in JBOD
48% savingsSlide11
Per Rack Cost; Commodity Server MFG with and without
DriveScale (higher drive utilization)
11
$346,410
Day 1 Cost
(only 2 JBODs)
$198,504
2 x
48 port 10GbE switches (ON-4940S)
20 x
R730 2RU Servers w/20
2/5” 1.2TB drives each
2 x
48 port 10GbE switches (ON-4940S)
20 x
R430 1RU Servers
2 x
JBOD w/60
3.5” 2.0TB drives each
2 x
DriveScale
Adaptor
with
DriveScale
withOUT
DriveScale240 Terabytes StorageOpen RackSpace for more servers43% SavingsSlide12
Per Rack Cost; Commodity Server MFG with and without
DriveScale (equal storage)
12
2 x
48 port 10GbE switches (ON-4940S)
20 x
R730 2RU Servers w/20
2/5” 1.2TB drives each
2 x
48 port 10GbE switches (ON-4940S)
20 x
R430 1RU Servers
4 x
JBOD w/60
3.5” 2.0TB drives each
4 x
DriveScale
Adaptor
with
DriveScale
withOUT
DriveScale
35% Savings
$692,820
4-year Cost(server refresh)$452,208Slide13
DriveScale
Deployment ScenariosPreserve your existing investment
13
Greenfield
Start off on the right foot with
DriveScale
Software Defined Infrastructure
Existing Hadoop – Need more storage on nodes (
i.e
storage bound)
Just add a JBOD and a DSA, and add disks to existing servers Existing Hadoop – Rebalance Storage between Nodes and Clusters Just add a JBOD and a DSA, pull unused disks from DAS servers to JBOD, and redistribute storage and nodes into new clusters. Existing Hadoop – Need more compute nodes (i.e. compute bound) Add a compute server, JBOD, and DSA. And you can add more nodes to your cluster without buying more DAS. Existing Hadoop – Moving from Public to Private or Hybrid Cloud Cloudera Director and
HortonWorks CloudBreak deployed you to the cloud? They are both integrated with DriveScale to deploy your private cloud the same way. You can even have a single cluster span your private and public clouds (i.e. hybrid).Slide14
©2017 DriveScale Inc. All Rights Reserved.
14
DriveScale
Reference CustomersSlide15
DriveScale
Target Customers
15
Big Data Applications – Hadoop, Spark, Cassandra, NoSQL,
etc
On-Premise Applications – Private and Hybrid Public Cloud
Concerned about infrastructure costs & wasted spend
Application profiles are dynamic – where infrastructure requirements are hard to predict and always changing
Approaching Power / Rack Space Limits
Want to share infrastructure across multiple applications
Want to buy storage and compute independently (storage or compute bound)Slide16
DriveScale
Target Customers – Common questions/statements
16
How can I buy compute separate from storage?
I just want to buy more compute, or
I just want to buy more storage
I want to virtualize
Hadoop
like AWS does…
How can I increase my utilization?
I want to share storage among server nodes with an NAS (i.e. Isilon)I need to lower my infrastructure costsCompared to status quo… Compared to NAS… Compared to Cloud…Slide17
DriveScale
Customer Reference -
Clearsense
17
Technology company which transforms data from electronic medical records (EMRs), supporting source systems, & legacy databases to enable new revenue streams for healthcare
Real-time analytics solution ingests many kinds of data from existing EMRs & data warehouses, and puts them into one platform to serve up analytics
After 18 months in production at AWS, they were unhappy with responsiveness and costs, and decided to build a Private Cloud
Hadoop responsiveness is something that they tout to their customers, and this technology supports that via infrastructure flexibility
DriveScale
Confidential Information © 2016Slide18
DriveScale
Customer Reference -
Clearsense
18
“
DriveScale
helps us build a ‘future-proof’ infrastructure. As our customer needs increase, we can now respond more quickly and without having to make massive capital expenditures.”
“We spent a lot of time engineering our software environment around real-time needs, ensuring a productive, scalable and economically viable environment. Now, with
DriveScale
, we have the ability to do the same thing with hardware,”
Charles Boicey, Chief Innovation Officer at
ClearSense
DriveScale
Confidential Information © 2016Slide19
DriveScale
Customer Case Study:
AppNexus
19
Global technology company whose private cloud-based software platform enables and optimizes programmatic online advertising
Serves 10s of billions ad buys per day; revenue > $2B annually
2200
Nodes split between Hadoop &
Hbase
Their customers are asking for more
vcore and memory, but not more storage – they wanted to scale compute independent of storageIssues we solve:
Highly Compute-bound environment – purchase compute-only nodesWant to reduce Operational impact of refreshing their servers
Rack / Power constraints also an issue for them
DriveScale Confidential Information © 2016Slide20
Timothy Smith, SVP of Technical Operations at
AppNexus
20
DriveScale Confidential Information © 2016
"
DriveScale
understood one of our core requirements, namely the desire to manage CPU and storage resources as separate pools.”
“Storage and server technology upgrades move on two different timetables. Without
DriveScale
, we are forced into storage refresh cycles that aren't strictly necessary, and are very cumbersome. Separating storage from compute allows us to upgrade or reallocate compute resources independent of storage. “
“
DriveScale
significantly decreases the operations workload, as a result increasing the velocity of delivering new products and features to our customer base.
DriveScale will also help us reduce wasted resources trapped in siloed clusters and thus contribute directly to our bottom line."Slide21
Cloudera Do’s and Don’t of building private Infrastructure…
21
Avoid
Isilon
for large (>100 servers) Hadoop infrastructure
DriveScale
behaves just like Locally Attached StorageSlide22
DriveScale
(DS) vs HPE BDRA
22
TCO for
DriveScale
is approximately 38% cheaper just for HPE hardware costs
Additional Cloudera/Hortonworks licenses for each of the data nodes in BDRA
Apollo 4200 which serves as the data node has a single controller and hence becomes a single point of failure (SPOF) for BDRA
- If a node fails the all the 28 drives needs to be replicated on the other storage nodes and drives. Ex: If the drives are 6TB each then, the network is flooded with 168 TB of data at once.
- The network is flooded with the data from the 28 drives to be replicated and the ongoing jobs will be delayed and are affected with the data replication
- Spare Apollo 4200 data nodes might be required based on the amount of data in each data node.
- With DS we have 4 path to each drive in the JBOD with dual IO controllers. - With DS we can sustain with 1 JBOD IO controller, 1 switch and 1DSA failures.
40Gb switches are required for BDRA. 10Gb switches for DS.OS maintenance on data nodes: If there an OS update required, then the entire node with 28 drives needs to be put under maintenance mode and that would also trigger replication of data across the other available storage nodes
Firmware upgrades of storage nodes poses a similar issue as above
Every read/write in a BDRA cluster is over the network
One of the reasons that traditional Hadoop deployments stripe data across a large number of nodes with a few disks each, is that the ‘failure locality’ is more manageable. If a large amount of data is stored on a single node, the penalty for failure is tremendous which is a problem for BDRA.
DriveScale
Confidential Information © 2017Slide23
©2017 DriveScale Inc. All Rights Reserved.
23
SummarySlide24
DriveScale
Summary
24
First Enterprise-Class Architecture designed for Scale Out Infrastructure
Eliminate Overprovisioning
Lower TCO
Improve Agility to workload changes
Integrated with Cloudera Director
DriveScale Confidential Information © 2017Slide25
DriveScale’s Core Value Propositions
25
Save Money
Lower Server and
Storage
CapEx
Reduce Time to Value of your Big Data initiative
Improve Utilization
3x better utilization of hardware resources
Modify Infrastructure on demand to respond to changing workloads
Multiple workloads on the same infrastructure
Simplify Everything
Commodity HW and fewer composable elements
Software controlled Infra
One-click deploymentSlide26
©2017 DriveScale Inc. All Rights Reserved.
26
Questions and AnswersSlide27
Frequently Asked Questions
Q1) Can I continue to buy servers and storage from my preferred vendors?
A1) Yes.
Drivescale
does not sell server or storage, or JBODS. But all the big vendors do. The only HW you will need from
DriveScale
is DSA (
DriveScale
Adaptor), to convert the JBOD SAS interface to Ethernet. 4 DSA (1RU chassis full) per high performance JBOD.
Q2) Does
DriveScale SDI Disaggregation work for more than just Hadoop?A2) Yes. It will work for any application. However, there may be performance implications for _non_ Big Data applications that use “small” blocks.Q3) What is the minimum storage required in “diskless” compute servers?A3) Any drive, as small as possible, just for the Operating System. Could be USB or small FLASH. All data stored on JBOD disks.
Q4) Does DriveScale
support SSD in JBODs?A4) Yes. The JBOD can have many types of heterogenous storage, as long as SAS compatible: HDD, High or low RPM, High or low Capacity, SSD, etc.
Q5) Is there a network impact of accessing disks over iscsi/Ethernet vs direct attached SAS?
A5) Minimal to zero impact; HDFS storage access peaks during copy of data from one node to another. This requires one node to read its disk (NIC RX port activity from disk) and transfer the data to another node (NIC TX port activity to other node), the receiving node does the inverse. In both nodes, there is an increase in network traffic to read or write the disk, but this increased traffic is on the opposite side of the bidirectional link used for internode traffic, so there is no performance impact.
TestDFSIO
performance testing results confirm this answer.Slide28
Thank You
©2017 DriveScale Inc. All Rights Reserved.