/
DriveScale End User Sales Presentation DriveScale End User Sales Presentation

DriveScale End User Sales Presentation - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
342 views
Uploaded On 2019-11-08

DriveScale End User Sales Presentation - PPT Presentation

DriveScale End User Sales Presentation April 24 2017 DriveScale Software Defined Infrastructure for Hadoop and Big Data Presentation Overview Introduction to DriveScale Corporate Facts The Market Problem ID: 764610

storage drivescale nodes data drivescale storage data nodes server infrastructure node compute hadoop servers software cluster disks jbod drives

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "DriveScale End User Sales Presentation" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

DriveScale End User Sales Presentation April 24, 2017 DriveScale Software Defined Infrastructure for Hadoop and Big Data

Presentation Overview Introduction to DriveScale / Corporate Facts The Market Problem How DriveScale Solves the ProblemSDI = Software Defined Infrastructure for Big DataBenefits of Software Defined InfrastructureSolution Overview & ArchitectureReference AccountsPartnersCompetitorsSummaryFAQ 2 ©2017 DriveScale Inc. All Rights Reserved.

Example Meeting Agenda 3 ©2017 DriveScale Inc. All Rights Reserved. Presenter RoleTopic Time John Doe Jane X Customer - VP Infrastructure VAR – Account Manager Meeting Kickoff 10 Ryan Shorter Jeff Chesson Howard Doherty DriveScale – Director of Sales Eastern US DriveScale – Director of Sales Central US DriveScale – VP Sales DriveScale Exec Summary and Key Benefits 20 Chris Munford Salah Chaou DriveScale - VP Field Operations DriveScale – Principal Solution Architecut DriveScale Solution Overview 20 All Q&A 10

Executive Summary: DriveScale 4 DriveScale is Software Defined Infrastructure for Hadoop and Big Data E.g. Spark, Cassandra, NO SQL, and other webscale appsenables the most efficient & agile infrastructure for Private & Hybrid CloudsEconomical– Save up to 60% vs DAS, Centralized NAS/SAN, or Public Cloud. DriveScale brings HyperScale Hadoop Architecture to EnterpriseEfficient – Get more compute and storage from your HW investment. Up to 3x improvement by pooling resources across clusters Easy – Uses same servers & drives; No changes to SW stack; Managed by Cloudera Director and HWX Cloudbreak ; Improved performance vs alternatives – maintains data locality

About DriveScale Over the last 15 years, a new computing approach has emerged inside Internet scale companies such as Google and Facebook to handle more information and data than ever before. DriveScale is in the business of productizing this modern computing architecture, called Software Defined Infrastructure (SDI), and delivering to ALL enterprises. DriveScale SDI is disrupter to traditional ways of deploying compute and storage infrastructure by disaggregating and composing compute and storage through software, enabling a modern architecture that breaks through workload limitations without compromise. Our unique SDI solution is complementary to your current datacenter infrastructure and delivers efficiency and an easy experience in a cost effective manner. You no longer need to move to a costly public Cloud environment to enjoy efficiency and scale. We build on your IT investments to deliver lower costs, and higher efficiency and flexibility without compromising security, performance or reliability in a private cloud environment. Our solution is enterprise-ready today, and was built by renowned enterprise technologists from Sun, Cisco and HPE .

Software Defined Infrastructure for Hadoop and Big Data Founded 2013, Sunnyvale, CA Founders: Satya Nishtala and Tom Lyon, Architects of Sun Workgroup Servers and Desktop, Cisco UCS and inventor of IP Switching Pelion Venture Partners, Foxconn, Nautilus Venture Partners Product Launched on May 19, 2016 DriveScale Corporate Facts

Gene Banman, CEO CEO: ClearPower, Zero Motorcycles, NetContinuum VP/GM Desktops, Sun Micro President of Sun Japan Satya Nishtala, CTO Fellow, System Architect: Nuova (Cisco UCS) DE, Lead architect: Sun - UltraSPARC workstations and work-group servers, Storage Tom Lyon, Chief Scientist Founder: Nuova (Cisco UCS), Founder: Ipsilon (Nokia) Emp #8: Sun Micro – SunOS, SPARC, SunScreen Duane Northcutt, VP Eng. CTO: Technicolor, Trident, Silicon Image VP Tech Kealia ; DE and inventor of SunRay , Sun Micro PhD CS, CMU F O U N D E R S F O U N D E R S 7 Executive Team Howard Doherty, VP Sales VP Sales: Arkeia Software (WD), Symphoniq , NetContinuum Sales Mgr : Netscape, Cisco S.K. Vinod , VP Product Management Co-founder: Xsigo Systems Sun Micro, Product Management Xerox PARC, imaging system startup venture

Advisors 8 Amr Awadallah Founder/CTO Cloudera James Gosling Creator of Java Scott McNealy Founder – CEO Sun Microsystems Ameet Patel CTO JP Morgan, Morgan Labs Phil Roussey Co Founder and Former EVP Bell Micro

©2017 DriveScale Inc. All Rights Reserved. 9 The Big Data Market Problem

10 The Big Data Market Problem: How to deploy in a private or hybrid cloud efficiently?

How to right-size your Big Data Infrastructure? 11 How Many Clusters? How Many Server Nodes in each Cluster? For each Server Node: How much CPU/cores? Memory? Storage/Drives? How can you change the cluster balance as workloads change?

The Problem: You must choose infrastructure FIRST 12 Infrastructure Application These decisions will determine the success (and cost) of your Big Data plans … before your applications are written (or deployed or scaled or evolved)

The default approach… 13 Choose a “one size fits all” Server Node and buy many of them => 2U servers: The minivans of computing – versatile, but not the best at anything = = $20,000

But, One Size Does _not_ Fit All… 14 Each type of cluster “wants” a different amount of disk per server Hadoop Data Lake Dev/TestHbaseKafkaCassandra… Fixed silos per cluster type lead to madnessNo resource sharingNo elasticityToo many server types / SKUs

You want Choice AND Adaptability CPU/RAM per rack unit STORAGE 500GB 8 PB 1/8GB 4/256GB sportscar minivan cargo truck cargo plane moped 15

The RIGHT Solution is Software Defined Infrastructure 16 Dynamically change your infrastructure to match your application workflow needs # of clusters # of server nodes per cluster # CPUs per server amount of RAM per server # disks per server DriveScale SDI allows you to Right-Size your Big Data Infrastructure “You can be wrong and still be right”

©2017 DriveScale Inc. All Rights Reserved. 17 DriveScale Software Defined Infrastructure Solution Overview

Software Defined Infrastructure for Big Data 18 1) Server Disaggregation , separate servers into “compute” servers = diskless compute servers JBODs (just a bunch of drives) = ‘compute-less’ storage servers2) Logical Server Composition Join “Compute” with “Storage” to create “logical” servers & clusters over Ethernet/IP top of rack switch dynamically change as needs change Requires 2 actions:

19 Step #1: Disaggregate the Servers Stop buying 2RU Servers with storage (NOTE: you can still use your existing servers) 2) Instead, start to buy (a) high density commodity compute servers and (b) JBODs for drive storage You can now buy compute when you need compute, and storage when you need storage. JBOD (Just A Bunch of Drives) 12 RU 8 RU 5 RU 2 RU 2 RU 2 RU 2 RU 2 RU 2 RU 1 RU 1 RU 1 RU

20 Step #2: Compose Logical Server Nodes and Clusters Physical JBODs (Just a Bunch of Disks) Physical Compute Servers “Logical” Server Nodes and Clusters Right-sized for Big Data / Hadoop Node 1 Node 2 Node 3 Node 4 Node 1 Node 2 Node 5 Cluster 1 - Performance - Spark Node 3 Cluster 2 – Storage – Data Lake

21 Adapt over time as workloads change Physical JBODs (Just a Bunch of Disks) Physical Compute Servers “Logical” Server Nodes and Clusters Right-sized for Big Data / Hadoop Node 1 Node 2 Node 3 Node 4 Node 1 Node 2 Node 5 Cluster 1 - Performance - Spark Node 3 Cluster 2 – Storage – Data Lake --- Add or Remove Nodes --- --- Return Storage to pool --- Add Storage to nodes --- ---

Shown another way… 22 Create a Cluster X Create another Cluster Y Create a third Cluster Z Expand Cluster X

23 DriveScale scales to 1000s of nodes+ … keep JBOD storage local in each rack DriveScale Adapter DriveScale Adapter Cluster 1 “Balanced ” Cluster 2 Storage focus Cluster 3 Compute focus compute JBOD storage DriveScale Adapter DriveScale Adapter DriveScale Adapter DriveScale Adapter DriveScale Adapter DriveScale Adapter DriveScale Adapter DriveScale Adapter DriveScale Adapter DriveScale Adapter

You just built a private cloud 24 Your Infrastructure is Highly Elastic Now you have a Private Cloud All hardware resources are shared DriveScale makes your private data center operate like a public cloud for Hadoop

SDI Revolutionized Hadoop Storage Storage Type DAS (Direct Attached Storage) DriveScale SW Defined Infrastructure Centralized storage NAS or SAN Comments Cost 5-10x Buy disks, instead of proprietary ‘appliances’, which are 5x to 10x the cost. Don’t waste money on storage features Hadoop doesn’t need ( dedup , RAID, erasure encoding, etc.) Performance 1/2 - 1/4 Give Hadoop nodes direct access to rack local disks, not shared or centralized file systems with limited IO Bandwidth Utilization 30-50% Buy only the disks you need, pooled local to the rack. This allows better storage utilization and re-balancing disks across nodes in a cluster gives better CPU utilization, putting more servers to work. Adaptability (ability to change node storage) none Re-define your server and storage infrastructure as application needs change. Scalability anti- hadoop Give nodes direct access to their own disks. Don’t share file systems (“nothing shared”). 3 1 3 3 1 1 3 3 1 3 2 3 3 1 3 Good 1 Poor 2 Fair 25 2

©2017 DriveScale Inc. All Rights Reserved. 26 Benefits of Software Defined Infrastructure

DriveScale Target Customers 27 Big Data Applications – Hadoop, Spark, Cassandra, NoSQL, etc On-Premise Applications – Private and Hybrid Public Cloud Concerned about infrastructure costs & wasted spend Application profiles are dynamic – where infrastructure requirements are hard to predict and always changing Approaching Power / Rack Space Limits Want to share infrastructure across multiple applications Want to buy storage and compute independently (storage or compute bound)

DriveScale Target Customers – Common questions/statements 28 How can I buy compute separate from storage? I just want to buy more compute, or I just want to buy more storageI want to virtualize Hadooplike AWS does… How can I increase my utilization?I want to share storage among server nodes with an NAS (i.e. Isilon ) I need to lower my infrastructure costs Compared to status quo… Compared to NAS… Compared to Cloud…

Benefits of Software Defined Infrastructure to Hadoop Operators 29 # Benefit Details 1 Lower Capital Costs Reduced server costs (don’t have to buy disks) Buy Less Storage (higher utilization) Less Rack space needed (more dense CPU and Storage) Lower Disk cost (3.5” disks cost less than 2.5”) 2 Lower Operational Cost Add storage without physical labor Replace failed drives without labor Less equipment = reduced power 3 Speed up Big Data Deployments Faster Time to Value Create new clusters and nodes in minutes instead of weeks Integrated with Cloudera Director and Horton CloudBreak Share resources among multiple applications and clusters

1RU diskless Server Benefits of Software Defined Infrastructure $ Lower Server Costs - Example 30 with DriveScale before 30 2RU Server with DAS (Direct Attached Storage) $3,065 $1,142 $12,832 System/CPU/ MoBo /NICs RAM DISK $2,063 $1,246 $172 $17,039 $3,481 Save $13,558 per server node. x 1000 nodes = $13M TOTAL TOTAL System/CPU/ MoBo /NICs RAM DISK (1TB for OS) 79.5% savings + +

Benefits of Software Defined Infrastructure $ Lower Disk Costs - Example 31 2.5” drive DAS 31 with DriveScale before 31 $802.27 per 2.5” drive = $0.45/GB Save $385 per drive. Save $192,000 per PetaByte $417.62 per 3.5” drive = $0.22/GB 3.5” drive in JBOD 48% savings

Per Rack Cost; Commodity Server MFG with and without DriveScale (higher drive utilization) 32 $346,410 Day 1 Cost (only 2 JBODs) $198,504 2 x 48 port 10GbE switches (ON-4940S) 20 x R730 2RU Servers w/20 2/5” 1.2TB drives each 2 x 48 port 10GbE switches (ON-4940S) 20 x R430 1RU Servers 2 x JBOD w/60 3.5” 2.0TB drives each 2 x DriveScale Adaptor with DriveScale withOUT DriveScale 240 Terabytes Storage Open RackSpace for more servers 43% Savings

Per Rack Cost; Commodity Server MFG with and without DriveScale (equal storage) 33 2 x 48 port 10GbE switches (ON-4940S) 20 x R730 2RU Servers w/20 2/5” 1.2TB drives each 2 x 48 port 10GbE switches (ON-4940S) 20 x R430 1RU Servers 4 x JBOD w/60 3.5” 2.0TB drives each 4 x DriveScale Adaptor with DriveScale withOUT DriveScale 35% Savings $692,820 4-year Cost (server refresh) $452,208

DriveScale Deployment ScenariosPreserve your existing investment 34 Greenfield Start off on the right foot with DriveScale Software Defined Infrastructure Existing Hadoop – Need more storage on nodes (i.e storage bound) Just add a JBOD and a DSA, and add disks to existing servers Existing Hadoop – Rebalance Storage between Nodes and Clusters Just add a JBOD and a DSA, pull unused disks from DAS servers to JBOD, and redistribute storage and nodes into new clusters. Existing Hadoop – Need more compute nodes (i.e. compute bound) Add a compute server, JBOD, and DSA. And you can add more nodes to your cluster without buying more DAS. Existing Hadoop – Moving from Public to Private or Hybrid Cloud Cloudera Director and HortonWorks CloudBreak deployed you to the cloud? They are both integrated with DriveScale to deploy your private cloud the same way. You can even have a single cluster span your private and public clouds (i.e. hybrid).

©2017 DriveScale Inc. All Rights Reserved. 35 DriveScale Solution Components

DriveScale Components shown in typical rack deployment 36 Top of Rack Switches: 64-128 port 10GbE ( Cisco, Arista, HPE, Dell, Quanta, etc ) Out-of-band Management Switch: 1GbE Node Server Compute Pool Rack Servers: 2U, 1U, 1/2U or 1/4U Dell, HPE, Cisco, SuperMicro , Quanta, Foxconn , etc DriveScale Adapter Ethernet to SAS Bridge 1 per JBOD Storage Pool – JBOD (Just a Bunch of Drives) Dell, HPE, SuperMicro , Sanmina, Quanta, Foxconn , etc Cloud Software Hardware DSC DriveScale Central 1 per customerCustomer SupportRemote upgrade Remote licensingDMSDriveScale Management System1-3 per customerLinux RPMRuns on VMsInventory Cluster configNode configDSNDriveScale Server Node Agent1 on each nodeInventory discovery

The DriveScale System 37 DriveScale Adapter (DSA) DriveScale Management System (DMS) DriveScale Cloud Central (DSC) Highly Automated Infrastructure Provisioning and Management DriveScale Server Node (DSN) DriveScale Agent DriveScale Agent

DriveScale Software Management Architecture 38 The 4 principal Components: 1. DriveScale Management Server (DMS)Data repository consists of: Inventory (DMS’s, DS Adapters, Switches, JBOD Chassis, Disks, Server Nodes) Configuration (Node Templates, Cluster Templates, Configured Clusters) Typical deployment consists of 3 DMS Systems DMS Database is used as a message bus to communicate with the end points 2. DriveScale Adapter (DSA) DSA Agent Discovery provides inventory for hardware Creates mappings for Server Nodes to consume disks 3. DriveScale Server Node (DSN) DS Server Agent provides inventory for server hardware Consumes mapped disks via DSA 4. DriveScale Central (DSC) Cloud-based Portal where DriveScale repos are stored for software distribution to subscribers .

The DriveScale Adapter 39 2x 10GbE Interfaces per Adapter 2x 12Gb 4 Lane SAS Interfaces per Adapter Enables SAS connected drives to be mounted over Ethernet 4 DriveScale Ethernet to SAS Adapters in 1u Chassis With 80 Gb throughput, a single chassis can comfortably support simultaneous access to 80 drives w/ equivalent performance to Direct Attached Storage Dual Redundant Power Supplies

PoC Minimal HW Requirements (with resiliency) 40 1x DSA chassis (includes 4x DS Adapters) 1x JBOD to be loaded with the 60 drives 3 (minimum) x Servers (Data Nodes) with 1 direct attached drive each (4-12 drives will be remotely attached via DSA) 1x Server (Name Node) with 1 direct attached drive1x Server (can be VM) for DMS and Ambari2x 10GE switches for Data FlowManagement Switch (1GE) DriveScale Proprietary Information © 2017

©2017 DriveScale Inc. All Rights Reserved. 41 DriveScale Reference Customers

DriveScale Target Customers 42 Big Data Applications – Hadoop, Spark, Cassandra, NoSQL, etc On-Premise Applications – Private and Hybrid Public CloudConcerned about infrastructure costs & wasted spend Application profiles are dynamic – where infrastructure requirements are hard to predict and always changing Approaching Power / Rack Space Limits Want to share infrastructure across multiple applications Want to buy storage and compute independently (storage or compute bound)

DriveScale Target Customers – Common questions/statements 43 How can I buy compute separate from storage? I just want to buy more compute, or I just want to buy more storageI want to virtualize Hadooplike AWS does… How can I increase my utilization?I want to share storage among server nodes with an NAS (i.e. Isilon ) I need to lower my infrastructure costs Compared to status quo… Compared to NAS… Compared to Cloud…

DriveScale Customer Reference - Clearsense 44 Technology company which transforms data from electronic medical records (EMRs), supporting source systems, & legacy databases to enable new revenue streams for healthcare Real-time analytics solution ingests many kinds of data from existing EMRs & data warehouses, and puts them into one platform to serve up analytics After 18 months in production at AWS, they were unhappy with responsiveness and costs, and decided to build a Private Cloud Hadoop responsiveness is something that they tout to their customers, and this technology supports that via infrastructure flexibility DriveScale Confidential Information © 2016

DriveScale Customer Reference - Clearsense 45 “ DriveScale helps us build a ‘future-proof’ infrastructure. As our customer needs increase, we can now respond more quickly and without having to make massive capital expenditures.” “We spent a lot of time engineering our software environment around real-time needs, ensuring a productive, scalable and economically viable environment. Now, with DriveScale , we have the ability to do the same thing with hardware,” Charles Boicey, Chief Innovation Officer at ClearSense DriveScale Confidential Information © 2016

DriveScale Customer Case Study: AppNexus 46 Global technology company whose private cloud-based software platform enables and optimizes programmatic online advertising Serves 10s of billions ad buys per day; revenue > $2B annually 2200 Nodes split between Hadoop & Hbase Their customers are asking for more vcore and memory, but not more storage – they wanted to scale compute independent of storage Issues we solve: Highly Compute-bound environment – purchase compute-only nodes Want to reduce Operational impact of refreshing their servers Rack / Power constraints also an issue for them DriveScale Confidential Information © 2016

Timothy Smith, SVP of Technical Operations at AppNexus 47 DriveScale Confidential Information © 2016 " DriveScale understood one of our core requirements, namely the desire to manage CPU and storage resources as separate pools.” “Storage and server technology upgrades move on two different timetables. Without DriveScale , we are forced into storage refresh cycles that aren't strictly necessary, and are very cumbersome. Separating storage from compute allows us to upgrade or reallocate compute resources independent of storage. “ “ DriveScale significantly decreases the operations workload, as a result increasing the velocity of delivering new products and features to our customer base. DriveScale will also help us reduce wasted resources trapped in siloed clusters and thus contribute directly to our bottom line."

©2017 DriveScale Inc. All Rights Reserved. 48 DriveScale Partners

DriveScale Partners 49 DriveScale Confidential Information © 2016 Resellers Technology

©2017 DriveScale Inc. All Rights Reserved. 50 Competing Approaches

How Hadoop is different from other enterprise workloads Hadoop architecture has evolved out of the need to transact and analyze extremely large volumes of data at scale in institutions like Google, Yahoo and Facebook Traditional Enterprise architecture solutions could not satisfy these needs Inability to meet scalability requirements Included redundant software stack which was unsuitable for these applications ExpensiveHadoop architecture is based on networked commodity servers with Direct-Attached StorageCommodity storage is less than 1/10th the cost of shared enterprise storageProven scalability to 1000s of nodes and 100s of PBs of storageCompute task is sent to the server where the data resides (data locality)Resiliency is managed by the software Hadoop includes YARN which creates containers to run jobs (achieves a higher level of utilization than VM’s) 51 ©2017 DriveScale Inc. All Rights Reserved.

Cost is DriveScale’s advantage over other storage solutions 52 When a disk drive is located in a JBOD, customer only pays for the price of the drive itself When the same disk drive is located in a storage appliance or server, customer pays for the drive AND all the software and storage services that are built into the appliance The average cost of disk drive storage is $0.08/GB This goes up to $0.50 to $1.00 /GB when the same drive is in a storage appliance (EMC Isilon, Datera, Nutanix, etc)

Hadoop Storage Options Storage Type DAS (Direct Attached Storage) DriveScale SW Defined Infrastructure Centralized storage NAS or SAN Comments Cost 5-10x Buy disks, instead of proprietary ‘appliances’, which are 5x to 10x the cost. Don’t waste money on storage features Hadoop doesn’t need ( dedup , RAID, erasure encoding, etc.) Performance 1/2 - 1/4 Give Hadoop nodes direct access to rack local disks, not shared or centralized file systems with limited IO Bandwidth Utilization 30-50% Buy only the disks you need, pooled local to the rack. This allows better storage utilization and re-balancing disks across nodes in a cluster gives better CPU utilization, putting more servers to work. Adaptability (ability to change node storage) none Re-define your server and storage infrastructure as application needs change. Scalability anti- hadoop Give nodes direct access to their own disks. Don’t share file systems (“nothing shared”). 3 1 3 3 1 1 3 3 1 3 2 3 3 1 3 Good 1 Poor 2 Fair 53 2

Hadoop Storage Needs vs Various Storage Types 54 Hadoop storage need -> Storage Type Data Locality Converged compute & storage Replica- tion Extreme Read BW Low Cost Commodity Total Examples DAS & DriveScale SDI ✔ ✔ ✔ ✔ ✔ Dell, HP ( nonApollo ), Cisco, SuperMicro , etc. NAS - Enterprise ✖ ✖ ✔ ✖ ✖ Isilon , Netapp , Gluster NAS - HPC ✖ ✖ ✖ ✔ ✖ Lustre , GPFS SAN/Block - External ✖ ✖ ✔ ✖ ? ScaleIO , Ceph , Datera , Cinder, AWS EBS SAN/Block - Hyperconverged ✖ ✔ ✔ ✖ ? Nutanix , ScaleIO , Robin Object ✖ ✖ ✔ ✖ ✔ AWS S3, Scality , Swift, EMC ECS ✔ ✖ ? Optimal SubOptimal Debatable

DriveScale Vs Isilon Enterprise NAS 55 Category Isilon DriveScale Price $0.50 - $1.00 per GB 1/5 $0.10 - $0.15 per GB Storage Performance Peak Bandwidth: 2.4 GB/sec per Isilon Node 4x Peak Bandwidth: 10GB/sec per DSA Hadoop/HDFS version compatibility 1-1.5 year delay from latest Hadoop ( Isilon FS needs to continuously be made compatible) Always current (sits below HDFS layer) Storage Scale 288 Compute Nodes (144 Isilon Nodes) 4000+ Compute Nodes. Drive Density 6 – 9 Drives per RU 12 – 18 Drives per RU Network Complexity Two Networks (Ethernet and Infiniband ) One Network (Ethernet) Compute/Storage Ratio 2:1 (2 Compute nodes per Isilon node) 16:1 (16 Compute nodes per DSA) Minimum config 3 Isilon Nodes 1 JBOD/DSA

Cloudera Do’s and Don’t of building private Infrastructure… 56 Avoid Isilon for large (>100 servers) Hadoop infrastructure DriveScale behaves just like Locally Attached Storage

DriveScale (DS) vs HPE BDRA 57 TCO for DriveScale is approximately 38% cheaper just for HPE hardware costs Additional Cloudera/Hortonworks licenses for each of the data nodes in BDRAApollo 4200 which serves as the data node has a single controller and hence becomes a single point of failure (SPOF) for BDRA - If a node fails the all the 28 drives needs to be replicated on the other storage nodes and drives. Ex: If the drives are 6TB each then, the network is flooded with 168 TB of data at once. - The network is flooded with the data from the 28 drives to be replicated and the ongoing jobs will be delayed and are affected with the data replication - Spare Apollo 4200 data nodes might be required based on the amount of data in each data node. - With DS we have 4 path to each drive in the JBOD with dual IO controllers. - With DS we can sustain with 1 JBOD IO controller, 1 switch and 1DSA failures. 40Gb switches are required for BDRA. 10Gb switches for DS. OS maintenance on data nodes: If there an OS update required, then the entire node with 28 drives needs to be put under maintenance mode and that would also trigger replication of data across the other available storage nodes Firmware upgrades of storage nodes poses a similar issue as above Every read/write in a BDRA cluster is over the network One of the reasons that traditional Hadoop deployments stripe data across a large number of nodes with a few disks each, is that the ‘failure locality’ is more manageable. If a large amount of data is stored on a single node, the penalty for failure is tremendous which is a problem for BDRA. DriveScale Confidential Information © 2017

©2017 DriveScale Inc. All Rights Reserved. 58 Summary

DriveScale Summary 59 First Enterprise-Class Architecture designed for Scale Out Infrastructure Eliminate Overprovisioning Lower TCO Improve Agility to workload changes Integrated with Cloudera Director DriveScale Confidential Information © 2017

DriveScale’s Core Value Propositions 60 Save Money Lower Server and Storage CapEx Reduce Time to Value of your Big Data initiative Improve Utilization 3x better utilization of hardware resources Modify Infrastructure on demand to respond to changing workloads Multiple workloads on the same infrastructure Simplify Everything Commodity HW and fewer composable elements Software controlled Infra One-click deployment

©2017 DriveScale Inc. All Rights Reserved. 61 Questions and Answers

Frequently Asked Questions Q1) Can I continue to buy servers and storage from my preferred vendors? A1) Yes. Drivescale does not sell server or storage, or JBODS. But all the big vendors do. The only HW you will need from DriveScale is DSA (DriveScale Adaptor), to convert the JBOD SAS interface to Ethernet. 4 DSA (1RU chassis full) per high performance JBOD.Q2) Does DriveScale SDI Disaggregation work for more than just Hadoop? A2) Yes. It will work for any application. However, there may be performance implications for _non_ Big Data applications that use “small” blocks. Q3) What is the minimum storage required in “diskless” compute servers? A3) Any drive, as small as possible, just for the Operating System. Could be USB or small FLASH. All data stored on JBOD disks. Q4) Does DriveScale support SSD in JBODs? A4) Yes. The JBOD can have many types of heterogenous storage, as long as SAS compatible: HDD, High or low RPM, High or low Capacity, SSD, etc. Q5) Is there a network impact of accessing disks over iscsi /Ethernet vs direct attached SAS? A5) Minimal to zero impact; HDFS storage access peaks during copy of data from one node to another. This requires one node to read its disk (NIC RX port activity from disk) and transfer the data to another node (NIC TX port activity to other node), the receiving node does the inverse. In both nodes, there is an increase in network traffic to read or write the disk, but this increased traffic is on the opposite side of the bidirectional link used for internode traffic, so there is no performance impact. TestDFSIO performance testing results confirm this answer.

Thank You ©2017 DriveScale Inc. All Rights Reserved.