/
Hortonworks Eric Baldeschwieler – CEO Hortonworks Eric Baldeschwieler – CEO

Hortonworks Eric Baldeschwieler – CEO - PowerPoint Presentation

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
345 views
Uploaded On 2018-12-10

Hortonworks Eric Baldeschwieler – CEO - PPT Presentation

twitter jeric14 hortonworks Hortonworks Inc 2011 Architecting the Future of Big Data June 29 2011 About Hortonworks Mission Revolutionize and commoditize the storage and processing of big data via open source ID: 739647

apache hadoop amp hortonworks hadoop apache hortonworks amp 2011 data yahoo phase storage adoption hdfs technology support hbase release

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Hortonworks Eric Baldeschwieler – CEO" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Hortonworks

Eric Baldeschwieler – CEOtwitter: @jeric14 (@hortonworks)

© Hortonworks Inc. 2011

Architecting the Future of Big Data

June 29, 2011Slide2

About Hortonworks

Mission: Revolutionize and commoditize the storage and processing of big data via open sourceVision: Half of the world’s data will be stored in Apache Hadoop within five years

Strategy: Grow the Apache Hadoop Ecosystem by making Apache Hadoop easier to consume, profit by providing training, support and certification

An independent companyFocused on making Apache Hadoop great

Hold nothing back, Apache Hadoop will be complete

© Hortonworks Inc. 2011

3Slide3

Credentials

Technical: key architects and committers from Yahoo! Hadoop engineering teamHighest concentration of Apache Hadoop committersContributed >70% of the code in Hadoop, Pig and ZooKeeperDelivered every major/stable Apache Hadoop release since 0.1History of driving innovation across entire Apache Hadoop stack

Experience managing world’s largest deploymentBusiness operations:

team of highly successful open source veteransLed by Rob Bearden, former COO of SpringSource & JBoss

Investors:

backed by Benchmark Capital and Yahoo!

Benchmark was key investor in Red Hat, MySQL, SpringSource, Twitter & eBay

© Hortonworks Inc. 2011

4Slide4

Hortonworks and Yahoo!

Yahoo! is a development partnerLeverage large Yahoo! development, testing & operations teamMore than 1,000 active & sophisticated users of Apache Hadoop

Access to the Yahoo! grid for testing large workloadsOnly organization that has delivered a stable release of Apache Hadoop

Yahoo will continue to contribute Apache Hadoop code too!Yahoo

! is a

customer

Hortonworks

provides level 3 support and training to Y

ahoo!

Yahoo deploys Apache Hadoop releases across its 42,000 grid

Yahoo! is an

investor

© Hortonworks Inc. 2011

5Slide5

Current State of Adoption

Vendor Ecosystem Adoption

Enterprise Adoption

Early adopters

Technology is hard to install, manage & use

Technology lacks enterprise robustness

Requires significant investment in technical staff or consulting

Hard to find & hire experienced developer & operations talent

Early in vendor adoption lifecycle

Hadoop is hard to integrate and extend

Hard

to find & hire experienced developer & operations

talent

Technology & Knowledge Gaps Prevent Apache Hadoop from Reaching Full Potential

Customers are asking their vendors for help with Hadoop!

“We’re seeing Hadoop in all of our fortune 2000 data accounts”

© Hortonworks Inc. 2011

6Slide6

Hortonworks Role & Opportunity

Vendor Ecosystem Adoption

Enterprise Adoption

Bridge the Gap!

Grow Market

Sell training and support via Partners

Fundamental shift in enterprise data architecture strategy

Apache Hadoop becomes

standard for managing new

types & scale of data

New

applications & solutions will

be

created to

leverage data in Apache

Hadoop

Creates massive big data technology and services opportunity for ecosystem

© Hortonworks Inc. 2011

7Slide7

Hortonworks Objectives

Make Apache Hadoop projects easier to install, manage & useRegular sustaining releasesCompiled code for each project (e.g. RPMs)Testing at scaleMake Apache Hadoop more robust

Performance gainsHigh availabilityAdministration & monitoring

Make Apache Hadoop easier to integrate & extendOpen APIs for extension & experimentation

All done within Apache Hadoop community

Develop collaboratively with community

Complete transparency

All code contributed back to Apache

Anyone should be able to easily deploy the Hadoop projects directly from Apache

© Hortonworks Inc. 2011

8Slide8

Technology Roadmap

Phase 1 – Making Apache Hadoop Accessible

Release the most stable version of Hadoop ever

Release directly usable code via Apache (RPMs

, .debs…)

Frequent

s

ustaining releases off of the

stable branches

2011

Phase 2 – Next

Generation Apache

Hadoop

Address key product gaps (Hbase support, HA, Management…)

Enable community & partner innovation via modular architecture

& open APIs

Work with community to define integrated stack

2012

(Alphas

starting

Oct

2011)

© Hortonworks Inc. 2011

9Slide9

Phase 2 - Next Generation Apache Hadoop

CoreHDFS FederationNext Gen MapReduceNew Write Pipeline (HBase support)HA (no SPOF) and Wire compatibility

Data - HCatalog 0.3Pig, Hive, MapReduce and Streaming as clients

HDFS and HBase as storage systemsPerformance and storage improvements

Management & Ease of use

All components fully

tested and deployable as a

stack

Stack installation and centralized config management

REST and GUI for user tasks

© Hortonworks Inc. 2011

10Slide10

Phase 2 – Core - MapReduce

Complete rewrite of the resource management layer Performance and Scale improvements6,000+ nodes / 100,000 concurrent tasks

Supports better availability and fail-overSupports new frameworks beyond MapReduce

© Hortonworks Inc. 2011

11Slide11

Phase 2 – Core – HDFS Federation

Multiple independent Namenodes and Namespace Volumes in a clusterScalability (6K nodes, 100K clients, 120PB disk), Workload isolation supportClient side mount tables for Global

NamespaceBlock storage as a generic shared storage service

DataNodes store blocks for all Namespace volumes – no partitioningNon-HDFS namespaces (HBase, MR tmp and others) can share the same

storage

Datanode 1

Datanode 2

Datanode m

...

...

...

NS1

Foreign NS n

...

...

NS k

Balancer

Block

Pools

Pool

n

Pool

k

Pool

1

NN-1

NN-k

NN-n

Common Storage

Namespace

Block storage

© Hortonworks Inc. 2011

12Slide12

Limitations of HDFS write pipeline in 0.20

Broken Flush, Sync, AppendNode failures can cause data loss for slow writersHadoop.Next

Flush, Sync, and Append supportNew replicas are added dynamically on failures

Phase 2 – Core – HDFS Write Pipeline

DN

DN

DN

Client

Flush Ack

© Hortonworks Inc. 2011

13Slide13

Phase 2 – Data – HCatalog

HDFS

HBase

HCatalog

Map

Reduce

Pig

Hive

Streaming

= Phase 1

= Phase 2

Shared schema and data model

Data can be shared between tool users

Data located by table rather than file

Clients independent of storage details

format, compression, …

Only one adaptor for new formats

not one per tool

Notifications when new data is available

© Hortonworks Inc. 2011

14Slide14

Hortonworks Value

Confidential Information

15Slide15

Hortonworks Differentiation

Unmatched domain expertiseDelivered every major release of Apache Hadoop to dateCritical mass of committersCommunity leadership role

Setting direction for core projectsYahoo! commitment and backing

Access to 1,000+ Hadoop engineers, Yahoo! gridAbsolute dedication to Apache

& open source

Focused on making Apache Hadoop the standard

Focus on delivering significant

value to technology vendors

ISVs, OEMs, Systems Integrators and other service providers

Confidential Information

16Slide16

Thank You.

© Hortonworks Inc. 2011