twitter jeric14 hortonworks Hortonworks Inc 2011 Architecting the Future of Big Data June 29 2011 About Hortonworks Mission Revolutionize and commoditize the storage and processing of big data via open source ID: 739647
Download Presentation The PPT/PDF document "Hortonworks Eric Baldeschwieler – CEO" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Hortonworks
Eric Baldeschwieler – CEOtwitter: @jeric14 (@hortonworks)
© Hortonworks Inc. 2011
Architecting the Future of Big Data
June 29, 2011Slide2
About Hortonworks
Mission: Revolutionize and commoditize the storage and processing of big data via open sourceVision: Half of the world’s data will be stored in Apache Hadoop within five years
Strategy: Grow the Apache Hadoop Ecosystem by making Apache Hadoop easier to consume, profit by providing training, support and certification
An independent companyFocused on making Apache Hadoop great
Hold nothing back, Apache Hadoop will be complete
© Hortonworks Inc. 2011
3Slide3
Credentials
Technical: key architects and committers from Yahoo! Hadoop engineering teamHighest concentration of Apache Hadoop committersContributed >70% of the code in Hadoop, Pig and ZooKeeperDelivered every major/stable Apache Hadoop release since 0.1History of driving innovation across entire Apache Hadoop stack
Experience managing world’s largest deploymentBusiness operations:
team of highly successful open source veteransLed by Rob Bearden, former COO of SpringSource & JBoss
Investors:
backed by Benchmark Capital and Yahoo!
Benchmark was key investor in Red Hat, MySQL, SpringSource, Twitter & eBay
© Hortonworks Inc. 2011
4Slide4
Hortonworks and Yahoo!
Yahoo! is a development partnerLeverage large Yahoo! development, testing & operations teamMore than 1,000 active & sophisticated users of Apache Hadoop
Access to the Yahoo! grid for testing large workloadsOnly organization that has delivered a stable release of Apache Hadoop
Yahoo will continue to contribute Apache Hadoop code too!Yahoo
! is a
customer
Hortonworks
provides level 3 support and training to Y
ahoo!
Yahoo deploys Apache Hadoop releases across its 42,000 grid
Yahoo! is an
investor
© Hortonworks Inc. 2011
5Slide5
Current State of Adoption
Vendor Ecosystem Adoption
Enterprise Adoption
Early adopters
Technology is hard to install, manage & use
Technology lacks enterprise robustness
Requires significant investment in technical staff or consulting
Hard to find & hire experienced developer & operations talent
Early in vendor adoption lifecycle
Hadoop is hard to integrate and extend
Hard
to find & hire experienced developer & operations
talent
Technology & Knowledge Gaps Prevent Apache Hadoop from Reaching Full Potential
Customers are asking their vendors for help with Hadoop!
“We’re seeing Hadoop in all of our fortune 2000 data accounts”
© Hortonworks Inc. 2011
6Slide6
Hortonworks Role & Opportunity
Vendor Ecosystem Adoption
Enterprise Adoption
Bridge the Gap!
Grow Market
Sell training and support via Partners
Fundamental shift in enterprise data architecture strategy
Apache Hadoop becomes
standard for managing new
types & scale of data
New
applications & solutions will
be
created to
leverage data in Apache
Hadoop
Creates massive big data technology and services opportunity for ecosystem
© Hortonworks Inc. 2011
7Slide7
Hortonworks Objectives
Make Apache Hadoop projects easier to install, manage & useRegular sustaining releasesCompiled code for each project (e.g. RPMs)Testing at scaleMake Apache Hadoop more robust
Performance gainsHigh availabilityAdministration & monitoring
Make Apache Hadoop easier to integrate & extendOpen APIs for extension & experimentation
All done within Apache Hadoop community
Develop collaboratively with community
Complete transparency
All code contributed back to Apache
Anyone should be able to easily deploy the Hadoop projects directly from Apache
© Hortonworks Inc. 2011
8Slide8
Technology Roadmap
Phase 1 – Making Apache Hadoop Accessible
Release the most stable version of Hadoop ever
Release directly usable code via Apache (RPMs
, .debs…)
Frequent
s
ustaining releases off of the
stable branches
2011
Phase 2 – Next
Generation Apache
Hadoop
Address key product gaps (Hbase support, HA, Management…)
Enable community & partner innovation via modular architecture
& open APIs
Work with community to define integrated stack
2012
(Alphas
starting
Oct
2011)
© Hortonworks Inc. 2011
9Slide9
Phase 2 - Next Generation Apache Hadoop
CoreHDFS FederationNext Gen MapReduceNew Write Pipeline (HBase support)HA (no SPOF) and Wire compatibility
Data - HCatalog 0.3Pig, Hive, MapReduce and Streaming as clients
HDFS and HBase as storage systemsPerformance and storage improvements
Management & Ease of use
All components fully
tested and deployable as a
stack
Stack installation and centralized config management
REST and GUI for user tasks
© Hortonworks Inc. 2011
10Slide10
Phase 2 – Core - MapReduce
Complete rewrite of the resource management layer Performance and Scale improvements6,000+ nodes / 100,000 concurrent tasks
Supports better availability and fail-overSupports new frameworks beyond MapReduce
© Hortonworks Inc. 2011
11Slide11
Phase 2 – Core – HDFS Federation
Multiple independent Namenodes and Namespace Volumes in a clusterScalability (6K nodes, 100K clients, 120PB disk), Workload isolation supportClient side mount tables for Global
NamespaceBlock storage as a generic shared storage service
DataNodes store blocks for all Namespace volumes – no partitioningNon-HDFS namespaces (HBase, MR tmp and others) can share the same
storage
Datanode 1
Datanode 2
Datanode m
...
...
...
NS1
Foreign NS n
...
...
NS k
Balancer
Block
Pools
Pool
n
Pool
k
Pool
1
NN-1
NN-k
NN-n
Common Storage
Namespace
Block storage
© Hortonworks Inc. 2011
12Slide12
Limitations of HDFS write pipeline in 0.20
Broken Flush, Sync, AppendNode failures can cause data loss for slow writersHadoop.Next
Flush, Sync, and Append supportNew replicas are added dynamically on failures
Phase 2 – Core – HDFS Write Pipeline
DN
DN
DN
Client
Flush Ack
© Hortonworks Inc. 2011
13Slide13
Phase 2 – Data – HCatalog
HDFS
HBase
HCatalog
Map
Reduce
Pig
Hive
Streaming
= Phase 1
= Phase 2
Shared schema and data model
Data can be shared between tool users
Data located by table rather than file
Clients independent of storage details
format, compression, …
Only one adaptor for new formats
not one per tool
Notifications when new data is available
© Hortonworks Inc. 2011
14Slide14
Hortonworks Value
Confidential Information
15Slide15
Hortonworks Differentiation
Unmatched domain expertiseDelivered every major release of Apache Hadoop to dateCritical mass of committersCommunity leadership role
Setting direction for core projectsYahoo! commitment and backing
Access to 1,000+ Hadoop engineers, Yahoo! gridAbsolute dedication to Apache
& open source
Focused on making Apache Hadoop the standard
Focus on delivering significant
value to technology vendors
ISVs, OEMs, Systems Integrators and other service providers
Confidential Information
16Slide16
Thank You.
© Hortonworks Inc. 2011