Matei Zaharia Benjamin Hindman Andy Konwinski Ali Ghodsi Anthony Joseph Randy Katz Scott Shenker Ion Stoica Background Clusters of commodity servers have become a major computing ID: 255568
Download Presentation The PPT/PDF document "The Datacenter Needs an Operating System" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Datacenter Needs an Operating System
Matei Zaharia,
Benjamin
Hindman
, Andy
Konwinski
, Ali
Ghodsi
, Anthony Joseph, Randy Katz, Scott
Shenker
, Ion
StoicaSlide2
Background
Clusters of commodity servers
have become a
major computing
platform in industry
and academia
Driven by data volumes outpacing the processing capabilities of single machines
Democratized by cloud computingSlide3
Background
Some
have
declared that
“the datacenter is the new computer”
Claim
:
this new computer increasingly needs
an
operating system
Not necessarily a new host OS, but
a common software layer
that manages resources and provides shared services for the whole datacenter, like an OS does for one hostSlide4
Why Datacenters Need an OS
Growing number of applications
Parallel processing systems:
MapReduce
, Dryad,
Pregel
, Percolator
,
Dremel
, MR
Online
Storage systems: GFS,
BigTable
, Dynamo,
SCADS
Web apps and supporting services
Growing number of users
200+
for
Facebook’s
Hadoop
data warehouse, running near-interactive ad hoc queriesSlide5
What Operating Systems Provide
Resource sharing
across applications
& users
Data sharing
between programs
Programming
abstractions
(e.g. threads, IPC)
Debugging
facilities
(e.g.
p
trace
,
gdb
)
Result:
OSes
enable a highly interoperable
software ecosystem that
we now take
for grantedSlide6
An Analogy
Today, a scientist analyzing data on a single machine can pipe it through a variety of tools, write new tools that interface with these through standard APIs, and trace across the stack
In the future, the scientist should be able to fire up a cloud on EC2 and do the same thing:
Intermix a variety of apps & programming models
Write new parallel programs that talk to these
Get a unified interface for managing the cluster
Debug and trace across all these componentsSlide7
Today’s Datacenter OS
Hadoop
MapReduce
as common execution and resource sharing
platform
Hadoop
InputFormat
API for data sharing
Abstractions for productivity programmers, but not for system builders
Very challenging to
debug across all the layersSlide8
Tomorrow’s Datacenter OS
Resource
sharing:
Lower
-level interfaces for fine-grained
sharing (
Mesos
is a first step in this direction)
Optimization for a variety of metrics (e.g. energy)
Integration with network scheduling mechanisms (e.g. Seawall [NSDI ‘11], NOX, Orchestra)Slide9
Tomorrow’s Datacenter OS
Data sharing:
Standard
interfaces for
cluster file
systems, key-value stores,
etc
In-memory data sharing (e.g. Spark, DFS cache), and a unified system to manage this memory
Streaming data abstractions (analogous to
pipes)
Lineage instead of replication for reliability (RDDs)Slide10
Tomorrow’s Datacenter OS
Programming abstractions:
Tools that can be used to build the next
MapReduce
/
BigTable
in
a
week (e.g. BOOM)
Efficient implementations of communication primitives (e.g. shuffle, broadcast)
New distributed programming modelsSlide11
Tomorrow’s Datacenter OS
Debugging facilities:
Tracing and debugging tools that work across the cluster software stack (e.g. X-Trace, Dapper)
Replay debugging that takes advantage of limited languages / computational models
Unified monitoring infrastructure and APIsSlide12
Putting it All Together
A successful datacenter OS might let users:
Build a
Hadoop
-like software stack in a week using the OS’s abstractions, while gaining other benefits (e.g. cross-stack replay debugging)
Share data efficiently between independently developed programming models and applications
Understand cluster behavior without having to log into individual nodes
Dynamically share the cluster with other usersSlide13
Conclusion
Datacenters need an OS-like software stack for the same reasons single computers did: manageability, efficiency & programmability
An OS is already emerging in an ad-hoc way
Researchers can help by taking a
long-term
approach
towards
these problemsSlide14
How Researchers can Help
Focus on paradigms, not performance
Industry is tackling performance but lacks luxury to take long-term view towards abstractions
Explore clean-slate approaches
Likelier to have impact here than in a “real” OS because datacenter software changes quickly!
Bring cluster computing to non-
experts
Much harder and more rewarding than big users