Movement Crunching and Sharing Guy Almes Academy for Advanced Telecommunications 13 February 2015 Overarching theme Understanding the interplay among data movement crunching and s haring is key ID: 253092
Download Presentation The PPT/PDF document "Big Data:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Big Data:Movement, Crunching, and Sharing
Guy Almes, Academy for Advanced Telecommunications
13 February 2015Slide2
Overarching theme
Understanding the interplay among
data movement,
crunching, and
s
haring is key.Slide3
This is a persistent theme
mid-1980s: NSF launched two closely related programs
The NSF Supercomputer Centers brought HPC and the emergent computational science to the mainstream of NSF-funded research
The NSFnet program, needed to connect science users to those supercomputers, resulted in connecting all our research universities to the Internet
File transfer of huge (e.g., one megabyte!) files was a key issue
Thus, A&M connected to NSFnet in August 1987Slide4
An ongoing themeThe Internet “outgrew” the narrow mission of connecting universities to supercomputers
But, in its broad missions, it often neglects the big-data needs of university researchers
Thus, having spawned the commercial Internet in the early 1990s, the universities created Internet2 in 1996
Again, a dramatic improvement in our ability to move huge (e.g., one gigabyte) files
Note the
Teragrid
network as a false stepSlide5
To the presentFirst, note A&M innovation in the
ScienceDMZ
, so that key data-intensive resources, e.g.,
the
gridftp
servers of
Brazos high-throughput cluster, have direct access to the wide-area (LEARN, Internet2,
ESnet
, etc.)
Recently, that wide-area infrastructure has been upgraded to 100-Gb/s
We’ll look at these in turnSlide6
ScienceDMZYou can achieve high-speed wide-are flows only if packet loss is very very small and MTU is not small
This fails if you try to extend these flows into the general-purpose campus LAN
Beginning 2009, we designed the Data Intensive Network to place key resources adjacent to the wide-area network
This idea, called “
ScienceDMZ
” and popularized by
ESnet
, is now widely adopted across the country
If both source and destination of a high-speed wide-area flow are on
ScienceDMZ’s
, very good performance can be
achieved
Example:
gridftp
servers supporting flows to/from the 240
TByte
file system for the Brazos high-throughput clusterSlide7
100 Gb/s UpgradeThe Internet2 backbone is built around 100-Gb/s circuits (and with up to 80 such circuits/lambdas per fiber)
With a combination of NSF and local funding, LEARN is evolving to 100-Gb/s:
Now: 100-Gb/s College Station to Houston
Now: 100-Gb/s Houston to Internet2 backbone at
Greenspoint
Now: 100-Gb/s Austin to Dallas
Now: 100-Gb/s Dallas to Internet2 backbone at Kansas City
Future: 100-Gb/s College Station to Dallas, and Austin to San Antonio to Houston
This would then result in a consistent 100-Gb/s wide-area infrastructureSlide8
Sum of current good situation
ScienceDMZ
and the emerging 100-Gb/s infrastructure permit very good end-to-end performance to resources on the
ScienceDMZ
Software tools such as
gridftp
, Globus Online, discipline-specific tools such as
Phedex
, permit wide-area flows in excess of 1
TByte
/hour to be sustained from other high-end sites
Emerging “Advanced Layer-2 Services”, based on software-defined network techniques, may be very importantSlide9
CrunchingSeveral key computing resources are already on A&M’s
ScienceDMZ
Parallel file system of Brazos
Similarly for Eos
Similarly for Ada
, a new very large x86 cluster
Emerging: Power7 cluster and eventually? the
BlueGene
cluster
Data-moving servers attached to the parallel file systems of these resources
And, using tools such as Globus Online, large data flows can be achieved to the computing resources of NSF/XSEDE and the DoESlide10
SharingThings are more primitive here. One can only point to:
A few discipline-specific examples, e.g., the
Phedex
system of the Large Hadron Collider’s CMS collaboration
Some key tools:
InCommon / Shibboleth provide federated identity/authentication
Globus Online provides some support for controlled sharing
But, generally, this situation does not match our needs to be able to share data among key scientific collaborationsSlide11
An important work in progress
Controlled high-performance sharing of data is key to effective scientific collaboration in the big-data world