Frank . Würthwein. moderator . 16 October 2015. What is the PRP? . Ecosystem and networking. PRP is the answer to all research IT problems?. PRP is the answer to all networking problems?. PRP is 3 FTEs to change the world. ID: 556435
DownloadNote - The PPT/PDF document "PRP End-to-End Technical Requirements Fr..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
PRP End-to-End Technical Requirements From Science Applications
16 October 2015Slide2
What is the PRP? Ecosystem and networking
PRP is the answer to all research IT problems?
PRP is the answer to all networking problems?
PRP is 3 FTEs to change the world
Requirements we heard focus on much more than just networking.
The entire gambit of compute, storage, and networking issues were present in the wish lists.
And whatever we do must scale out faster than technology/$$, and be sustainable beyond the project.Slide3
Quotes paraphrased without attribution
Bringing home results produced at large compute centers is a
long tedious and painful process
Ability to use cached data and/or
elastically scaling out
processing local data
is a game changer
I want to
any data, anytime, anywhere
processing on my laptop and supercomputer without recompiling or human data
without me having to think about where to run.
a single interface
, rather than a dozen different accounts and environments
but fully controlled by me where I run what
– don’t give me layers of complicated middleware that are impenetrable for me.
Whatever tools you give me must require
minimal maintenance on my part
to make my data accessible for download & analysis by anybody
at their home institution because I can’t support everybody to compute at my institution.
Want the same network connectivity to external resources as I have to internal resources:
Science irrespective of Geography!
No matter what network bandwidth you give me, I will want more, low latency streaming bandwidth from anywhere to anywhere.Slide4
Large mismatch between what’s needed and what can realistically be done with 3 FTE ?!?
Leverage, Leverage, Leverage
Partner with other projects as much as we can!
Pick low hanging fruit and
Connectivity within PRP
Most science drivers are ok with 10Gbps as long as it is consistent & reliable across all
within the PRP … and then some.
Some want to push the envelope all the way to 100Gbps and beyond, especially the
Connectivity beyond PRP
Many sciences want to connect at 10Gbps to XSEDE resources & US National labs
feeding data to large compute resources is a widely shared requirement.
locations want to route across
Connectivity to international GCM archives at “PRP quality”
Connectivity to ALMA at “PRP quality”
Connectivity to AWS at “PRP quality”
data to large compute resources is a widely shared requirement.Slide8
Size of Data
While the full range of TB to PB was mentioned, most current needs seem to be in the O(10) – O(100) TB range.
Needs will scale by x10 or more within lifetime of PRP.
Needs likely to scale faster than TB/$$ growth.
Starting out with single FIONA is ok, but scale out into distributed cluster of DTNs on campus will happen.Slide9
A note to CIOs and alike
Science DMZ must reach the instruments on campus.
This is not just a centralized data center IT issue !
There will be a strong push on your campus to buy more DTN hardware over time,
and locate them in places you did not expect.Slide10
Security & Privacy concerns are secondary to networking, storage, and compute issues in most cases except some types of biomedical data.
Probably not surprising, security is typically more a concern for resource providers than resource consumers.However, PRP needs to satisfy not just resource consumers!
Can we build trusted systems from the ground up?Slide11
Bring my data to a large compute resource for processing, and bring the output back to me when done.
Make my data available for others to download and process with their local compute resources.
Bring my friend’s data to me for processing with my local resources.
It’s probably true that nobody cares to manage these transfers, but would rather use caches that make transfer management superfluous.
Support incorporating all of the above into orchestrated pipelines and workflows
Support this all to happen in (quasi-)
, within human attention spans, and in “batched” mode “overnight”.Slide12
How do we support science beyond the initial drivers ?
Other Sciences at participating institutions?
Same sciences at other institutions?
Other Sciences at other institutions?Slide13