Teaching and Training for the Parallel Future Dick Brown St Olaf College SPLASH Educators and Trainers Symposium October 24 2011 Overview Review of the need for parallelism Strategies we need ID: 616152
Download Presentation The PPT/PDF document "Surviving in the Wild" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Surviving in the Wild:Teaching and Training for the Parallel Future
Dick Brown
St. Olaf College
SPLASH Educators’ and Trainers Symposium
October 24, 2011Slide2
OverviewReview of the need for parallelism
Strategies we need
What to teach
How to teach it
How to get it taught
Surviving in the wild of Parallelism
With interludes, to be announced… Slide3
The need for parallelismQuestion:
Why do we need parallelism at the programming level?
Hint:
It’s
not
“because it’s there”(… although desktop applications do tend to find ways to use ever greater power per dollar)
Note:
I will use
parallel
as a generic for concurrent, parallel, distributed, cloud,
accelerator
(e.g., GPGPU), etc.Slide4
The need for parallelismAnswer: ScaleSlide5
The need for parallelismAnswer: Scale
Cloud applications
+
= 30,000,000,000,000,000Slide6
The need for parallelismAnswer:
Scale
Cloud applications
Scientific applications: Particle-level simulation of
turbulance
is exascale
Can’t achieve
exascale
performance without
many cores (Berkeley “walls”), acceleratorsSlide7
Challenges for industryTechnology:Heterogeneous computing (CPU + accelerators)
Sophisticated “on the fly” runtime systems
“Wall” of memory hierarchy vs. on-chip access Slide8
Challenges for industryTechnology:Heterogeneous computing (CPU + accelerators)
Sophisticated “on the fly” runtime systems
“Wall” of memory hierarchy vs. on-chip access
Examples
AMD Fusion System Architecture:
CPU+GPU
Intel MIC (Many Integrated Cores):
50+ CPUs on a chip, as a cluster-like accelerator Slide9
Challenges for industryTechnology:
Heterogeneous computing (CPU + accelerators)
Sophisticated “on the fly” runtime systems
“Wall” of memory hierarchy vs. on-chip access
Programming models:
Higher level; more “human-centric”
Scalable
Versatile Slide10
Challenges for Education/Training We want to prepare our students for what
they’ll need, before the demand explodes,
but
What are the enduring principles?
Technologies, (hence) tools change rapidly!
(Educators:) Change the curriculum???Slide11
A wild ecosystemIndustry/AcademiaSlide12
A wild ecosystemIndustry/AcademiaLearning curve/Rapid changeSlide13
A wild ecosystemIndustry/AcademiaLearning curve/Rapid change
Principles/PracticesSlide14
A wild ecosystemIndustry/AcademiaLearning curve/Rapid change
Principles/Practices
Teaching/Research
New research discoveries in technology and programming models need to get into the curriculum
yesterdaySlide15
A wild ecosystemIndustry/AcademiaLearning curve/Rapid change
Principles/Practices
Teaching/Research
We need strategy!
And, it’s coming fast!
Took OOPSLA 20 years to become SPLASH…
We can’t wait that longSlide16
Strategies to be foundWhat to teachHow to teach it
How to get it taught
ITiCSE
2010
working
group,
Strategies
for Preparing Computer Science Students
for
the
Multicore
WorldSlide17
What to teachParallel
computing has a head start:
ACM/IEEE Curriculum ’91
3
required
hours on parallel algorithms3 required hours on distributed and parallel programming language constructs, with hands-on
practice
Ada
, Concurrent Pascal, Occam, or
Parlog
(Was not universally embraced…)Slide18
What to teachParallel
computing has a head start:
ACM/IEEE Curriculum ’91
3
required
hours on parallel algorithms3 required
hours on distributed and parallel programming language constructs, with hands-on practice
But,
ten years later…
ACM/IEEE Curriculum ’01
0 required hours
of parallel algorithms
No mention of
programming language
constructs
Replaced by:
“net-centric computing,” etc.Slide19
NSF/TCPP Curriculum Standards Initiative in Parallel and Distributed Computing – Core Topics for Undergraduates
Sushil
K. Prasad,
IEEE TCPP Chair, Georgia State University
Richard LeBlanc
, Seattle University, ACM Education Council
Charles Weems
, University of Massachusetts, Amherst
Alan
Sussman
, University of Maryland
Arnold Rosenberg,
Northeastern and Colorado State University
Andrew
Lumsdaine
,
Indiana University
Curriculum Initiative Website: linked through
tcpp.computer.org
http://www.cs.gsu.edu/~tcpp/curriculum/index.phpSlide20
Who are we?
Chtchelkanova, Almadena - NSF
Dehne, Frank - University of Carleton, Canada
Gouda, Mohamed - University of Texas, Austin, NSF
Gupta, Anshul - lBM T.J. Watson Research Center
JaJa, Joseph - University of Maryland
Kant, Krishna - NSF, Intel
La Salle, Anita - NSF
LeBlanc, Richard, University of Seattle
Lumsdaine, Andrew - Indiana University
Padua, David- University of Illinois at Urbana-Champaign
Parashar, Manish- Rutgers, NSF
Prasad, Sushil- Georgia State University
Prasanna, Viktor- University of Southern California
Robert, Yves- INRIA, France
Rosenberg, Arnold- Colorado State University
Sahni, Sartaj- University of Florida
Shirazi, Behrooz- Washington State University
Sussman, Alan - University of Maryland
Weems, Chip, University of Massachussets
Wu, Jie - Temple UniversitySlide21
Specifying Curriculum Recommendations – NSF/TCPP Approach
Identify topics in
four existing
areas: architecture, algorithms, programming, and cross-cutting topics
For each topic, recommend
Bloom level
“Hours” of coverage
Suggested learning outcome
Possible core course for
coverage
Focus: First two yearsSlide22
Bloom Levels
Use first three levels for recommended core topics
K= Know the term/recall definition (basic literacy)
C = Comprehend so as to paraphrase/illustrate
A = Apply it in some way (requires operational command
)
N = Not in core (but may be useful in elective or advanced courses)Slide23
Example
Parallel and Distributed Models and Complexity
Costs of computation
Algorithms Topics
Bloom#
Course
Learning Outcome
Algorithmic problems
The important thing here is to emphasize the parallel/distributed aspects of the topic
Communication
broadcast
C/A
Data Struc/Algo
represents method of exchanging information - one-to-all broadcast (by recursive doubling)
multicast
K/C
Data Struc/Algo
Illustrate macro-communications on rings, 2D-grids and trees
scatter/gather
C/A
Data Structures/Algorithms
gossip
N
Not in core
Asynchrony
K
CS2
asynchrony as exhibited on a distributed platform, existence of race conditions
Synchronization
K
CS2, Data Struc/Algo
aware of methods of controlling race condition,
Sorting
C
CS2, Data Struc/Algo
parallel merge sort,
Selection
K
CS2, Data Struc/Algo
min/max, know that selection can be accomplished by sorting
K: know term
C: paraphrase/illustrate
A: applySlide24
Programming
Assume some conventional (sequential) programming experience
Key is to introduce parallel programming
early
to students
Four overall areas
Paradigms – By target machine model and by control statements
Notations – language/library constructs
Correctness – concurrency control
Performance – for different machine classesSlide25
Parallel Programming Paradigms (Selections)
By target machine model
Shared memory (Bloom classification
A
)
Distributed memory (
C
)
Client/server (
C
)
Hybrid
(
K
) – e.g., CUDA for CPU/
GPU
By control statements
Task/thread spawning (
A
)
Parallel Loop (
C
)Slide26
How to Read the ProposalOh no! Not another class to squeeze into our curriculum!Slide27
Oh yes! Not
another class to squeeze into your curriculum
!
How to Read the ProposalSlide28
Oh yes! Not
another class to squeeze into your curriculum
!
Draft curriculum released Dec 2010 (
tcpp.computer.org
)
How to Read the ProposalSlide29
Enduring skills?Since the tool set is subject to change at any time, how much investment in those skills?
Many parallel languages and features have come and gone
Need hands-on experience for effective learning.
Anything may suddenly emerge as important
Python as a prototyping language for HPCSlide30
A candidate for addition
Patterns of parallel programmingSlide31
Patterns, a candidate for additionBackground
“Gang of Four” bookSlide32
Patterns, a candidate for additionBackground
“Gang of Four” book, 1994
Doug Lea,
Concurrent programming in Java: Design principles and patterns
, 1999
Tim Mattson, et al, Patterns for Parallel Programming, 2005
Kurt Kreutzer and Berkeley
Parlab
, the
Dwarves
Motifs
Kreutzer and Mattson
, OPL
(
parlab.eecs.berkeley.edu
/wiki/
patterns)Slide33
Patterns, a candidate for additionWhy patterns?
They capture
reusable units of expert problem-solving strategy
Thus, they provide novices with a way to
acquire expertise
Many are supported by toolsLoop parallel
,
Message passing
,
Map-reduce
, …Slide34
How to teach itAgree with NSF/TCPP Initiative, that parallelism should be taught early and often
Scratch team kept concurrent scripts, because users “not surprised that a sprite can do several things at once”
Lessons of
Vishkin’s
“Peanut Butter Sandwich” exerciseSlide35
CSinParallel projectAdd parallelism early and often at all levels
Incremental, flexible approach via modules
Sharing within our communitySlide36
CSinParallel projectModular Approach
Short units (1-3 days)
Identified learning objectives
Self-contained
Flexible for use in various courses and curricula
Make software/libraries more accessibleParallel Platform Packages, Resources
Share, discuss and help as a community
http://
csinparallel.orgSlide37
CSinParallel project
Some selected module topics
Introductory
:
Map-Reduce computing
for CS1 using WebMapReduceConcurrent
access
to
data structures
in
Java or C++
Multicore
programming
with Intel’s
Manycore
Testing Lab
Intermediate:
Introduction
to parallel computing concepts
Concurrency strategies in programming languages
Parallel sorting
a
lgorithmsSlide38
Module: WebMapReduceSlide39
Module: MTL with OpenMP
Intel’s Manycore Testing Lab
Module
#pragma omp parallel for num_threads(threadct) \
shared (a, n, h, integral) private(i)
reduction(+: integral) Slide40
CSinParallelWe seek collaborators and contributorsSlide41
Patterns MethodologyKreutzer and Mattson OPL not only provides a catalog of patterns, but also a
software problem-solving methodologySlide42
Patterns MethodologyKreutzer and Mattson OPL not only provides a catalog of patterns, but also a
software problem-solving methodology
Purposes:
Education
Communication
DesignSlide43
How to get it taughtPressures on the professor
“Oh no! Not another course to squeeze…”
So, take an
i
ncremental spiral approach
(agreeing with NSF/TCPP)Small changes in curriculum in many places
Revisit challenging issues
Students come to think of parallelism as natural part of computation
Spiral approach is pedagogically effectiveSlide44
How to get it taughtIncentives
Microgrants
: small (e.g., $1500) amounts for contributing first steps in teaching parallelism
Intel Academic Community
(
intel.com/AcademicCommunity)Educational Alliance for a Parallel Future (
eapf.org
)Slide45
NSF/TCPP InitiativeEarly
Adopter ProgramSlide46
How to obtain Early Adopter Status?
16
Early adopters chosen
for Spring
term 2011
17 Early adopters chosen
for Fall term 2011
Next round of competition:
Fall 2012;
Deadline
November 5,
2011
NSF/Intel funded Stipend/Honorarium
Which
course(s
) , topics, evaluation plan?Slide47
How to obtain Early Adopter Status?
Instructors
for
core CS/CS courses
such as CS1/2, Systems, Data Structures and Algorithms –
department-wide adoption preferred
elective courses
such as Algorithms, Architecture, Programming Languages, Software
Engg
, etc.
introductory/advanced
PDC course
dept chairs, dept curriculum committee members responsibleSlide48
How to get it taughtOther supports needed
Platform availability
Support community
Educational elements
Learning objectives, assessment tools, etc.Slide49
Surviving in the wild ecosystemIndustry/AcademiaLearning curve/Rapid change
Principles/Practices
Teaching/ResearchSlide50
Surviving in the wild ecosystemIndustry/AcademiaLearning curve/Rapid change
Principles/Practices
Teaching/Research
Mine from the heritage of the past
Incremental approach
Spiral exposition
Pattern-based methodsSlide51
An example
(go-
lang.org
)
Mine
from the heritage of the past
Hoare’s CSP; CCS
Pi Calculus [teach/research]
Incremental
approach
Not far from C [academic/industry]
Spiral
exposition
Midway steps towards explicit threads, message passing [Learning curve/rapid change]
Pattern-based
methods
Message passing
,
Fork-join
, channel as
Parallel Queue
¨
[Principles/Practice]Slide52
Questions?