UserDefined Distributions Brad Chamberlain Cray Inc CSEP 524 May 20 2010 Chapel Distributions Distributions Recipes for parallel distributed arrays help the compiler map from the computations global view ID: 404451
Download Presentation The PPT/PDF document "Chapel:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Chapel: User-Defined Distributions
Brad ChamberlainCray Inc.
CSEP 524
May 20, 2010Slide2
Chapel DistributionsDistributions: “Recipes for parallel, distributed arrays”
help the compiler map from the computation’s global view……down to the
fragmented, per-processor implementation
=
α
·
+
=
α
·
+
=
α
·
+
=
α
·
+
=
α
·
+
MEMORY
MEMORY
MEMORY
MEMORY
Distributions
Data Parallelism
Task Parallelism
Locality Control
Target Machine
Base LanguageSlide3
Domain DistributionDomains may be distributed across locales
var D:
domain(2) dmapped Block(
CompGrid
, …)
= …;
A distribution defines…
…ownership of the domain’s indices (and its arrays’ elements)
…default work ownership for operations on the domains/arrayse.g.,
forall loops or promoted operations…memory layout/representation of array elements/domain indices …implementation of operations on its domains and arrayse.g., accessors, iterators, communication patterns, …
D
A
B
CompGrid
L0
L1
L2
L3
L4
L5
L6
L7
Distributions
Data Parallelism
Task Parallelism
Locality Control
Target Machine
Base LanguageSlide4
Domain DistributionsAny domain type may be distributed
Distributions do not affect program semanticsonly implementation details and therefore performance
“
steve
”
“
lee
”
“
sung
”
“
david
”
“
jacob
”
“
albert
”
“
brad
”
Distributions
Data Parallelism
Task Parallelism
Locality Control
Target Machine
Base LanguageSlide5
Domain Distributions
“
steve
”
“
lee
”
“
sung
”
“
david
”
“
jacob
”
“
albert
”
“
brad
”
Any domain type may be distributed
Distributions do not affect program semantics
only implementation details and therefore performance
Distributions
Data Parallelism
Task Parallelism
Locality Control
Target Machine
Base LanguageSlide6
Distributions: Goals & ResearchAdvanced users can write their own distributions
specified in Chapel using lower-level language featuresChapel will provide a standard library of distributionswritten using the same user-defined distribution mechanism
(Draft paper describing user-defined distribution strategy available by request)
Distributions
Data Parallelism
Task Parallelism
Locality Control
Target Machine
Base LanguageSlide7
A Simple Distribution: Block1DIntent: block a 1D index space across a set of locales
L0
L1
L2
0
1
-1
…
…
Use a bounding box
to compute the blocking
8
…
…
Distributions
Data Parallelism
Task Parallelism
Locality Control
Target Machine
Base LanguageSlide8
Distributions vs. DomainsQ1: Why distinguish between distributions and domains?
Q2: Why do distributions map an index space rather than a fixed index set?A:
To permit several domains to share a single distributionamortizes the overheads of storing a distributionsupports trivial domain/array alignment and compiler optimizations
const
D : …
dmapped
B1 = [1..8],
outerD
: …dmapped B1 = [0..9], innerD: subdomain
(D) = [2..7], slideD: subdomain(D) = [4..6];
L0
L1L2
Sharing a distribution
supports trivial alignment
of
these domains
Distributions
Data Parallelism
Task Parallelism
Locality Control
Target Machine
Base LanguageSlide9
Distributions vs. DomainsQ1: Why distinguish between distributions and domains?
Q2: Why do distributions map an index space rather than a fixed index set?A:
To permit several domains to share a single distributionamortizes the overheads of storing a distributionsupports trivial domain/array alignment and compiler optimizations
const
D : …
dmapped
B1 = [1..8],
outerD: …dmapped B2 = [0..9], innerD
: …dmapped B3 = [2..7], slideD
: …dmapped B4 = [4..6];
L0L1
L2
When each domain is
given its own distribution,
the compiler cannot reason
about alignment of indices.
Distributions
Data Parallelism
Task Parallelism
Locality Control
Target Machine
Base LanguageSlide10
The Block Distribution maps the indices of a domain in a dense fashion across the target Locales according to the boundingBox argument
The Block Distribution
const
Dist =
new
dmap
(
new
Block(boundingBox=[1..4, 1..8]));var Dom: domain
(2) dmapped Dist = [1..4, 1..8];
L0
L1L2
L3L4
L5
L6L7
distributed overSlide11
The Cyclic Distribution maps the indices of a domain in a round-robin fashion across the target Locales according to the startIdx argument
The Cyclic Distribution
const
Dist =
new
dmap
(
new
Cyclic(startIdx=(1,1)));var Dom: domain
(2) dmapped Dist = [1..4, 1..8];
L0
L1L2
L3L4
L5
L6L7
distributed overSlide12
Domain Maps: Distributions and LayoutsDomain Map: The general concept that indicates how to implement a domain and its arrays
Two flavors:layout: a domain map targeting a single localerow-major order, column major order, Morton order
hierarchically tiled, linked data structure, etc.compressed sparse row, open hashing w/ quadratic probingdistribution: a domain map targeting multiple localesblock, cyclic, block-cyclic
recursive bisection, hashing across locales, graph partitioning
…Slide13
Domain Map Framework: Layouts
distribution
domain
array
Responsibility:
How to generate new domains
Responsibility:
How to store, iterate over domain indicesResponsibility:
How to store, access, iterate over array elementsthree descriptors:
Distributions
Data Parallelism
Task Parallelism
Locality Control
Target Machine
Base LanguageSlide14
Domain Map Framework: Distributions
distribution
domain
array
Responsibility:
How to generate new domains and map
indices to locales
Responsibility:
How to store, iterate over domain indicesResponsibility:
How to store, access, iterate over array elementsglobal descriptors(one global instance or replicated per locale)
local descriptors(one instance per locale)Responsibility:
How to store, iterate over local domain indicesResponsibility: How to store, access, iterate over local array elements
Responsibility: How to store asingle locale’sportion of theindex space
Distributions
Data Parallelism
Task Parallelism
Locality Control
Target Machine
Base LanguageSlide15
Domain Map Framework: Distributions
global descriptors
(one global instance
or replicated per locale)
local descriptors
(one instance per locale)
distribution
domain
array
domain descriptor
distribution descriptor
local
array descriptor
local
domain
descriptor
array descriptor
target locale set
distribution
args
map index to locale…
global index info
index type
index iterationallocate array…
element type
global value iterationrandom access…
store local indices
local index iterationadd new indices
…
store local values
local value iterationlocal random access…
= descriptor
state/type
= descriptor methods
legend
local
distribution
descriptor
Distributions
Data Parallelism
Task Parallelism
Locality Control
Target Machine
Base LanguageSlide16
myElems
=
4
5
myElems
=
6
8
myElems
=
3
1
1D Block Distribution Classes
global
descriptors
local
descriptors
distribution
domain
array
code
L0
L1
L2
myChunk
=
min(
idxType
)
3
myChunk
=
max(
idxType
)
6
myChunk
=
4
5
myElems
=
4
5
myElems
=
6
8
myElems
=
3
1
const
B1 =
new
dmap
(
new
Block(
bbox
=[1..8]));
const
D:
domain
(1)
dmapped
B1
= [1..8];
var
A, B:
[D]
real;
myBlock
=
4
5
myBlock
=
6
8
myBlock
=
3
1
boundingBox
=
1
8
targetLocales
=
L0
L1
L2
whole =
1
8
(
LocaleSpace
=
[0..2])Slide17
myElems
=
4
5
myElems
=
6
myElems
=
1D Block Distribution Classes
global
descriptors
local
descriptors
distribution
domain
array
code
L0
L1
L2
myChunk
=
min(
idxType
)
3
myChunk
=
max(
idxType
)
6
myChunk
=
4
5
myElems
=
4
5
myElems
=
6
myElems
=
const
B1 =
new
dmap
(
new
Block(
bbox
=[1..8]));
const
sliceD
:
domain
(1)
dmapped
B1 = [4..6];
var A2, B2: [sliceD]
real;
myBlock =
4
5
myBlock
=
6
myBlock
=
-1
0
…
boundingBox
=
1
8
targetLocales
=
L0
L1L2
whole =
4
6
(
LocaleSpace
=
[0..2])