Yin Lu Yong Chen Oct 07 2013 DataIntensive Scalable Computing Laboratory DISCL Localitydriven Highlevel IO Aggregation for Processing Scientific Datasets 1 Outline Introduction ID: 816713
Download The PPT/PDF document "Jialin Liu , Bradly Crysler" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Jialin Liu, Bradly Crysler, Yin Lu, Yong ChenOct. 07. 2013Data-Intensive Scalable Computing Laboratory (DISCL)
Locality-driven High-level I/O Aggregation for Processing Scientific Datasets
1
Slide2OutlineIntroductionMotivation Hila: High Level I/O A
ggregationEvaluationConclusion and Future Work
2
Slide3Introduction
Scientific simulations
nowadays generate
a few
terabytes (TB) of data in
a single run
and the data sizes
are expected
to reach
petabytes (PB) in the near future. GCRM, 100 million collumns, 128 levels per column, 50 kmAccessing and analyzing the data reveals poor I/O performance due to the logical-physical mismatching.
Slide4IntroductionScientific Datasets and Scientific I/O Libraries
PnetCDF, HDF5, ADIOSPnetCDF
MPI-IO
Parallel File Systems
Scientific I/O libraries allow users to specify array-based logical input
Logical-physical mismatching
Slide5Motivation
I/O methods in scientific I/O libraries(
PnetCDF
, ADIOS, HDF5):
Independent I/O
Collective I/O
Noblocking
I/O
Processes collaboration:
No
Calls collaboration :
No
Processes collaboration:
Yes
Calls collaboration :
No
Processes collaboration:
Yes
Calls collaboration :
Yes
Slide6MotivationContention on Storage Server without Aware of Locality
…
Call
0
…
Call
1
…
Call
i
…
Two Phase Collective I/O
…
ag
00
ag
01
ag
02
ag
03
…
…
…
ag
1
0
ag
1
1
ag
1
2
ag
1
3
ag
i
0
ag
i
1
ag
i
2
ag
i
3
Slide7Performance with Overlapping CallsConclusion: Overlapping Should be Removed
Slide8Idea: High level I/O Aggregation
start{0,0,0}
length{100,200,100}
start{0,0,100}
length{100,200,100}
start{10,20,100}
length{10,150,400}
start{10,170,100}
length{10,150,400}
Physical
Layout
sub
0
sub
2
sub
0
sub
2
sub
1
sub
3
sub
1
sub
3
Physical
Layout
start{0,0,0}
length{100,200,200}
start{10,20,100}
length{10,300,400}
Call
0
Call
1
Logical Input
Decomposition
Slide9Idea: High level I/O Aggregation
Basic IdeaFigure out the overlapping among requestsEliminate the overlapping before doing I/O
Challenges
How to decompose the requests
How to aggregate the sub-arrays at a high level
Slide10Hila: High Level I/O Aggregation
Way to figure out the physical layoutSub-correlation Function
Sub-correlation Set
Lustre Striping: stripe size: s; stripe count: l;
Dataset : Dimension: d; subsets size: m
Slide11Hila Algorithm: Prior Step
Prior Step: calculate sub-correlation set, one time analysis
Slide12Hila Algorithm: DecompositionMain Steps: Request Decomposition and Aggregation
Slide13Improvement with HilaPerformance Improved with Hila
Slide14Improvement with HilaFASM Improved with Hila
Slide15Conclusion and Future WorkConclusion
The mismatching between logical access and physical layout can lead to poor performance.We propose the locality-driven high-level aggregation approach (HiLa) to facilitate the existing I
/O methods by eliminating the overlapping among sub-array requests.
Future Work
Apply to write operations
Integrate with
f
ile systems.
Slide16Locality-driven High-level I/O Aggregationfor Processing Scientific DatasetsThanks
Q&Ahttp://discl.cs.ttu.edu