Mike Folk Ruth Aydt Joe Lee BinhMinh Ribler Kent Yang Ruth Duerr Christopher Lynnes T he 14 th HDF and HDFEOS Workshop September 2830 2010 September 2830 2010 HDFHDFEOS Workshop XIV ID: 315889
Download Presentation The PPT/PDF document "Improving long-term preservation of EOS ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Improving long-term preservation of EOS data by independently mapping HDF4 data objects
Mike Folk, Ruth Aydt, Joe Lee, Binh-Minh Ribler, Kent YangRuth Duerr, Christopher LynnesThe 14th HDF and HDF-EOS WorkshopSeptember 28-30, 2010
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
1Slide2
Mapping project team members
The HDF GroupRuth AydtPeter CaoMike FolkJoe LeeElena PourmalTong QiBinh-Minh RiblerEunsoo SeoVeer SinghMuqun
{Kent} Yang
NASARuth
Duerr
(NSIDC)
Chris Lynnes (GES-DISC)
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
2Slide3
HDF4 files are complex
September 28-30, 2010HDF/HDF-EOS Workshop XIV3Slide4
How do HDF users avoid having to deal with all of that complexity?
September 28-30, 2010HDF/HDF-EOS Workshop XIV4Slide5
Through the HDF software libraries,
either by using HDF APIs directly, or by using HDF tools that depend on the HDF libraries. But what about the future…September 28-30, 2010HDF/HDF-EOS Workshop XIV
5Slide6
Over
the long term, there is a risk in depending solely on HDF software to access HDF-formatted data. It is possiblein the distant future, that the software may not be available.September 28-30, 2010HDF/HDF-EOS Workshop XIV6Slide7
“If only we could read HDF data with an independent program that does not rely on the HDF API…
A possible approach [would be to create] a map of a data file, [and] utilities to find, assemble and write out SDSes and
vdatas.”
“Leveraging HDF Utilities”Christopher Lynnes
HDF Workshop X.
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
7Slide8
User’s view of the HDF4 SD model
September 28-30, 2010HDF/HDF-EOS Workshop XIV8Slide9
Mapping SDS to file offset/length
HDF4 file layout
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
9Slide10
Mapping with compressed chunks
HDF4 file layout
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
10Slide11
Recap
ProblemThe complex byte layout of HDF files makes long-term readability of HDF data dependent on long-term availability of HDF software.SolutionCreate a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data.September 28-30, 2010HDF/HDF-EOS Workshop XIV11Slide12
HDF4 mapping workflow
HDF4 File
HDF4 Mapping File
(XML document)
hmap
linked with
HDF4 library
Reader
program
Object Data
Groups, Data Objects,
Structural and Application
Metadata;
Locations of Object Data
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
12Slide13
Target User
Person 20+ years in the futureInterested in data stored in HDF4 fileHas HDF4 file and companion map fileCan “write a program”May not have:HDF4 data model, format, documentation, or softwareMapping schema, documentation, or softwareWill have knowledge of:Basic XMLData representations used today
Compression used by HDF4 (JPEG, Szip, etc.)
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
13Slide14
Project Phases
Phase 1Categorize HDF4 data held by NASA.Build a prototypeXML layout representationTool to create XML map file for given HDF4 fileTools to read HDF4 data based solely on map filesPhase 2Build a robust versionDeploySeptember 28-30, 2010HDF/HDF-EOS Workshop XIV
14Slide15
How many HDF4 products?
Data Center
HDF4 Products
ASF
0
GES-DISC
236
GHRC
54
ASDC
63
LP-DAAC
67
NSIDC
47
ORNL-DAAC
2
PO.DAAC
22
SDAC
0
MrDC
95
Total
586
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
15Slide16
Data characteristics
Product IdentificationProduct NameData LevelArchive LocationFor HDF-EOS productsHDF-EOS versionFor swath dataNumber of swathsMaximum number of dimensionsOrganized by time, space, both, or otherEtc.For SDS data
Number of SDSsMax number of dimensionsDid any SDS have attributesWas any SDS annotated
Were dimension scales usedWas compression used and if so what kindWas chunking used
For
Vdata
Number of
Vdata
structures
Did any have attributes
Did any fields have attributes
Etc.
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
16
Product Characteristics ExaminedSlide17
Phase 2 tasks
September 28-30, 2010HDF/HDF-EOS Workshop XIV17Investigate integration of mapping schema with existing standards
Determine HDF-EOS 2 requirements
Redesign and expand the XML schema
Implement production quality map writer
Develop demo map reader
Deploy tools at select NASA data centersSlide18
Task AInvestigate integration of mapping schema with existing standards
September 28-30, 2010HDF/HDF-EOS Workshop XIV18Slide19
Investigate existing standards
Investigated:METS, PREMIS, ESML, NcML, and CSMLConcluded: Existing standards have different purposes than mapping schemaNone meet all needs of mapping projectDevelop new schema tailored to project goalsHarmonize with PREMISLeverage terminology and approaches from allSeptember 28-30, 2010
HDF/HDF-EOS Workshop XIV
19Slide20
Task BDetermine HDF-EOS2 requirements
September 28-30, 2010HDF/HDF-EOS Workshop XIV20Slide21
Categorize HDF-EOS2 data products
Created a data pool from NASA data centersGES DISC, NSIDC, LAADS, LP DAACLaRC, PO.DAAC, GHRC, OBPG, LAADSDetailed description of sample dataReported options for adding HDF-EOS2 contents to the mapping fileDocuments and reports at wiki: http://wiki.hdfgroup.org/MappingPhase2_TaskB
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
21Slide22
Task CRedesign Schema
September 28-30, 2010HDF/HDF-EOS Workshop XIV22Slide23
Design priorities
Mapping filesProvide complete access to user-supplied content in NASA’s EOS binary HDF4 filesHave enough information to stand on their ownBe as simple as possibleMapping schemaDescribe the Mapping filesUsed for validation and documentationMay not be available to target userSeptember 28-30, 2010HDF/HDF-EOS Workshop XIV
23Slide24
Representation of HDF4 Objects
HDF4 User-Level Object
Mapping File XML
Element
Attribute, Annotation
Attribute
Vgroup
Group
Vdata
Table
SDS
Array
Dimension
Dimension
Raster Image
Not
yet done
Palette
Not
yet done
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
24Slide25
Mapping File – Group & Table
(fragment)September 28-30, 201025HDF/HDF-EOS Workshop XIV
Represents HDF4 Objects and Relationships
Information needed to access and interpret raw data in HDF4 file
Select raw data values included to help user verify binary data handled properly
AMSR_E_L2_Land_V09_200501180027_DSlide26
Status and Plans
StatusMap file design stabilizing for most HDF4 objectsPlansComplete design for Raster Images and PalettesContinue to refine instructions and contents Finalize schemaSeptember 28-30, 201026HDF/HDF-EOS Workshop XIVSlide27
Task DImplement Writer
September 28-30, 2010HDF/HDF-EOS Workshop XIV27Slide28
Map Writer Requirements
Retrieve information needed from HDF4 fileWrite out corresponding XML fileQuality requirementsCompleteness – don’t miss any objects in file.Accuracy – don’t give wrong information.September 28-30, 2010HDF/HDF-EOS Workshop XIV28Slide29
Writer Status and Plan
StatusCovers most Vgroup/Vdata/SDS objects.Covers some GR/Annotation objects.Being tested with NASA data.Plans: Increase coverage / accuracy / reliability.September 28-30, 2010
HDF/HDF-EOS Workshop XIV
29Slide30
Task EImplement demo reader
September 28-30, 2010HDF/HDF-EOS Workshop XIV30Slide31
Demo Reader Requirements
Multiplatform command line toolEasy to use clear arguments and outputMust validate that objects in the mapping file are actually in the HDF4 fileDeveloped in a well-supported high level language (python)Well documented Available as open sourceSeptember 28-30, 2010HDF/HDF-EOS Workshop XIV
31Slide32
Demo Reader Status
StatusOnly Vdata support provided so farCurrent source code available at https://sourceforge.net/projects/pyhdfDocumentation at
http://pyhdf.sourceforge.net/Plans
SDS and RIS support
September 28-30, 2010
HDF/HDF-EOS Workshop XIV
32Slide33
Task GDeploy
September 28-30, 2010HDF/HDF-EOS Workshop XIV33Slide34
Deploy
Begin in Jan 2011, complete in AprilActivities:GES DISC Incorporate into the existing archive ingest systemManage the retrofit into existing metadata filesNSIDCSupport implementation in NSIDC’s ECS systemOther ESDCs Encouraged to join in But deployment to other centers expected subsequent to the project. September 28-30, 2010
HDF/HDF-EOS Workshop XIV
34Slide35
Thank You!
September 28-30, 2010HDF/HDF-EOS Workshop XIV35Slide36
Acknowledgements
This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration.September 28-30, 2010HDF/HDF-EOS Workshop XIV
36Slide37
Questions/comments?
September 28-30, 2010HDF/HDF-EOS Workshop XIV37Slide38
September 28-30, 2010
HDF/HDF-EOS Workshop XIV38Slide39
Extra slides
September 28-30, 2010HDF/HDF-EOS Workshop XIV39Slide40
Mapping File – Array with Attribute
(fragment)September 28-30, 201040HDF/HDF-EOS Workshop XIV
Represents HDF4 Objects and Relationships
Information needed to access and interpret raw data in HDF4 file
Select raw data values included to help user verify binary data handled properly;
“corners” + 5 random
AIRS.2002.08.31.L3.RetStd_H001.v5.0.14.0.G07178195754Slide41
Mapping of Array with complex storage
September 28-30, 201041HDF/HDF-EOS Workshop XIV
Select values
included in map file for verification
Test file created for project
Compression
Chunks with Ghost Cells
Raw data in HDF4 file;
First chunk’s data is not contiguous