/
Improving long-term preservation of EOS data by independent Improving long-term preservation of EOS data by independent

Improving long-term preservation of EOS data by independent - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
412 views
Uploaded On 2016-05-12

Improving long-term preservation of EOS data by independent - PPT Presentation

Mike Folk Ruth Aydt Joe Lee BinhMinh Ribler Kent Yang Ruth Duerr Christopher Lynnes T he 14 th HDF and HDFEOS Workshop September 2830 2010 September 2830 2010 HDFHDFEOS Workshop XIV ID: 315889

eos hdf workshop 2010 hdf eos 2010 workshop xiv september data hdf4 file mapping objects map schema nasa user

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Improving long-term preservation of EOS ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Improving long-term preservation of EOS data by independently mapping HDF4 data objects

Mike Folk, Ruth Aydt, Joe Lee, Binh-Minh Ribler, Kent YangRuth Duerr, Christopher LynnesThe 14th HDF and HDF-EOS WorkshopSeptember 28-30, 2010

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

1Slide2

Mapping project team members

The HDF GroupRuth AydtPeter CaoMike FolkJoe LeeElena PourmalTong QiBinh-Minh RiblerEunsoo SeoVeer SinghMuqun

{Kent} Yang

NASARuth

Duerr

(NSIDC)

Chris Lynnes (GES-DISC)

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

2Slide3

HDF4 files are complex

September 28-30, 2010HDF/HDF-EOS Workshop XIV3Slide4

How do HDF users avoid having to deal with all of that complexity?

September 28-30, 2010HDF/HDF-EOS Workshop XIV4Slide5

Through the HDF software libraries,

either by using HDF APIs directly, or by using HDF tools that depend on the HDF libraries. But what about the future…September 28-30, 2010HDF/HDF-EOS Workshop XIV

5Slide6

Over

the long term, there is a risk in depending solely on HDF software to access HDF-formatted data. It is possiblein the distant future, that the software may not be available.September 28-30, 2010HDF/HDF-EOS Workshop XIV6Slide7

“If only we could read HDF data with an independent program that does not rely on the HDF API…

A possible approach [would be to create] a map of a data file, [and] utilities to find, assemble and write out SDSes and

vdatas.”

“Leveraging HDF Utilities”Christopher Lynnes

HDF Workshop X.

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

7Slide8

User’s view of the HDF4 SD model

September 28-30, 2010HDF/HDF-EOS Workshop XIV8Slide9

Mapping SDS to file offset/length

HDF4 file layout

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

9Slide10

Mapping with compressed chunks

HDF4 file layout

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

10Slide11

Recap

ProblemThe complex byte layout of HDF files makes long-term readability of HDF data dependent on long-term availability of HDF software.SolutionCreate a map of the layout of data objects in an HDF file, allowing a simple reader to be written to access the data.September 28-30, 2010HDF/HDF-EOS Workshop XIV11Slide12

HDF4 mapping workflow

HDF4 File

HDF4 Mapping File

(XML document)

hmap

linked with

HDF4 library

Reader

program

Object Data

Groups, Data Objects,

Structural and Application

Metadata;

Locations of Object Data

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

12Slide13

Target User

Person 20+ years in the futureInterested in data stored in HDF4 fileHas HDF4 file and companion map fileCan “write a program”May not have:HDF4 data model, format, documentation, or softwareMapping schema, documentation, or softwareWill have knowledge of:Basic XMLData representations used today

Compression used by HDF4 (JPEG, Szip, etc.)

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

13Slide14

Project Phases

Phase 1Categorize HDF4 data held by NASA.Build a prototypeXML layout representationTool to create XML map file for given HDF4 fileTools to read HDF4 data based solely on map filesPhase 2Build a robust versionDeploySeptember 28-30, 2010HDF/HDF-EOS Workshop XIV

14Slide15

How many HDF4 products?

Data Center

HDF4 Products

ASF

0

GES-DISC

236

GHRC

54

ASDC

63

LP-DAAC

67

NSIDC

47

ORNL-DAAC

2

PO.DAAC

22

SDAC

0

MrDC

95

Total

586

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

15Slide16

Data characteristics

Product IdentificationProduct NameData LevelArchive LocationFor HDF-EOS productsHDF-EOS versionFor swath dataNumber of swathsMaximum number of dimensionsOrganized by time, space, both, or otherEtc.For SDS data

Number of SDSsMax number of dimensionsDid any SDS have attributesWas any SDS annotated

Were dimension scales usedWas compression used and if so what kindWas chunking used

For

Vdata

Number of

Vdata

structures

Did any have attributes

Did any fields have attributes

Etc.

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

16

Product Characteristics ExaminedSlide17

Phase 2 tasks

September 28-30, 2010HDF/HDF-EOS Workshop XIV17Investigate integration of mapping schema with existing standards

Determine HDF-EOS 2 requirements

Redesign and expand the XML schema

Implement production quality map writer

Develop demo map reader

Deploy tools at select NASA data centersSlide18

Task AInvestigate integration of mapping schema with existing standards

September 28-30, 2010HDF/HDF-EOS Workshop XIV18Slide19

Investigate existing standards

Investigated:METS, PREMIS, ESML, NcML, and CSMLConcluded: Existing standards have different purposes than mapping schemaNone meet all needs of mapping projectDevelop new schema tailored to project goalsHarmonize with PREMISLeverage terminology and approaches from allSeptember 28-30, 2010

HDF/HDF-EOS Workshop XIV

19Slide20

Task BDetermine HDF-EOS2 requirements

September 28-30, 2010HDF/HDF-EOS Workshop XIV20Slide21

Categorize HDF-EOS2 data products

Created a data pool from NASA data centersGES DISC, NSIDC, LAADS, LP DAACLaRC, PO.DAAC, GHRC, OBPG, LAADSDetailed description of sample dataReported options for adding HDF-EOS2 contents to the mapping fileDocuments and reports at wiki: http://wiki.hdfgroup.org/MappingPhase2_TaskB

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

21Slide22

Task CRedesign Schema

September 28-30, 2010HDF/HDF-EOS Workshop XIV22Slide23

Design priorities

Mapping filesProvide complete access to user-supplied content in NASA’s EOS binary HDF4 filesHave enough information to stand on their ownBe as simple as possibleMapping schemaDescribe the Mapping filesUsed for validation and documentationMay not be available to target userSeptember 28-30, 2010HDF/HDF-EOS Workshop XIV

23Slide24

Representation of HDF4 Objects

HDF4 User-Level Object

Mapping File XML

Element

Attribute, Annotation

Attribute

Vgroup

Group

Vdata

Table

SDS

Array

Dimension

Dimension

Raster Image

Not

yet done

Palette

Not

yet done

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

24Slide25

Mapping File – Group & Table

(fragment)September 28-30, 201025HDF/HDF-EOS Workshop XIV

Represents HDF4 Objects and Relationships

Information needed to access and interpret raw data in HDF4 file

Select raw data values included to help user verify binary data handled properly

AMSR_E_L2_Land_V09_200501180027_DSlide26

Status and Plans

StatusMap file design stabilizing for most HDF4 objectsPlansComplete design for Raster Images and PalettesContinue to refine instructions and contents Finalize schemaSeptember 28-30, 201026HDF/HDF-EOS Workshop XIVSlide27

Task DImplement Writer

September 28-30, 2010HDF/HDF-EOS Workshop XIV27Slide28

Map Writer Requirements

Retrieve information needed from HDF4 fileWrite out corresponding XML fileQuality requirementsCompleteness – don’t miss any objects in file.Accuracy – don’t give wrong information.September 28-30, 2010HDF/HDF-EOS Workshop XIV28Slide29

Writer Status and Plan

StatusCovers most Vgroup/Vdata/SDS objects.Covers some GR/Annotation objects.Being tested with NASA data.Plans: Increase coverage / accuracy / reliability.September 28-30, 2010

HDF/HDF-EOS Workshop XIV

29Slide30

Task EImplement demo reader

September 28-30, 2010HDF/HDF-EOS Workshop XIV30Slide31

Demo Reader Requirements

Multiplatform command line toolEasy to use clear arguments and outputMust validate that objects in the mapping file are actually in the HDF4 fileDeveloped in a well-supported high level language (python)Well documented Available as open sourceSeptember 28-30, 2010HDF/HDF-EOS Workshop XIV

31Slide32

Demo Reader Status

StatusOnly Vdata support provided so farCurrent source code available at https://sourceforge.net/projects/pyhdfDocumentation at

http://pyhdf.sourceforge.net/Plans

SDS and RIS support

September 28-30, 2010

HDF/HDF-EOS Workshop XIV

32Slide33

Task GDeploy

September 28-30, 2010HDF/HDF-EOS Workshop XIV33Slide34

Deploy

Begin in Jan 2011, complete in AprilActivities:GES DISC Incorporate into the existing archive ingest systemManage the retrofit into existing metadata filesNSIDCSupport implementation in NSIDC’s ECS systemOther ESDCs Encouraged to join in But deployment to other centers expected subsequent to the project. September 28-30, 2010

HDF/HDF-EOS Workshop XIV

34Slide35

Thank You!

September 28-30, 2010HDF/HDF-EOS Workshop XIV35Slide36

Acknowledgements

This work was supported by cooperative agreement number NNX08AO77A from the National Aeronautics and Space Administration (NASA). Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author[s] and do not necessarily reflect the views of the National Aeronautics and Space Administration.September 28-30, 2010HDF/HDF-EOS Workshop XIV

36Slide37

Questions/comments?

September 28-30, 2010HDF/HDF-EOS Workshop XIV37Slide38

September 28-30, 2010

HDF/HDF-EOS Workshop XIV38Slide39

Extra slides

September 28-30, 2010HDF/HDF-EOS Workshop XIV39Slide40

Mapping File – Array with Attribute

(fragment)September 28-30, 201040HDF/HDF-EOS Workshop XIV

Represents HDF4 Objects and Relationships

Information needed to access and interpret raw data in HDF4 file

Select raw data values included to help user verify binary data handled properly;

“corners” + 5 random

AIRS.2002.08.31.L3.RetStd_H001.v5.0.14.0.G07178195754Slide41

Mapping of Array with complex storage

September 28-30, 201041HDF/HDF-EOS Workshop XIV

Select values

included in map file for verification

Test file created for project

Compression

Chunks with Ghost Cells

Raw data in HDF4 file;

First chunk’s data is not contiguous