/
Authors:         Wei   Zhang Authors:         Wei   Zhang

Authors: Wei Zhang - PowerPoint Presentation

crashwillow
crashwillow . @crashwillow
Follow
344 views
Uploaded On 2020-08-28

Authors: Wei Zhang - PPT Presentation

Texas Tech University Suren Byna Lawrence Berkeley National Laboratory Houjun Tang ID: 809277

attributes index indexing data index attributes data indexing metadata file miqs time search evaluation memory files hdf5 amp path

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Authors: Wei Zhang" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Authors:

Wei Zhang Texas Tech University Suren Byna Lawrence Berkeley National Laboratory Houjun Tang Lawrence Berkeley National Laboratory Brody Williams Texas Tech University Yong Chen Texas Tech University

Topic:

MIQS: Metadata Indexing and Querying Service for Self-Describing File Formats

The 31st International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’19, Denver, CO) 

Date:

November

19

th

,

2019

Slide2

Data Management in Scientific Applications

Scientific ApplicationsExperimentsObservationsEver-increasing DataSize of the filesNumber of the filesVariety of the files

Slide3

Self-describing Data Formats

Self-describing Data FormatsMetadata is stored alongside the data objectsHDF5, netCDF, ADIOS-BP, ASDF, etc.

Slide4

Metadata Search over HDF5 File Collection

File System HierarchyDirectoryHDF5 Files+ Object HierarchyGroupsDatasets=HDF5 File Collection HierarchyFind data objects with metadata attribute “brightness = 60”Find data files with any data object on which attribute

“temperature = 50”

Slide5

Metadata Search by Scanning Data Files

DrawbacksTime-consuming file scanning processSize of the data objectsNumber of the data files

Slide6

Metadata Search with DatabaseViolating self-contained data management principle

Deployment EffortData Model AdaptionMaintenance DemandStorage RedundancyPerformance IssuesPoor Portability & MobilityNameTypeData SourceDB SystemDB Type

SPOT Suite

TomographyHDF5MongoDBNoSQL

JAMOGenomicsHDF5MongoDBBIMM

Biomedical

Biomedical Image

MySQL

RDBMS

EMPRESS

2.0

General

HDF5,

netCDF

SQLite

Slide7

A New Era of Metadata Search

Self-contained Metadata IndexMinimal Complexity:Metadata schemaMetadata search processPortability & MobilityMinimal Storage RequirementPerformance GainsDirect Access to Metadata Index

Slide8

MIQS – Metadata Indexing and

Querying ServiceOverviewIn-memory IndexIndex PersistenceIndex LoadingMetadata Search ServiceHighlightsMinimal ComplexityPortability & MobilityMinimal Storage RequirementPerformance Gains

Slide9

In-memory Index

Path ListsIDs instead of repetitive path strings.ART – Adaptive Radix TreeSave space for strings with common prefixExact queryAffix-based querySBST – Self-balancing Search TreeEfficient exact queryPossible range query

Slide10

Index Construction – Initial Indexing

Process r only index when file_counterr % n == rRecursive scanfile_counterNumber of files encounterednNumber of processesrProcess rank

Slide11

Index Construction – Compact Index File

Making mobility possibleSmall index filesKeeping compact layoutGrouping value blocks in attribute blockGrouping FOIPs in value blockUse ID instead of path strings

Slide12

Index Construction – Index File Read/Write

Index PersistenceProcess r  index file r Retrospective loadingProcess r  (n+r-s-1)%n (s < n-1)Index RecoveryProcess r  (n+r-s-1)%n (s < n)

Slide13

Serving Queries

Attribute name  ARTNumeric value  SBST.String value  ARTRetrieve list of FOIPsFile paths  File Path ListObject paths  Object Path ListBrightness=50Author=“Oscar”BrightnessAuthorOscar50[(0,1)][(0,1)]”/home/Oscar/data/test.hdf5”,”/2019/05/2/B/pixel.fit”

Slide14

Evaluation – Platform & Control

ExperimentMIQS Evaluation PlatformSupercomputerEdisonCPU12 * Ivy Bridge @2.4GHzMemory64GB DDR3 1866Network23.7TB/s global bandwidthLustre30PB @ 700GB peak I/OMIQS v.s. MongoDBNoSQLFlexible Data SchemaState-of-the-artMongoDB Evaluation PlatformHost machineDifferent from EdisonCPU16 * Haswell @2.3GHzMemory128GB DDR4 2133Network56Gb/s bandwidthHDD6TB 7200rpm 6Gb/s SASMongoDB Storage EngineWiredTiger with data compression

Slide15

Evaluation – Dataset

100 HDF5 files Baryon Oscillation Spectroscopic Survey(BOSS)145 GB 144 million attributes1.5 million data objects.

Slide16

Evaluation – Indexing Time

16 attributes in MongoDB5-9 min16 attributes in MIQS50% Indexing Time Reduction (initial indexing)99% Indexing Time Reduction (index recovering)You can also:Index all attributes in MIQS in 8-14min.Recover index of all attributes within 2min.

Slide17

Evaluation – Indexing Time (Break-down)

Scanning Time : roughly equal (MIQS v.s. MongoDB)MongoDBInserting BSON (3 - 6min)MIQS 16 attributesIn-memory index: 0.5 – 1minPersistent index : ignorableMIQS all attributes:In-memory index: 5 – 8minPersistent index : ignorableMIQS index recovery16 attributes: < 40sAll attributes: < 2min.MongoDB Indexing Time (16 attributes)MIQS Indexing Time

(16 attributes)

MIQS Indexing Time (all attributes)MIQS Index Recovery Time

Slide18

Evaluation – Query Performance

LatencyMongoDB: 5 min at maximum scaleMIQS: 0.29 ms at maximum scaleThroughput:MongoDB: 319 kQPS at maximum scaleMIQS: 363 billion QPS at maximum scaleQuery Latency Comparison (16 attributes)Query Throughput Comparison (16 attributes)

Slide19

Evaluation – Memory Consumption

MongoDB:Up to 4.2GBMIQS:16 attributes: up to 600 MBAll attributes: up to 7.8GBMongoDB Memory Consumption (16 attributes)MIQS Memory Consumption (16 attributes)MIQS Memory Consumption (all attributes)

You

can:Save spaceIndex more attributes

Slide20

ConclusionProblems: No metadata

indexing or Not self-contained.MIQS – Self-contained Metadata Indexing and Querying ServiceBenefits:Minimal ComplexityPortability & MobilityMinimal Storage RequirementsPerformance GainsFuture WorkIntegrating compact index file layout into HDF5More types of queriesPerformance improvementEmbracing a new era of metadata search

Slide21

Follow UpPaper:

Contact Us:DISCL @ TTU:SDM Group @ LBNLhttps://discl.cs.ttu.eduX-Spirit.zhang@ttu.eduBrody.Williams@ttu.eduYong.Chen@ttu.eduhttps://sdm.lbl.govhtang4@lbl.govsbyna@lbl.govhttp://bit.ly/SC19-MIQSACM Digital Library:

Acknowledgement:

Many thanks to the audience and also those paper

reviewers who provided valuable comments.This research is supported in part by the National Science Foundation under grant CNS-1338078, CNS-1362134, CCF-1409946, CCF-1718336, OAC-1835892, and CNS-1817094. This work is sup- ported in part by the Director, Office of Science, Office of Advanced Scientific Computing Research, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. (Project: EOD-HDF5: Experimental and Observational Data enhancements to HDF5, Program managers: Dr. Laura Biven and Dr. Lucy Nowell). This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility.

Slide22

Slide23