/
DAP, ERDDAP, and Tabular (Sequence) Datasets DAP, ERDDAP, and Tabular (Sequence) Datasets

DAP, ERDDAP, and Tabular (Sequence) Datasets - PowerPoint Presentation

tawny-fly
tawny-fly . @tawny-fly
Follow
349 views
Uploaded On 2018-11-10

DAP, ERDDAP, and Tabular (Sequence) Datasets - PPT Presentation

Try it httpcoastwatchpfegnoaagoverddap Bob Simons ltbobsimonsnoaagovgt NOAA NMFS SWFSC ERD OBIS SOS Custom DAP ERDDAP Database ERDDAP Files ID: 727625

dap amp longitude latitude amp dap latitude longitude data time datasets tabular erddap 2014 table owner sequences 00z dataset

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "DAP, ERDDAP, and Tabular (Sequence) Data..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

DAP, ERDDAP, andTabular (Sequence) Datasets

Try it: http://coastwatch.pfeg.noaa.gov/erddapBob Simons <bob.simons@noaa.gov>NOAA NMFS SWFSC ERD

OBIS SOS Custom DAP ERDDAP ...Database ERDDAP Files

Your Favorite Client SoftwareSlide2

My Goals for this Presentation

Tell you more about ERDDAP.Raise awareness and appreciation of tabular data.Convince you that tabular datasets are best served as DAP sequences.And that serving them in DAP as 1D or 2D gridded datasets is a bad idea.

(This has nothing to do with how they are stored.)Bonus: 3 powerful ideas: Abstractions (capture the essence; hide the instance details)Representations (different file formats)Reusability (value is multiplied)Slide3

1) ERDDAPSlide4
Slide5
Slide6

ERDDAP Features

(Re)serves diverse local and remote datasets Abstraction: thanks to DAP, the source differences are hidden.Serves gridded and tabular datasetsOffers a unified place to search for datasetsFull-text, category-based, or advanced.

Encourages improved metadataSo users can understand the dataset.Offers a standard way to request data from any datasetFor humans: forms on web pages.For computers: DAP, WMS, (SOS) web services.Offers a choice of response file formatsDifferent

representationsStandardizes time formats (Here, different representations are trouble.)

As Strings - ISO 8601:2004(E), e.g., 2014-07-01T20:00:00ZAs numbers - seconds since 1970-01-01T00:00:00Z

Is reusable.Slide7

2) Tabular DataSlide8

Tabular Datasets

Tabular data sources: databases, OBIS, SOS, CSV files, flat .nc files, CF DSG .nc files, ...GeospatialCF Discrete Sampling

Geometry (DSG) feature types: Point: whale sightingsProfile: disposable CTDTimeSeries: moored buoyTimeSeriesProfile: CTDTrajectory: shipTrajectoryProfile

: profiling gliderNon-Geospatiallaboratory data, references, fish disease lists, ecosystem: what eats what, ...

Larry Ellison is rich because databases are reusable for numerous types of data.Slide9

(ERD)DAP Data Requests:Gridded vs. Tabular Datasets

Gridded Datasets (DAP projection constraints)DAP: ?temperature[437] [46:1:162][122:282]ERDDAP: ?temperature[(2014-07-01)][(22):(51)][(-145):(-105)]

Tabular Datasets (DAP selection constraints)DAP: ?s.id,s.owner,s.time,s.latitude,s.longitude,s.wtemp&s.id="sp031"&s.time>=1404172800ERDDAP: ?id,owner,time,latitude,longitude,wtemp&id="sp031"&time>=2014-07-01

id

owner

typetime

latitudelongitudewtemp

atmp46088

NDBC3m Discus

1993-06-01T14:20:00Z48.336

-123.15916.418.0

46088NDBC

3m Discus

1993-06-01T14:50:00Z

48.336

-123.159

16.5

18.2

...

...

...

...

...

...

...

...

SANF1

SFSU

C-MAN

1968-10-14T16:00:00Z

24.456

-81.877

15.8

14.9

SANF1

SFSU

C-MAN

1968-10-14T17:00:00Z

24.456

-81.877

15.8

14.8

...

...

...

...

...

...

...

...Slide10

(ERD)DAP Sequence Requests vs. Database SQL Requests

(ERD)DAP: ?id,owner,type,time,latitude,longitude,wtemp&id="46088"&time>=2014-07-01SQL: SELECT id,owner,type,time,latitude,longitude,wtemp

FROM s WHERE id="46088" AND time>=2014-07-01Pablo Picasso: "Good artists copy, great artists steal."Slide11

Related Tables vs. One Table

idowner

typelatitudelongitudetimewtempatmp

46088NDBC3m Discus

48.336-123.159

1993-06-01T14:20:00Z

16.418.0

46088NDBC3m Discus

48.336-123.159

1993-06-01T14:50:00Z

16.518.2

.........

...

...

...

...

NC312

NCSU

C-MAN

24.456

-81.877

1968-10-14T16:00:00Z

15.8

14.9

NC312

NCSU

C-MAN

24.456

-81.877

1968-10-14T17:00:00Z

15.814.8.....................

idtimewtempatmp460881993-06-01T14:20:00Z16.418.0460881993-06-01T14:50:00Z16.518.2............NC3121968-10-14T16:00:00Z15.814.9NC3121968-10-14T17:00:00Z15.814.8............

idownertypelatitudelongitude46088NDBC3m Discus48.336-123.15941005NDBC6m Discus32.501-79.099BP114BP3m DIscus36.905-75.713NC312NCSUC-MAN24.456-81.877...............

Join (

Denormalized)

Buoy Table

Observation Table

NormalizedSlide12

Yeah, but why doesn't ERDDAP support nested sequences?

It does, but just internally.ERDDAP (re)presents the dataset as a single table.One table is an abstraction. It hides details.

The average user understands a table.One vs. many tables: just different representations.This lets all tabular datasets have the same structure. The results of a DAP or SQL query is always one table.There are many file format representations of one table.Slide13

3) Tabular datasets are bestserved as DAP

sequences.(Why DAP Sequences Rock!)And that serving them in DAP as 1D or 2D gridded datasets is a bad idea.(This has nothing to do with how they are stored.)Slide14

Why Sequences Rock! Reason #1

If the data is coming from a relational database, OBIS, or SOS, the dataset can't be served as a gridded dataset.There are no index (row) numbers.It isn't easy/possible to know how many rows there are.The

order of the rows may change at any time.New rows are added as new data arrives: frequently.Slide15

Why Sequences Rock! Reason #2

Serving tabular data in DAP as 1D or 2D gridded datasets is a bad idea. Logic: Men:mortal. Socrates:man.

Socrates:mortal. Grids:handled well by DAP. Treat table as:grid. Treat table as grid:handled well?Grid dimensions usually represent a physical continuum.DAP: ?temperature[408:437

][46:1:162][122:282]ERDDAP: ?temperature[(

2014-06-01):(2014-06-30)][(22):(51)][(-145):(-105)]No arrangement of tabular dataset dimensions works well

2D [buoy][time]: buoy is not a continuum, time leads to wasted space1D [time]: fine, but then you need 1000 datasets (1 per buoy)

1D [row]: aggregated, but row isn't a continuum. In every case, it's hard to know which rows to request.

The rows you want are scattered through the dataset.so you have to either download everything or make numerous requests.Serving a DSG file directly: too many formats, too hard to query.Slide16

Why Sequences Rock! Reason #3

DAP sequence requests use the terminology of the dataset. (It's easy.)?id,owner,type,latitude,longitude&distinct()

?id,type,latitude,longitude&owner="NDBC"&distinct()?id&latitude>=22&latitude<=55&longitude>=-145&longitude<=-105&distinct()?id&latitude>=22&latitude<=55&longitude>=-145&longitude<=-105&time>=2014-07-01&distinct()

?&latitude>=22&latitude<=55&longitude>=-145&longitude<=-105&time>=2014-07-01

index

id

owner

typelatitude

longitudetime

wtempatmp1

46088NDBC

3m Discus48.336-123.159

1993-06-01T14:20:00Z

16.4

18.0

2

46088

NDBC

3m Discus

48.336

-123.159

1993-06-01T14:50:00Z

16.5

18.2

137522

BP114

BP

3m

Discus

36.905

-75.7132003-02-09T02:00:00Z16.712.2137523BP114BP3m discus36.905

-75.7132003-02-09T04:00:00Z16.612.01732156NC312NCSUC-MAN24.456-81.8771968-10-14T16:00:00Z15.814.91732157NC312NCSUC-MAN24.456-81.8771968-10-14T17:00:00Z15.814.8328245941005NDBC6m Discus32.501-79.0901984-08-22T14:20:00Z14.626.8

328246041005NDBC6m Discus32.501-79.0901984-08-22T14:50:00Z14.726.2Making these requests with index numbers is a difficult (not for Roberto), multi-step, programming task. And it's inefficient.Slide17

Why Sequences Rock! Reason #4

Because declarative languages (SQL, DAP selection constraints) let you describe what you want, not how to get it.?

id,owner,type,latitude,longitude&distinct()?id,type,latitude,longitude&owner="NDBC"&distinct()?id&latitude>=22&latitude<=55&longitude>=-145&longitude<=-105&distinct()?id&latitude>=22&latitude<=55&longitude>=-145&longitude<=-105&time>=2014-07-01&distinct()

?&latitude>=22&latitude<=55&longitude>=-145&longitude<=-105&time>=2014-07-01With

imperative languages (C, Fortran, Java, Python), you must describe, step-by-step, how to solve the problem.

1) Request all latitudes.2) Filter

3) Request all longitudes.4) Multiple requests because data is scattered throughout the dataset.Slide18

Why Sequences Rock! Reason #5

Because the other options all suck. Serving the datasets as grids doesn't work.You now understand why, right?Serve the data files via FTP. Getting a chunk of data is all or nothing. Makes user deal with various file formats.

Custom forms and web services are too much work to make.Custom: 6+ months per dataset? Ongoing maintenance. No consistency! Reusable: 1 day, minimal maintenance, consistent!Give trusted colleagues access to the database or the files.

That's not making the data public!Don't let anyone else use the data.This is actually the #1 method of fisheries data distribution.Slide19

My Goals for this Presentation

Tell you more about ERDDAP.Raise awareness and appreciation of tabular data.Convince you that tabular datasets are best served as DAP sequences.And that serving them in DAP as 1D or 2D gridded datasets is a bad idea.

(This has nothing to do with how they are stored.)Bonus: 3 powerful ideas: Abstractions (capture the essence; hide the instance details)Representations (different file formats)Reusability (value is multiplied)Slide20

Thank you!