/
Repositories, Workspaces, Web Services Repositories, Workspaces, Web Services

Repositories, Workspaces, Web Services - PowerPoint Presentation

atomexxon
atomexxon . @atomexxon
Follow
343 views
Uploaded On 2020-10-22

Repositories, Workspaces, Web Services - PPT Presentation

some ideas Peter Wittenburg The Language Archive Max Planck Institute CLARIN Research Infrastructure Nijmegen The Netherlands scope of workshop clear focus on technology and architecture issues for preservation and access ID: 814587

2010 data repository access data 2010 access repository rights architecture metadata domain paths amp collections mirror api repositories registry

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Repositories, Workspaces, Web Services" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Repositories, Workspaces, Web Services

- some ideas -

Peter

Wittenburg

The Language Archive - Max Planck Institute

CLARIN Research InfrastructureNijmegen, The Netherlands

Slide2

scope of workshop

clear focus on technology and architecture issues for preservation and access

many other issues not in focus although relevant

IPR

, license issues only partially

quality of data & metadata certification (RAC, DSA

, etc

)

AAI

cost

aspects

etc.

let's have interactive presentations

should be able to extract

essentials

Slide3

Definitions?

Slide4

so simple

repository

Slide5

- orange

- 2010

- plum

- 2010

- pear

- 2010

- apple

- 2010

+ Metadata

repository

metadata registry

?

?

dangerous since physical paths may change etc

Slide6

- orange

- 2010

- plum

- 2010

- pear

- 2010

- apple

- 2010

+ replication due to preservation

repository

metadata registry

repository

?

?

dangerous since metadata records can be re-used

metadata should be stable

transfer at physical level

Slide7

- orange

- 2010

- plum

- 2010

- pear

- 2010

-

appel

- 2010

+ replication and

PIDs

repository

metadata registry

repository

-

PID4

- 2010

-

PID3

- 2010

-

PID2

- 2010

-

PID1

-

URL1

- URL 2

PID

registry

?

dangerous: another indirection layer

transfer at physical level

access possible

which rights?

same access rights

Slide8

- orange

- 2010

- plum

- 2010

- pear

- 2010

what about collections

repository

metadata registry

repository

-

PID4

- 2010

-

PID3

- 2010

-

PID2

- 2010

-

PID1

-

URL1

- URL 2

PID

registry

transfer at physical level

- collection

- 2010

-

appel

- 2010

-

PIDx

- URL

PS:

collections are

dynamic

Slide9

topic of

high relevance

ESFRI

Task Force on

Repositories (report)

e-IRG/ESFRI Task Force on Data Management (report)

Blue Ribbon Task Force on Sustainable Digital Preservation and

Access (report)

EC High Level Expert Group on Scientific

Data (report)

ASIS&T

Summit Phoenix on Research Data and Access

(slides & summary)

T. Hey et al. The Fourth Paradigm: Data-Intensive Scientific

Discovery (book)

Slide10

summarizing the challenges

how to

manage the data Tsunami

maintain data visibility

preserve

the data (just seen one solution) protect the data integrity

ensure that we get the object we wanted

guarantee data authenticity

(how to present)

maintain context and provenance information

protect privacy and rights in complex data world

maintain

trust in data

federate repositories to (virtually) integrate data

achieve (partial) interoperability

exploit distributed data without copying

Slide11

speaking about

metadata harvesting

Access Data

With Extraction and Analysis, Through Catalog

Direct to Partner Sites

View Information on Data

Through Catalog

Link to Data at Partner Site

Search Shared Catalog

Data Mirror

Metadata

Catalog

Harvester

Online Catalog

Online Analysis

Slide12

speaking about architectures

Slide13

speaking about federations

Slide14

speaking about federations

Slide15

speaking about federations

Slide16

general configuration

repository A

- architecture

- rights domain

- access paths

- etc.

mirror repository X

- architecture

- rights domain

- access paths

- etc.

adapter(s)

adapter(s)

repository B

- architecture

- rights domain

- access paths

- etc.

adapters

repository C

- architecture

- rights domain

- access paths

- etc.

adapters

mirror repository Y

- architecture

- rights domain

- access paths

- etc.

adapters

mirror repository Z

- architecture

- rights domain

- access paths

- etc.

adapters

can be special

does not scale

Slide17

general configuration

repository A

- architecture

- rights domain

- access paths

- etc.

mirror repository X

- architecture

- rights domain

- access paths

- etc.

API

API

repository B

- architecture

- rights domain

- access paths

- etc.

API

repository C

- architecture

- rights domain

- access paths

- etc.

API

mirror repository Y

- architecture

- rights domain

- access paths

- etc.

API

mirror repository Z

- architecture

- rights domain

- access paths

- etc.

API

replication layer

Slide18

generic

HLEG

figure

Data generators

Users

Common Data Services

Community Support Services

Data Curation

User functionalities

Data capture & transfer

Virtual Research Environments

Data discovery & navigation

Workflow generation

Annotation, Interpretability

Safe & persistent storage

Identifiers, Authenticity, Workflow execution, Mining

Trust

Slide19

requirements for

intermediate layer

needs to cope with large diversity of solutions and architectures

may only minimally interfere with local repository solutions

(too much has been invested along community traditions)

needs to respect rights domains and preserve access rights needs to be transparent to proven utilization mechanisms needs to operate at logical level (canonical collections)

needs to scale with

number of (community)

data centers

only one way to go:

separate functionality into independent components

(data, metadata,

PIDs

, etc)

specify proper interfaces (of course)

Slide20

requirements for layer

how to manage procedures/workflows in complex landscape

how to assess quality and correctness of all workflows

how to maintain provenance information

only one way to go

make use of an easy-to-interpret declarative language establish proper "policy rules on all levels" map these rules to robust and proven activities

separate declarative language from interpretation engine

iRODS

is an attempt in this direction

respect to Reagan Moore and his team

at

MPI

since some years such a declarative language to manage

access rights for the million objects which need to be treated individually and which are part of collections

Slide21

Reagan's

data environments

moving not bytes but collections

need to maintain integrity of collections (incl. relations)

collections are assembled for a certain purpose

collections have properties to ensure their purposepolicies ensure maintenance of propertiesprocedures implement policies procedures result in state informationassessment step to validate state

purpose, properties, policies, procedures, state info

Slide22

program - 1st part

Larry

Lannom

(

CNRI): about a digital object architecture Alex Wade (MS): approach from MS

Malte Dreyer: thoughts about generic APIJohn Kennedy: heterogeneity of repositories in DEISAKen Galluppi: federating several repositories

Willem

Elbers

: federation tests with

iRODS

Jean-Yves

Nief

:

iRODS

in professional use

Peter and Johannes: summary + discussion

Slide23

utilization challenge

utilization software may not be

affected by

replication

utilization software should also make use of copies

any replication solution needs to demonstrate this !!!!

existing

utilization

software

Slide24

work spaces and profiles

users want to

store data

protect data

share data

enrich datachange dataetc.data is somewhere in this complex domain users want transparent

access

how to get this done?

profiles

attributes

quotas

etc

Slide25

processing chains - specification

data

metadata

registries

tool

metadata

registries

data

operation

data*

operation

workflow specification framework

this is very discipline specific - various possibilities

curation

/annotation/enrichment/visualization pipelines, etc

Slide26

processing chains - execution

workflow execution framework

Slide27

the challenges

large amounts of data is at mirroring repositories

let's execute operations on the mirroring sites

how to easily deploy operators

how to inform execution environment about invocation way

how to let them act on the user's behalf etc

Slide28

program - 2nd part

SARA colleagues: workspace in NL

Morris Riedel (

FZJ

): workspace ideasJohannes & John (

RZG): operational aspects Thomas & Erhard (U Tübingen): WebLicht exampleMike Papazoglou

(U Tilburg): generic

SOA

aspects

Peter: wrap up and discussion

Slide29

thanks for the attention