some ideas Peter Wittenburg The Language Archive Max Planck Institute CLARIN Research Infrastructure Nijmegen The Netherlands scope of workshop clear focus on technology and architecture issues for preservation and access ID: 814587
Download The PPT/PDF document "Repositories, Workspaces, Web Services" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Repositories, Workspaces, Web Services
- some ideas -
Peter
Wittenburg
The Language Archive - Max Planck Institute
CLARIN Research InfrastructureNijmegen, The Netherlands
Slide2scope of workshop
clear focus on technology and architecture issues for preservation and access
many other issues not in focus although relevant
IPR
, license issues only partially
quality of data & metadata certification (RAC, DSA
, etc
)
AAI
cost
aspects
etc.
let's have interactive presentations
should be able to extract
essentials
Slide3Definitions?
Slide4so simple
repository
Slide5- orange
- 2010
- plum
- 2010
- pear
- 2010
- apple
- 2010
+ Metadata
repository
metadata registry
?
?
dangerous since physical paths may change etc
Slide6- orange
- 2010
- plum
- 2010
- pear
- 2010
- apple
- 2010
+ replication due to preservation
repository
metadata registry
repository
?
?
dangerous since metadata records can be re-used
metadata should be stable
transfer at physical level
Slide7- orange
- 2010
- plum
- 2010
- pear
- 2010
-
appel
- 2010
+ replication and
PIDs
repository
metadata registry
repository
-
PID4
- 2010
-
PID3
- 2010
-
PID2
- 2010
-
PID1
-
URL1
- URL 2
PID
registry
?
dangerous: another indirection layer
transfer at physical level
access possible
which rights?
same access rights
Slide8- orange
- 2010
- plum
- 2010
- pear
- 2010
what about collections
repository
metadata registry
repository
-
PID4
- 2010
-
PID3
- 2010
-
PID2
- 2010
-
PID1
-
URL1
- URL 2
PID
registry
transfer at physical level
- collection
- 2010
-
appel
- 2010
-
PIDx
- URL
PS:
collections are
dynamic
Slide9topic of
high relevance
ESFRI
Task Force on
Repositories (report)
e-IRG/ESFRI Task Force on Data Management (report)
Blue Ribbon Task Force on Sustainable Digital Preservation and
Access (report)
EC High Level Expert Group on Scientific
Data (report)
ASIS&T
Summit Phoenix on Research Data and Access
(slides & summary)
T. Hey et al. The Fourth Paradigm: Data-Intensive Scientific
Discovery (book)
Slide10summarizing the challenges
how to
manage the data Tsunami
maintain data visibility
preserve
the data (just seen one solution) protect the data integrity
ensure that we get the object we wanted
guarantee data authenticity
(how to present)
maintain context and provenance information
protect privacy and rights in complex data world
maintain
trust in data
federate repositories to (virtually) integrate data
achieve (partial) interoperability
exploit distributed data without copying
Slide11speaking about
metadata harvesting
Access Data
With Extraction and Analysis, Through Catalog
Direct to Partner Sites
View Information on Data
Through Catalog
Link to Data at Partner Site
Search Shared Catalog
Data Mirror
Metadata
Catalog
Harvester
Online Catalog
Online Analysis
Slide12speaking about architectures
Slide13speaking about federations
Slide14speaking about federations
Slide15speaking about federations
Slide16general configuration
repository A
- architecture
- rights domain
- access paths
- etc.
mirror repository X
- architecture
- rights domain
- access paths
- etc.
adapter(s)
adapter(s)
repository B
- architecture
- rights domain
- access paths
- etc.
adapters
repository C
- architecture
- rights domain
- access paths
- etc.
adapters
mirror repository Y
- architecture
- rights domain
- access paths
- etc.
adapters
mirror repository Z
- architecture
- rights domain
- access paths
- etc.
adapters
can be special
does not scale
Slide17general configuration
repository A
- architecture
- rights domain
- access paths
- etc.
mirror repository X
- architecture
- rights domain
- access paths
- etc.
API
API
repository B
- architecture
- rights domain
- access paths
- etc.
API
repository C
- architecture
- rights domain
- access paths
- etc.
API
mirror repository Y
- architecture
- rights domain
- access paths
- etc.
API
mirror repository Z
- architecture
- rights domain
- access paths
- etc.
API
replication layer
Slide18generic
HLEG
figure
Data generators
Users
Common Data Services
Community Support Services
Data Curation
User functionalities
Data capture & transfer
Virtual Research Environments
Data discovery & navigation
Workflow generation
Annotation, Interpretability
Safe & persistent storage
Identifiers, Authenticity, Workflow execution, Mining
Trust
Slide19requirements for
intermediate layer
needs to cope with large diversity of solutions and architectures
may only minimally interfere with local repository solutions
(too much has been invested along community traditions)
needs to respect rights domains and preserve access rights needs to be transparent to proven utilization mechanisms needs to operate at logical level (canonical collections)
needs to scale with
number of (community)
data centers
only one way to go:
separate functionality into independent components
(data, metadata,
PIDs
, etc)
specify proper interfaces (of course)
Slide20requirements for layer
how to manage procedures/workflows in complex landscape
how to assess quality and correctness of all workflows
how to maintain provenance information
only one way to go
make use of an easy-to-interpret declarative language establish proper "policy rules on all levels" map these rules to robust and proven activities
separate declarative language from interpretation engine
iRODS
is an attempt in this direction
respect to Reagan Moore and his team
at
MPI
since some years such a declarative language to manage
access rights for the million objects which need to be treated individually and which are part of collections
Slide21Reagan's
data environments
moving not bytes but collections
need to maintain integrity of collections (incl. relations)
collections are assembled for a certain purpose
collections have properties to ensure their purposepolicies ensure maintenance of propertiesprocedures implement policies procedures result in state informationassessment step to validate state
purpose, properties, policies, procedures, state info
Slide22program - 1st part
Larry
Lannom
(
CNRI): about a digital object architecture Alex Wade (MS): approach from MS
Malte Dreyer: thoughts about generic APIJohn Kennedy: heterogeneity of repositories in DEISAKen Galluppi: federating several repositories
Willem
Elbers
: federation tests with
iRODS
Jean-Yves
Nief
:
iRODS
in professional use
Peter and Johannes: summary + discussion
Slide23utilization challenge
utilization software may not be
affected by
replication
utilization software should also make use of copies
any replication solution needs to demonstrate this !!!!
existing
utilization
software
Slide24work spaces and profiles
users want to
store data
protect data
share data
enrich datachange dataetc.data is somewhere in this complex domain users want transparent
access
how to get this done?
profiles
attributes
quotas
etc
Slide25processing chains - specification
data
metadata
registries
tool
metadata
registries
data
operation
data*
operation
workflow specification framework
this is very discipline specific - various possibilities
curation
/annotation/enrichment/visualization pipelines, etc
Slide26processing chains - execution
workflow execution framework
Slide27the challenges
large amounts of data is at mirroring repositories
let's execute operations on the mirroring sites
how to easily deploy operators
how to inform execution environment about invocation way
how to let them act on the user's behalf etc
Slide28program - 2nd part
SARA colleagues: workspace in NL
Morris Riedel (
FZJ
): workspace ideasJohannes & John (
RZG): operational aspects Thomas & Erhard (U Tübingen): WebLicht exampleMike Papazoglou
(U Tilburg): generic
SOA
aspects
Peter: wrap up and discussion
Slide29thanks for the attention