Johannes Reetz EUDAT VAMP workshop Helsinki 30 Sep 2013 Challenges and Approaches The CDI concept Collaborative Data Infrastructure Trust Data Curation Data Generators Users Common Data Services ID: 812836
Download The PPT/PDF document "EUDAT AAI for a Collaborative Data Infra..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
EUDAT
AAI for a Collaborative Data Infrastructure
Johannes Reetz, EUDATVAMP workshopHelsinki, 30 Sep 2013
- Challenges and Approaches -
Slide2The CDI conceptCollaborative Data Infrastructure
Trust
Data CurationData GeneratorsUsersCommon Data ServicesCommunity Support Services
User-focused functionality, data capture & transfer, VREs
Data discovery & navigation, workflow creation, annotation, interpretability
Persistent storage, identification, authenticity, workflow execution, mining
2
Slide3EPOS: European Plate Observatory
SystemCLARIN: Common Language Resources and Technology InfrastructureENES: Service for Climate Modelling in Europe
LifeWatch: Biodiversity Data and ObservatoriesVPH: The Virtual Physiological HumanINCF: International NeuroinformaticsAll share common challenges:Reference models and architecturesPersistent data identifiersMetadata managementDistributed data sourcesData interoperabilityInitially s
ix research communities on Board
3
Slide4Communities
and
Data CentersIdentifying basic requirementsIdentify
commonalities,
common data services
Slide5What community users see …
Today
commutity dataCommunity LayerCommunity specific authentication, authorization & single sign-on
Community portal,
single credential type
Slide6common
metadata
explorationcommon data stage-in and stage-out servicesdata services for the long tail data, also from citizen scientistscommon replication services with access to distributed storageUnified Authentication, Authorization & Single Sign-OnOtherveryusefuldata
Tomorrow
community data
commutity
data
What community users see …
Various
c
ommunity
p
ortals,
different credential types
EUDAT portal, for
non-affiliated users, many credential types
Slide7from: Analysis of
the FIM doc (v0.7, L. Florio et al. 2013
)User friendliness (high)Browser & non-browser federated access (high)Multiple technologies with translators including dynamic issue of credentials (medium)Bridging communities (medium)Implementations based on open standards and sustainable with compatible licenses (high)Different Levels of Assurance with provenance (high)Authorisation under community and/or facility control (high)Attributes must be able to cross national borders(high)Well defined semantically harmonised attributes(medium)Flexible and scalable IdP attribute release policy(medium)EUDAT supports
these
requirements
, but
emphasizes
#3, #4
and
#9
(high)
(high)
(high)
Slide8EUDAT Sites
g
eneral
data
centres
(
replica
)
storages
community
centres
repositories
Slide9Safe Replication Service
Robust, safe and highly available data replication service for small- and medium- sized repositoriesTo guard against data loss in long-term archiving and preservation
9EUDAT CDI Domain of registered data
Data center
store
Data center
store
Data center
store
Community
repository
PIDs
•
Policy
rules
To optimize access for user from different regions
To bring data strategically closer to systems for powerful compute-intensive analysis
PIDs are used to keep track on location
and
can provide attributes
Slide10Use Case: CLARIN – Safe Replication
EPIC PID
registry
Slide11VPH / VIP
diXa
INCF
EPOS / PP
WG7
CLARIN /
Replix
ENES
/CMIP5,IPCC-AR5
Safe Replication “islands”
CLARIN / CUNI
CLARIN / CUNI
EPOS / Orpheus
NeuGrid
community
centres
repositories
g
eneral
data
centres
replica
storages
Slide12Data Staging Service
Support researchers in transferring large data collections from EUDAT storage to HPC facilitiesReliable, efficient, and easy-to-use tools to manage data transfers
12EUDAT CDI Domain of registered data
Data center
store
Data center
store
PRACE
HPC
HPC
Provide the means to ingest computational results into the repository via the EUDAT infrastructure
Slide13EUDAT Services (1)
13
Safe Replication ServiceReplicating Data Objects (DO) from a Repository to Replica StoragesRepository & Replica Storage belong to separate administrative zonesRegistration of Original DO and ReplicaPID / object
identifier
Service
Create DO
handles
Manages
/
Maintain
DO
handles
Resolve
DO
handles
Data
Staging
Service
Replication of Data from the
domain
of
registered
data
(Stage-Out)
Replication
of
data
objects
into
the
domain
of
registered
data
(Stage-In)Replication
of not-registered Data Objects between
scratch storages
Slide14Service specific actors/
actions (1)
14Safe Replication ServiceRepository Data Manager replicatesReplica Storage Manager registers DOs1) (community) user access data via repository2) User access data via replica storagePID (Handle) ServiceRepository Data Manager: creates/manages primary object handleReplica Storage Manager: creates/manages secondary
object
handles
Users
and
others
resolves
the
location
of the physical storage the
handles (PIDs)Data StagingUsers access
and fetch data from either the repository
or
the
replica
storage
User
ingest
new
data
into
the
repository
EUDAT CDI
Domain
of
registered
data
Data center
store
Data center
store
Data center
store
Community
repository
PIDs
•
Policy
rules
Slide15Simple Store for ”long-tail” data and the Citizen scientists
Allow registered users to upload ”long tail” data into the EUDAT storeEnable sharing objects and collections with other researchers
EUDAT CDI Domain of registered data
Simple
store
portal
Simple
upload
Simple
metadata
PID
registration
Data center
store
Data center
store
Data center
store
Utilise
other EUDAT services to provide reliability and data retention
PIDs are assigned to uploaded DO
Slide16Definition of the data sets as objects for entitlement
Find and define collections of scientific data – generated either by various communities or via EUDAT services (e.g. facetted search)
Access those data collections through the given references in the metadata to the relevant data stores EUDAT CDI Domain of registered data
Data center
store
Community
repository
Community
repository
Data center
store
Metadata
portal
Joint Metadata Service
Slide17EUDAT Services (2)
17
Simple Store ServiceRepository for registered data with metadata for the sharingDigital objects are registered (handles are assigned)Fragmented User Group: many communities & „citizen
scientists
“
are
contributing
and
retrieving
data
EUDATbox
Service
Temporary
shareable
storage space for data, not necessarily
registered
User
deposits
data
– not
necessarily
with
metadata
Not a
homogeneous
user
group
:
many
communities
, „
citizen
scientists“
(Joint)
Metadata ServiceMetadata
from various
repositories are
harvested and
collected
Metadata
exploration
,
facetted
search
:
result
sets
define
data
set
for
entitlement
Slide18Service specific actors/
actions (2)
18Simple Store (Repository)Users deposit data and metadataUser search for and access dataRepository Storage Manager (needs to create the handle service)EUDAT boxUser deposit dataUser shares data by
inviting
other
users
User
access
data
(
Joint)
Meta
Data Service
Manager harvests metadata from (many)repositories
also via the replica site
EUDAT CDI
Domain
of
registered
data
Data center
store
Community
repository
Community
repository
Data center
store
Metadata
portal
Slide19Attribute Provider
AuthZ either community-managed or ( ) attributes provided by user’s home IdP are reused
Communities
Identity
credential
conversion
AtP
1
AtP
2
AtP
3
z
oned
credential
conversion
service
unique
user
Ids
,
project-wise
mapped
to
a
ttribute
b
ased
access
c
ontrol
information
Different types of
Identity Providers
AuthN
*
c
onsolidated
credentials
IdP
A
IdP
B
IdP
D
IdP
C
eID
shib
OpenID
x.509
*
20
Slide21EUDAT AAI-TF approach
21
ConSec: Contrail Security code
Slide2222
The Figure shows the high level view: SAML is used for authentication (possibly translated from OpenID (not shown));
OAuth (version 2) is used for delegation (internally, within the federation), and XACML is used for access control policies. Control (in the workflow sense) roughly goes from left to right and from top to bottom. Internally, an X.509 certificate with authorisation attributes is generated; this certificate is also managed internally and thus not usually exposed to (or accessible by) the user. Its purpose is threefold: (a) to ensure that non-HTTP services can be accessed (i.e., outside the OAuth delegation workflow), such as GridFTP and iRODS, and (b) to allow fine-grained authorisation, and (c) to allow command line access to services for expert users. In OAuth, the authorisation server remains the central hub where access is delegated. However since, EUDAT needs finer grained access, so the generated X.509 certificate carries also authorisation attributes (see below), which are checked against pre-defined access policies. The system deployed and used by EUDAT was built by the Contrail project, so we are reusing the Contrail Security (ConSec) code and tools developed within this pilot project. This decision was based on the evaluation of options, where ConSec promised most of the features required by the EUDAT communities. EUDAT is currently running a ConSec authentication infrastructure for integration at FZJ. EUDAT is currently not running an authorisation infrastructure.