/
Integration and validation of a data grid software Integration and validation of a data grid software

Integration and validation of a data grid software - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
413 views
Uploaded On 2016-10-14

Integration and validation of a data grid software - PPT Presentation

N Carenton Madiec 1 S Denvil 1 K Berger 2 A Cofino 3 1 CNRSIPSL 2 DKRZ 3 UNICAN How to improve quality of deliverables in a collaborative effort context ID: 475582

data node esgf system node data system esgf test hpc python center integration tests grid services suite identity provider

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Integration and validation of a data gri..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Integration and validation of a data grid software

N.

Carenton-Madiec 1, S. Denvil 1, K. Berger 2, A. Cofino 3

1CNRS/IPSL 2DKRZ 3UNICAN

How to improve quality of deliverables in a collaborative effort context?

The Earth System Grid Federation

The Earth System Grid Federation (ESGF) is a collaboration of groups, agencies and institutions around the world, that are dedicated to the development and operation of a long-term system for the management, access and analysis of climate data produced

by

CMIP,

CORDEX or PMIP

in

HPC centers. Some of the challenges that ESGF is committed to address include:

A System of Distributed Nodes

Data Node

Index Node

Identity Provider

Compute Node

Node

Node

Node

Node

Node

Node

Client

A Graduated Set of Services

WEB PORTAL

DATA INDEXING &SEARCH

DATA

PUBLISHING

DATA ACCESS

ACCESS CONTROL

ANALYSIS & VISUALIZATION

NODE REGISTRY

USER REGISTRATION

NODE MGR

Node

An Integrated Software Stack

Compute Node:

Live Access Server

Identity Provider

:

OpenID

Identity Provider,

Globus

Simple CA,

Globus

MyProxy

Server

Index Node

: Apache

Solr

, ESGF Search, ESGF Web Portal

Data Node

: Node Manager, Publisher,

Postgres

,

Thredds

Data Server, Security Filters, Security Services,

GridFTP

ESGF Test Suite – A single tool for multiple purposes

Developers community is spread around the world

Administrators community is spread around the world

A Worldwide Infrastructure & Team

Developers are specialized in one module of the stack

Changes are made independently from each other

ESGF TEST FEDERATION

ESGF TEST SUITE

Need for Integration & Validation Tools

A Collaborative Project

Designed to perform high level tests on ESGF nodes from the user’s perspective. The scope is to test a single data node and its three peer services (Identity Provider, index and compute services) Parallelized runs of the test suite on each node gives a status of the whole federation.

EGU 2014 – ESSI2.8 – Earth Science on Cloud, HPC and Grid

Technologies

Python Nose: A testing framework where every test is written as an independent function, class or module

Python Requests: HTTP for humans

Python

MyProxyClient

:

Globus

MyProxy

Support

Python

Subprocess

: Spawns system processes

Python Selenium: Automates browser actions

Python

MultiProcessing

: Parallelizes tasks

Integration Tests

Non regression Tests

Post Deployment Tests

Monitoring

For Developers

For

Admins

Outlook

Additional tests ran from the server side would bring lower level sanity checks

A test suite is a requirement to set up a continuous integration system that facilitates deployment and improves stability

A continuous integration system is a requirement to set up a continuous deployment system that improves reactivity

The enormous scale of the data holdings, moving from

Peta

-bytes to

Exa

-bytes

Support for both model output and a wide variety of observational data

The distributed nature of the data archives, which are geographically distributed and autonomously operated

HPC Center #1

HPC Center #3

HPC Center #2

HPC Center #4

HPC Center #5