/
Highly Available ESGF Services for the Copernicus Climate Data Store Highly Available ESGF Services for the Copernicus Climate Data Store

Highly Available ESGF Services for the Copernicus Climate Data Store - PowerPoint Presentation

reportperfect
reportperfect . @reportperfect
Follow
343 views
Uploaded On 2020-08-28

Highly Available ESGF Services for the Copernicus Climate Data Store - PPT Presentation

Matt Pryor Phil Kershaw Alan Iwi CEDA Sebastien Gardoll IPSL Carsten Ebrecht DKRZ Luca Cinquini NASA JPLUCAR ESGF Container Working Group ESGF F2F Washington DC December 2018 ID: 807977

data esgf node services esgf data services node containerised containers container climate kubernetes index single load application traditional service

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Highly Available ESGF Services for the C..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Highly Available ESGF Services for the Copernicus Climate Data Store

Matt Pryor

, Phil Kershaw, Alan Iwi (CEDA)

Sebastien

Gardoll

(IPSL)

Carsten

Ebrecht

(DKRZ)

Luca

Cinquini

(NASA JPL/UCAR)

ESGF Container Working Group

ESGF F2F, Washington D.C.

December 2018

Slide2

Contents

Context

What is the Copernicus Climate Data Store?

Requirements for data discovery and download services

Load-balanced Architecture

Overview

Challenges and Compromises

Containerised

ESGF services

Motivation

Current state

Challenges and Solutions

Future work

Slide3

Context

Slide4

Context

What is the Copernicus Climate Data Store?

The Climate Data Store (CDS) is part of the Copernicus Climate Change Service (C3S)

C3S is operated by ECMWF on behalf of the European Union

It aims to provide key indicators of climate change drivers, supporting all sectors

The CDS provides a single, freely available interface to a range of climate-related observations and simulations

Wide range of data sources from many participating

organisations

In-situ observations, models,

reanalyses

, satellite products

Slide5

Context

Requirements for data discovery and download services

CEDA, IPSL and DKRZ to provide quality-controlled subset of CMIP5 for use with the CDS using ESGF services

User-facing services (e.g. search and download) must be highly available at a single set of URLs

>= 98% uptime, ~7 days downtime per year

Publishing not subject to this restriction

No single site can meet this requirement

Geographically distributed, load-balanced service is required

Not the same as the traditional federated approach

Some inconsistency is accepted as a trade-off for high availability

Slide6

Current Architecture

Slide7

Normal operation

Publication

Data replication

Load-balanced Architecture

Overview

Separate master index node for publishing

Publishing does not have to be highly available

Replication to slaves is turned off during publishing

Data node and slave index node at each site

Data replication using

Synda

DNS load-balancing across sites

Each DNS query returns an A record for an available site at random

Available sites determined by health check

Short time-to-live (TTL) means clients perform lookups regularly

No need for proxy server (which is a single point of failure)

Cloud-based DNS service (Amazon Route 53)

CEDA

Slave Index Node

Data Node

DNS Service

End User

Master Index Node

DKRZ

Slave Index Node

Data Node

Synda

IPSL

Slave Index Node

Data Node

Synda

Publisher

Slide8

Load-balanced Architecture

Challenges and Compromises

To maintain high availability when publishing, some consistency must be sacrificed

Data may be available for download via THREDDS at one site but not at others

Slave indexes may be inconsistent after publication to the master index

Data replication via

Synda

needs to target a specific data node

Requires modifying

Solr

records after initial publication

Non-deterministic catalog paths generated during publication

Patch from Alan Iwi (CEDA) uses DRS in path instead of an integer

DNS load-balancing is not perfect

Reliant on clients to respect TTL for correct

behaviourReliant on third-party service (running a DNS server is difficult)

Sophisticated algorithms are a lot more expensive on cloud-based providersSophisticated health checks are also more expensive

Slide9

Containerised ESGF Services

Slide10

Traditional installation

Shared Libraries

Hypervisor

Guest OS

Guest OS

Process Space

Containerised

ESGF Services

Motivation for Containers

Containers simplify installation

A container encapsulates an application and its dependencies as a single unit

No more dependency hell

Containers increase confidence

A container is packaged once and used multiple times

Same code in test and production

Containers increase portability

A container can run anywhere there is a Linux kernel

Containers encourage modularity

Each container runs a single application

Containers work together to provide an integrated system

Containers allow better usage of resources

Higher density than a VM per applicationMore isolation than processes on a shared host

Server

Host Operating System

Virtualised installation

Libraries

Libraries

Application

Application

Containerised

installation

Slide11

Containerised ESGF Services

Motivation for Kubernetes

Containers excel when used with an orchestrator

Automated management of

containerised

applications across a cluster

Kubernetes is now the de facto standard

Resilience and scaling are core features of the platform

Zero downtime rolling upgrade

In-cluster service discovery and load-balancing

Storage abstraction

Slide12

Containerised ESGF Services

Current State

https://github.com/ESGF/esgf-docker

All core ESGF services have been

containerised

Currently no support for

GridFTP

/Globus, node manager or dashboard

MyProxy

deprecated in

favour

of SLCS

Single-node deployment using Docker Compose working

Kubernetes deployment using Helm charts working

Each Tomcat and Django application is fully self-contained

SSL termination and client authentication using Nginx proxy

Container images built, tested and pushed by Jenkins for every commit to master and develThanks to Sebastien Gardoll (IPSL)

Slide13

Containerised ESGF Services

Challenges and Solutions

Very different paradigm to traditional monolithic installer

Shared configuration files in traditional installer are difficult to untangle for each application

Initial implementation by Luca made large steps towards addressing this problem

Initial implementation closely followed traditional installer

Refactored to be more “cloud-native”

No need for process managers like

supervisord

Use official base containers where possible

Reduce container bloat

Slide14

Containerised ESGF Services

Challenges and Solutions

ESGF applications with multiple responsibilities

ESGF applications could be refactored to better suit a micro-services architecture

Would allow better use of scaling features in Kubernetes

SSL client authentication

Kubernetes has no native support for SSL client authentication

Current solution requires proxy container for SSL handshake

Ideally, we would allow Kubernetes to handle ingress

Could replace SSL certificates for authentication with OAuth tokens

Slide15

Containerised ESGF Services

Future work

More flexible deployment

Work is currently underway to support partial deployments

Build Tomcat applications from source

Pre-built wars are included from ESGF distribution site at build time

Should build Tomcat applications from source at a particular version

Also useful for testing (e.g. build an image from a dev branch)

Implement more of the ESGF test suite for Docker build

Feature parity with traditional installer

Subject to specific deprecations

Automated publication using

Kubernetes jobs

Slide16

Questions