/
New tape server software New tape server software

New tape server software - PowerPoint Presentation

contera
contera . @contera
Follow
353 views
Uploaded On 2020-06-25

New tape server software - PPT Presentation

Status and plans CASTOR facetoface workshop 2223 September 2014 Eric Cano on behalf of CERN ITDSS group Overview Features for first release New tape server architecture Control and reporting flows ID: 787703

thread tape data session tape thread session data task disk blocks process block drive write memory fifo parent read

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "New tape server software" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

New tape server softwareStatus and plans

CASTOR face-to-face workshop

22-23 September 2014

Eric Cano

on behalf of CERN

IT-DSS group

Slide2

Overview

Features

for first release

New tape server architecture

Control and reporting flows

Memory

management and d

ata flow

Error handling

Main process and sessions

Stuck session and recovery

Development

methodologies and QA

What changes in practice?

What is still missing?

Logical Block Protection investigation

Release plans and potential

new features

Slide3

Features for first release

Continuation of the push to replace legacy tape software

Started with creation of tape gateway and bridge

VMGR+VDQM

will be

next

Drop-in replacement

Tapeserverd

consolidated

in a single daemon

Replaces the

previous

stack:

taped & satellites +

rtcpd

+

tapebridged

Identical outside

protocols (almost)

Stager

/

Cli

client (

readtp

in unchanged)

VMGR/VDQM

tpstat

/

tpconfig

New labelling command (castor-tape-label)

Keep

what works:

One

process per session (

pid

listed in

tpstat

, as before)

Better logs

Latency shadowing (no impact of slow DB)

Empty

mount protection

Result from big

teamwork since last meeting:

E.Cano

, S. Murray, V. Kotlyar, D.

Kruse

, D. Come

Slide4

New tape server architecture

Pipelined:

based on FIFOs and threads/thread pools

Always fast to post to FIFO

Push data blocks, reports, requests for more work

Each FIFO output is served by one thread(pool)

Simple loop: pop, use/serve the data/request, repeat

All latencies are shadowed in the various threads

Keep the instruction pipeline non-empty with task

prefetch

N-way parallel disk access (as before)

All reporting is asynchronous

Tape thread is the central element that we want to keep busy at full speed

Slide5

Data FIFO

Free blocks

Tape Write Task

Data blocks

Pop block, write

to tape, (flush,)

report result

Return free block

Migration

session overview

Migration Mount Manager (main thread)*

Provide

blocks

Disk Read Task

Get free

blocks

Read data

from disk

Push full

data block

Task queue

Pop,

execute,

delete

n threads

Disk Read Thread Pool

Task queue

Pop,

execute,

delete

1 thread

Tape Write Single Thread

Request more

on threshold

Request for more

Task Injector

1 thread

Get more work

from tape gateway,

create and push tasks

Report Packer

1 thread

Pack information

and send bulk report

on flush/end session

1 thread

Instantiate memory manager, injector, packer, disk and tape thread

Give initial kick to task injector

Wait for completion

Global Status Reporter

Pack information

For

tapeserverd

and

1 thread

Free blocks

Client queue

1 thread

Memory manager

*(main thread)

Slide6

Task queue

Pop,

execute,

delete

Disk

Write Thread Pool

n threads

Request more

on threshold

Data FIFO

Disk Write Task

Data blocks

Pop block, write

to disk,

report result

Return free block

Recall session overview

Recall Mount Manager (main thread)*

Tape Read Task

Pull free

blocks

Read data

from tape

Push full

data block

1 thread

Task queue

Pop,

execute,

delete

Tape Read Single Thread

Request for more

Task Injector

1 thread

Get more work

from tape gateway,

create and push tasks

Individual file reports, flush reports, end of session report

Report Packer

1 thread

Pack information

and send bulk report

threshold/end session

1 thread

Instantiate memory manager, injector, packer, disk and tape thread

Give initial kick to task injector

Wait for completion

Global Status Reporter

1 thread

Pack information

For

tapeserverd

and

*(main thread)

Free blocks

(no thread)

Memory manager

Slide7

Control flow

Task injector

Initially called synchronously (empty mount detection)

Triggered by requests for more work (stored in a FIFO)

Gets more work from client

Creates and injects tasks

Tasks created, linked to each other (reader/writer couple) and injected to the tape and disk thread FIFOs

Disk thread pool

Pops disk tasks, executes them, deletes them and moves to the next

Tape thread

Same as disk after initializing the session

MountingTape identificationPositioning for writing… and unmounting

in the endThe reader thread(pool) requests for more workBased on task FIFO content thresholdsAlways ask for n files or m bytes (whichever comes first, configurable)Asks again when half of that is still available in the task FIFO

Asks again one last time when the task FIFO becomes empty (last call)

Slide8

Reporting flow

Reports to client (file related)

Posted to a FIFO

Packed and transmitted in a separate thread

Send on flush in migrations

Send on thresholds in recalls

End of session also follows this path

Reports to parent process (tape/drive related)

Posted to a FIFO

Transmitted asynchronously by a separate thread

Parent process keeps track of the session’s status and informs the VDQM and VMGR

Slide9

Memory management and data flow

Same as before:

circulate a fixed number of memory blocks (size and count configurable)

Errors can be piggy backed on data blocks

Write

r side always does the reporting,

even for read errors

Central

m

emory manager

Migration: actively pushes blocks for each tape write task

Disk read tasks pulls block from thereReturns the block with data in a second FIFOData gets written to tape by the tape write taskRecalls: passive containerTape read task pulls memory blocks as neededPushes them to the disk write tasks (in FIFOs)

Disk write tasks pushes the data to the disk serverMemory blocks get recycled to the memory manager after writing to disk or tape

Slide10

Error handlingReporting

Errors get logged when they happen

If error happens in the reader, it gets propagated to the writer through the data path

The writer propagates the error to the client

Session behaviour on error

Recalls: carry on for stager, halt on error for

readtp

absolute positioning by

blockId

(stager)r

elative positioning by fSeq (readtp)Migrations: any error ends the session

Slide11

Main process and sessions

The session is forked by the parent process

Parent process keeps track of sessions and drive statuses in a drive catalogue

Answers VDQM requests

Filters input requests based on drive state

Manages the configuration files

The child session reports tape related status to the parent process

mount, unmounts

amount of data transferred for the watchdog

The parent process informs the VMGR and VDQM on behalf of the child session

Client library completely rewritten

Forking is actually done a utility sub-process (forker)No actual forking from the multithreaded parent process

Process inventory:1 parent process + 1 fork helper processN session processes (at most 1 per drive)

Slide12

ZeroMQ+Protocol buffers

The

parent/session processes

communication is a no-risk protocol

Both ends get release/deployed together

Can be changed at any time

Opportunity to experiment new serialization methodologies

Need to replace

umbrello

This gave good results

Protocol buffers provide robust serialization with little development effort

ZMQ handles many communication scenariosStill in finalization (issues in the watchdog communication)

Slide13

Stuck sessions and recovery

Stuck sessions do happen

RFIO problems suspected

Currently handled by a script

Log file based. No move for set time => kill

Problematic with unusually big files

Watchdog will get more internal data

Too much to be logged

If data stops flowing for a given time => kill

Clean-up process launched automatically when session killed

No clean-up after session failure

a non-stuck session failed to do its own clean-up=> drive down

Slide14

Development methodologies and QA

Full C++, maintainable

software

Object encapsulation for separately manageable units

Easy unit testing

Exception handling simplifies error reporting a lot

RAII (destructors) simplifies resource

management

Cleaner drive specifics implementation through inheritance

Easy to add new models

Hardcoding-free SCSI and tape format layers

Naming conventions matching the SCSI documentationsString error reporting for all SCSI errorVery similar approach with the AUL tape format

Unit testingAllows running various scenarios systematicallyOn RPM buildMigrations, recalls, good day, bad day, full tapeUsing fake objects for drive, client interfaceEasier debugging when problems can be reproduced in unit test context

Run test standalone + through valgrind and helgrind

Automatic detection of memory leaks and race conditionsCompletely brought

to the CASTOR treeAutomated system testing would be a nice addition to this setup

Slide15

What changes in practice?

T

he new logs

Convergence with the rest of CASTOR logs

Single line at completion

of tape thread

Summarises the session for tape log

More detailed timings

Will make it easier to pinpoint performance bottlenecks

New log parsing required

Should be greatly simplified as all relevant information is on a single line

A single daemonConfiguration not radically changed

Slide16

What is still missing?

Support for Oracle libraries

The parent process’s watchdog for transfer sessions

Will move stuck transfers detection from operators scripts to internal (with better precision)

File transfer protocol switching

Add local file support

reliance on

rfio

removed

Add

Xroot support

switched on by configurationinstead of RFIODiskserver >= 2.1.14-15 required (for stat call)Add Ceph support

Disk path based switch, automaticFine tuning of logs for operationsDocument the latest developments

Slide17

Release and deployment

Data transfers are being validated now on IBM drives

Oracle drives will follow with mount

suport

Some previously mentioned features missing

Target date for a

tapeserverd

-only 2.1.15 CASTOR release: end of November

Production deployment ~January

Compatible with current 2.1.14 stagers

2.1.14-15 on disk server will be needed for using

Xroot2.1.14 is the end of road for rtcpd/taped

Slide18

Logical block protection

Tests of the tape drive feature have been done by F. Nikolaidis, J. Leduc and K. Ha

Adds a 4 byte checksum to tape blocks

Protects the data block during the transfer from computer memory to tape drive

2 checksum algorithm in use today:

Reed-Solomon

CRC32-C

Reed-Solomon requires 2 threads to match drive throughput

CRC32-C can fit in a single thread

CRC32-C is available on most recent drives

Slide19

Next tape developments

Tapeserverd

Logical block protection integration

Support for pre-emption of session

VDQM/VMGR

Merge of the two in a single tape resource manager

Simplify interface

Asymmetric drive support

Improve scheduling (atomic tape-in-drive semantics for migrations)

Today, the chosen tape might no have compatible drives available, leading to migration delays

Remove need for manual synchronization

Consider pre-emptive schedulingmax-out the system with background task (repack, verify)Interrupt and make space for user sessions when they comeAllow over quota for users when free drives exist

Leading to 100% utilisation of the drivesFacilitates tape server upgradesIntegrate the authentication part for tape (from Cupv)

Slide20

Conclusion

Tape server stack has been re-written and consolidated

New features already provide improvements

Empty mount protection for both read and write

Full request and report latency shadowing

Better timing monitoring is already in place

Major clean-up will allow easier development and maintenance

More new features coming

Xroot

/

Ceph

supportLogical block protectionSession pre-emptionEnd of the road for rtcpd/tapedWill be dropped form 2.1.15 as soon as we are happy with

tapeserverd in productionMore tape software consolidation around the cornerVDQM/VMGR