/
Some  Design Notes  Iteration - 2 Some  Design Notes  Iteration - 2

Some Design Notes Iteration - 2 - PowerPoint Presentation

rivernescafe
rivernescafe . @rivernescafe
Follow
342 views
Uploaded On 2020-06-24

Some Design Notes Iteration - 2 - PPT Presentation

Method 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine SQlite additional overhead Creates an HPC job per file and sends ACK ID: 785187

hpc import extractor job import hpc job extractor file main rabbitmq clowder metadata def bin gcn usr env status

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Some Design Notes Iteration - 2" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Some Design Notes Iteration - 2

Method

- 1

Extractor main program

Runs from an external VM

Listens

for RabbitMQ messages

Starts a light database engine - SQlite - additional overhead?

Creates an HPC job per file and sends ACK

Store the job ID and file ID in a database table

Asynchronously check for job status and once a job completes with exit 0, send

ack

for the corresponding file id - Can this be done?

Another option is to acknowledge immediately and if the job the fails resend the message - Can this be done?

Extractor processing logic

Resides in the HPC

Is called by the extractor main program

Downloads file using Clowder APIs

Processes the files and uploads back previews and metadata

using APIs

Method - 1A

All above steps are included

Staging in and staging out

SFTP the files from main

extractor to HPC file system

Helps in optimizing HPC time

usage

Method - 2

An elasticity control script listens to RabbitMQ messages

Once the number of messages increase, it creates multiple instances of a special extractor

This is more of a manual

approach

Slide2

Start

Wait for RabbitMQ message

SSH into Login node

Run PBS script to submit Job to HPC queue

Job

Status?

Queued / Running

Send ACK to RabbitMQ

End

Failed

Process File

U

se HPC?

Yes

No

pyClowder HPC Flowchart (Iteration 2)

Create PBS Script from HPC XML file and Config file

Database

Store HPC job ID,

file ID, and status

Read

s

ettings

from

Config file

Completed with

Exit Status 0

Get record

Connect to RabbitMQ

Message

r

eceived?

Yes

No

Upload Preview / Metadata

Update records

Wait in HPC Queue

Job Picked up by HPC

?

Process File

Upload Preview / Metadata

Yes

No

Synchronous steps

Asynchronous steps

Inside HPC Environment

Exit from Login node

Slide3

Data/Metadata

(MongoDB)

Extraction Bus

(RabbitMQ)

Clowder

Web Application

Web Browser

Client

Clowder VM

Data/Metadata

(MongoDB)

Main

Extractor

Extractor VM

Job

#

!/usr/bin/env

python

import pika

import sys

import logging

import json

import traceback

. . .

def main(

):

GCN-51

Job

#

!/usr/bin/env

python

import pika

import sys

import logging

import json

import traceback

. . .

def main(

):

GCN-65

Job

#

!/usr/bin/env

python

import pika

import sys

import logging

import json

import traceback

. . .

def main(

):

GCN-40

Clowder

APIs

HPC Compute Nodes

pyClowder

HPC

XML File

pyClowder HPC Architecture

Diagram (Iteration 2)

Slide4

Some Design / Implementation Questions (from JIRA)Iteration - 1

Does the extractor program files need to be copied to the login node through program? If code compilation is needed this might created additional overhead.

Assume that the program is present in the HPC environment in the compiled format

Another option is to assume that the extractor will run from within the HPC environment. I.e. the code is already present in the HPC. Is this a safe assumption to make?

YesWhat are the exceptions that need to be handled?Exception in main extractorException in extraction job in HPCJob aborts due to reasons at HPC side - requested wall-time or memory exceededThe VM from where the main extractor is running crashes?What is expected out of the user who sets up the HPC extractor? Or what shall be provided by pyClowder and what shall be done by the one who writes the extractor?

Try to make this as generic as possible. Can extractor logic be put in a separate file? Otherwise, how will the HPC machine pick up the job file?Need to find a workaround. Need to keep the extractor structure unchanged.

Slide5

Start

Get RabbitMQ message

U

se HPC?

SSH into Login node

Transfer Extractor Program to Login Node via SFTP

Submit Job to HPC queue

Job Status?

Yes

Queued / Running

Send ACK to RabbitMQ

Completed with

Exit Status 0

End

Failed

Process File

No

Extraction Successful?

Yes

No

Flowchart (Iteration 1)

Slide6

Data/Metadata

(MongoDB)

Extraction Bus

(RabbitMQ)

Clowder

Web Application

Web Browser

Client

Clowder VM

Data/Metadata

(MongoDB)

Main

Extractor

Extractor VM

Job

#

!/usr/bin/env

python

import pika

import sys

import logging

import json

import traceback

. . .

def main(

):

GCN-51

Job

#

!/usr/bin/env

python

import pika

import sys

import logging

import json

import traceback

. . .

def main(

):

GCN-65

Job

#

!/usr/bin/env

python

import pika

import sys

import logging

import json

import traceback

. . .

def main(

):

GCN-40

Clowder

APIs

HPC Compute Nodes

pyClowder

HPC

XML File

Architecture Diagram (Iteration 1)