Method 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine SQlite additional overhead Creates an HPC job per file and sends ACK ID: 785187
Download The PPT/PDF document "Some Design Notes Iteration - 2" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Some Design Notes Iteration - 2
Method
- 1
Extractor main program
Runs from an external VM
Listens
for RabbitMQ messages
Starts a light database engine - SQlite - additional overhead?
Creates an HPC job per file and sends ACK
Store the job ID and file ID in a database table
Asynchronously check for job status and once a job completes with exit 0, send
ack
for the corresponding file id - Can this be done?
Another option is to acknowledge immediately and if the job the fails resend the message - Can this be done?
Extractor processing logic
Resides in the HPC
Is called by the extractor main program
Downloads file using Clowder APIs
Processes the files and uploads back previews and metadata
using APIs
Method - 1A
All above steps are included
Staging in and staging out
SFTP the files from main
extractor to HPC file system
Helps in optimizing HPC time
usage
Method - 2
An elasticity control script listens to RabbitMQ messages
Once the number of messages increase, it creates multiple instances of a special extractor
This is more of a manual
approach
Slide2Start
Wait for RabbitMQ message
SSH into Login node
Run PBS script to submit Job to HPC queue
Job
Status?
Queued / Running
Send ACK to RabbitMQ
End
Failed
Process File
U
se HPC?
Yes
No
pyClowder HPC Flowchart (Iteration 2)
Create PBS Script from HPC XML file and Config file
Database
Store HPC job ID,
file ID, and status
Read
s
ettings
from
Config file
Completed with
Exit Status 0
Get record
Connect to RabbitMQ
Message
r
eceived?
Yes
No
Upload Preview / Metadata
Update records
Wait in HPC Queue
Job Picked up by HPC
?
Process File
Upload Preview / Metadata
Yes
No
Synchronous steps
Asynchronous steps
Inside HPC Environment
Exit from Login node
Slide3Data/Metadata
(MongoDB)
Extraction Bus
(RabbitMQ)
Clowder
Web Application
Web Browser
Client
Clowder VM
Data/Metadata
(MongoDB)
Main
Extractor
Extractor VM
Job
#
!/usr/bin/env
python
import pika
import sys
import logging
import json
import traceback
. . .
def main(
):
GCN-51
Job
#
!/usr/bin/env
python
import pika
import sys
import logging
import json
import traceback
. . .
def main(
):
GCN-65
Job
#
!/usr/bin/env
python
import pika
import sys
import logging
import json
import traceback
. . .
def main(
):
GCN-40
Clowder
APIs
HPC Compute Nodes
pyClowder
HPC
XML File
pyClowder HPC Architecture
Diagram (Iteration 2)
Slide4Some Design / Implementation Questions (from JIRA)Iteration - 1
Does the extractor program files need to be copied to the login node through program? If code compilation is needed this might created additional overhead.
Assume that the program is present in the HPC environment in the compiled format
Another option is to assume that the extractor will run from within the HPC environment. I.e. the code is already present in the HPC. Is this a safe assumption to make?
YesWhat are the exceptions that need to be handled?Exception in main extractorException in extraction job in HPCJob aborts due to reasons at HPC side - requested wall-time or memory exceededThe VM from where the main extractor is running crashes?What is expected out of the user who sets up the HPC extractor? Or what shall be provided by pyClowder and what shall be done by the one who writes the extractor?
Try to make this as generic as possible. Can extractor logic be put in a separate file? Otherwise, how will the HPC machine pick up the job file?Need to find a workaround. Need to keep the extractor structure unchanged.
Slide5Start
Get RabbitMQ message
U
se HPC?
SSH into Login node
Transfer Extractor Program to Login Node via SFTP
Submit Job to HPC queue
Job Status?
Yes
Queued / Running
Send ACK to RabbitMQ
Completed with
Exit Status 0
End
Failed
Process File
No
Extraction Successful?
Yes
No
Flowchart (Iteration 1)
Slide6Data/Metadata
(MongoDB)
Extraction Bus
(RabbitMQ)
Clowder
Web Application
Web Browser
Client
Clowder VM
Data/Metadata
(MongoDB)
Main
Extractor
Extractor VM
Job
#
!/usr/bin/env
python
import pika
import sys
import logging
import json
import traceback
. . .
def main(
):
GCN-51
Job
#
!/usr/bin/env
python
import pika
import sys
import logging
import json
import traceback
. . .
def main(
):
GCN-65
Job
#
!/usr/bin/env
python
import pika
import sys
import logging
import json
import traceback
. . .
def main(
):
GCN-40
Clowder
APIs
HPC Compute Nodes
pyClowder
HPC
XML File
Architecture Diagram (Iteration 1)