Meta-Task and - PowerPoint Presentation

397 views
Uploaded On 2017-07-29

Meta-Task and - PPT Presentation

Workflow Management in ProdSys II DEfT Maxim Potekhin BNL ADC Development Meeting February 4 2013 1 ProdSys II DEfT Status and Summary of Progress Progress since the last SampC week ID: 574018

workflow task meta deft task workflow deft meta tasks job graph jedi model pyutilib state based system panda existing database code visualization

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/574018" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "Meta-Task and" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Meta-Task and Workflow Managementin ProdSys II (DEfT)

Maxim PotekhinBNLADC Development MeetingFebruary 4, 2013

1Slide2

ProdSys II (DEfT): Status and Summary of ProgressProgress since the last S&C week:

Analyzed project requirementsDeveloped an object model for ProdSys II (DEfT)Created database schemas for object persistence in RDBMSInvestigated multiple existing solutions for workflow managementIdentified mature software components to be used in the projectIdentified the proper and standardized format for describing the workflowDeveloped a prototype application: DEFT v1.0

Maintained documentationThe project is extensively documented in both TWiki and a blog:

https://twiki.cern.ch/twiki/bin/viewauth/Atlas/ProdSys

http://prodsys.blogspot.com/

2Slide3

Overview of this presentationWhy?Motivations for doing this work

What?Requirements, scope and deliverables based on motivationsHow?The choice of technologies and platforms to meet the requirementsWhere?What has been done so far, and the current prototype of the system

Whereto?Directions for future development and integration, and the project timeline

3Slide4

Motivations (1)We need to address the following:

The concept of Meta-Task as a group of logically related tasks, needed to complete a specific physics project or sub-project. Absent in the original product (ProdSys I), it emerged based on operational experience with PanDA and its workflow. It effectively became the central object in the high-level workflow managementIt is currently modeled in spreadsheets, which act as surrogate database and GUI, require active maintenance and limit the scalability of the overall Production System.Meta-Tasks must be properly modeled, introduced and integrated into the system to guarantee that it delivers adequate performance going forward.

4Slide5

Motivations (2)…and also the following:

Automation: we need the capability to process Meta-Tasks with minimal human intervention beyond task definition. Right now it is a labor intensive semi-manual process.But we also need the capability to have operator intervention and Meta-Task recovery: there must be adequate tools for the operators and managers to direct the Meta-Task processing, for example:To be able to start certain steps before others are definedTo augment a Meta-Task in any stage of processing

To put a task on hold

To recover from failures in an optimal way

Flexibility of job definition (e.g. making it dynamic as opposed to static once the task is created): there are a number of advantages that we hope can be realized once there is a capability to define jobs dynamically, based on the resources and other conditions present once the task moves into the execution stage. Currently, jobs are fairly static objects in the system.

Maintainability: the code of the existing Production System was written "organically", in order to actively support emerging requests from the users, and it starts showing its age

Scalability: there are certain issues with the interaction between ProdSys I and the database back-end. In general, given the dramatic rise in the number of tasks defined and executed, we must ensure a lot of headroom going forward.

Ease of use: there is currently a great amount of detail that the end user (Physics Coordination) must define in order to achieve a valid task request. We must automate and facilitate the task creation process, whereby cumbersome logic is handled within the application, and the user interface is more transparent and friendly.

5Slide6

RequirementsPrincipal requirements have already been touched upon in the “Motivations” section, where we outlined the general new functionality that needs to be implemented. Let’s put the requirements in a slightly different perspective:

Front End: capability to define and persist the Meta-Task objects, and submit them for processing in PanDA. This includes the following functionality:Quick creation of Meta-Tasks based on pre-defined templatesSimilar to template use: “cloning” of existing Meta-TasksAs an option, the ability to handle more complex Meta-Task topologies than currently in use

Automation: capability to process Meta-Tasks without human intervention (with the exception of complex failure recovery)

Control: operators must have an option to intervene in the execution of a Meta-Task, for example:

To be able to start certain individual tasks at will before the whole Meta-Task is completely defined

To augment a live Meta-Task (for example by growing the size of the dataset)

To put a task on hold (for example to investigate some job failure)

To recover from failures in an optimal way (without using “good” data and having to restart from scratch)

Workload optimization: flexibility of job definition (such as size) by using highly dynamic PanDA brokerage system to pick resources best suited for the workload at hand, and vice versa.

Documentation and Maintainability.

Scalability: demonstration, via stress testing, of the headroom in the system throughput.

Ease of use: implementation, to the maximum extent possible, of user-friendly user interfaces. In conjunction with that, robust authorization and access control to enforce effective management of the workflow.

Monitoring.

6Slide7

Overall DesignOne principal design decision made early on was to create the new Production System, a.k.a. ProdSys II, as a tandem of two subsystems, which play complementary roles and represent two different levels in managing the overall workflow :

DEfT: Database Engine for Tasks:DEfT is responsible for formulating the Meta-Tasks and the underlining tasks. Meta-Task topologies can include chains of tasks, bags of tasks and other types task groups, complete with all necessary parameters. DEfT keeps track of, and persists the state of Meta-Tasks under its management, and their constituent tasks. It provides the interface for each Meta-Task definition, management and monitoring throughout its lifecycle.JEDI: Job Execution and Definition Interface

JEDI is using the task definitions formulated in DEfT to define and submit individual jobs to PanDA, keep track of their progress and handle re-tries of failed jobs, job redirection etc. In addition, JEDI interfaces data management services in order to properly aggregate and account for data generated by individual jobs (i.e. general dataset management)

Note: we won’t always capitalize DE

T and JEDI in this way, and shall use DEFT and Jedi wherever it’s easier to type



7Slide8

DEFT EquationEach task in a Meta-Task can be represented as a set of parameters describing its state, i.e. as a vector. The role of DEFT is to apply rules to transform this vector.

y1y

3…

…

= DSlide9

Front End and User Interface – Meta-Task Editor

DEfT

DEfT and JEDI working in tandem

Meta-Task 1

Task 1-2

Task 1-3

Task 1-4

JEDI

Meta-Task 2

Task 2-1

Task 2-2

…

Task 2-N

Meta-Task 3

Meta-Task 4

...

Meta-Task N

Job 1-1-1

Task 1-1

Job 1-1-2

Job 1-1-3

Job 1-1-4

Job 2-1-1

Job 2-1-2

…

Job 2-1-N

…

PanDA Sites

…

…Slide10

JEDI

DEfT

DEfT and JEDI as an assembly line

Meta-Task 1

JEDI

Job 1-1-1

Task

1 (t

)

Job 1-1-2

Job 1-1-3

Job 1-1-4

Task

1 (t

)

DEfT

Task

1 (t

)

JEDI

Job 1-1-1

Job 1-1-2

Job 1-1-3

Job 1-1-4

Meta-Task 1

DEfT

TSlide11

Major Components and their CommunicationDesign ParametersJEDI has been designated as the component most closely coupled with the PanDA service and brokerage itself, while DEfT is considered a more platform-neutral database engine for bookkeeping of tasks (hence its name). It makes sense to leverage this factorization and take advantage of component-based architecture, in order to reduce dependencies in the system and keep venues for its evolution open. Based on that, DEfT will have a minimum level of dependency on PanDA specifics, while JEDI takes advantage of full integration with PanDA. Wherever possible, JEDI will have custody of the content stored in the database tables, and DEfT will largely use references to these data.

Communication A few features of the DEfT/JEDI tandem design: in managing the workflow in Deft, it’s unnecessary (and in fact undesirable) to implement subprocesses and/or multi-threaded execution such as one in products like PaPy etc. That is because DEfT is NOT a processing engine, or even a resource provisioning engine, as is the case in some Python-based workflow engines.

It's a state machine that does not necessarily need to work in real time (or even near time). Near-time state transitions and resource provisioning is done in PanDA

There are a few ways to establish communication between Deft and Jedi, which are not mutually exclusive. For example, there may be

callbacks

, and database token updates, which may in fact co-exist. If a message is missed for some reason, the information can be picked up during a periodic sweep of the database. In summary,

DEfT

and JEDI will work

asynchronously

An interesting question is whether the database should be exposed through the typical access library API, or as a Web Service. Needs to be evaluated.

11Slide12

Notes on Choice of TechnologiesReuse. Reuse. Reuse.Workflow management is by no means a new subject or R&D topic. We must make an effort to reuse either a complete existing solution, or failing that, major components and do our best to avoid in-house development, in order to save on development and maintenance costs. Progress will be measured not by the number of lines of code we manage to write, but the number of lines of code we managed to

not write and still get the desired result.We have completed an extensive analysis of many existing Workflow management systems and packages. There are links and other information to be found in the ProdSys II TWiki.Various BPEL engines (like Apache ODE), Pegasus, VisTrails, Soma, Weaver, PaPy, Pyphant etc

We must recognize that existing turnkey solutions for complete management of workflow have any or all of the following problems:They are heavy-weight and with a steep learning curve

Are strongly tied to a particular platform such as Condor

Implement their own resource provisioning schemes, which is inefficient since same is done in PanDA

Hard to integrate with handling and transfer of data used in ATLAS

With packages, there are also issues of performance, ongoing support and maintainability

Based on the above, an optimal solution would be to find a package or a library that can be adopted in an application integrated with PanDA

Note on the language platform

Without having a fundamental preference for any particular programming language, it makes sense to stick with a Python solution due to the ready available expertise at all levels and locations of the ATLAS community

12Slide13

The Graph Model (1)Why use the graph model for the Workflow?Looking at available literature in both industry and academia, graph is the most efficient and natural way to model workflows

While not explicit, the existing system of chained task does assume dependencies best described in a graph, like in this simplified example illustrating a chain of tasks:13Slide14

The Graph Model (2)Handle more complex cases of workflow, if neededCan start with implementing the “chain”, “bag” and “bag of chains” topologies

Liberates the designers of the workflow from severe limitations of the current “spreadsheet” model – can do not just what’s immediately possible, but what’s best for delivery of physics results (don’t anticipate drastic changes soon, but something to prepare for)In the following, we will generally use the concept of tasks as a group of jobs using one or more datasets as input, and one or more datasets is output:

14Slide15

The Graph Model (3)Crucial features of the DEFT Meta-Task ModelIn accordance with operational practice in ATLAS, the dependencies between two adjacent tasks in the chain, which we model as nodes of a graph, are best conceptualized as datasets, which then become the edges of the graph

In order to handle “bag” and other complex topologies, we introduce “pseudo-tasks” representing the entry and exit points. It is actually an established technique in handling workflows.15Slide16

The Petri Net Model and the DEFT State MachineWhy do we find the Petri Net Model useful?

The Petri Model adequately represents the transitions between various states of the workflow graph nodes based on the state of the incoming edgesThe Petri graph should not be confused with the workflow graph, it represents “places” and “transitions”It allows the developer to conceptualize DEFT logic as a state machine, which traverses the workflow graph and toggles the state of tasks based on “tokens”, which areState of datasets serving as inputs to the taskControl inputs set by the human operator

16Slide17

PyUtilib (1)What are we looking for?We are looking for a package that implements the workflow engine logic along the lines of the models described above and forms the core of DEfT.

What is PyUtilib?Having evaluated a few available packages, we consider PyUtilib (and in particular it’s Workflow component) as a prime choice for the implementation of the state machine in DEfTDeveloped and actively maintained by personnel at Sandia National Laboratories. Link:https://software.sandia.gov/trac/pyutilib/wiki/Documentation/PyUtilibOverview

Successfully used in a few projects done at SandiaBased on component architecture and contains a lot of useful tools other than just Workflow Management

What do we gain? We avoid writing tons of boilerplate code to handle complex dependencies in the workflow, and benefit from the core application code being simple (for specific reasons explained in the slides to follow).

Features of the Workflow Package in PyUtilib

PyUtilib is using component architecture to offer the developer a straightforward and intuitive way to express dependencies in a set of tasks by means of “connectors”

A dependency amounts to a simple assignment operator

Traversal of dependencies is fully automatic and is not exposed to the developer, saving effort.

Individual tasks in a workflow can be actual workflows themselves, lending almost any level of granularity to the system – if we wanted to implement individual PanDA job handling this way, within a task, we can. It’s a matter of design decision (not proposing this at this point).

17Slide18

PyUtilib (2)In the following diagram, each “Task” is a Python object, and its inputs and outputs are attributes of same.

18Slide19

PyUtilib (3)# code example (for illustration purposes only)

# define the Task class as appropriateclass TaskA(pyutilib.workflow.Task):

def __init__(self, *args, **

kwds

"""Constructor.""“

pyutilib.workflow.Task.__init

__(self, *

args

, **

kwds

)

self.inputs.declare

('x')

self.inputs.declare

('y') self.outputs.declare

('z')

def execute(self):

"""Compute the sum of the inputs.""“

self.z

self.x

self.y

# can be anything, of course

# application code: establish dependency between tasks A and B:

w =

pyutilib.workflow.Workflow

()

w.add

(A)

print w(x=1, y=3) # prints 4

19Slide20

PyUtilib (4)# code example (for illustration purposes only)

# define another Task classclass TaskB(pyutilib.workflow.Task): def __init__(self, *

args, **kwds

"""Constructor."""

pyutilib.workflow.Task.__init

__(self, *

args

, **

kwds

)

self.inputs.declare

('X')

self.outputs.declare('Z')

def execute(self):

"""Compute the value of the input times 2"""

self.Z = 2*

self.X

# application code: establish dependency between tasks A and B:

A =

TaskA

()

B =

TaskB

()

A.inputs.x

B.outputs.Z

w =

pyutilib.workflow.Workflow

()

w.add

(A)

print w(X=1, y=3) # prints 5

20Slide21

Meta-Task: the LanguageHow do we represent and document the objects created according to the Graph Model, in human-readable format?

The need to represent Meta-Tasks and their components in a way amenable to being read, edited and archived by humans was realized early on in the project“Domain Specific Language” was mentioned as one of the development parametersDo we need to build it from scratch? Probably not,It’s the model!Since we consider the Graph Model as the optimal way to represent the workflow in its various states, it is a reasonable approach to try and identify a natural way to represent the graph

This leads to realization that there are already standard languages and schemas that do exactly that.

21Slide22

Meta-Task: GraphML and NetworkXChoice of SchemaThere is an obvious advantage in choosing the schema that’s is standardized, enjoys support and has parsers already written to handle its specifics.

GraphML appears to be very simple, human-readable and enjoys parser support in many existing visualization and analysis software productsFrom personal experience, JSON is less well suited for this application as it’s too terse and the data representation is unintuitive. Still an option when needed, for example for AJAX monitoring applications and similar use cases.Allows us to standardize on the workflow description, visualization, editing, documentation and versioning with essentially zero effort.

NetworkX“NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.”

While its functionality is quite rich, and allows all sorts of graph analysis and exploration, the minimal subset of methods is quite easy to learn and use immediately

Reads GraphML, JSON, and other documents and creates an in-memory model of the graph automatically

Likewise, serializes graphs into a variety of formats, like GraphML, JSON etc

Visualization can be implemented by a few supported Python packages which need to be installed separately, such as

matplotlib

etc.

22Slide23

NetworkX: examples of visualization23Slide24

Meta-Task: Persisting Graphs in RDBMSHow to persist graphs?With current choice of the format/language, we obviously already have the option to persist workflow objects as XML documents and optionally in XML databases (nice to have), and more optimally in specialized Graph Databases, with some development effort

At the same time, integration with PanDA/JEDI at this juncture pre-supposes the use of RDBMS (practically speaking, Oracle)Persisting graphs in RDBMS had been addressed before; we revisited existing approached and chose the “Adjacency Table” approach as the most scalable and easy to implementSee TWiki page at: https://twiki.cern.ch/twiki/bin/viewauth/Atlas/TaskModel#Graphs_in_RDBMS

Two separate tables are kept, for the nodes and for the edges. The nodes have unique IDs, the edges contain pairs of IDs in separate columns. This allows for a straightforward representation of most any DAG in RDBMS

Note on Graph Databases

There are multiple advantages in matching the storage model to the model of the object that’s being stored. In that regard, Graph Databases could be the ideal solution for future version of DEFT and other components of the Production System

The power of Graph Databases has been recently realized by major organizations to create new, previously unattainable level of capability

As already mentioned, we won’t pursue this in the current design and implementation cycle, but it would be wise to maintain ongoing R&D in that direction along with other noSQL technologies.

24Slide25

Bringing it all togetherIn DEFT, we combine the following elements

Graph Model to represent the Workflow, with tasks being the nodes of the graph and datasets being the edgesNetworkX package to manage the in-memory instances of the Workflow GraphsGraphML language (XML schema) as the standard way to represent graphs in human readable form, which can be easily imported into NetworkXThe Workflow Package (a part of PyUtilib), which allows us to implement the rules governing state transitions of the tasks based on various conditions.An array of possible visualization, graphing and editing solutions that can be used due to the standard GraphML format:

GePhi

NetworkX +

matplotlib

Straightforward integration (via AJAX/JSON/XML) with advanced

Javascript

libraries and

jQuery

add-ons such as

jsPlumb

WireIt

, Raphael etc.

25Slide26

Template

DEFT Prototype (1)Software prototypeDEFT exists as a functioning, proof-of-integration prototype (CLI utility)

Integration of NetworkX, PyUtilib and DB Oracle schemas

Capability to import and export workflows in GraphML format, as well as to persist data in RDBMS, and access and modify data transparently across these containers

GraphML

Document

input

DEFT

GraphML

Document

output

Oracle DB

ead and update

Change of the Meta-Task State

Change of the Meta-Task StateSlide27

DEFT Prototype (2)Software prototypeCapability to support workflows described by DAG of any complexity, not just “chains” and “bags”

Straightforward cloning and copying of tasksPossibility to interactively edit workflows in visual editors27Slide28

DEFT: example of a workflow template in GraphML

28Slide29

PlansDEFT/JEDI IntegrationWith the DEFT prototype operational, the project is at a stage where we need to actively work on JEDI integration – this is the next immediate step.

Important peripheral componentsNeed to create a module for automated dataset name generation. Right now this logic appears to be spread across various components of ProdSys I.Task visualization, editing and monitoringBasic visualization tools are already available, such as

GePhi and matplotlib add-on to NetworkX (cumbersome installation though). Editing is available in

GePhi

complete with a GUI interface, and of course GraphML files can also be edited using any text editor.

For more polished look and more dynamic and better user experience, we can develop a browser-based frontend utilizing

jsPlumb

WireIt

, Raphael etc – but we need to budget manpower for that, since the considerable power of these graphics systems comes with significant complexity of logic and API

29Slide30

Backup slides: examples of workflow visualization and editing in GePhi

30Slide31

Backup slides: examples of workflow visualization and editing in GePhi

31Slide32

Backup slides: examples of workflow visualization and editing in GePhi

32Slide33

Backup slides: examples of Javascript tools to aid in building Meta-Task GUI in DEFT