/
Work\rowManagementinCondorPeterCouvares,TevkKosar,AlainRoy,JeWeber,a Work\rowManagementinCondorPeterCouvares,TevkKosar,AlainRoy,JeWeber,a

Work\rowManagementinCondorPeterCouvares,Tev kKosar,AlainRoy,Je Weber,a - PDF document

pamella-moone
pamella-moone . @pamella-moone
Follow
368 views
Uploaded On 2016-03-08

Work\rowManagementinCondorPeterCouvares,Tev kKosar,AlainRoy,Je Weber,a - PPT Presentation

2PCouvaresTKosarARoyJWeberandKWegner12DAGManDesignPrinciplesThegoalofDAGManistoautomatethesubmissionandmanagementofcomplexworkrowsinvolvingmanyjobswithafocusonreliabilityandfaulttolerance ID: 247123

2P.Couvares T.Kosar A.Roy J.Weber andK.Wegner1.2DAGManDesignPrinciplesThegoalofDAGManistoautomatethesubmissionandmanagementofcomplexwork\rowsinvolvingmanyjobs withafocusonreliabilityandfault-tolerance

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Work\rowManagementinCondorPeterCouvares,..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Work\rowManagementinCondorPeterCouvares,Tev kKosar,AlainRoy,Je Weber,andKentWengerUniversityofWisconsin-Madison,ComputerSciencesDepartmentpfc,roy,weber,wengerLouisianaStateUniversity,DepartmentofComputerScienceandCenterforComputation&Technology1.1IntroductionTheCondorProjectbeganin1988andhasevolvedintoafeature-richbatchsystemthattargetshigh-throughputcomputing;thatis,Condorfocusesonprovidingreliableaccesstocomputingoverlongperiodsoftime,insteadofhighly-tuned,high-performancecomputingforshortperiodsoftimeorsmallnumbersofapplications.ManyCondorusershavenotonlylong-runningjobs,buthavecomplexsequencesofjobs,orwork\rows,thattheywishtorun.Inthelate1990s,webegandevelopmentofDAGMan(orDirectedAcyclicGraphManager),whichallowsuserstosubmitlargework\rowstoCondor.AswithCondor,thefocushasbeenonreliability.DAGManhasasimpleinterfacethatallowsmany,butcertainlynotalltypesofwork\rowstobeexpressed.Wehavefound,throughyearsofexperiencerunningproductionwork\rowswithourusers,thatsolvingtheso-called\simple"problemscanbesurprisinglycomplex.The rsthalfofthispaperprovidesaconceptual(andalmostchronological)developmentofDAGMantoillustratethecomplexitiesthatCondordealwith.Inthepastseveralyears,Condorhasexpandeditfocusfromrunningjobsonlocalclustersofcomputers(orpools,inCondorterminology)torunningjobsindistributedgridenvironments.Alongwiththeadditionalcomplexitiesinrunningjobscamegreaterchallengesintransferringdatatoandfromthejobexecutionsites.WehavedevelopedStork,whichalongwithbeingabirdname,treatsdataplacementwiththesameconcernthatCondortreatsjobexecution.WithacombinationofDAGMan,Condor,andStork,userscancreatelarge,complexwork\rowsthatreliably\getthejobdone"inagridenviron-ment.Intherestofthispaper,weexploreDAGManandStork(Condorhasbeencoveredindetailelsewhwere). 2P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegner1.2DAGManDesignPrinciplesThegoalofDAGManistoautomatethesubmissionandmanagementofcomplexwork\rowsinvolvingmanyjobs,withafocusonreliabilityandfault-toleranceinthefaceofavarietyoferrors.Work\rowmanagementincludesnotonlyjobsubmissionandmonitoring,butjobpreparation,cleanup,throttling,retry,andotheractionsnecessarytoensurethegoodhealthofimportantwork\rows.DAGManattemptstoovercomeorworkaroundasmanyerrorsaspossible,andinthefaceoferrorsitcannotovercome,itendeavorstoallowtheusertoresolvetheproblemmanuallyandthenresumethework\rowfromthepointwhereitlastlefto .Thiscanbethoughtofasa\checkpointing"ofthework\row,justassomebatchsystemsprovidecheckpointingofjobs.Notably,themajorityofDAGMan'sfeatures|andevensomeofitsspe-ci csemantics|werenotoriginallyenvisioned,butratheraretheproductofyearsofcollaborationwithactiveusers.Theexperiencegainedfromtheneedsandproblemsof\production"scienceapplicationshasdrivenmostDAGMandevelopmentoverthepastsixyears.ThefundamentaldesignprinciplesofDAGManareasfollows:DAGMansitsasalayer\above"thebatchsysteminthesoftwarestack.DAGManutilizesthebatchsystem'sstandardAPIandlogstosubmit,query,andmanipulatejobs,anddoesnotdirectlyinteractwithjobsinde-pendantly. 1 DAGManreadsthelogsoftheunderlyingbatchsystemtofollowthestatusofsubmittedjobs,ratherthaninvokinginteractivetoolsorserviceAPIs.Relianceonsimpler, le-basedi/oallowsDAGMan'sownimplementa-tiontobesimpler,morescalableandreliableacrossmanyplatforms,andthereforemorerobust.DAGManhasnopersistentstateofitsown|itsruntimestateisbuiltentirelyfromitsinput les,andfromtheinformationgleanedbyreadinglogsprovidedbythebatchsystemaboutthehistoryofthejobsithassubmitted. NotethatDAGManassumesthebatchsystemguaranteesthatitwillnot\lose"jobsaftertheyhavebeensuccessfullysubmitted.Currently,ifthejobislostbythebatchsystemafterbeingsuccessfullysubmittedbyDAGMan,DAGManwillwaitinde nitelyforthestatusofthejobinthequeuetochange.Anexplicitqueryforthestatusofsubmittedjobs(asopposedtowaitingforthebatchsystemtorecordjobstatuschanges)maybenecessarytoaddressthis.Also,ifajoblanguishesinthequeueforever,DAGManiscurrentlynotableto\timeout"andremovethejobandmarkitasfailed.Whenremovingjobs,detectingandrespondingtothefailureofaremoveoperation(leavingajob\stuck"inthequeue)isaninterestingquestion. 1Work\rowManagementinCondor31.3DAGMandetails1.3.1DAGManBasicsDAGManallowsuserstoexpressjobdependenciesasarbitrarydirectedacyclicgraphs,orDAGs.Inthesimplestcase,DAGMancanbeusedtoensurethattwojobsexecutesequentially|forexample,thatjobBisnotsubmitteduntiljobAhascompletedsuccessfully.Likeallgraphs,aDAGManDAGconsistsofnodesandarcs.Eachnoderepresentsasingleinstanceofabatchjobtobeexecuted,andeacharcrep-resentstheexecutionordertobeenforcedbetweentwonodes.Unlikemorecomplexsystemssuchas[ 21 ],arcsmerelyindicatetheorderinwhichthejobsmustrun.Ifanarcpointsfromnodeto,wesaythatistheparentofandisthechildof.(SeeFigure 1.1 .)Aparentnodemustcompletesuccessfullybeforeanyofitschildnodescanbestarted.Notethateachnodecanhaveanywholenumberofparentsorchildren(includingzero).DAGMandoesnotrequireDAGstobefullyconnected.WhydoesDAGManrequireadirectedacyclicgraphinsteadofanarbitrarygraph?Thegraphisdirectedinordertoexpresstheorderthatjobsmustrun.Likewise,thegraphisacyclictoensurethatDAGManwillnotruninde nitely.Inpractice,we ndthatmostwork\rowsweencountereitherdonotrequireloops,ortheloopscanbeunrolledintoanacyclicgraph. NP NC NL Fig.1.1.Therelationshipbetweenparentsandchildren.istheparentofislonelyandhasnoparentsorchildren.DAGManseekstorunasmanyjobsaspossibleinparallel,giventheconstraintsoftheirparent/childrelationships.Forexample,infortheDAGinFigure 1.2 ,DAGManwillinitiallysubmitbothandtoCondor,allowingthemtoexecuteinparalleliftherearesucientcomputersavailable.Aftercompletessuccessfully,DAGManwillsubmitbothandtothebatchsystem,allowingthemtoexecuteinparallelwitheachother,andwithifithasnotcompletedalready.Whenbothandhave nishedsuccessfully, 4P.Couvares,T.Kosar,A.Roy,J.Weber,andK.WegnerDAGManwillsubmit.Ifandbothcompletesuccessfully,theDAGwillhavecompleted,andDAGManwillexitsuccessfully.Earlierwede nedanode'sparentandchildrelationships.IndescribingDAGs,itcanalsobeusefultode neanode'ssisterasanynodewhichsharesthesamesetofparentnodes(includingtheemptyset).Althoughsisterrela-tionshipsarenotrepresentedexplicitlyinsideDAGMan,theyareimportantbecausesisternodesalwaysbecome\runnable"simultaneously,whentheirparentscompletesuccessfully.InFigure 1.2 andaresisterswithnoparents,andandaresistersthatshareasaparent.Inpractice,however,DAGMansubmitsindividualjobstothebatchsched-uleroneatatime,andmakesnoguaranteesaboutthepreciseorderthatitwillsubmitthejobsofnodesthatarereadytorun.Inotherwords,andmaybesubmittedtothebatchsysteminanyorder.Itisalsoimportanttorememberthat,oncesubmitted,thebatchsystemisfreetorunjobsinitsqueueinanyorderitchooses.mayrunafterdespitebeingsubmittedtothequeueearlier.Additionally,thejobsmaynotberuninparallelifthereareinsucientcomputeresourcesforallparalleljobs. N1 N2 N3 N4 N5 Fig.1.2.This\diamond"dagillustratesparentandchildlinks.mustcompletesuccessfully,thenbothandcanexecuteinparallel.Onlywhenbothofthemhave nishedsuccessfullycanbeginexecution.isadisconnectednodeandcanexecuteinparallelwithalloftheothernodes.Whilerunning,DAGMankeepsalistinmemoryofalljobsintheDAG,theirparent/childrelationships,andtheircurrentstatus.Giventhisinfor-mation,DAGMansubmitsjobstothebatchsystemwhenappropriate,andcontinuesuntileithertheDAGiscomplete,ornomoreforwardprogresscanbemadeduetofailedjobs.Inthelattercase,DAGMancreatesalistoffailedjobsalongwiththereasonsfortheirfailure,andproducesarescueDAG le. 1Work\rowManagementinCondor5 Waiting forN parents Submitted Failed Done(Successful) Fig.1.3.ThestatetransitiondiagramforeachnodeofaDAG.Seetextfordetails.ArescueDAGisaspecialDAGthatrepresentsthestateofapreviouslypartially-completedDAGsuchthattheoriginalDAGcanberestartedwhereitlefto ,withoutrepeatinganysuccessfullycompletedwork.TherescueDAGisanexactcopyoftheoriginalinputDAG,exceptthatallpreviouslynodesthathavesuccessfullycompletedaremarkedasdone.WhenDAGManisrestartedwitharescueDAG,itreconstructsthestateofthepreviousDAG.Internally,DAGMankeepstrackofthecurrentstatusofeachnode.Figure 1.3 showsthebasicstatediagramofaDAGnode.WhenDAGManstarts,itmarkseachnodeas\waiting",andinitializesawaitingcount(N)forthenodeequalitsnumberofparents.InthecaseofrescueDAG,DAGMansetsthewaitingcountequalthenumberofparentswhicharenotalreadymarkedas\done".Anode'swaitingcountrepresentsthenumberofitsparentsthathaveyettocompletesuccessfully,andwhicharethereforepreventingitfrombeingsubmitted.Onlywhenanode'swaitingcountreacheszerocanDAGMansubmitthejobassociatedwiththenode.Ifthejobissubmittedsuccessfully,DAGManmarksthenodeas\submitted".Ifthejobsubmissionfailsforanyreason,thenodeismarkedas\failed".WhenDAGMandetectsthatajobhasleftthebatchsystemqueue,itmarksthenodeas\done"ifthejobexitedsuccessfully,oroherwisemarksit\failed".Successisdeterminedbytheexitcodeoftheprogram:ifitiszero,thenthejobexitedsuccessfully,otherwiseitfailed.(Butseethedescriptionofpost-scriptslaterinSection 1.3.2 foramodi cationofthis.)Whenajobismarked\done",thewaitingcountofallitschildrenisdecrementedbyone.Anynodeswhosewaitingcountreacheszeroissubmittedtothebatchschedulerasdescribedearlier. 6P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegner  \n  \n\r \r \r  \r\r \n\r\n \n  \n \n \r\r Fig.1.4.AstatediagramforexecutingasingleDAGnode.UnlikeFigure 1.3 ,thisdiagramaddstheabilitytorunpre-scriptsandpost-scripts.Thedi erencesfromFigure 1.3 arenotedinbold.1.3.2DAGManComplicationsSofar,thedescriptionofaDAGManDAGisnotveryinteresting:weexe-cutejobsandmaintaintheorderinwhichtheymustexecute,whileallowingparallelismwhenitispossible.Unfortunately,thisisinsucientinrealenvi-ronments,whichhavemanycomplicationsandsourcesoferrors.Complication:Setup,Cleanup,orInterpretationofaNodeThe rstcomplicationoccurswhenusingexecutablesthatarenoteasilymodi- edtoruninadistributedcomputingenvironmentandthereforeneedasetuporcleanupsteptooccurbeforeorafterthejob.Forexample,beforeajobisrundatamayneedtobestagedfromatapearchiveoruncompressed.WhilethisstepcouldbeplacedinaseparateDAGnode,thismaycauseunneces-saryoverheadbecausetheDAGnodewillbesubmittedandscheduledasaseparatejobbythebatchsystem,insteadofrunningimmediatelyonthelocalcomputer.DAGManprovidestheabilitytorunaprogrambefore(apre-script)orafter(apost-script)ajobisrun.TheseprogramsshouldbelightweightbecausetheyareexecutedonthenodefromwhichtheDAGwassubmittedandalargeDAGmayexecutemanyofthesescripts.(ButseeSection 1.3.2 forawaythatDAGMancandealwiththis.)RunningthesescriptsaddscomplexitytothestatediagraminFigure 1.3 ThechangesneededtosupportscriptsareshowninFigure 1.4 .Onceajobisallowedtorun,itcanoptionallyrunapre-script.Afterthejobhasrun,itcanoptionallyrunapost-script. 1Work\rowManagementinCondor7 N1 N2 N3 Pre: If result(N1) = success do nothingelse rewrite N2 to empty job Pre: If result(N1) = failure do nothingelse rewrite N3 to empty job Fig.1.5.AnexampleconditionalDAG.Notethatifthescriptsfail,thenodeisconsideredtohavefailed,justasifthejobitselfhadfailed.Thereisoneinterestingcasetonote,whichisnoteasilyrepresentedinFigure 1.4 :ifanodehasapost-script,itwillneverdirectlygointothefailedstate,butwillalwaysrunthepost-script.Inthisway,thepost-scriptcandecideifajobhasreallyfailedornot.ItcandosomeanalysisbeyondtheDAGMan'sabilitydecideifanodeshouldbeconsideredtohavesucceededorfailedbasedonthewhethertheexitcodeiszeroornot.Thesigni cantlyenhancestheabilityofDAGMantoworkwithexisitingcode.Someusershavediscoveredaninterestingwaytousepost-scripts.Theycreatepost-scriptsthatrewritetheirchildnodes'jobdescription letochangehowthejobruns.Thiscanbeusedforatleasttwopurposes.First,itcancreateconditionalDAGsbyallowingarun-timedecisionthatchangestheDAG.Forexample,considerFigure 1.5 .Ifsucceeds,thentheprescriptforwillrewritetoanemptyjob|perhapsrunningthe/bin/truecommand. 1 Inthisway,onlywillrunaftersucceeds.Similarly,iffails,thenonlywillrun.WhileamoregenericfacilityforconditionalDAGsmaybedesirable,itwouldaddcomplexity,andsimpleconditionalDAGscanbecreatedwith.Aseconduseforpre-scriptsistodolast-minuteplanning.Forexample,whensubmittingjobstoCondor-G(whichallowsjobstobesubmittedtoremotegridsitesinsteadofthelocalbatchsystem),userscanspecifyexactlywhichgridsitetheywishtheirjobstorunat.Apre-scriptcandecidewhatgridsiteshouldbeused,rewritethejobdescription,andthejobwillrunthere. InrecentversionsofCondor,thejobcanbeeditedtocontain\noop jobtruewhichleavestheexecutablenamealone,andimmediatelyterminatesthejosuccessfully 8P.Couvares,T.Kosar,A.Roy,J.Weber,andK.WegnerComplication:ThrottlingAllofthemechanismsdescribedsofarworkverywell.Unfortunately,therealworldappliesadditionalconstraints.ImagineaDAGthatcanhaveonethousandjobssimultaneouslyrunning,andeachofthemhasapre-scriptandapost-script.WhenDAGMancansubmitthejobs,itwillstartuponethou-sandnearlysimultaneouspre-scripts,thensubmitonethousandjobsnearlysimultaneously.Runingthatmanypre-scriptsmaycauseanunacceptableloadonthesubmissionmachine,andsubmittingthatmanyjobstotheunderlyingbatchsubmissionsystemmayalsostrainitscapacity.Forthisreason,DAG-Mancanthrottlethenumberofpre-scripts,jobs,orpost-scriptsthatmayrunatanytime.Thisresultsinanothermodi cationtoourstatediagramforrunningasinglenode,asshowninFigure 1.6 .  \n  \n\r \r \r \n\r\n \n  \n \n \r\r \n\r\n \n\r \n\r  \n \n\r Fig.1.6.AstatediagramforexecutingasingleDAGnode.InadditiontothestateinFigure 1.4 ,thisdiagramaddsDAGMan'sabilitytothrottlepre-scripts,jobs,andpost-scripts.Thedi erencesfromFigure 1.4 arenotedinbold.DAGMancanalsothrottlethenumberofjobsthatitsubmitstothebatchsystem.Thisnumbermightbesigni cantlygreaterthanthenumberofrunningjobsothiscanpreventoverloadingthebatchsystem.Thisisagoodexampleofasurprisingadditionalconstraint:wedidnotrealizethatDAGsmightbeabletosubmitsomanyjobsthatthenumberofidlejobscouldoverwhelmthebatchsystem. 1Work\rowManagementinCondor9Complication:UnreliableapplicationsorsubsystemsSomeapplicationsarenotrobust|itisnotuncommonto ndaprogramthatsometimesfailstorunonthe rstattempt,butcompletessuccessfullyifgivenanotherchance.Sometimesitisduetoaprogramerror,sometimesduetointeractionswiththeenvironment,suchasa\rakynetworked lesystem.Ideally,problemssuchasthiswouldalwaysbe xedbeforetryingtoruntheprogram.Unfortunately,thisisnotalwayspossible,perhapsbecausetheprogramisclosed-sourceorbecauseoftimeconstraints.Tocopewithunreliableprogramsorenvironments,DAGmanprovidestheabilitytoretryanodeifitfails.Usersspecifyhowmanytimesthenodeshouldberetriedbeforedecidingthatithasactuallyfailed.Whenanodeisretried,thenode'spre-scriptisalsorunagain.Insomecases,auserwantstoretrymultipletimesunlesssomecatastrophicerroroccurs.DAGManhandlethiswiththe\retryunless-exit"featurewhichwillretryajobunlessitexitswithaparticularvalue.Oneplacethismightbeusefulisplanning:imagineapre-scriptthatdecideswhereajobshouldberun.Retrymightbesetto10,toallowthejobtoberunattendi erentsites,butifthereissomecatastrophicerror,thenthepre-scriptcanexitwithaspeci cvaluethatindicates\donotretry".Addingtheabilitytoretrythejobresultsinone nalchangetoourstatediagram,asshowninFigure 1.7 1.3.3AdditionalDAGManDetailsWewillbrie\rymentionseveralotherinterestingDAGManfeatures.RunningDAGManrobustlyWhathappensifthemachineonwhichDAGManisrunningcrashes?Al-thoughDAGManwouldnolongercontinuetosubmitjobs,existingjobscon-tinuerunning,butitwouldbeniceifDAGMancouldberestartedsothatitcouldcontinuemakingforwardprogress.Ideally,DAGManshouldhandleasmuchaspossiblefortheuser,sowerunDAGManasaCondorjob.Thismeansthatifthemachinecrashes,whenitrestartsCondorwillrestarttheDAGManprocess,whichwillrecoverthestateofitsexecutionfrompersis-tentlog les,andwillresumeoperation.Thissortofrobustnessisessentialinallowinguserstorunlargesetsofjobsina\hands-o "fashion.RecursiveDAGsADAGnodecandoanything,includingsubmittinganotherDAG.Thisal-lowsforthecreationofDAGswithconditionalbranchesinthem.TheDAGnodecanmakeachoice,thensubmitanindependentDAGbasedonthere-sultofthatchoice.ThiscanallowforverycomplexDAGstobeexecuted. 10P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegner  \n  \n\r \r \r \n\r\n \n  \n \n \r\r \n\r\n \n\r \n\r  \n \n\r Fig.1.7.ThecompletestatediagramforexecutingasingleDAGnode.Thesingledi erencefromFigure 1.6 isnotedinbold.Unfortunately,italsomakesdebuggingaDAGharder.ForanalternativetorecursiveDAGs,seeSection 1.7 1.3.4DescribingaDAGItistheuser'sresponsibilitytoprovideDAGManwithadescriptionofeachjobintheformatoftheunderlyingbatchscheduler.ForCondor,thismeansassociatingeachnodewitha\submit le"describingthejobtobeexecuted.DAGManultimatelyusesthis letosubmitthejobtothebatchschedulerusingthestandardsubmissioninterface.UsersdescribeaDAGbylistingeachnodeandtherelationshipsbetweennodes.AsampleDAGdescriptionisshowninFigure 1.8 1.3.5DAGManExperienceDAGManhasbeenusedextensivelywiththeCondorbatchjobschedulingsystem.Inparticular,wehaveuseditformanagingsetsofjobsusingBLASTforproteinanalysis,setsofjobsforsimultationofeventsforhigh-energyphysics,andmanyotheruses.WehavefoundthatourimplementationoftheDAGManeasilyscalestolargeDAGsofaround1000nodeswithoutthrottlingandDAGsofaround100,000nodeswiththrottling.Webelieveitcouldscale 1Work\rowManagementinCondor11JobN1submit-n1JobN2submid-n2JobN3submid-n3JobN4submid-n4JobN5submid-n5ParentN1ChildN2N3ParentN2N3ChildN4RetryN15ScriptPREN5uncompress-dataScriptPOSTN5uncompess-dataFig.1.8.HowausermightdescribethediamonddagfromFigure 1.2 .Inthisdescription,nodeN1canberetried5timesandalloftheothernodesarenotretriediftheyfail.NodeN5hasbothapre-scriptandapost-scriptmuchfurtherthanthatifnecessary.BecauseDAGMancanmanageDAGsofthisscaleandbecausewe ndthatthegreatestbottleneckisintheunderlyingbatchjobsubmissionsystemscapabilities,wehavenotexpendede orttooptimizeittoworkwithlargerDAGs.DAGManhasbeenusedinawidevarietyofproductionenvironments.Wewillprovidetwoexampleshere.WithintheCondorProject,wehavecreatedaBLAST[ 5 ]analysisser-vicefortheBiologicalMagenticResonanceDataBankattheUniversityofWisconsin-Madison.[ 6 ]BLAST ndsregionsoflocalsimilaritybetweennucleotideorproteinsequences.Localresearchersdoweeklyqueriesagainstdatabasesthatareupdatedeveryweek.OurservicetakesalistofsequencestoqueryandcreatesapairofDAGstoperformthequeries,asillustratedinFigure 1.9 .The rstDAGperformsthesetup,thecreationofasecondDAGthatdoesthequeries(thenumberofnodesinthisDAGvaries,soitisdynamicallycreated),andthenassembletheresults.TheseDAGsareuseddi erently:the rstDAGusesdependenciestoorderthejobsthatarerun,whilethesecondDAGhascompletelyindependentnodesandDAGManisusedforreliableexecutionandthrottling.Onaverage,thesecondDAGhasapproximately1000nodes,butwehaveexperimentedwithasmanyas200,000nodes.Thisservicehasrunonaweeklybasisformorethantwoyearswithlittlehumansupervision.TheVirtualDataSystem(VDS)[ 9 ]buildsontopofDAGManandCondor-G.Usersprovideadescriptionofwhatdataisavailableandhowthedatacanbetransformed,thenrequestthedatatheyneed.TheVDScreatesaDAGthatfetchesandtransformsdataasneeded,whiletrackingtheprovenanceofthedata.AspartoftheDAGcreationandexecution,theVDSusesplanningtodecidewhichgridsitesshouldperformthetransformations.TheVDShas 12P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegner  \n  \r  \r\r \n\r\r\r  \r   !" #\r !!$!" #\r% #\r& #\r' Fig.1.9.ThepairofDAGsusedtorunBLASTjobs.ThequeryDAGiscreatedbythemainDAG.Seetextfordetails.beenusedforawidevarietyofapplicationsincludinghigh-energyphysicseventsimulation,galaxycluster nding,andgenomeanalysis.1.4ImplementationStatusDAGManhasbeenfreelydistributedaspartoftheCondorsoftwaresince1999.Ithasbeenusedfornumerouslargeprojects,andisstable.ItisavailableforawidevarietyofUnixplatforms,andMicrosoftWindows.1.5InteractionwithCondorCondorisahigh-throughputbatchjobscheduler.Becauseithasbeencoveredindetailelsewhere([ 17 ],[ 20 ]),weonlybrie\ryreviewithere.CondorwasoriginallydesignedtoutilizeCPUcyclesoncomputersthatwouldotherwisebeidle,suchasdesktopcomputersthatareunusedbutturnedonovernight.However,Condorhasexpandeditsreachandnowworkswellwithdedicatedcomputersandgridsystems.Condor'sabilitytointeractwithgridsystem,calledCondor-G[ 10 ],allowsCondortosubmitjobstoGlobus[ 11 (versions2,3,and4),NorduGrid,Oracle,LSF,PBS,andevenremoteCondorinstallations(referredtoasCondor-C).CondorandCondor-Gemphasizereliability.IfCondorcrashes,itwillcon-tinuerunningthejobswhenitrestarts.Condorcanprovidejobcheckpointingandmigrationtofacilitaterecoverwhenexecutioncomputersfail.Condor-G 1Work\rowManagementinCondor13provideselaboraterecoveryschemestodealwithnetworkoutagesandremotegridsitefailures.DAGManisbuilttouseCondorforjobexecution,anditcansubmitjobstboththelocalbatchsystemandremotegridsystemswithequalease.Wehavecreatedmanywork\rowsusingDAGManthatexecuteinagridenvironment.1.6IntegrationwithStork1.6.1AnIntroductiontoStorkJustascomputationandnetworkresourcesneedtobecarefullyscheduledandmanaged,theschedulingofdataplacementactivitiesallacrossthedistributedcomputingsystemsiscrucialbecausetheaccesstodataisgenerallythemainbottleneckfordataintensiveapplications.Thisisespeciallythecasewhenmostofthedataisstoredontapestoragesystems,whichslowsdownaccesstodataevenfurtherduetothemechanicalnatureofthesesystems.Thecurrentapproachtosolvethisproblemofdataplacementiseitherdoingitmanually,oremployingsimplescripts,whichdonothaveanyau-tomationorfaulttolerancecapabilities.Theycannotadapttoadynamicallychangingdistributedcomputingenvironment.Theydonothavetheprivi-legesofajob,theydonotgetscheduled,andgenerallyrequirebaby-sittingthroughouttheprocess.Dataplacementactivitiesmustbe rstclasscitizensinthedistributedcomputingenvironmentsjustlikethecomputationaljobs.Theyneedtobequeued,scheduled,monitored,andevencheck-pointed.Moreimportantly,itmustbemadesurethattheycompletesuccessfullyandwithoutanyneedforhumanintervention.Moreover,dataplacementjobsshouldbetreateddi erentlyfromcompu-tationaljobs,sincetheyhavedi erentsemanticsanddi erentcharacteristics.Forexample,ifthetransferofalarge lefails,wemaynotsimplywanttrestartthejobandre-transferthewhole le.Rather,wemayprefertrans-ferringonlytheremainingpartofthe le.Similarly,ifatransferusingoneprotocolfails,wemaywanttotryotherprotocolssupportedbythesourceanddestinationhoststoperformthetransfer.Wemaywanttodynamicallytuneupnetworkparametersordecideconcurrencylevelforspeci csource,destinationandprotocoltriples.Atraditionalcomputationaljobschedulerdoesnothandlethesecases.Forthispurpose,dataplacementjobsandcom-putationaljobsshouldbedi erentiatedfromeachotherandeachshouldbesubmittedtospecializedschedulersthatunderstandtheirsemantics.Wehavedesignedandimplementedthe rstbatchschedulerspecializedfordataplacement:Stork[ 16 ].Thisschedulerimplementstechniquesspeci ctoqueuing,scheduling,andoptimizationofdataplacementjobs,andprovidesalevelofabstractionbetweentheuserapplicationsandtheunderlyingdatatransferandstorageresources. 14P.Couvares,T.Kosar,A.Roy,J.Weber,andK.WegnerAproductionlevelStorkisbundledwithCondorreleases.Additionally,researchintonewfeaturesiscontinuinginparallel.1.6.2DataPlacementJobTypesUnderStork,dataplacementjobsarecatagorizedintothethreefollowintypes:transferThisjobtypeisfortransferringacompleteorpartial lefromonephysicallocationtoanotherone.Thiscanincludeagetorputoperationorathirdpartytransfer.Storksupportsavarietyofdatatransferprotocols,including:local lesystem,GridFTP,FTP,HTTP,NeST,SRB,SRMandUniTree.Further,sitescancreatenewtransfermodulesusingtheStorkmodularAPI.allocateThisjobtypeisusedforallocatingstoragespaceatthedestinationsite,allocatingnetworkbandwidth,orestablishingalight-pathontheroutefromsourcetodestination.Basically,itdealswithallnecessaryresourceallocationspre-requiredfortheplacementofthedata.releaseThisjobtypeisusedforreleasingthecorrespondingresourcewhichisallocatedbefore.1.6.3FlexibleJobRepresentationStorkusestheClassAd[ 19 ]jobdescriptionlanguagetorepresentthedataplacementjobs.TheClassAdlanguageprovidesavery\rexibleandextensibledatamodelthatcanbeusedtorepresentarbitraryservicesandconstraints.Belowarethreesampledataplacement(DaP)requests::dap_type="allocate";dest_host="houdini.example.com";size="200MB";duration="60minutes";allocation_id=1;;dap_type="transfer";src_url="file:////data/example.dat";dest_url="gsiftp://houdini.example.com/data/example.dat";;dap_type="release";dest_host="houdini.example.com";allocation_id=1;The rstrequestistoallocate200MBofdiskspacefor1houronaNeSTserver.Thesecondrequestistotransfera lefromanSRBservertotheallocatedspaceontheNeSTserver.Thethirdrequestistode-allocatethepreviouslyallocatedspace. 1Work\rowManagementinCondor151.6.4FaultToleranceDataplacementapplicationsmustoperateinanenvironmentoffaults.Datserversmaybeunavailableformanyreasons.Remoteandlocalnetworksmayencounteroutagesorcongestion.hostonthelocalareanetwork,includingthehostrunningStork.Storkisequippedtodealwithavarietyofdataplacementfaults,whichcanbecon guredatboththesystem,andjoblevel.Fortransientenvironmentfaults,dataplacementjobsthatfailcanberetriedafterasmalldelay.Thenumberofretriesallowedisasystemcon g-uration.ForlongertermfaultsassociatedwithaparticulardataserverStorkcanalsoretryafailedtransferusingalistofalternatedataprotocols.Ifinthepre-viousexample,thehostghidorac.sdsc.eduisalsorunningaGridFTPserver,accessedviathegsiftpprotocol,thecorrespondingtransferjobcouldbeaug-mentedtoretryatransferfailurefromtheprimarySRBtoNeSTprotocols,withatransferattemptfromthegsiftptoNeSTprotocols::dap_type="transfer;src_url="srb://ghidorac.sdsc.edu/home/kosart.condor/1.dat";dest_url="nest://turkey.cs.wisc.edu/1.dat";alt_protocols="gsiftp-nest";1.6.5InteractionwithDAGMan  \n \r   \n \r   \r Fig.1.10.AsimpleDAGthatincludesStorkDAGManhasbeenextendedtoworkwellwithStork.Inadditiontospeci-fyingcomputationaljobs,dataplacementjobscanbespeci ed,andDAGManwillsubmitthemtoStorkforexecution.Thisallowsforstraightforwardexe-cutionofwork\rowsthatincludedatatransfer.AsimpleexampleofhowStorkcanbeusedwithDAGManappearsinFigure 1.10 .ThisDAGtransfersdatatoagridsite,executesthejobatthatsite,thentransferstheoutputdatabacktothesubmissionsite.ThisDAGcouldeasilybeenhancedtoallowspaceallocationbeforethedatatransfers,orcouldhavemultipledatatransfers. 16P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegner1.6.6InteractionwithHeterogeneousResourcesStorkactslikeanI/Ocontrolsystem(IOCS)betweentheuserapplicationsandtheunderlyingprotocolsanddatastorageservers.Itprovidescompletemodularityandextendibility.Theuserscanaddsupportfortheirfavoritestoragesystem,datatransportprotocol,ormiddlewareveryeasily.Thisisaverycrucialfeatureinasystemdesignedtoworkinaheterogeneousdis-tributedenvironment.Theusersorapplicationsmaynotexpectallstoragesystemstosupportthesameinterfacestotalktoeachother.Andwecannotexpectallapplicationstotalktoallthedi erentstoragesystems,protocols,andmiddleware.Thereneedstobeanegotiatingsystembetweenthemwhichcaninteractwiththosesystemseasilyandeventranslatedi erentprotocolstoeachother.Storkhasbeendevelopedtobecapableofthis.ThemodularityofStorkallowsuserstoinsertaplug-intosupportanystoragesystem,protocol,ormiddlewareeasily.Storksupportsseveraldatatransferprotocols,including:FTP[ 18 GridFTP[ 1 ].Storksupportsthird-partydatatransferwithGridFTP,elim-inatingtheneedforlocalstorage.HTTP[ 8 DiskRouter[ 14 Storksupportsseveraldatastoragesystems,including:SRB[ 3 UniTree[ 7 NeST[ 4 Storkmaintainsalibraryofpluggable\dataplacement"modules.ThesemodulesgetexecutedbydataplacementjobrequestscomingintoStork.Theycanperforminter-protocoltranslationseitherusingamemorybu erorthird-partytransferswheneveravailable.Inter-protocoltranslationsarenotsup-portedbetweenallsystemsorprotocolsyet.Inordertotransferdatabetweensystemsforwhichdirectinter-protocoltranslationisnotsupported,twoconsecutiveStorkjobscanbeusedinstead.The rstStorkjobperformstransferfromthesourcestoragesystemtothelocaldiskcacheofStork,andthesecondStorkjobperformsthetransferfromthelocaldiskcacheofStorktothedestinationstoragesystem.1.6.7ModularAPIWhiletheStorkserverisasingleprocess,thedatatransfers,allocations,etc,areperformedbyseparatemodules.Theapplicationprograminterfacetothemodulesissimpleenoughforsitestowritetheirownmodulesasneeded.Forexample,eachdatatransfermoduleisexecutedwiththefollowingargumentlist: 1Work\rowManagementinCondor17src urldest urlarguments...Thus,towriteanewmodulethattransfersdatafromthefoopro-tocoltothebarprotocol,anewmoduleiscreatedwiththethename:stork.transfer.foo-bar.Modulesareexecutableprogramsandmaybewrittenasshellscripts.Further,modulebindingisperformedatruntime,enablingsitestocreatenewmoduleswithoutrestartingtheStorkserver.1.6.8PerformanceEnhancementsStorkhasseenseveralrecentperformanceenhancements.The rstisthatMultipledataplacementsmaynowbespeci edinasinglesubmission leinsteadofmultiple les.Thisoptimizationissigni cantwhentransferingmany lestoStorkbecauseiteliminatesextrainvocationsofthecommandwhichcanbesurprisinglytimeconsumingwhentransferringtensofthousandsof les.ThesecondenhancementwasintegrationoftheGridFtpclient,intoStorkserver.WhendoingmanysimultaneousGridFTP letransfers,thissavesconsiderabletimeandreducesthetotalnumberofprocessesinthesys-tem.Finally,Storkisnowabletoexecuteanarbitraryprogramwhentheactivejobqueuesizefallsbelowacon gurablelevel.Thisisenvisionedasasimple,buthighperformancealternativetomanagingverylargedataplacementwork\rowswithDAGManbecauseitwillallowStorkuserstolimittherateatwhichtheysubmitjobssothatStorkisnotoverwhelmed,whileensuringthatStorkhassucientworktodoatanygiventime.1.6.9ImplementationStatusStorkisavailable,withallfeaturesdescribedsofar,aspartoftheCondordistribution.ItisavailableonLinux,andwillbeavailableforotherplatformsinthefuture.UsersoutsideoftheCondorProjectarejustbeginningtouseStorkinproduction,andwehopetohavemoreinthenearfuture.1.6.10ActiveResearchResearchonStorkisactive,andmuchofitcanbefoundin[ 15 ].Researchincludes:Extendingdataplacementtypestoincludeinteractionwithametadatacatalog.Experimentationwithschedulingtechniquesotherthan rst-in, rst-out.Thisincludesnotonlytraditionalschedulingtechniquessuchasshortest-job rstormultilevelqueuepriorityscheduling,butalsoincludesschedul-ingbasedonmanagementofstoragespacetoensurethatstoragespaceisnotexceededbythedatatransfers.Inaddition,schedulingofconnection 18P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegnermanagementisimportantwhentherearemanysimultaneousconnectionstoaserver.Runtimeadaptioncanbeperformedtotunethenetworkparametersforatransfertominimizetransfertime.Thisisdiscussedinfurtherdetailin 13 ].ResearchhasbeendonetoenableStorktodetectproblemssuchasserversthatareunreliableinavarietyofways,thenbaseschedulingdecisionsonthisknowledge.Moredetailsarein[ 12 ].1.7FutureDirectionsThereareseveralpromisingareasforfutureworkwithDAGManandStork.ForDAGMan,wewouldliketoexploremethods(probablyutilizingClas-sAds[ 19 ])toallowdi erentconditionsfordecidingwhenanodeshouldexe-cute.Today,anodeexecuteswhenallofitsparentshave nishedwithanexitcodeof0,butallowingmorecomplexconditionswouldallowconditionalexe-cution(equivalenttoif-then-else)andpartialexecution(aDAGthat nisheswhenacertainpercentageofnodeshavecompleted.)WealsowouldliketosupportdymamicDAGs,whichareDAGsthatcanchangeon-the-\ry,basedonuserinput.Thishasbeenfrequentlyrequestedbyusers,andisparticularlyusefulwhenDAGManisusedbyahigher-levelschedulingsystemthatmaychangeplansinreactiontocurrentconditions.ForStork,weareexploringwaystomakeitmorescalableandmorereli-able.Wearealsoinvestigatingmethodstomakeusingmatchmaking,similartothatinCondor,toselectwhichdatatransfersshouldberunandwhichsitestheyshouldtransferdatatoo.1.8ConclusionsDAGManisareliablework\rowmanagementsystem.Althoughthework\rowsitsupportsarerelativelysimple,therearemanycomplexitiesthatweredis-coveredasweusedDAGManthroughtheyears,suchastheneedfor\rexiblemethodsforretryingandthrottling.Asaresultofourexperience,DAGManhasfoundfavorwithmanyusersinproductionenvironments,andsoftwarehasbeencreatedthatreliesonDAGManforexecution.UsedwithCondor,Condor-G,andStork,DAGManisapowerfultoolforwork\rowexecutioningridenvironments. References1.W.Allcock,J.Bester,J.Bresnahan,A.Chervenak,I.Foster,C.Kesselman,S.Meder,V.Nefedova,D.Quesnel,andS.Tuecke.Datamanagementandtransferinhigh-performancecomputationalgridenvironments.ParallelCom-puting,28(5):749{771,May2002.2.A.Baker.TheSecretWaysofAlBaker.TheMiracleFactory,2003.3.C.Baru,R.Moore,A.Rajasekar,andM.Wan.TheSDSCstorageresourcebroker.InProceedingsofCASCON,1998.4.J.Bent,V.Venkataramani,N.LeRoy,A.Roy,J.Stanley,A.Arpaci-Dusseau,R.Arpaci-Dusseau,andM.Livny.NeST-agridenabledstorageappliance.InJ.Weglarz,J.Nabrzyski,J.Schopf,andM.Stroinkski,editors,GridResourceManagement.KluwerAcademicPublishers,2003.5.Basiclocalalignmentsearchtool(BLAST). http://www.ncbi.nlm.nih.gov/blast/ ,2006.6.Biologicalmagneticresonancedatabank. http://www.bmrb.wisc.edu/ ,2006.7.M.Butler,R.Pennington,andJ.A.Terstriep.MassstorageatNCSA:SGIDMFandHPUniTree.InProceedingsof40thCrayUserGroupConference1998.8.R.Fielding,J.Gettys,J.Mogul,H.Frystyk,L.Masinter,P.Leach,andT.Berners-Lee.Hypertexttransferprotocol{HTTP/1.1.InternetRFC2616Jun1999.9.I.Foster,J.Voeckler,M.Wilde,andY.Zhao.Chimera:Avirtualdatasystemforrepresenting,querying,andautomatingdataderivation.In14thInternationalConferenceonScienti candStatisticalDatabaseManagement(SSDBM'02)2002.10.J.Frey,T.Tannenbaum,I.Foster,M.Livny,andS.Tuecke.Condor-G:Acom-putationmanagementagentformulti-institutionalgrids.ClusterComputing5:237{246,2002.11.TheGlobusAlliance.Seewebsiteat12.G.Kola,T.Kosar,andM.Livny.Aclient-centricgridknowledgebase.InPro-ceedingsof2004IEEEInternationalConferenceonClusterComputing,pages431{438,SanDiego,CA,September2004.IEEE.13.G.Kola,T.Kosar,andM.Livny.Run-timeadaptationofgriddata-placementjobs.ParallelandDistributedComputingPractices,2004. 20References14.G.KolaandM.Livny.Diskrouter:A\rexibleinfrastructureforhighperformancelargescaledatatransfers.TechnicalReportCS-TR-2003-1484,UniversityofWisconsin-MadisonComputerSciencesDepartment,2003.15.T.Kosar.DataPlacementinWidelyDistributedSystems.PhDthesis,Univer-sityofWisconsin-Madison,2005.16.T.KosarandM.Livny.Stork:Makingdataplacementa rstclasscitizeninthegrid.InParallelandDistributedComputingPractices,2004.17.M.Litzkow,M.Livny,andM.Mutka.Condor-ahunterofidleworkstations.Proceedingsofthe8thInternationalConferenceonDistributedComputingSystems,pages104{111,June1988.18.J.PostelandJ.Reynolds.Filetransferprotocol(FTP).InternetRFC959,Oct1985.19.R.Raman,M.Livny,andM.Solomon.Matchmaking:Distributedresourcemanagementforhighthroughputcomputing.InProceedingsoftheSeventhIEEEInternationalSymposiumonHighPerformanceDistributedComputing(HPDC7),Chicago,IL,July1998.20.D.Thain,T.Tannenbaum,andM.Livny.Distributedcomputinginpractice:thecondorexperience.Concurrency-PracticeandExperience,17(2-4):323{356,2005.21.W.Woods.What'sinalink:Foundationsforsemanticnetworks.InD.BobrowandA.Collins,editors,RepresentationandUnderstanding:StudiesinCognitiveScience.NewYork:AcademicPress,1975.