2PCouvaresTKosarARoyJWeberandKWegner12DAGManDesignPrinciplesThegoalofDAGManistoautomatethesubmissionandmanagementofcomplexworkrowsinvolvingmanyjobswithafocusonreliabilityandfaulttolerance ID: 247123
Download Pdf The PPT/PDF document "Work\rowManagementinCondorPeterCouvares,..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Work\rowManagementinCondorPeterCouvares,TevkKosar,AlainRoy,JeWeber,andKentWengerUniversityofWisconsin-Madison,ComputerSciencesDepartmentpfc,roy,weber,wengerLouisianaStateUniversity,DepartmentofComputerScienceandCenterforComputation&Technology1.1IntroductionTheCondorProjectbeganin1988andhasevolvedintoafeature-richbatchsystemthattargetshigh-throughputcomputing;thatis,Condorfocusesonprovidingreliableaccesstocomputingoverlongperiodsoftime,insteadofhighly-tuned,high-performancecomputingforshortperiodsoftimeorsmallnumbersofapplications.ManyCondorusershavenotonlylong-runningjobs,buthavecomplexsequencesofjobs,orwork\rows,thattheywishtorun.Inthelate1990s,webegandevelopmentofDAGMan(orDirectedAcyclicGraphManager),whichallowsuserstosubmitlargework\rowstoCondor.AswithCondor,thefocushasbeenonreliability.DAGManhasasimpleinterfacethatallowsmany,butcertainlynotalltypesofwork\rowstobeexpressed.Wehavefound,throughyearsofexperiencerunningproductionwork\rowswithourusers,thatsolvingtheso-called\simple"problemscanbesurprisinglycomplex.Thersthalfofthispaperprovidesaconceptual(andalmostchronological)developmentofDAGMantoillustratethecomplexitiesthatCondordealwith.Inthepastseveralyears,Condorhasexpandeditfocusfromrunningjobsonlocalclustersofcomputers(orpools,inCondorterminology)torunningjobsindistributedgridenvironments.Alongwiththeadditionalcomplexitiesinrunningjobscamegreaterchallengesintransferringdatatoandfromthejobexecutionsites.WehavedevelopedStork,whichalongwithbeingabirdname,treatsdataplacementwiththesameconcernthatCondortreatsjobexecution.WithacombinationofDAGMan,Condor,andStork,userscancreatelarge,complexwork\rowsthatreliably\getthejobdone"inagridenviron-ment.Intherestofthispaper,weexploreDAGManandStork(Condorhasbeencoveredindetailelsewhwere). 2P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegner1.2DAGManDesignPrinciplesThegoalofDAGManistoautomatethesubmissionandmanagementofcomplexwork\rowsinvolvingmanyjobs,withafocusonreliabilityandfault-toleranceinthefaceofavarietyoferrors.Work\rowmanagementincludesnotonlyjobsubmissionandmonitoring,butjobpreparation,cleanup,throttling,retry,andotheractionsnecessarytoensurethegoodhealthofimportantwork\rows.DAGManattemptstoovercomeorworkaroundasmanyerrorsaspossible,andinthefaceoferrorsitcannotovercome,itendeavorstoallowtheusertoresolvetheproblemmanuallyandthenresumethework\rowfromthepointwhereitlastlefto.Thiscanbethoughtofasa\checkpointing"ofthework\row,justassomebatchsystemsprovidecheckpointingofjobs.Notably,themajorityofDAGMan'sfeatures|andevensomeofitsspe-cicsemantics|werenotoriginallyenvisioned,butratheraretheproductofyearsofcollaborationwithactiveusers.Theexperiencegainedfromtheneedsandproblemsof\production"scienceapplicationshasdrivenmostDAGMandevelopmentoverthepastsixyears.ThefundamentaldesignprinciplesofDAGManareasfollows:DAGMansitsasalayer\above"thebatchsysteminthesoftwarestack.DAGManutilizesthebatchsystem'sstandardAPIandlogstosubmit,query,andmanipulatejobs,anddoesnotdirectlyinteractwithjobsinde-pendantly. 1 DAGManreadsthelogsoftheunderlyingbatchsystemtofollowthestatusofsubmittedjobs,ratherthaninvokinginteractivetoolsorserviceAPIs.Relianceonsimpler,le-basedi/oallowsDAGMan'sownimplementa-tiontobesimpler,morescalableandreliableacrossmanyplatforms,andthereforemorerobust.DAGManhasnopersistentstateofitsown|itsruntimestateisbuiltentirelyfromitsinputles,andfromtheinformationgleanedbyreadinglogsprovidedbythebatchsystemaboutthehistoryofthejobsithassubmitted. NotethatDAGManassumesthebatchsystemguaranteesthatitwillnot\lose"jobsaftertheyhavebeensuccessfullysubmitted.Currently,ifthejobislostbythebatchsystemafterbeingsuccessfullysubmittedbyDAGMan,DAGManwillwaitindenitelyforthestatusofthejobinthequeuetochange.Anexplicitqueryforthestatusofsubmittedjobs(asopposedtowaitingforthebatchsystemtorecordjobstatuschanges)maybenecessarytoaddressthis.Also,ifajoblanguishesinthequeueforever,DAGManiscurrentlynotableto\timeout"andremovethejobandmarkitasfailed.Whenremovingjobs,detectingandrespondingtothefailureofaremoveoperation(leavingajob\stuck"inthequeue)isaninterestingquestion. 1Work\rowManagementinCondor31.3DAGMandetails1.3.1DAGManBasicsDAGManallowsuserstoexpressjobdependenciesasarbitrarydirectedacyclicgraphs,orDAGs.Inthesimplestcase,DAGMancanbeusedtoensurethattwojobsexecutesequentially|forexample,thatjobBisnotsubmitteduntiljobAhascompletedsuccessfully.Likeallgraphs,aDAGManDAGconsistsofnodesandarcs.Eachnoderepresentsasingleinstanceofabatchjobtobeexecuted,andeacharcrep-resentstheexecutionordertobeenforcedbetweentwonodes.Unlikemorecomplexsystemssuchas[ 21 ],arcsmerelyindicatetheorderinwhichthejobsmustrun.Ifanarcpointsfromnodeto,wesaythatistheparentofandisthechildof.(SeeFigure 1.1 .)Aparentnodemustcompletesuccessfullybeforeanyofitschildnodescanbestarted.Notethateachnodecanhaveanywholenumberofparentsorchildren(includingzero).DAGMandoesnotrequireDAGstobefullyconnected.WhydoesDAGManrequireadirectedacyclicgraphinsteadofanarbitrarygraph?Thegraphisdirectedinordertoexpresstheorderthatjobsmustrun.Likewise,thegraphisacyclictoensurethatDAGManwillnotrunindenitely.Inpractice,wendthatmostwork\rowsweencountereitherdonotrequireloops,ortheloopscanbeunrolledintoanacyclicgraph. NP NC NL Fig.1.1.Therelationshipbetweenparentsandchildren.istheparentofislonelyandhasnoparentsorchildren.DAGManseekstorunasmanyjobsaspossibleinparallel,giventheconstraintsoftheirparent/childrelationships.Forexample,infortheDAGinFigure 1.2 ,DAGManwillinitiallysubmitbothandtoCondor,allowingthemtoexecuteinparalleliftherearesucientcomputersavailable.Aftercompletessuccessfully,DAGManwillsubmitbothandtothebatchsystem,allowingthemtoexecuteinparallelwitheachother,andwithifithasnotcompletedalready.Whenbothandhavenishedsuccessfully, 4P.Couvares,T.Kosar,A.Roy,J.Weber,andK.WegnerDAGManwillsubmit.Ifandbothcompletesuccessfully,theDAGwillhavecompleted,andDAGManwillexitsuccessfully.Earlierwedenedanode'sparentandchildrelationships.IndescribingDAGs,itcanalsobeusefultodeneanode'ssisterasanynodewhichsharesthesamesetofparentnodes(includingtheemptyset).Althoughsisterrela-tionshipsarenotrepresentedexplicitlyinsideDAGMan,theyareimportantbecausesisternodesalwaysbecome\runnable"simultaneously,whentheirparentscompletesuccessfully.InFigure 1.2 andaresisterswithnoparents,andandaresistersthatshareasaparent.Inpractice,however,DAGMansubmitsindividualjobstothebatchsched-uleroneatatime,andmakesnoguaranteesaboutthepreciseorderthatitwillsubmitthejobsofnodesthatarereadytorun.Inotherwords,andmaybesubmittedtothebatchsysteminanyorder.Itisalsoimportanttorememberthat,oncesubmitted,thebatchsystemisfreetorunjobsinitsqueueinanyorderitchooses.mayrunafterdespitebeingsubmittedtothequeueearlier.Additionally,thejobsmaynotberuninparallelifthereareinsucientcomputeresourcesforallparalleljobs. N1 N2 N3 N4 N5 Fig.1.2.This\diamond"dagillustratesparentandchildlinks.mustcompletesuccessfully,thenbothandcanexecuteinparallel.Onlywhenbothofthemhavenishedsuccessfullycanbeginexecution.isadisconnectednodeandcanexecuteinparallelwithalloftheothernodes.Whilerunning,DAGMankeepsalistinmemoryofalljobsintheDAG,theirparent/childrelationships,andtheircurrentstatus.Giventhisinfor-mation,DAGMansubmitsjobstothebatchsystemwhenappropriate,andcontinuesuntileithertheDAGiscomplete,ornomoreforwardprogresscanbemadeduetofailedjobs.Inthelattercase,DAGMancreatesalistoffailedjobsalongwiththereasonsfortheirfailure,andproducesarescueDAGle. 1Work\rowManagementinCondor5 Waiting forN parents Submitted Failed Done(Successful) Fig.1.3.ThestatetransitiondiagramforeachnodeofaDAG.Seetextfordetails.ArescueDAGisaspecialDAGthatrepresentsthestateofapreviouslypartially-completedDAGsuchthattheoriginalDAGcanberestartedwhereitlefto,withoutrepeatinganysuccessfullycompletedwork.TherescueDAGisanexactcopyoftheoriginalinputDAG,exceptthatallpreviouslynodesthathavesuccessfullycompletedaremarkedasdone.WhenDAGManisrestartedwitharescueDAG,itreconstructsthestateofthepreviousDAG.Internally,DAGMankeepstrackofthecurrentstatusofeachnode.Figure 1.3 showsthebasicstatediagramofaDAGnode.WhenDAGManstarts,itmarkseachnodeas\waiting",andinitializesawaitingcount(N)forthenodeequalitsnumberofparents.InthecaseofrescueDAG,DAGMansetsthewaitingcountequalthenumberofparentswhicharenotalreadymarkedas\done".Anode'swaitingcountrepresentsthenumberofitsparentsthathaveyettocompletesuccessfully,andwhicharethereforepreventingitfrombeingsubmitted.Onlywhenanode'swaitingcountreacheszerocanDAGMansubmitthejobassociatedwiththenode.Ifthejobissubmittedsuccessfully,DAGManmarksthenodeas\submitted".Ifthejobsubmissionfailsforanyreason,thenodeismarkedas\failed".WhenDAGMandetectsthatajobhasleftthebatchsystemqueue,itmarksthenodeas\done"ifthejobexitedsuccessfully,oroherwisemarksit\failed".Successisdeterminedbytheexitcodeoftheprogram:ifitiszero,thenthejobexitedsuccessfully,otherwiseitfailed.(Butseethedescriptionofpost-scriptslaterinSection 1.3.2 foramodicationofthis.)Whenajobismarked\done",thewaitingcountofallitschildrenisdecrementedbyone.Anynodeswhosewaitingcountreacheszeroissubmittedtothebatchschedulerasdescribedearlier. 6P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegner \n\n\r \r \r \r\r \n\r\n\n \n\n \r\r Fig.1.4.AstatediagramforexecutingasingleDAGnode.UnlikeFigure 1.3 ,thisdiagramaddstheabilitytorunpre-scriptsandpost-scripts.ThedierencesfromFigure 1.3 arenotedinbold.1.3.2DAGManComplicationsSofar,thedescriptionofaDAGManDAGisnotveryinteresting:weexe-cutejobsandmaintaintheorderinwhichtheymustexecute,whileallowingparallelismwhenitispossible.Unfortunately,thisisinsucientinrealenvi-ronments,whichhavemanycomplicationsandsourcesoferrors.Complication:Setup,Cleanup,orInterpretationofaNodeTherstcomplicationoccurswhenusingexecutablesthatarenoteasilymodi-edtoruninadistributedcomputingenvironmentandthereforeneedasetuporcleanupsteptooccurbeforeorafterthejob.Forexample,beforeajobisrundatamayneedtobestagedfromatapearchiveoruncompressed.WhilethisstepcouldbeplacedinaseparateDAGnode,thismaycauseunneces-saryoverheadbecausetheDAGnodewillbesubmittedandscheduledasaseparatejobbythebatchsystem,insteadofrunningimmediatelyonthelocalcomputer.DAGManprovidestheabilitytorunaprogrambefore(apre-script)orafter(apost-script)ajobisrun.TheseprogramsshouldbelightweightbecausetheyareexecutedonthenodefromwhichtheDAGwassubmittedandalargeDAGmayexecutemanyofthesescripts.(ButseeSection 1.3.2 forawaythatDAGMancandealwiththis.)RunningthesescriptsaddscomplexitytothestatediagraminFigure 1.3 ThechangesneededtosupportscriptsareshowninFigure 1.4 .Onceajobisallowedtorun,itcanoptionallyrunapre-script.Afterthejobhasrun,itcanoptionallyrunapost-script. 1Work\rowManagementinCondor7 N1 N2 N3 Pre: If result(N1) = success do nothingelse rewrite N2 to empty job Pre: If result(N1) = failure do nothingelse rewrite N3 to empty job Fig.1.5.AnexampleconditionalDAG.Notethatifthescriptsfail,thenodeisconsideredtohavefailed,justasifthejobitselfhadfailed.Thereisoneinterestingcasetonote,whichisnoteasilyrepresentedinFigure 1.4 :ifanodehasapost-script,itwillneverdirectlygointothefailedstate,butwillalwaysrunthepost-script.Inthisway,thepost-scriptcandecideifajobhasreallyfailedornot.ItcandosomeanalysisbeyondtheDAGMan'sabilitydecideifanodeshouldbeconsideredtohavesucceededorfailedbasedonthewhethertheexitcodeiszeroornot.ThesignicantlyenhancestheabilityofDAGMantoworkwithexisitingcode.Someusershavediscoveredaninterestingwaytousepost-scripts.Theycreatepost-scriptsthatrewritetheirchildnodes'jobdescriptionletochangehowthejobruns.Thiscanbeusedforatleasttwopurposes.First,itcancreateconditionalDAGsbyallowingarun-timedecisionthatchangestheDAG.Forexample,considerFigure 1.5 .Ifsucceeds,thentheprescriptforwillrewritetoanemptyjob|perhapsrunningthe/bin/truecommand. 1 Inthisway,onlywillrunaftersucceeds.Similarly,iffails,thenonlywillrun.WhileamoregenericfacilityforconditionalDAGsmaybedesirable,itwouldaddcomplexity,andsimpleconditionalDAGscanbecreatedwith.Aseconduseforpre-scriptsistodolast-minuteplanning.Forexample,whensubmittingjobstoCondor-G(whichallowsjobstobesubmittedtoremotegridsitesinsteadofthelocalbatchsystem),userscanspecifyexactlywhichgridsitetheywishtheirjobstorunat.Apre-scriptcandecidewhatgridsiteshouldbeused,rewritethejobdescription,andthejobwillrunthere. InrecentversionsofCondor,thejobcanbeeditedtocontain\noop jobtruewhichleavestheexecutablenamealone,andimmediatelyterminatesthejosuccessfully 8P.Couvares,T.Kosar,A.Roy,J.Weber,andK.WegnerComplication:ThrottlingAllofthemechanismsdescribedsofarworkverywell.Unfortunately,therealworldappliesadditionalconstraints.ImagineaDAGthatcanhaveonethousandjobssimultaneouslyrunning,andeachofthemhasapre-scriptandapost-script.WhenDAGMancansubmitthejobs,itwillstartuponethou-sandnearlysimultaneouspre-scripts,thensubmitonethousandjobsnearlysimultaneously.Runingthatmanypre-scriptsmaycauseanunacceptableloadonthesubmissionmachine,andsubmittingthatmanyjobstotheunderlyingbatchsubmissionsystemmayalsostrainitscapacity.Forthisreason,DAG-Mancanthrottlethenumberofpre-scripts,jobs,orpost-scriptsthatmayrunatanytime.Thisresultsinanothermodicationtoourstatediagramforrunningasinglenode,asshowninFigure 1.6 . \n\n\r \r \r \n\r\n\n \n\n \r\r \n\r\n\n\r \n\r \n\n\r Fig.1.6.AstatediagramforexecutingasingleDAGnode.InadditiontothestateinFigure 1.4 ,thisdiagramaddsDAGMan'sabilitytothrottlepre-scripts,jobs,andpost-scripts.ThedierencesfromFigure 1.4 arenotedinbold.DAGMancanalsothrottlethenumberofjobsthatitsubmitstothebatchsystem.Thisnumbermightbesignicantlygreaterthanthenumberofrunningjobsothiscanpreventoverloadingthebatchsystem.Thisisagoodexampleofasurprisingadditionalconstraint:wedidnotrealizethatDAGsmightbeabletosubmitsomanyjobsthatthenumberofidlejobscouldoverwhelmthebatchsystem. 1Work\rowManagementinCondor9Complication:UnreliableapplicationsorsubsystemsSomeapplicationsarenotrobust|itisnotuncommontondaprogramthatsometimesfailstorunontherstattempt,butcompletessuccessfullyifgivenanotherchance.Sometimesitisduetoaprogramerror,sometimesduetointeractionswiththeenvironment,suchasa\rakynetworkedlesystem.Ideally,problemssuchasthiswouldalwaysbexedbeforetryingtoruntheprogram.Unfortunately,thisisnotalwayspossible,perhapsbecausetheprogramisclosed-sourceorbecauseoftimeconstraints.Tocopewithunreliableprogramsorenvironments,DAGmanprovidestheabilitytoretryanodeifitfails.Usersspecifyhowmanytimesthenodeshouldberetriedbeforedecidingthatithasactuallyfailed.Whenanodeisretried,thenode'spre-scriptisalsorunagain.Insomecases,auserwantstoretrymultipletimesunlesssomecatastrophicerroroccurs.DAGManhandlethiswiththe\retryunless-exit"featurewhichwillretryajobunlessitexitswithaparticularvalue.Oneplacethismightbeusefulisplanning:imagineapre-scriptthatdecideswhereajobshouldberun.Retrymightbesetto10,toallowthejobtoberunattendierentsites,butifthereissomecatastrophicerror,thenthepre-scriptcanexitwithaspecicvaluethatindicates\donotretry".Addingtheabilitytoretrythejobresultsinonenalchangetoourstatediagram,asshowninFigure 1.7 1.3.3AdditionalDAGManDetailsWewillbrie\rymentionseveralotherinterestingDAGManfeatures.RunningDAGManrobustlyWhathappensifthemachineonwhichDAGManisrunningcrashes?Al-thoughDAGManwouldnolongercontinuetosubmitjobs,existingjobscon-tinuerunning,butitwouldbeniceifDAGMancouldberestartedsothatitcouldcontinuemakingforwardprogress.Ideally,DAGManshouldhandleasmuchaspossiblefortheuser,sowerunDAGManasaCondorjob.Thismeansthatifthemachinecrashes,whenitrestartsCondorwillrestarttheDAGManprocess,whichwillrecoverthestateofitsexecutionfrompersis-tentlogles,andwillresumeoperation.Thissortofrobustnessisessentialinallowinguserstorunlargesetsofjobsina\hands-o"fashion.RecursiveDAGsADAGnodecandoanything,includingsubmittinganotherDAG.Thisal-lowsforthecreationofDAGswithconditionalbranchesinthem.TheDAGnodecanmakeachoice,thensubmitanindependentDAGbasedonthere-sultofthatchoice.ThiscanallowforverycomplexDAGstobeexecuted. 10P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegner \n\n\r \r \r \n\r\n\n \n\n \r\r \n\r\n\n\r \n\r \n\n\r Fig.1.7.ThecompletestatediagramforexecutingasingleDAGnode.ThesingledierencefromFigure 1.6 isnotedinbold.Unfortunately,italsomakesdebuggingaDAGharder.ForanalternativetorecursiveDAGs,seeSection 1.7 1.3.4DescribingaDAGItistheuser'sresponsibilitytoprovideDAGManwithadescriptionofeachjobintheformatoftheunderlyingbatchscheduler.ForCondor,thismeansassociatingeachnodewitha\submitle"describingthejobtobeexecuted.DAGManultimatelyusesthisletosubmitthejobtothebatchschedulerusingthestandardsubmissioninterface.UsersdescribeaDAGbylistingeachnodeandtherelationshipsbetweennodes.AsampleDAGdescriptionisshowninFigure 1.8 1.3.5DAGManExperienceDAGManhasbeenusedextensivelywiththeCondorbatchjobschedulingsystem.Inparticular,wehaveuseditformanagingsetsofjobsusingBLASTforproteinanalysis,setsofjobsforsimultationofeventsforhigh-energyphysics,andmanyotheruses.WehavefoundthatourimplementationoftheDAGManeasilyscalestolargeDAGsofaround1000nodeswithoutthrottlingandDAGsofaround100,000nodeswiththrottling.Webelieveitcouldscale 1Work\rowManagementinCondor11JobN1submit-n1JobN2submid-n2JobN3submid-n3JobN4submid-n4JobN5submid-n5ParentN1ChildN2N3ParentN2N3ChildN4RetryN15ScriptPREN5uncompress-dataScriptPOSTN5uncompess-dataFig.1.8.HowausermightdescribethediamonddagfromFigure 1.2 .Inthisdescription,nodeN1canberetried5timesandalloftheothernodesarenotretriediftheyfail.NodeN5hasbothapre-scriptandapost-scriptmuchfurtherthanthatifnecessary.BecauseDAGMancanmanageDAGsofthisscaleandbecausewendthatthegreatestbottleneckisintheunderlyingbatchjobsubmissionsystemscapabilities,wehavenotexpendedeorttooptimizeittoworkwithlargerDAGs.DAGManhasbeenusedinawidevarietyofproductionenvironments.Wewillprovidetwoexampleshere.WithintheCondorProject,wehavecreatedaBLAST[ 5 ]analysisser-vicefortheBiologicalMagenticResonanceDataBankattheUniversityofWisconsin-Madison.[ 6 ]BLASTndsregionsoflocalsimilaritybetweennucleotideorproteinsequences.Localresearchersdoweeklyqueriesagainstdatabasesthatareupdatedeveryweek.OurservicetakesalistofsequencestoqueryandcreatesapairofDAGstoperformthequeries,asillustratedinFigure 1.9 .TherstDAGperformsthesetup,thecreationofasecondDAGthatdoesthequeries(thenumberofnodesinthisDAGvaries,soitisdynamicallycreated),andthenassembletheresults.TheseDAGsareuseddierently:therstDAGusesdependenciestoorderthejobsthatarerun,whilethesecondDAGhascompletelyindependentnodesandDAGManisusedforreliableexecutionandthrottling.Onaverage,thesecondDAGhasapproximately1000nodes,butwehaveexperimentedwithasmanyas200,000nodes.Thisservicehasrunonaweeklybasisformorethantwoyearswithlittlehumansupervision.TheVirtualDataSystem(VDS)[ 9 ]buildsontopofDAGManandCondor-G.Usersprovideadescriptionofwhatdataisavailableandhowthedatacanbetransformed,thenrequestthedatatheyneed.TheVDScreatesaDAGthatfetchesandtransformsdataasneeded,whiletrackingtheprovenanceofthedata.AspartoftheDAGcreationandexecution,theVDSusesplanningtodecidewhichgridsitesshouldperformthetransformations.TheVDShas 12P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegner \n\r \r\r \n\r\r\r \r !" #\r !!$!" #\r% #\r& #\r' Fig.1.9.ThepairofDAGsusedtorunBLASTjobs.ThequeryDAGiscreatedbythemainDAG.Seetextfordetails.beenusedforawidevarietyofapplicationsincludinghigh-energyphysicseventsimulation,galaxyclusternding,andgenomeanalysis.1.4ImplementationStatusDAGManhasbeenfreelydistributedaspartoftheCondorsoftwaresince1999.Ithasbeenusedfornumerouslargeprojects,andisstable.ItisavailableforawidevarietyofUnixplatforms,andMicrosoftWindows.1.5InteractionwithCondorCondorisahigh-throughputbatchjobscheduler.Becauseithasbeencoveredindetailelsewhere([ 17 ],[ 20 ]),weonlybrie\ryreviewithere.CondorwasoriginallydesignedtoutilizeCPUcyclesoncomputersthatwouldotherwisebeidle,suchasdesktopcomputersthatareunusedbutturnedonovernight.However,Condorhasexpandeditsreachandnowworkswellwithdedicatedcomputersandgridsystems.Condor'sabilitytointeractwithgridsystem,calledCondor-G[ 10 ],allowsCondortosubmitjobstoGlobus[ 11 (versions2,3,and4),NorduGrid,Oracle,LSF,PBS,andevenremoteCondorinstallations(referredtoasCondor-C).CondorandCondor-Gemphasizereliability.IfCondorcrashes,itwillcon-tinuerunningthejobswhenitrestarts.Condorcanprovidejobcheckpointingandmigrationtofacilitaterecoverwhenexecutioncomputersfail.Condor-G 1Work\rowManagementinCondor13provideselaboraterecoveryschemestodealwithnetworkoutagesandremotegridsitefailures.DAGManisbuilttouseCondorforjobexecution,anditcansubmitjobstboththelocalbatchsystemandremotegridsystemswithequalease.Wehavecreatedmanywork\rowsusingDAGManthatexecuteinagridenvironment.1.6IntegrationwithStork1.6.1AnIntroductiontoStorkJustascomputationandnetworkresourcesneedtobecarefullyscheduledandmanaged,theschedulingofdataplacementactivitiesallacrossthedistributedcomputingsystemsiscrucialbecausetheaccesstodataisgenerallythemainbottleneckfordataintensiveapplications.Thisisespeciallythecasewhenmostofthedataisstoredontapestoragesystems,whichslowsdownaccesstodataevenfurtherduetothemechanicalnatureofthesesystems.Thecurrentapproachtosolvethisproblemofdataplacementiseitherdoingitmanually,oremployingsimplescripts,whichdonothaveanyau-tomationorfaulttolerancecapabilities.Theycannotadapttoadynamicallychangingdistributedcomputingenvironment.Theydonothavetheprivi-legesofajob,theydonotgetscheduled,andgenerallyrequirebaby-sittingthroughouttheprocess.Dataplacementactivitiesmustberstclasscitizensinthedistributedcomputingenvironmentsjustlikethecomputationaljobs.Theyneedtobequeued,scheduled,monitored,andevencheck-pointed.Moreimportantly,itmustbemadesurethattheycompletesuccessfullyandwithoutanyneedforhumanintervention.Moreover,dataplacementjobsshouldbetreateddierentlyfromcompu-tationaljobs,sincetheyhavedierentsemanticsanddierentcharacteristics.Forexample,ifthetransferofalargelefails,wemaynotsimplywanttrestartthejobandre-transferthewholele.Rather,wemayprefertrans-ferringonlytheremainingpartofthele.Similarly,ifatransferusingoneprotocolfails,wemaywanttotryotherprotocolssupportedbythesourceanddestinationhoststoperformthetransfer.Wemaywanttodynamicallytuneupnetworkparametersordecideconcurrencylevelforspecicsource,destinationandprotocoltriples.Atraditionalcomputationaljobschedulerdoesnothandlethesecases.Forthispurpose,dataplacementjobsandcom-putationaljobsshouldbedierentiatedfromeachotherandeachshouldbesubmittedtospecializedschedulersthatunderstandtheirsemantics.Wehavedesignedandimplementedtherstbatchschedulerspecializedfordataplacement:Stork[ 16 ].Thisschedulerimplementstechniquesspecictoqueuing,scheduling,andoptimizationofdataplacementjobs,andprovidesalevelofabstractionbetweentheuserapplicationsandtheunderlyingdatatransferandstorageresources. 14P.Couvares,T.Kosar,A.Roy,J.Weber,andK.WegnerAproductionlevelStorkisbundledwithCondorreleases.Additionally,researchintonewfeaturesiscontinuinginparallel.1.6.2DataPlacementJobTypesUnderStork,dataplacementjobsarecatagorizedintothethreefollowintypes:transferThisjobtypeisfortransferringacompleteorpartiallefromonephysicallocationtoanotherone.Thiscanincludeagetorputoperationorathirdpartytransfer.Storksupportsavarietyofdatatransferprotocols,including:locallesystem,GridFTP,FTP,HTTP,NeST,SRB,SRMandUniTree.Further,sitescancreatenewtransfermodulesusingtheStorkmodularAPI.allocateThisjobtypeisusedforallocatingstoragespaceatthedestinationsite,allocatingnetworkbandwidth,orestablishingalight-pathontheroutefromsourcetodestination.Basically,itdealswithallnecessaryresourceallocationspre-requiredfortheplacementofthedata.releaseThisjobtypeisusedforreleasingthecorrespondingresourcewhichisallocatedbefore.1.6.3FlexibleJobRepresentationStorkusestheClassAd[ 19 ]jobdescriptionlanguagetorepresentthedataplacementjobs.TheClassAdlanguageprovidesavery\rexibleandextensibledatamodelthatcanbeusedtorepresentarbitraryservicesandconstraints.Belowarethreesampledataplacement(DaP)requests::dap_type="allocate";dest_host="houdini.example.com";size="200MB";duration="60minutes";allocation_id=1;;dap_type="transfer";src_url="file:////data/example.dat";dest_url="gsiftp://houdini.example.com/data/example.dat";;dap_type="release";dest_host="houdini.example.com";allocation_id=1;Therstrequestistoallocate200MBofdiskspacefor1houronaNeSTserver.ThesecondrequestistotransferalefromanSRBservertotheallocatedspaceontheNeSTserver.Thethirdrequestistode-allocatethepreviouslyallocatedspace. 1Work\rowManagementinCondor151.6.4FaultToleranceDataplacementapplicationsmustoperateinanenvironmentoffaults.Datserversmaybeunavailableformanyreasons.Remoteandlocalnetworksmayencounteroutagesorcongestion.hostonthelocalareanetwork,includingthehostrunningStork.Storkisequippedtodealwithavarietyofdataplacementfaults,whichcanbeconguredatboththesystem,andjoblevel.Fortransientenvironmentfaults,dataplacementjobsthatfailcanberetriedafterasmalldelay.Thenumberofretriesallowedisasystemcong-uration.ForlongertermfaultsassociatedwithaparticulardataserverStorkcanalsoretryafailedtransferusingalistofalternatedataprotocols.Ifinthepre-viousexample,thehostghidorac.sdsc.eduisalsorunningaGridFTPserver,accessedviathegsiftpprotocol,thecorrespondingtransferjobcouldbeaug-mentedtoretryatransferfailurefromtheprimarySRBtoNeSTprotocols,withatransferattemptfromthegsiftptoNeSTprotocols::dap_type="transfer;src_url="srb://ghidorac.sdsc.edu/home/kosart.condor/1.dat";dest_url="nest://turkey.cs.wisc.edu/1.dat";alt_protocols="gsiftp-nest";1.6.5InteractionwithDAGMan \n\r \n\r \r Fig.1.10.AsimpleDAGthatincludesStorkDAGManhasbeenextendedtoworkwellwithStork.Inadditiontospeci-fyingcomputationaljobs,dataplacementjobscanbespecied,andDAGManwillsubmitthemtoStorkforexecution.Thisallowsforstraightforwardexe-cutionofwork\rowsthatincludedatatransfer.AsimpleexampleofhowStorkcanbeusedwithDAGManappearsinFigure 1.10 .ThisDAGtransfersdatatoagridsite,executesthejobatthatsite,thentransferstheoutputdatabacktothesubmissionsite.ThisDAGcouldeasilybeenhancedtoallowspaceallocationbeforethedatatransfers,orcouldhavemultipledatatransfers. 16P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegner1.6.6InteractionwithHeterogeneousResourcesStorkactslikeanI/Ocontrolsystem(IOCS)betweentheuserapplicationsandtheunderlyingprotocolsanddatastorageservers.Itprovidescompletemodularityandextendibility.Theuserscanaddsupportfortheirfavoritestoragesystem,datatransportprotocol,ormiddlewareveryeasily.Thisisaverycrucialfeatureinasystemdesignedtoworkinaheterogeneousdis-tributedenvironment.Theusersorapplicationsmaynotexpectallstoragesystemstosupportthesameinterfacestotalktoeachother.Andwecannotexpectallapplicationstotalktoallthedierentstoragesystems,protocols,andmiddleware.Thereneedstobeanegotiatingsystembetweenthemwhichcaninteractwiththosesystemseasilyandeventranslatedierentprotocolstoeachother.Storkhasbeendevelopedtobecapableofthis.ThemodularityofStorkallowsuserstoinsertaplug-intosupportanystoragesystem,protocol,ormiddlewareeasily.Storksupportsseveraldatatransferprotocols,including:FTP[ 18 GridFTP[ 1 ].Storksupportsthird-partydatatransferwithGridFTP,elim-inatingtheneedforlocalstorage.HTTP[ 8 DiskRouter[ 14 Storksupportsseveraldatastoragesystems,including:SRB[ 3 UniTree[ 7 NeST[ 4 Storkmaintainsalibraryofpluggable\dataplacement"modules.ThesemodulesgetexecutedbydataplacementjobrequestscomingintoStork.Theycanperforminter-protocoltranslationseitherusingamemorybuerorthird-partytransferswheneveravailable.Inter-protocoltranslationsarenotsup-portedbetweenallsystemsorprotocolsyet.Inordertotransferdatabetweensystemsforwhichdirectinter-protocoltranslationisnotsupported,twoconsecutiveStorkjobscanbeusedinstead.TherstStorkjobperformstransferfromthesourcestoragesystemtothelocaldiskcacheofStork,andthesecondStorkjobperformsthetransferfromthelocaldiskcacheofStorktothedestinationstoragesystem.1.6.7ModularAPIWhiletheStorkserverisasingleprocess,thedatatransfers,allocations,etc,areperformedbyseparatemodules.Theapplicationprograminterfacetothemodulesissimpleenoughforsitestowritetheirownmodulesasneeded.Forexample,eachdatatransfermoduleisexecutedwiththefollowingargumentlist: 1Work\rowManagementinCondor17src urldest urlarguments...Thus,towriteanewmodulethattransfersdatafromthefoopro-tocoltothebarprotocol,anewmoduleiscreatedwiththethename:stork.transfer.foo-bar.Modulesareexecutableprogramsandmaybewrittenasshellscripts.Further,modulebindingisperformedatruntime,enablingsitestocreatenewmoduleswithoutrestartingtheStorkserver.1.6.8PerformanceEnhancementsStorkhasseenseveralrecentperformanceenhancements.TherstisthatMultipledataplacementsmaynowbespeciedinasinglesubmissionleinsteadofmultipleles.ThisoptimizationissignicantwhentransferingmanylestoStorkbecauseiteliminatesextrainvocationsofthecommandwhichcanbesurprisinglytimeconsumingwhentransferringtensofthousandsofles.ThesecondenhancementwasintegrationoftheGridFtpclient,intoStorkserver.WhendoingmanysimultaneousGridFTPletransfers,thissavesconsiderabletimeandreducesthetotalnumberofprocessesinthesys-tem.Finally,Storkisnowabletoexecuteanarbitraryprogramwhentheactivejobqueuesizefallsbelowacongurablelevel.Thisisenvisionedasasimple,buthighperformancealternativetomanagingverylargedataplacementwork\rowswithDAGManbecauseitwillallowStorkuserstolimittherateatwhichtheysubmitjobssothatStorkisnotoverwhelmed,whileensuringthatStorkhassucientworktodoatanygiventime.1.6.9ImplementationStatusStorkisavailable,withallfeaturesdescribedsofar,aspartoftheCondordistribution.ItisavailableonLinux,andwillbeavailableforotherplatformsinthefuture.UsersoutsideoftheCondorProjectarejustbeginningtouseStorkinproduction,andwehopetohavemoreinthenearfuture.1.6.10ActiveResearchResearchonStorkisactive,andmuchofitcanbefoundin[ 15 ].Researchincludes:Extendingdataplacementtypestoincludeinteractionwithametadatacatalog.Experimentationwithschedulingtechniquesotherthanrst-in,rst-out.Thisincludesnotonlytraditionalschedulingtechniquessuchasshortest-jobrstormultilevelqueuepriorityscheduling,butalsoincludesschedul-ingbasedonmanagementofstoragespacetoensurethatstoragespaceisnotexceededbythedatatransfers.Inaddition,schedulingofconnection 18P.Couvares,T.Kosar,A.Roy,J.Weber,andK.Wegnermanagementisimportantwhentherearemanysimultaneousconnectionstoaserver.Runtimeadaptioncanbeperformedtotunethenetworkparametersforatransfertominimizetransfertime.Thisisdiscussedinfurtherdetailin 13 ].ResearchhasbeendonetoenableStorktodetectproblemssuchasserversthatareunreliableinavarietyofways,thenbaseschedulingdecisionsonthisknowledge.Moredetailsarein[ 12 ].1.7FutureDirectionsThereareseveralpromisingareasforfutureworkwithDAGManandStork.ForDAGMan,wewouldliketoexploremethods(probablyutilizingClas-sAds[ 19 ])toallowdierentconditionsfordecidingwhenanodeshouldexe-cute.Today,anodeexecuteswhenallofitsparentshavenishedwithanexitcodeof0,butallowingmorecomplexconditionswouldallowconditionalexe-cution(equivalenttoif-then-else)andpartialexecution(aDAGthatnisheswhenacertainpercentageofnodeshavecompleted.)WealsowouldliketosupportdymamicDAGs,whichareDAGsthatcanchangeon-the-\ry,basedonuserinput.Thishasbeenfrequentlyrequestedbyusers,andisparticularlyusefulwhenDAGManisusedbyahigher-levelschedulingsystemthatmaychangeplansinreactiontocurrentconditions.ForStork,weareexploringwaystomakeitmorescalableandmorereli-able.Wearealsoinvestigatingmethodstomakeusingmatchmaking,similartothatinCondor,toselectwhichdatatransfersshouldberunandwhichsitestheyshouldtransferdatatoo.1.8ConclusionsDAGManisareliablework\rowmanagementsystem.Althoughthework\rowsitsupportsarerelativelysimple,therearemanycomplexitiesthatweredis-coveredasweusedDAGManthroughtheyears,suchastheneedfor\rexiblemethodsforretryingandthrottling.Asaresultofourexperience,DAGManhasfoundfavorwithmanyusersinproductionenvironments,andsoftwarehasbeencreatedthatreliesonDAGManforexecution.UsedwithCondor,Condor-G,andStork,DAGManisapowerfultoolforwork\rowexecutioningridenvironments. References1.W.Allcock,J.Bester,J.Bresnahan,A.Chervenak,I.Foster,C.Kesselman,S.Meder,V.Nefedova,D.Quesnel,andS.Tuecke.Datamanagementandtransferinhigh-performancecomputationalgridenvironments.ParallelCom-puting,28(5):749{771,May2002.2.A.Baker.TheSecretWaysofAlBaker.TheMiracleFactory,2003.3.C.Baru,R.Moore,A.Rajasekar,andM.Wan.TheSDSCstorageresourcebroker.InProceedingsofCASCON,1998.4.J.Bent,V.Venkataramani,N.LeRoy,A.Roy,J.Stanley,A.Arpaci-Dusseau,R.Arpaci-Dusseau,andM.Livny.NeST-agridenabledstorageappliance.InJ.Weglarz,J.Nabrzyski,J.Schopf,andM.Stroinkski,editors,GridResourceManagement.KluwerAcademicPublishers,2003.5.Basiclocalalignmentsearchtool(BLAST). http://www.ncbi.nlm.nih.gov/blast/ ,2006.6.Biologicalmagneticresonancedatabank. http://www.bmrb.wisc.edu/ ,2006.7.M.Butler,R.Pennington,andJ.A.Terstriep.MassstorageatNCSA:SGIDMFandHPUniTree.InProceedingsof40thCrayUserGroupConference1998.8.R.Fielding,J.Gettys,J.Mogul,H.Frystyk,L.Masinter,P.Leach,andT.Berners-Lee.Hypertexttransferprotocol{HTTP/1.1.InternetRFC2616Jun1999.9.I.Foster,J.Voeckler,M.Wilde,andY.Zhao.Chimera:Avirtualdatasystemforrepresenting,querying,andautomatingdataderivation.In14thInternationalConferenceonScienticandStatisticalDatabaseManagement(SSDBM'02)2002.10.J.Frey,T.Tannenbaum,I.Foster,M.Livny,andS.Tuecke.Condor-G:Acom-putationmanagementagentformulti-institutionalgrids.ClusterComputing5:237{246,2002.11.TheGlobusAlliance.Seewebsiteat12.G.Kola,T.Kosar,andM.Livny.Aclient-centricgridknowledgebase.InPro-ceedingsof2004IEEEInternationalConferenceonClusterComputing,pages431{438,SanDiego,CA,September2004.IEEE.13.G.Kola,T.Kosar,andM.Livny.Run-timeadaptationofgriddata-placementjobs.ParallelandDistributedComputingPractices,2004. 20References14.G.KolaandM.Livny.Diskrouter:A\rexibleinfrastructureforhighperformancelargescaledatatransfers.TechnicalReportCS-TR-2003-1484,UniversityofWisconsin-MadisonComputerSciencesDepartment,2003.15.T.Kosar.DataPlacementinWidelyDistributedSystems.PhDthesis,Univer-sityofWisconsin-Madison,2005.16.T.KosarandM.Livny.Stork:Makingdataplacementarstclasscitizeninthegrid.InParallelandDistributedComputingPractices,2004.17.M.Litzkow,M.Livny,andM.Mutka.Condor-ahunterofidleworkstations.Proceedingsofthe8thInternationalConferenceonDistributedComputingSystems,pages104{111,June1988.18.J.PostelandJ.Reynolds.Filetransferprotocol(FTP).InternetRFC959,Oct1985.19.R.Raman,M.Livny,andM.Solomon.Matchmaking:Distributedresourcemanagementforhighthroughputcomputing.InProceedingsoftheSeventhIEEEInternationalSymposiumonHighPerformanceDistributedComputing(HPDC7),Chicago,IL,July1998.20.D.Thain,T.Tannenbaum,andM.Livny.Distributedcomputinginpractice:thecondorexperience.Concurrency-PracticeandExperience,17(2-4):323{356,2005.21.W.Woods.What'sinalink:Foundationsforsemanticnetworks.InD.BobrowandA.Collins,editors,RepresentationandUnderstanding:StudiesinCognitiveScience.NewYork:AcademicPress,1975.