/
TheVLDBJournal TheVLDBJournal

TheVLDBJournal - PDF document

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
361 views
Uploaded On 2015-11-03

TheVLDBJournal - PPT Presentation

DOI101007s007780140357y REGULARPAPER TheStratosphereplatformforbigdataanalytics AlexanderAlexandrov ID: 181785

DOI10.1007/s00778-014-0357-y REGULARPAPER TheStratosphereplatformforbigdataanalytics AlexanderAlexandrov

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "TheVLDBJournal" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

TheVLDBJournal DOI10.1007/s00778-014-0357-y REGULARPAPER TheStratosphereplatformforbigdataanalytics AlexanderAlexandrov · RicoBergmann · StephanEwen · Johann-ChristophFreytag · FabianHueske · ArvidHeise · OdejKao · · UlfLeser · VolkerMarkl · FelixNaumann · MathiasPeters · AstridRheinländer · MatthiasJ.Sax · SebastianSchelter · MareikeHöger · KostasTzoumas · DanielWarneke ©Springer-VerlagBerlinHeidelberg2014 Abstract WepresentStratosphere,anopen-sourcesoft- warestackforparalleldataanalysis.Stratospherebrings togetherauniquesetoffeaturesthatallowtheexpressive, easy,andefÞcientprogrammingofanalyticalapplications atverylargescale.StratosphereÕsfeaturesincludeÒinsituÓ dataprocessing,adeclarativequerylanguage,treatmentof user-deÞnedfunctionsasÞrst-classcitizens,automaticpro- StratosphereisfundedbytheGermanResearchFoundation(DFG) A.Alexandrov · S.Ewen · F.Hueske · M.Hšger · O.Kao · M.Leich · V.Markl · S.Schelter · K.Tzoumas( B ) TechnischeUniversitŠtBerlin,Berlin,Germany e-mail:kostas.tzoumas@tu-berlin.de A.Alexandrov e-mail:alexander.alexandrov@tu-berlin.de S.Ewen e-mail:stephan.ewen@tu-berlin.de F.Hueske e-mail:fabian.hueske@tu-berlin.de M.Hšger e-mail:mareike.hoger@tu-berlin.de O.Kao e-mail:odej.kao@tu-berlin.de M.Leich e-mail:marcus.leich@tu-berlin.de V.Markl e-mail:volker.markl@tu-berlin.de S.Schelter e-mail:sebastian.schelter@tu-berlin.de R.Bergmann · J.-C.Freytag · · M.Peters · A.RheinlŠnder · M.J.Sax Humboldt-UniversitŠtzuBerlin,Berlin,Germany e-mail:bergmann@informatik.hu-berlin.de gramparallelizationandoptimization,supportforiterative programs,andascalableandefÞcientexecutionengine. StratospherecoversavarietyofÒBigDataÓusecases,such asdatawarehousing,informationextractionandintegration, datacleansing,graphanalysis,andstatisticalanalysisappli- turedesigndecisions,introduceStratospherethroughexam- plequeries,andthendiveintotheinternalworkingsofthe systemÕscomponentsthatrelatetoextensibility,program- mingmodel,optimization,andqueryexecution.Weexperi- mentallycompareStratosphereagainstpopularopen-source alternatives,andweconcludewitharesearchoutlookforthe nextyears. J.-C.Freytag e-mail:freytag@informatik.hu-berlin.de U.Leser e-mail:leser@informatik.hu-berlin.de M.Peters e-mail:mathias.peters@informatik.hu-berlin.de e-mail:rheinlae@informatik.hu-berlin.de M.J.Sax e-mail:mjsax@informatik.hu-berlin.de A.Heise · F.Naumann HassoPlattnerInstitute,Potsdam,Germany e-mail:arvid.heise@hpi.uni-potsdam.de F.Naumann e-mail:felix.naumann@hpi.uni-potsdam.de D.Warneke InternationalComputerScienceInstitute,Berkeley,CA,USA e-mail:warneke@icsi.berkeley.edu 123 A.Alexandrovetal.KeywordsBigdataParalleldatabasesQueryprocess-QueryOptimizationDatacleansingTextminingGraphprocessingDistributedsystems1IntroductionWeareinthemidstofaÒBigDataÓrevolution.Theplungingcostofhardwareandsoftwareforstoringdata,acceleratedbycloudcomputing,hasenabledthecollectionandstorageofhugeamountsofdata.Theanalysisandexplorationofthesedatasetsenabledata-drivenanalysisthathasthepotentialtoaugmentorevenreplacead-hocbusinessdecisions.Forexample,webcompaniestrackuserbehaviortooptimizetheirbusiness.LargescientiÞcexperimentsandsimulationscollecthugeamountsofdata,andscientistsanalyzethesetoformorvalidatehypotheses.CommercialRDBMSproductscannotcopewiththescaleandheterogeneityofthecollecteddatasets,andtheirpro-gramminganddatamodelisnotagoodÞttothenewanaly-sisworkßows.Thesereasonshaveledtoareconsiderationofmethodsformanagingdataatscale,leadingtonewsoftwareartifactsdevelopedbyacademiaandindustry.TheÒBigDataÓsoftwareecosystemincludesdistributedÞlesystems[paralleldataanalysisplatforms[],dataprogramminglanguages[],andmorespe-cializedtoolsforspeciÞcdatadomains[WepresentStratosphere,adataanalyticsstackthatenablestheextraction,analysis,andintegrationofhetero-geneousdatasets,rangingfromstrictlystructuredrelationaldatatounstructuredtextdataandsemi-structureddata.TheStratospheresystemcanperforminformationextractionandintegration,traditionaldatawarehousinganalysis,modeltraining,andgraphanalysisusingasinglequeryprocessor,compiler,andoptimizer.Stratospherebringstogetherauniquesetoffeaturesthatwebelieveareanessentialmixforsupportingdiverseana-lyticalapplicationsonÒBigData.ÓFirst,webelievedataanalystsaremoreproductivewhenusingdeclarative,high-levellanguagesratherthanlow-levellanguages.Stratosphereincludessuchaquerylanguage.Inaddition,thesystemcanserveasasuitablecompilationplat-formforseveralotherlanguagesfordifferentdomains.Byofferinganextensibleintermediatelayer,andbyexposingseverallayersofthesystemstackasprogrammingmod-elswithanunderlyingoptimizer,querylanguagescanbecompiledtoStratospherewithlesseffort(e.g.,oftenwithoutimplementinganownoptimizerforthespeciÞclanguage),andthiscompilationcanleadtobetterperformance.Second,StratosphereenablesÒinsituÓdataanalysisbyconnectingtoexternaldatasources,suchasdistributedÞlesystemsthatoftenactastheÒlandingpointsÓofheteroge-neousdatasourcesfromvariousorganizations.Thatway,anexpensivedataloadingprocessisnotneeded;Stratospheredoesnotstoredata,butonlyconvertsittooptimizedbinaryformatsaftertheinitialscans.Third,StratosphereusesarichersetofprimitivesthanMapReduce,includingprimitivesthatallowtheeasyspec-iÞcation,automaticoptimization,andefÞcientexecutionofjoins.Thismakesthesystemamoreattractivecompilationplatformfordatawarehousinglanguagesandapplications.Fourth,Stratospheretreatsuser-deÞnedfunctions(UDFs)asÞrst-classcitizensthroughoutthesystemÕsstack,includ-ingthesystemÕsoptimizer.Thiswidenstheapplicabilityandextensibilityofthesystemtoproblemdomainsbeyondtradi-tionaldatawarehousingqueries,suchasinformationextrac-tionfromtextualdataandinformationintegration.Fifth,Stratosphereincludesaqueryoptimizerthatauto-maticallyparallelizesandoptimizesdataanalysisprograms.Theprogrammerdoesnotneedtoworryaboutwritingpar-allelcodeorhand-pickingajoinorder.Sixth,Stratosphereincludessupportforiterativepro-grams,programsthatmakerepeatedpassesoveradatasetupdatingamodeluntiltheyconvergetoasolution.ThisenablesthespeciÞcation,optimization,andexecutionofgraphanalyticsandstatisticalapplicationsthedataprocessingengine.Suchapplicationsareintegratedwithdatapre-andpostprocessingwithinasingleanalyticalpipeline,cross-optimized,andexecutedbythesamesystem.Finally,Stratosphereusesanexecutionenginethatincludesexternalmemoryqueryprocessingalgorithmsandnativelysupportsarbitrarilylongprogramsshapedasdirectedacyclicgraphs.Stratosphereoffersbothpipeline(inter-operator)anddata(intra-operator)parallelism.Stratosphereisalayeredsystemthatoffersseveralpro-grammingabstractionstoauser.Wediscussthemintop-downorder,higher-levelabstractionsbeingmoredeclarativeandamenabletoautomaticoptimization.TheMeteorquerylanguage[]offersadeclarativeabstractionforprocessingsemi-structureddata.ThePACTprogrammingmodel[]isanalogousto,andinfactageneralizationof,theMapRe-ducemodel;PACTsofferamoderatelylow-levelprogram-mingabstractionconsistingofaÞxedsetofparallelizationprimitivesandschema-lessdatainterpretedbyuser-deÞnedfunctionswritteninJava.Thislevelofabstractionisespe-ciallyusefulforimplementingcomplexoperatorsthatdonotÒÞtÓinaquerylanguage.Finally,theNepheleprogrammingabstraction[]allowsapowerusertospecifycustompar-allelizationschemes.Overthelastyears,ourresearchinStratospherehasadvancedthestateoftheartindatamanagementinsev-eralaspects.Weproposedadataprogrammingmodelbasedonsecond-orderfunctionstoabstractparallelization[],amethodthatusesstaticcodeanalysisofuser-deÞnedfunc-tionstoachievegoalssimilartodatabasequeryoptimizationinaUDF-heavyenvironment[],abstractionstointe- Stratosphereplatformforbigdataanalyticsgrateiterativeprocessinginadataßowsystemwithgoodper-formance[],anextensiblequerylanguageandunder-lyingoperatormodel[],techniquestoinfercloudtopolo-giesanddetectbottlenecksindistributedexecution[aswellastechniquestoexploitdynamicresourcealloca-tion[]andevaluatecompressionschemes[].Wehave,Þnally,evaluatedStratosphereonavarietyofanalyticalusecases[ThecontributionofthispaperlieslessintheindividualresearchÞndingsandmoreinplacingtheseÞndingsintoalargerperspective.We,fortheÞrsttime,presentthearchitec-tureoftheStratospheresystemasawholeandtheinterplaybetweenvariouscomponentsthathaveappearedinindivid-ualpublications.Wediscussindetailthequeryoptimiza-tionprocessinStratosphere.Inaddition,weconductanextensiveexperimentalstudyagainsttheopen-sourcestateoftheart.Finally,wediscusslessonslearnedfrombuild-ingthesystemandofferourresearchoutlookforthenextyears.Stratosphereisanopen-sourceprojectavailableatundertheApachelicense.Therestofthispaperdescribesthearchitecture,interfaces,applications,andinternalworkingsofthesystematitscurrentstage,aswellashighlightsseveralresearchinnovationsthatadvancethestateoftheartinmassivelyparalleldataprocess-ing.SectionpresentsanoverviewoftheStratospheresys-tem.SectiondiscussesMeteor,theStratospherequerylan-guage,fromanend-userperspective,exempliÞedviatwousecases.SectionpresentstheunderlyingextensibleSopremooperatormodel.SectionpresentsthePACTprogrammingmodel,themodelofparallelizationusedinStratosphere.presentstheoptimizationphasesandtechniquesemployedinStratosphere.SectiondiscusseshowprogramsareactuallyexecutedinStratospherebytheNepheledistrib-uteddataßowengineandStratosphereÕsruntimeoperators.experimentallycomparesStratospherewithotheropen-sourcesystems.SectiondiscussesongoingworkbytheStratospheregroup.Finally,Sect.discussesrelatedwork,andSect.concludesandoffersaresearchoutlook.2SystemarchitectureTheStratospheresoftwarestackconsistsofthreelayers,termedtheSopremoPACT,andlayers.EachlayerisdeÞnedbyitsownprogrammingmodel(theAPIthatisusedtoprogramdirectlythelayerorusedbyupperlayerstointeractwithit)andasetofcomponentsthathavecer-tainresponsibilitiesinthequeryprocessingpipeline.ThissectionpresentstheoverallStratospherearchitecture,brießysketchesthepurposeandresponsibilitiesofeachlayer,andhighlightstheirinteractions.Inaddition,thesectionestab-lishestheterminologythatisusedintherestofthispaper.ThemainmotivationbehindseparatingtheStratospheresysteminthreelayerswithdifferentprogrammingmodelsistoprovideuserswithachoiceregardingthedeclarativityoftheirprogramsandtohavedifferentcompilationtargetswhentheÒusersÓarelanguagecompilers.Whilethepro-grammingmodelofthehighestlayer,Sopremo,exhibitsthehighestdegreeofdeclarativityandisamenabletosimilaroptimizationsasinrelationaldatabases,thesubjacentPACTandNephelelayersgraduallytradedeclarativityforexpres-siveness.Throughaseriesofcompilationsteps,Stratospherecantranslatethehigher-layerprogramsintolower-layerpro-grams,therebyexploitingtherichersemanticsofthehigher-levelprogrammingmodelsforautomaticoptimizationsineachcompilationstep.sketchesStratosphereÕsarchitectureandillus-tratesthefunctionalityeachofthethreelayersprovides.Intheremainderofthissection,weintroduceeachlayeroftheStratospherestackintop-downorder.SopremoisthetopmostlayeroftheStratospherestack.ASopremoprogramconsistsofasetoflogicaloperatorscon-nectedinadirectedacyclicgraph(DAG),akintoalogicalqueryplaninrelationalDBMSs.ProgramsfortheSopremolayercanbewrittenin,anoperator-orientedquerylanguagethatusesaJSON-likedatamodeltosupporttheanalysisofunstructuredandsemi-structureddata.Meteorsharessimilarobjectivesashigher-levellanguagesofotherbigdatastacks,suchasPig[]orJaql[]intheHadoopecosystem,butishighlightedbyextensibilityandtheseman-ticallyrichoperatormodelSopremo,whichalsolendsitsnametothelayer.ThroughSopremo,domainspecialistscaneasilyintegrateapplication-speciÞcfunctionsbyextendingSopremoÕssetofoperators,enablingautomaticoptimizationatcompiletimefordifferentdomains.OnceaMeteorscripthasbeensubmittedtoStratosphere,theSopremolayerÞrsttranslatesthescriptintoanopera-torplan.Moreover,thecompilerwithintheSopremolayercanderiveseveralpropertiesoftheplan,whichcanlaterbeexploitedforthephysicaloptimizationoftheprograminthesubjacentPACTlayer.TheMeteorlanguageispresentedbymeansofexamplesinSect..DetailsabouttheSopremolayerandtheoptimizationprocessaredescribedinSects.TheoutputoftheSopremolayerand,atthesametime,inputtothePACTlayeroftheStratospheresystemisaPACTprogram.PACTprogramsarebasedonthePACTpro-grammingmodel,anextensiontotheMapReduceprogram-mingmodel.SimilartoMapReduce,thePACTprogrammingmodelbuildsupontheideaofsecond-orderfunctions,calledPACTs.EachPACTprovidesacertainsetofguaranteesonwhatsubsetsoftheinputdatawillbeprocessedtogether,andtheÞrst-orderfunctionisinvokedatruntimeforeachofthese PACTisaportmanteauforÒparallelizationcontract.Ó A.Alexandrovetal. Fig.1TheStratospheresoftwarestack.FunctionalityisdistributedintothreelayerscharacterizedbytheirdistinctAPIs(programmingmodels).Stratosphereconnectstopopularopen-sourcesoftwareforresourcemanageranddatastoragesubsets.Thatway,theÞrst-orderfunctionscanbewritten(orgeneratedfromaSopremooperatorplan)independentlyoftheconcretedegreeofparallelismorstrategiesfordataship-pingandreorganization.ApartfromtheMapandReducecontracts,thePACTprogrammingmodelalsofeaturesaddi-tionalcontractstosupporttheefÞcientimplementationofbinaryoperators.Moreover,PACTscanbeassembledtoformarbitrarilycomplexDAGs,notjustÞxedpipelinesofjobsasinMapReduce.TheÞrst-order(user-deÞned)functionsinPACTprogramscanbewritteninJavabytheuser,andtheirsemanticsarehiddenfromthesystem.Thisismoreexpressivethanwrit-ingprogramsintheSopremoprogrammingmodel,asthelanguageisnotrestrictedtoaspeciÞcsetofoperators.How-ever,PACTprogramsstillexhibitacertainlevelofdeclar-ativityastheydonotdeÞnehowthespeciÞcguaranteesoftheusedsecond-orderfunctionswillbeenforcedatruntime.Inparticular,PACTprogramsdonotcontaininformationondatarepartitioning,datashipping,orgrouping.Infact,forseveralPACTinputcontracts,thereexistdifferentstrategiestofulÞlltheprovidedguaranteeswithdifferentimplicationsontherequiredeffortfordatareorganization.Choosingthecheapestofthosedatareorganizationstrategiesistherespon-sibilityofaspecialcost-basedoptimizer,containedinthePACTlayer.Similartoclassicdatabaseoptimizers,itcom-putesalternativeexecutionplansandeventuallychoosesthemostpreferableone.Tothisend,theoptimizercanrelyonvariousinformationsources,suchassamplesoftheinputdata,codeannotations(possiblygeneratedbytheSopremolayer),informationfromtheclusterÕsresourcemanager,orruntimestatisticsfrompreviousjobexecutions.DetailsabouttheoptimizerÕscostmodelaswellastheactualoptimizationprocessarediscussedinSects.TheintroductionofadistinctlayerthatacceptsarbitrarilycomplexDAGsofsecond-orderfunctionsthatwraparbitraryusercode,andtheabilitytooptimizesuchprogramstowarddifferentphysicalexecutionstrategiesisonecentralaspectthatdifferentiatestheStratospherestackfromothersystems(e.g.,Asterix[]).ThePACTlayerseparatestheparalleliza-tionaspectsfromsemanticaspectsandprovidesaconvenientintermediateprogrammingmodelthatsitsbetweenoperatorswithknownsemanticsandarbitraryparalleltasks. StratosphereplatformforbigdataanalyticsTheoutputofthePACTcompilerisaparalleldataßowprogramforNephele,StratosphereÕsparallelexecutionengine,andthethirdlayeroftheStratospherestack.Sim-ilartoPACTprograms,Nepheledataßowprograms,alsoJobGraphs,arealsospeciÞedasDAGswiththever-ticesrepresentingtheindividualtasksandtheedgesmod-elingthedataßowsbetweenthose.However,incontrasttoPACTprograms,NepheleJobGraphscontainaconcreteexe-cutionstrategy,chosenspeciÞcallyforthegivendatasourcesandclusterenvironment.Inparticular,thisexecutionstrategyincludesasuggesteddegreeofparallelismforeachvertexoftheJobGraph,concreteinstructionsondatapartitioningaswellashintsontheco-locationofverticesatruntime.IncomparisonwithMeteorandthePACTprogrammingmodel,aNepheleJobGraphexhibitsthehighestlevelofexpressiveness,butattheexpenseofprogrammingsimplic-ity.Stratosphereusers,whochoosetodirectlywritetheirdataanalyticsprogramsasNepheleJobGraphs,arenolongerboundtoasetofsecond-orderfunctionsbutcanfreelyimple-mentthebehaviorofeachvertex.WhencompilingtheJobGraphfromaPACTprogram,thePACTlayerexploitsthisßexibilityandinjectsadditionalcodefordatapreparationtogetherwiththeuserÕsÞrst-orderfunctionintoaNephelevertex.ThisPACTdatapreparationcodeistheninchargeofreorganizingtheincomingdata(i.e.sorting,grouping,join-ingthedata)suchthatitobeysthepropertiesexpectedbytheuserÕsencapsulatedÞrst-orderfunction.NepheleitselfexecutesthereceivedJobGraphonasetofworkernodes.Itisresponsibleforallocatingtherequiredhardwareresourcestorunthejobfromaresourcemanager,schedulingthejobÕsindividualtasksamongthem,monitor-ingtheirexecution,managingthedataßowsbetweenthetasks,andrecoveringtasksintheeventofexecutionfailures.Moreover,NepheleprovidesasetofmemoryandI/Oser-vicesthatcanbeaccessedbytheusertaskssubmitted.Atthemoment,theseservicesareprimarilyusedbythePACTdatapreparationcodementionedabove.Duringtheexecutionofajob,Nephelecancollectvariousstatisticsontheruntimecharacteristicsofeachofitstasks,rangingfromCPUandmemoryconsumptiontoinformationondatadistribution.ThecollecteddataarecentrallystoredinsideNepheleÕsmasternodeandcanbeaccessed,forexam-ple,bythePACTcompiler,toreÞnethephysicaloptimizationofsubsequentexecutionsofthesametask.FurtherdetailsonNephele,especiallyonitsscheduling,communication,andfault-tolerancestrategies,aredescribedinSect.Inordertoincreasethepracticalimpactofoursystem,wetakespecialcaretomakeStratosphereintegratewellwithexisting,populartechnologies.Inparticular,StratosphereprovidessupportforthepopularHadoopdistributedÞlesys-temandthecloudstorageserviceAmazonS3,aswellasforEucalyptus.Weplantosupportmulti-tenancybyintegrat-ingStratospherewithresourcemanagementsystems,suchasApacheYARN.Moreover,Stratospherecandirectlyallocatehardwareresourcesfrominfrastructure-as-a-serviceclouds,suchasAmazonEC2.3StratospherebyMeteorexamplesStratospherehasbeendesignedtocoverawidevarietyofusecases,includingtheanalysisofstructureddata(e.g.,spread-sheets,relationaldata),semi-structureddata(e.g.,HTMLwebsites),andunstructured,textualdata.Inthissection,wepresentMeteor,oneofStratosphereÕstop-mostprogramminginterfacesbywalkingthroughtwoexampleprograms:aTPC-Hdatawarehousingqueryandanapplicationthatincludesoperatorsoftwodomain-speciÞcpackagesforinformationextractionanddatacleansingMeteororganizesdomain-speciÞcoperatorsinpackages,andtreatstheformerasÞrst-classcitizens,allowinguserstofreelycombineexistingoperatorsandextendthelanguageandruntimefunctionalitywithnewoperators.Amainadvan-tageofthisapproachisthattheoperatorÕssemanticscanbeaccessedatcompiletimeandcanbepotentiallyusedforoptimization.Toprocessavarietyofdifferentdatatypes,Meteorbuildsuponasemi-structureddatamodelthatextendsJSON[].ThelanguagesyntaxisinspiredbyJaql[however,wesimpliÞedmanylanguagefeaturesinordertoprovidemechanismsforaseamlessintegrationofnewoper-atorsandtosupport-aryinputandoutputoperators.3.1StructureddataanalysisWeintroduceMeteorÕslanguagefeaturesusingamodiÞedversionofTPC-HQuery15.TheMeteorscriptthatimple-mentsthequeryisshowninListing1.ThescriptstartswithreadingthetablefromaÞle(line1).Itsubse-quentlyselectsathree-monthtimeinterval(lines3Ð5)andcomputesthetotalrevenueforeachsupplierbygroupingon(lines7Ð12).Finally,thescriptjoinsthegroupedrecordswiththetableontheattribute(lines14Ð16),assemblestheoutputformat(lines17Ð23),andwritestheresulttoaÞle(line25).Meteorstatementscanassigntheresultofanoperatorinvocationtoavariable,whicheitherreferstoamaterializeddatasetortoalogicalintermediatedataset,i.e.,theresultofanoperator.Variablesstartwithadollarsign($)toeasilydistinguishdatasetsfromoperatordeÞnitions.Forexample,thevariableinline1referstoalogicaldataset,theresultofthe operator.Eachoperatorinvocationstartswiththeuniquenameoftheoperator(underscoredinalllistings)andistypicallyfol-lowedbyalistofinputsandasetofoperatorproperties(dis-playedinitalics),whichareconÞguredwithalistofname/-expressionpairs.Considerthe operatorinlines3Ð4:theoperatorhasinputandisconÞguredwithahere A.Alexandrovetal.=read ’lineitem.json’=filter ’1996-01-01’’1996-04-01’=group suppkey,10total_revenue:sum($li[*].extPrice*11(1-$-$discount))12};1314$s=read ’supplier.json’=join ==$ ’result.json’Listing1TPC-HQuery15variantasMeteorscriptscript::=(statement‘;’)*statement::=variable’=’operatoroperator::=name+inputs?properties?’;’inputs::=(variable’in’)?variable(’,’inputs)?variable::=’$’nameproperties::=propertyproperties?property::=name+expression::=literal|array|objectListing2ExcerptofMeteorÕsEBNFgrammarproperty,whichspeciÞestheselectioncondition.Propertyexpressionsareoftenonlyliteralsbutmaybeascomplexasthepropertyexpressionofthe operator(e.g.,lines15Ð23),whichspeciÞestheschemaoftheresultingdataset.summarizesthegeneralMeteorsyntaxinextendedBackus-NaurForm.TherelationalpackageofMeteoroffersawidevarietyofdatatransformationandmatchingoperatorsontheJSONdatamodel,suchasÞlter,transform(whichallowsforarbi-traryÞeldmodiÞcations),pivot(nestingandunnesting),split(arrayde-normalization),group,join,union,setintersection,anddifference.WereferthereadertoTableandrefer-ence[]fordetails.3.2Querieswithdomain-speciÞcoperatorsOperatorsfromdifferentMeteorpackagescanbejointlyusedtobuildcomplexanalyticalqueries.Supposearesearchinsti-=read ’news.json’=annotate usealgorithm’morphAdorner’=annotate entities usealgorithm’regex’’person’’entity11into{12name:$person,13articles:$articles14};1516$persons=read ’person.json’removeduplicates))�0.95retainlongest=join .*, ’result.json’Listing3Meteorquerycombininginformationextraction,datacleansing,andrelationaloperatorstutewantstoÞndoutwhoofitspast,andpresentafÞliatedresearchershaveappearedinrecentnewsarticlesforaPRcampaign.GivenasetofemployeerecordsfromthepastÞveyearsandacorpusofnewsarticles,wewouldliketoÞndnewsarticlesthatmentionatleastoneformerorpresentemployee.Listing3displaysthecorrespondingMeteorscript.AfterimportingthenecessarySopremopackages(lines1Ð2),thenewsarticlescorpusisreadfromaÞle,andinformationextraction(IE)operatorsareapplied(lines4Ð14)toanno-tatesentenceboundariesandthenamesofpeoplemen-tionedinthearticles.Subsequently,thedatasetisrestruc-turedwiththeoperatortogroupnewsarticlesbypersonnames(lines9Ð14).Inlines16Ð20,theemployeerecordsarereadandduplicaterecordsareremovedusingremoveduplicatesoperator,conÞguredwithsimilar-itymeasure,threshold,andaconßictresolutionfunction.Thedatasetsarethenjoinedonthepersonname(lines21Ð27).clausespeciÞestheoutputformat,whichcontainsallemployeeattributesandtheURLsofnewsarticlesmen-tioningacertainperson. StratosphereplatformforbigdataanalyticsTable1OverviewofavailableSopremooperators OperatorMeteorkeywordDescription FilterstheinputbyonlyretainingthoseelementswherethegivenpredicateevaluatestotrueTransformseachelementoftheinputaccordingtoagivenexpressionFlattensornestsincomingrecordsaccordingtoagivenoutputschemaJoinstwoormoreinputsetsintooneresult-setaccordingtoajoincondition.Aself-joincanberealizedbyspecifyingthesamedatasourceasbothinputs.Providesalgorithmsforanti-,equi-,natural-,left-outer-,right-outer-,full-outer-,semi-,andtheta-joinsGroupstheelementsofoneormoreinputsonagroupingkeyintooneoutput,suchthattheresultcontainsoneitempergroup.Aggregatefunctions,suchascount()orsum(),canbeappliedSet/bagunionCalculatestheunionoftwoormoreinputstreamsundersetorbagsemanticsunionallSetdifferenceCalculatestheset-baseddifferenceoftwoormoreinputstreamsSetintersectionCalculatestheset-basedintersectionoftwoormoreinputstreamsPivotRestructuresthedataaroundapivotelement,suchthateachuniquepivotvalueresultsinexactlyonerecordretainingalltheinformationoftheoriginalrecordsReplace(All)ReplacesatomicvalueswithwithadeÞnedreplacementexpressionreplaceallSortstheinputstreamgloballySplitsanarray,anobject,oravalueintomultipletuplesandprovidesameanstoemitmorethanoneoutputrecordTurnsabagofvaluesintoasetofvaluesSentenceannotation/splittingannotatesentencesAnnotatessentenceboundariesinthegiveninputtextandoptionallysplitsthetextintoseparaterecordsholdingonesentenceeachsplitsentencesTokenannotationannotatetokensAnnotatestokenboundariesinthegiveninputsentencewise.RequiressentenceboundaryannotationPart-of-speechannotationannotateposAnnotatespart-of-speechtagsinthegiveninputsentencewise.RequiressentenceandtokenboundaryannotationsParsetreeannotationannotatestructureAnnotatesthesyntacticstructureoftheinputsentencewise.RequiressentenceboundaryannotationsStopwordannotation/removalannotatestopwordsAnnotatesstopwordsoccurringinthegiveninputandoptionallyreplacesstopwordoccurrenceswithauser-deÞnedstringremovestopwordsNgramannotation/splittingannotatengramsAnnotatestokenorcharacterngramswithuser-deÞnedlengthninthegiveninput.Optionally,theinputcanbesplitintongramssplitngramsEntityannotation/extractionannotateentitiesAnnotatesentitiesinthegiveninputandoptionallyextractsrecognizedentityoccurrences.Supportsgeneral-purposeentities(e.g.,persons,organizations,places,dates),biomedicalentities(e.g.,genes,drugs,species,diseases),anduser-deÞnedregularexpressionsanddictionaries.RequiressentenceandtokenboundaryannotationsextractentitiesRelationannotation/extractionannotaterelationsAnnotatesrelationssentencewiseinthegiveninputandoptionallyextractsrecognizedrelationshipsusingco-occurrence-orpattern-basedalgorithms.Requiressentence,part-of-speechandentityannotationsextractrelationsMergerecordsMergesexistingannotationsofrecordswhichsharethesameIDDatascrubbingEnforcesdeclarativelyspeciÞedrulesfor(nested)attributesandÞltersinvalidrecordsEntitymappingmapentitiesUsesasetofschemamappingstorestructuremultipledatasourcesintomultiplesinks.UsuallyusedtoadjustthedatamodelofanewdatasourcetoaglobaldataschemaDuplicatedetectiondetectduplicatesEfÞcientlyÞndsfuzzyduplicateswithinadatasetRecordlinkagelinkrecordsEfÞcientlyÞndsfuzzyduplicatesacrossmultiple(clean)datasourcesDatafusionFusesduplicaterepresentationstooneconsistententrywithdeclarativelyspeciÞedrulesDuplicateremovalremoveduplicatesPerformsduplicatedetection,subsequentfusion,andretainsnonduplicates TopInformationExtraction,DataCleansing A.Alexandrovetal.4ExtensibilityinStratosphere’soperatormodelThepreviousexamplesinSect.haveshownhowMeteorcanbeusedtoperformstructureddataanalysis,extractinfor-mationfromtext,andcleansedata.Thisßexibilitystemsfromtheunderlying,semanticallyrichSopremooperatormodel.Alloperators(includingrelationalones)areorga-nizedinpackagesanddynamicallyloadedduringthepars-ingprocessofaMeteorscript.MeteorcanbeseenasatextualinterfaceforSopremo,andMeteorscriptsaretrans-latedone-to-oneintoSopremoplans.BesidesthetextualMeteorinterface,queryplansofSopremooperatorscouldalsobecomposedwithgraphicalinterfacesorotherqueryadepictstheSopremoplan(which,ingeneralcanbeadirectedacyclicgraph)generatedfromtheMeteorscriptofListing3bytheMeteorparser.EachoperatorinvocationintheMeteorscriptcorrespondstoaSopremooperatorintheplan.Meteorvariablesaretranslatedtoedgesintheplan,sig-nifyingthedataßowbetweenSopremooperators.SopremooperatorsareconÞguredwiththecorrespondingpropertiesintheMeteorscript(weomitthosefromtheÞgure).Intheremainderofthissection,webrießydiscussthenotionsofextensibilityandoperatorcompositioninSopremo.Amorein-depthdiscussionoftheconceptscanbefoundinrefer-ence[Toseamlesslyintegratedomain-speciÞcSopremopack-ages,thesemustsatisfysomeconstraints.Eachpackageanditsoperatorsmustbeself-containedinthreeways.First,oper-atorsareself-containedinthattheSopremoprogrammerpro-videsaparallelimplementationofnewoperatorsinadditiontotheirsemantics.AnoperatorcanbedeÞnedeitherasacom-positionofotheroperatorsorasanelementaryoperatorwithacorrespondingPACTprogram(thedirectlylowerprogram-minglayerofStratosphere)implementation.AsSopremodoesnotallowrecursivecompositions,alloperatorscanbereducedtoa(possiblylarge)setofinterconnectedelementaryoperators,whicharebackedbyPACTprograms.Second,operatorsexposetheirpropertiesthroughareßec-tiveAPI.Theproperties,suchastheconditionofajoin,aretransparentlymanagedandvalidatedbytheoperatoritself.Operatorsmayusetheirpropertiestochooseanappro-priateimplementation.Thus,noadditionalknowledgeout-sideofthepackagesisrequiredtoproperlyconÞguretheThird,thepackagedevelopermayoptionallyproviderel-evantmetadatatoaidtheoptimizerinplantransformationandcostestimation.Fig.2SopremoplanandPACTprogramcorrespondingtotheMeteorqueryinListing3.SopremoplanPACTprogram (a)(b) Stratosphereplatformforbigdataanalytics Fig.3TheÒRemoveduplicatesÓoperatordeÞnedasacompositeoper-Tofacilitateextensibility,weintroducedtheconceptofoperatorcompositioninSopremo.Followingthegoodprac-ticesofmodularizationandinformationhidinginsoftwareengineering,developerscandeÞnecomplexoperatorsusingsimplerones.ThisenablescodereuseandallowscomplexoperatorstoimmediatelybeneÞtfrommoreefÞcientre-implementationsofsimpleroperators.Compositioncanalsoimproveoptimization.Transformationrulesthatcannotbeappliedtoacompositeoperatormightbevalidforitsbuild-ingblockoperators.showstheimplementationoftheduplicateremovaloperatorasacompositeoperator.Here,thedupli-catedetectionisperformedbyanotherSopremooperator.Toremovethefoundduplicaterecords,weneedtofusethedupli-cates(righthandside)andmergetheresultwithallnondu-plicaterecords(lefthandside).Theexampledemonstratesnestedcompositions.Althoughduplicatedetectionmaybena•velyimplementedasathetajoin,mosttimesitisacom-plexcompositionthatimplementsamulti-passsortedneigh-borhoodorotheradvancedalgorithms.ToillustratethecurrentanalyticalcapabilitiesoftheSopremolibrariesthatareshippedwiththesystem,Tablelistsselectedoperatorsandtheirfunctionality.SopremoplansarecompiledtoPACTprogramsbyapro-gramassembler.FigurebshowsatranslatedPACTprogramfortheSopremoplanofFig.a.ThePACTprogrammingmodelisdiscussedindetailinthenextsection.TheSopremotoPACTassemblertranslateseachSopremooperatorintooneormorePACToperators.BeforethePACTprogramisassembled,compositeSopremooperatorsarerecursivelydecomposedintotheirindividualoperatorsuntilonlyele-mentaryoperatorsremain.Theseelementaryoperatorscanbedirectlytranslatedintothesecond-orderfunctionsthatPACTprovides,suchasMapandReduce.Furthermore,theassemblerinfersfromallattributeÞeldsthatarereferencedinaMeteorscriptacompactdatarepresentationschemetoquicklyaccesstheseimportantÞeldsinthetree-structuredSopremovalues.ThePACTprogramisassembledbyinstan-tiatingthePACTimplementationsofallSopremooperatorsandconnectingtheirinputsandoutputs.ThepropertiesoftheSopremooperatorsareembeddedintothePACTprogrambyaddingthisinformationtotheconÞgurationoftherespectivePACToperators.5ModelforparallelprogrammingStratosphereprovidesanexplicitprogrammingmodel,calledthePACTprogrammingmodel,thatabstractsparalleliza-tion,hidingthecomplexityofwritingparallelcode.ThissectiondiscussesthedatamodelofthePACTmodel(Sect.),theindividualoperatorsandthecompositionofone-pass(acyclic)PACTprogramsfromoperators(Sect.andÞnallythecompositionofiterative(cyclic)PACTpro-grams(Sect.5.1DatamodelPACToperatorsoperateonaßatrecorddatamodel.Adataset,anintermediateresultproducedbyonePACTopera-torandconsumedbyanother,isanunorderedcollectionofrecords.Arecordisanorderedtupleofvalues,eachhavingawell-deÞneddatatype.Thesemanticsandinterpretationofthevaluesinarecord,includingtheirtypes,areopaquetotheparallelruntimeoperators;theyaremanipulatedsolelybytheUDFsthatprocessthem.Certainfunctionsrequiretoformgroupsofrecordsbyattributeequalityorbyothertypesofassociations.Forsuchoperations,asubsetoftherecordÕsÞeldsisdeÞnedasakeyThedeÞnitionofthekeymustincludethetypesofthevaluesintheseÞeldstoallowtheruntimeoperatorstoaccesstherel-evantÞelds(forsortingandpartitioning)fromtheotherwiseschema-freerecords.NestedSopremoJSONobjectsareconvertedtorecordsduringthecompilationofSopremoplanstoPACTprograms.JSONnodesthatactaskeysaretranslatedtoindividualrecordÞelds.5.2PACToperatorsandacyclicPACTprogramsAPACTisasecond-orderfunctionthattakesasargumentadatasetandaÞrst-orderuser-deÞnedfunction(UDF).APACToperatororsimplyoperatorconsistsofaPACTsecond-orderfunctionandaconcreteinstantiationoftheUDF.PACTsspecifyhowtheinputdataarepartitionedintoindependentsubsetscalledparallelizationunits(PUs).The A.Alexandrovetal. Fig.4APACToperatorusingaReducePACTactualsemanticsofdatamanipulationisencapsulatedintheuser-deÞnedfunctions(UDFs).ThePACTprogrammingmodelisdeclarativeenoughtoabstractawayparallelism,butdoesnotdirectlymodelsemanticinformationastheSopremolayer;thisisencapsulatedwithintheUDFlogicandlargelyhiddenfromthesystem.Whilethismayseemlimiting,itenablesthespeciÞcationofawidervarietyofdataanalysisprograms(e.g.,reducefunctionsthatarenotsimpleaggre-gates[showsthestructureofaPACToperatorthatusestheReducefunctionasitsPACT.Theinputdatasetislog-icallygroupedusingthekeyattribute(ÒcolorÓintheÞg-ure,whichcorrespondstotheÞrstattribute).Eachgroupofrecordswithacertainkeyvalueformsaparallelizationunit.TheUDFisappliedtoeachPUindependently.Byspeci-fyingtheoperatorusingtheReducePACT,theprogrammermakesaÒpactÓwiththesystem;allrecordsthatbelongtothesamePUwillbeprocessedtogetherbyasingleinvocationoftheUDF.ThelogicaldivisionofadatasetintoPUscanbesatisÞedbyseveralphysicaldata-partitioningschemes.Forexample,inFig.,PUscanbephysicallypartitionedintotwonodesasindicatedbythethickhorizontaldottedline:residesinnode1,andresidetogetherinnode2.ThelogicaloutputofthePACToperatoristheconcatena-tionoftheoutputsofallUDFinvocations.Intheexample,theUDFcollapsesrecordsintoasinglerecordpergroup(e.g.,computinganaggregate)andreturnsthekeyvaluetogetherwiththecomputedaggregate.Currently,Þvesecond-orderfunctions(showninFig.areimplementedinthesystem.Inaddition,wehavedevel-opedtwohigher-orderfunctionsusedforiterativeprocessing(wediscussthoselater).ThecreatesaPUfromeveryrecordintheinput.ThefunctionformsaPUwithallrecordsthathavethesamevalueforauser-deÞnedkeyattribute.MatchCross,andCoGroupPACTsoperateontwoinputdatasets.TheparallelizationunitsoftheMatchfunctionareallpairsofrecordsthathavethesamekeyattributevalue. WefollowthedeÞnitionsfromtheoriginalMapReducepaper[]butexcludeexecution-speciÞcassumptions(suchasthepresenceofsortedreduceinputs).CoGroupStub{cogroup(IteratorPactRecord&#x-54.;倀persons,IteratorPactRecord&#x-540;duplicates,CollectorPactRecord&#x-52.;瀀output){(!duplicates.hasNext())//Noduplicatesout.collect(persons.next());PactRecordcleanPerson=merge(persons.next(),duplicates);out.collect(cleanPerson);SID=0;CoGroupContract=CoGroupContract.build(DupElim.PactLong.,SID,SID);Listing4AnexampleofUDFcodeforaCoGroupoperatorMatchthereforeperformsaninnerequi-joinandappliestheUDFtoeachresultingrecordpair.TheCrossfunctiondic-tatesthateveryrecordoftheÞrstinputtogetherwitheveryrecordofthesecondinputformsaPU,performingaCarte-sianproduct.CoGroupgeneralizesReducetotwodimen-sions;eachPUcontainstherecordsofbothinputdatasetswithagivenkey.Thesourceofrecords(leftorrightinput)isavailabletotheUDFprogrammer.ComparedwithMatchrecord-at-a-timejoin),CoGroupisajoin.Assuch,CoGroupsubsumesMatchwithrespecttoexpressive-ness,butithasstricterconditionsonhowthePUsareformedandhencefewerdegreesoffreedomforparallelization.ForaformaldeÞnitionoftheÞvePACTs,wereferthereadertoreference[showsapossiblePACTimplementationcodeoftheÒDuplicateRemovalÓoperatorfromSect.(seealsoTable).TheJavaclassinheritedfrom()indi-catesthetypeofPACT(CoGroup).Usercodeisencapsulatedinthemethod.Theinputs(personsandpossibleduplicates)aregroupedontheÞrstÞeld,.ThisisspeciÞedinthecodethatinstantiatestheoperator(lines17Ð19).TheUDFiscalledforeachpersontogetherwithits(zeroormore)duplicates.Ifduplicatesarefound,theyaremergedtoformacleanedversionofthepersonrecord.Wedeferthediscussionofprogramsthatmakerepeatedpassesovertheinputuntilthenextsection.Fornow,aPACTprogramisadirectedacyclicgraphwithPACToperators,datasources,anddatasinksasnodes.Operatorswithmulti-plesuccessorsforwardthesamedatatoeachsuccessorandthusbehavesimilaras,forexample,commonsubexpressionsinSQL.FigurebshowsthecompositeMeteor/SopremoexampleofSect.transformedtoaPACTprogram.Theprogramhastwodatasources,PersonsandNews.Forexam- Stratosphereplatformforbigdataanalytics (a)(b)(c)(d)(e) Fig.5TheÞvesecond-orderfunctions(PACTs)currentlyimplementedinStratosphere.TheparallelizationunitsimpliedbythePACTsareencloseddottedboxesple,duplicateremovalisimplementedasCoGroupoverthepersonÕsinputandtheoutputoftheprecedingCross.TheUDFisinvokedonalistcontainingexactlyonepersonandalistofpossibleduplicates.Iftheduplicatelistisnotempty,itismergedintooneÒcleanÓpersonrecordwiththepersonlist.Theprogramhasonedatasink,Results,whichwriteseveryrecorddeliveredfromtheprecedingMatchoperatorintotheunderlyingÞlesystem.PACTprogramscanbegeneratedbytheSopremocompilerorspeciÞedbyhand.5.3IterativePACTprogramsManydataanalysistaskscannotbeimplementedasalgo-rithmsthatmakeasinglepassoverthedata.Rather,theyareofiterativenature,repeatingacertaincomputationtoreÞnetheirsolutionuntilaconvergencecriterionisreached.ThePACTprogrammingmodelsupportstheexpressionofsuchprogramsthroughhigher-orderÞxpointoperators[Toachievegoodperformancewithoutexposingexplicitmutablestate,PACTofferstwodifferentdeclarativeÞxpointoperators:oneforandoneforIncrementalIterationsBotharedeÞnedbymeansofastepfunctionthatisevaluatedrepeatedlyoveradatasetcalledthepartial-orintermediatesolution(seeFig.).ThestepfunctionisanacyclicPACTprogram.Oneparallelapplicationofthestepfunctiontoallpartitionsofthepartialsolutioniscalledasuperstep[BulkIterationsexecutethePACTprogramthatservesasthestepfunctionineachsuperstep,consumingtheentirepartialsolution(theresultoftheprevioussupersteportheinitialdataset)andrecomputethenextversionofthepartialsolution,whichwillbeconsumedatthenextiteration.Theiterationstopswhenauser-deÞnedterminationcriterionisIncrementalIterations,theuserisaskedtosplittherepresentationofthepartialsolutionintotwodatasets:asolutionsetinFig.b)andainFig.b).Ateachsuperstep,anincrementaliterationconsumesonlytheworkingsetandselectivelymodiÞeselementsofthesolutionset,henceincrementallyevolvingthepartialsolutionratherthanfullyrecomputingit.SpeciÞcally,using,thestepfunctioncomputesthenextworksetandadeltasetinFig.b),whichcontainstheitemstobeupdatedinthesolutionset.Thenewworkingsetholdsthedatathatdrivethenextsuperstep,whilethesolutionsetholdstheactualstateofthepartialsolution.Elementsofthesolutionset(termedÒcoldÓ)thatarenotcontainedinneednotbeupdated.TofacilitatetheefÞcientmergebetweenthecurrentsolutionsetandthedeltaset,eachelementinthesolutionsetmustbeuniquelyaddressablebyakey.Whenapplicable,incrementaliterationstypicallyleadtomoreefÞcientalgorithms,becausenoteveryelementintheintermediatesolutionneedstobeexaminedineachsuperstep.sparsecomputationaldependenciespresentinmanyproblems,anddatasetsallowasupersteptofocusontheÒhotÓpartsoftheintermediatesolutionandleavetheÒcoldÓpartsuntouched.Frequently,themajorityoftheintermediatesolutioncoolsdowncomparativelyfast,andthelatersuper-stepsoperateonlyonasmallsubsetofthedata.Notethattheintermediatesolutionisimplicitlyforwardedtothenextsuperstep,notrequiringthealgorithmtorecreateit.ProgrammersimplementiterativealgorithmsbydeÞningthestepfunctionasaregularacyclicPACTprogramthatusesthepartialsolutionasadatasourceandnextpartialsolutionasasink.ThisstepfunctionisthenembeddedintoaÞxpointoperatorthattakesaninitialpartialsolutionandinvokesthestepfunctionrepeatedlyonthenextversionoftheintermediatesolutionuntilacertainterminationconditionisreached.Whilebulkiterationsrequireanexplicitterminationcondition(eitheraconvergencecriterionoraÞxednumberofsupersteps),incrementaliterationsterminatewhentheyproduceanemptyworkingset.showsastepfunctionforthebulkandanincre-mentalversionofagraphalgorithm.ThisgenericexamplealgorithmassociatesanIDwitheachvertexandpropagatestheIDfromeachvertextoitsneighbors,whereeachneigh-boradoptstheIDifitissmallerthanitsowncurrentID.ThisalgorithmeventuallydistributesIDsaccordingtoconnectedcomponents,butisinaslightlymodiÞedversionapplicabletomanyothergraphproblems,suchasshortestpathsandmaximumßow.Inthebulkversion,thestepfunctionjoinsthevertexstatewiththeedgestocreatecandidatesforeachvertexÕsneighbors(Match)andthenselectstheminimum A.Alexandrovetal.Fig.6AnalgorithmthatÞndstheconnectedcomponentsofagraphasabulkiterationandanincrementalStratosphereBulkiterationIncrementaliteration (a)(b) IDfromthecandidatesforeachvertex(Reduce).Theincre-mentalversionholdsthecandidateIDsintheworksetandthevertexstateasthesolutionset.Inadditiontotheafore-mentionedoperations,itjoinstheminimalcandidatewiththesolutionsetandcheckswhethertheselectedIDisactuallynew.Onlyinthatcase,itreturnsanewvalueforthevertex,whichgoesintothedeltasetandintotheMatchthatcreatestheworksetforthenextsuperstep.Byselectivelyreturningornotreturningvaluesfromthejoinbetweentheworksetandsolutionset,thealgorithmrealizesthedynamiccom-putationthatexcludesunchangedpartsofthemodelfromparticipatinginthenextsuperstep.Wereferthereadertoreference[]foracompletetreat-mentofiterationsinStratosphere.Atthetimeofthiswrit-ing,theiterationÕsfeatureisinanexperimentalstageandhasnotbeenintegratedwithSopremoandMeteor.IterativeprogramsarecompiledtoregularDAG-shapedNepheleJobGraphsthatsendupstreammessagestocoordinatesuperstepexecution.6OptimizationinStratosphereThissectiondiscussesStratosphereÕsoptimizer.Theopti-mizercompilesPACTprogramsintoNepheleJobGraphs.SopremoplansaretranslatedintoPACTprogramspriortooptimizationasdiscussedinSect..TheoverallarchitectureoftheoptimizerispresentedinSect..SectionsdiscussthereorderingofPACToperatorsandthegenerationofphysicalplans.6.1OptimizeroverviewTheStratosphereoptimizerbuildsontechnologyfrompar-alleldatabasesystems,suchaslogicalplanequivalence,costmodels,andinterestingpropertyreasoning.However,therearealsoaspectsthatclearlydistinguishitfrompriorwork.ManyofthedistinguishingfeaturesofStratosphereÕsopti-mizercomparedwithconventionalqueryoptimizersorigi-natefromdifferencesinprogramspeciÞcation.Mostrela-tionaldatabasesystemsprovideadeclarativeSQLinterface.QueriesspeciÞedinSQLaretranslatedintoexpressiontreesofrelationalalgebra.Theseexpressiontreesarerewrittenusingtransformationrules,whicharebasedoncommuta-tivityandassociativitypropertiesofrelationaloperatorsandÞnallycompiledintophysicalexecutionplans.Incontrasttoarelationalquery,PACTprogramsaredirectedacyclicgraphs(DAGs)ofPACToperators.SinceDAGsaremoregeneralthantrees,traditionalplanenumera-tiontechniquesneedtobeadapted.Operatorsdifferaswell;whilerelationaloperatorshavefullyspeciÞedsemantics,PACToperatorsareparallelizablesecond-orderfunctions Stratosphereplatformforbigdataanalytics Fig.7ThedifferentprogramtransformationphasesoftheStratosphereoptimizerthatencapsulateuser-deÞnedÞrst-orderfunctions.Duetothepresenceofarbitraryusercode,thesemanticsofaPACToperatorarenot,ingeneral,knownbytheoptimizer.There-fore,planrewritingrulesasknownfromrelationaloptimizersdonotapplyinthecontextofPACTprograms.Inaddition,thelackofsemanticshindersthecomputationofreliablesizeestimatesforintermediateresults,whichareimportantforcost-basedoptimization.Finally,relationaloptimizerscanleveragetheirknowledgeofdataschema.Incontrast,PACTÕsdatamodelisbasedonrecordsofarbitrarytypesinordertosupportawidevarietyofusecases.Dataisonlyinterpretedbyusercodeandhenceopaquetotheoptimizer.showsthearchitecturestagesoftheStratosphereoptimizer,andFig.showsthedifferentrepresentationsofaprogramasitpassesthroughthedifferentoptimizationstages.TheoptimizercompilesPACTprogramsintoNepheleJobGraphs.DataprocessingtasksspeciÞedasSopremoplansaretranslatedintoPACTprogramspriortooptimizationandcompilation.ThisprocesswasdescribedinSect..Theoptimizeritselfconsistsoffourphases.Similartomanyrela-tionaloptimizers,theoptimizationprocessisseparatedintoalogicalrewritingandaphysicaloptimizationphase.Thesep-arationbetweenlogicalandphysicaloptimizationisaresultofthebottom-uphistoricalevolutionoftheStratospheresys-tem(thePACTlayerandphysicaloptimizationpredatethe (a)(b)(c) Fig.8Plantransformationthroughthedifferentphasesoftheopti-mizer.AsubmittedPACTprogramismodiÞedbylogicaloptimizationandproducesanequivalentprogramafteroperatorreordering.ReadandwritesetsareshownasrecordswithgreenredÞelds.Then,aphys-icalplanwithannotatedlocalexecutionanddatashippingstrategiesisobtained,andÞnallyaNepheleJobGraphisemittedbytheoptimizer.OriginalPACTprogramModiÞedPACTprogramPhysicalplanNepheleJobGraph(colorÞgureonline) A.Alexandrovetal.logicaloptimizationandtheSopremolayer);wearecur-rentlydesigninganoptimizerthatuniÞestheoptimizationofSopremoandPACToperatorsandchoosestheorderofoperatorsandtheirphysicalexecutionstrategiesinasinglepass.SeeSect.fordetails.Priortooptimization,theoptimizertransformsaPACTprogramintoaninternalrepresentation.ThisrepresentationisaDAGconsistingofoperatornodesthatrepresentdatasources,datasinks,PACToperators,andinternaloperations,suchasÒCombineÓ(ifapplicableforaReduce)orÒTempÓ(materialization)operators.Internaloperatorsdonotchangethesemanticsofadataßow;however,theycanimproveitsexecution,andtheyaresometimesrequiredtopreventdead-locks(seeSect.fordetails).Inthenextphase,theoptimizergeneratessemanticallyequivalentplansbyreorderingoperators.RewritingrulesasknownfromrelationaloptimizersdonotdirectlyapplyinourcontextastheoptimizerisnotawareofthesemanticsoftheUDFoperators.Sectionpresentstheoperatorreorderingtechniquesoftheoptimizer,whicharebasedonthedetec-tionofattributeaccessconßicts,staticcodeanalysis,andswitchingofconsecutiveoperators.Physicaloptimizationcomesafteroperatorreordering.Thesecond-orderfunctionofanoperatordeÞnesitslogi-calparallelization.Foragivensecond-orderfunction,therecanbemultiplephysicaldatashippingstrategies(suchashash-orrangepartitioningorbroadcasting),thatprovidetheparallelizationrequirements,aswellasseverallocalphysicalexecutionstrategies(suchassort-orhash-basedtechniques).Similartodatabasesystems,interestingproperties[]canbeleveraged.Theoptimizergeneratesaphysicalexecutionplanbottom-upbychoosingexecutionstrategiesandcon-sideringinterestingproperties.TheoptimizationprocessisexplainedindetailinSect..Finally,theresultingexecu-tionplanistranslatedintoaNepheleJobGraphandsubmittedforexecution.6.2OperatorreorderingIntheÞrstoptimizationphase,theStratosphereoptimizerreordersoperatorssimilarlytothelogicaloptimizationphaseinrelationaloptimizers[].However,asmentionedin,traditionaltransformationrulescannotbedirectlyappliedduetotheunknownsemanticsoftheUDFsinsideofthePACToperators.Instead,wedeÞnedandprovedtwosuf-ÞcientconditionstoreordertwosuccessivePACToperatorswithoutchangingtheprogramsemantics.Theseconditionsarebasedonthedetectionofconßictingattributeaccessesandthepreservationofgroupcardinalities.WeusethenotionofreadandwriteÞeldsetstoholdtheinformationofallÞeldsthatanUDFreadsandwrites.Thereby,awriteaccesstoarecordmayaddorremoveanattributeormodifythevalueofanexistingattribute.TheÞrstreorderingconditioncomparesthereadandwritesetsoftwosuccessivePACToperatorsandchecksforover-lappingaccesspatterns.Inordertoevaluatetotrue,onlythereadsetsoftheoperatorsmayintersect.Otherwise,theoperatorshaveconßictingread-writeorwrite-writeaccesses.Thisreasoningissimilartoconßictdetectioninoptimisticconcurrencycontrolmethods[]andcompilertechniquestooptimizeloops.Thesecondconditiononlyappliesforgroup-basedPACToperators,suchasReduceandCoGroup.Sincethesemanticsofagroupingoperatormightdependonthesizeofitsinputgroup,thisconditionensuresthatinputgroupsarepreservediftheoperatorsarereordered.Weshowedthatourconditionsareapplicableforallcombinationsofthesetofcurrentlysupportedsecond-orderfunctions(seerefer-ence[]).Figuresa,bdepicttwosemanticallyequivalentPACTprogramsfortheTPC-Hexamplequerydiscussedin.Assumingthattheleftplanwasgivenasinputtotheoptimizer,ourreorderingtechniquesallowtoswitchtheReduceandMatchoperators,whichyieldstheplanontherighthandside.TheMatchandReduceoperatorsperformthejoinandgroupedaggregation,respectively.Thistransforma-tionispossiblebecausebothoperatorshavenonconßictingreadandwritesets,whichareindicatedintheÞguresbygreenandred-coloredrecordÞeldsaboveandbeloweachoperator.MatchalsofulÞllsthegroup-preservationcondi-tion.Sinceitisaprimary-keyforeign-keyjoinonthegroup-ingattribute,Matchdoesnotchangethecardinalityoftheindividualreducegroups.Therefore,thistransformationisaninvariantgrouptransformationasknownfromrelationalqueryoptimization[Inordertoevaluatetheconditions,theoptimizerrequiresreadandwritesetsandboundsontheoutputcardinality)oftheoperatorUDFs.Weemploystaticcodeanaly-sis(SCA)techniquestoautomaticallyderivethisinformationnformation41].OurapproachleveragesourknowledgeofandcontrolovertheAPIthattheuserusestoaccessrecordÞelds.Hence,wecansafelyidentifyallrecordaccessesofanUDFbytrack-ingthecorrespondingAPIcalls,e.g.,readÞeld1fromrecordtowritevalueÕsÞeld2.Theextractionalgorithmusescon-trolßow,Def-Use,andUse-DefdatastructuresobtainedfromanSCAframeworktotracetheeffectsofrecordaccessesthroughtheUDF.Ourapproachguaranteescor-rectnessthroughconservatism.Fieldaccessescanalwaysbeaddedtothecorrespondingreadorwritesetswithoutlossofcorrectness(butwithlossofoptimizationpotential).Super-setsoftheactualreadandwritesetsmightonlyimplyaddi-tionalaccessconßicts;therefore,theoptimizermightmissvalidoptimizationchoicesbutwillneverproducesemanti-callyincorrecttransformations.Basedontheconditionstoidentifysemantic-preservingoperatorreorderings,wedesignedanovelalgorithmtoenu-merateallvalidtransformationsoftheinputPACTpro- Stratosphereplatformforbigdataanalyticsgram[].Incontrasttorelationaloptimizers,whereplansarebuiltbysubsequentlyaddingoperators,ouralgorithmisbasedonrecursivetop-downdescentandbinaryswitchesofsuccessiveoperators.Weenumerateoperatorordersonlyonprogramswherethedataßowresemblesatree.FordataßowsthatareDAGsbutnottrees,i.e.,wheresomeoperatorshavemultiplesuccessors,theproblembecomessimilartothatofrelationaloptimizationwithcommonsubexpressions.Asasimplesolution,wesplitthedataßowaftereachsuchopera-tor,therebydecomposingittoasetoftrees.ThesetreesarethenindividuallyoptimizedandafterwardrecomposedtoaDAG.Alimitationofthismethodisthatoperatorscanneverbemovedacrossoperatorswithmultiplesuccessors.Giventhereorderingconditionsandtheplanenumerationalgorithm,theoptimizercanemulatemanytransformationsthatareknownfromtherelationaldomainsuchasselec-tionandprojectionpushdown,join-orderenumeration,andinvariantgroupingtransformations.However,alsononrela-tionaloperatorsareincludedintotheoptimizationprocess.AfulladaptionofthereorderingalgorithmtoDAGsispartofourfuturework.6.3PhysicaloptimizationAfterthelogicalrewritingcomponenthaspickedanequiva-lentPACTprogram,thelatterisfurtheroptimizedtoproduceaphysicalexecutionplanwithconcreteimplementationsofdatashippingandlocaloperatorexecutionstrategiesusingestimatedexecutioncosts[StratosphereÕsruntimesupportsseveralexecutionstrate-giesknownfromparalleldatabasesystems.Amongthesestrategiesarerepartitionandbroadcastdatatransferstrate-giesandlocalexecutionstrategies,suchassort-basedgroup-ingandmultiplejoinalgorithms.Inadditiontoexecutionstrategies,theoptimizerusestheconceptofinterestingprop-erties[].GivenaPACToperator,theoptimizerkeepstrackofallphysicaldataproperties,suchassorting,grouping,andpartitioning,thatcanimprovetheoperatorÕsexecution[cshowsapossiblephysicalexecutionplanforthePACTprogramofFig.b.Here,theMatchoperatorben-eÞtsfromthedataoftheleftinputbeingpartitionedonduetotheprecedingReduceoperator.ThispropertyisleveragedbylocallyforwardingthedatabetweenReduceandMatchandhenceavoidingdatatransferoverthenetwork.However,incontrasttotherelationalset-ting,itisnotobviouswhetheranoperatorÕsUDFpreservesordestroysphysicaldataproperties,i.e.,aphysicalpropertycanbedestroyedbyanUDFthatmodiÞesthecorrespond-ingrecordÞeld.TheoptimizerusesthenotionofconstantrecordÞelds,whichcanbederivedfromanUDFÕswriteset(seeSect.)toreasonaboutinterestingpropertypreser-vation.Inthisplan,attributeisheldinrecordÞeld5.SincetheReduceoperatorinourexampledoesnotmodifyÞeld5asindicatedinFig.b,itpreservesthepartitioningonthatÞeld.Theoptimizerusesacost-basedapproachtochoosethebestplanfrommultiplesemanticallyequivalentplanalter-natives.ThecostmodelisbasedonestimatednetworkI/OanddiskI/Oasthesearethefactorsthatdominatemostjobsinlargeclusters(wearecurrentlyusingCPUcostforsomeoptimizationdecisioninaprototypicalstage).Therefore,thesizeofintermediateresultsmustbeestimated.Whilethisisachallengingtaskinrelationaldatabasesystems,itisevenmoredifÞcultinthecontextofthePACTprogrammingmodelduetoitsfocusonUDFs.Inthecurrentstate,theoptimizerfollowsapragmaticapproachandreliesonthespeciÞcationofhints,suchasUDFselectivity.Thesehintscanbeeithersetmanuallybyauser,derivedfromupwardslayers,orinthefutureberetrievedfromaplannedmetadatacollectioncomponent(seeSect.Thealgorithmtoenumeratephysicalplansisbasedonadepth-Þrstgraphtraversalstartingatthesinknodesoftheprogram.Whiledescendingtowardthesources,theoptimizertracksinterestingproperties.ThesepropertiesoriginatefromthespeciÞedkeysofthePACToperatorsandaretracedaslongastheyarepreservedbytheUDFs.Whentheenumer-ationalgorithmreachesadatasource,itstartsgeneratingphysicalplanalternativesonitswaybacktowardthesinks.Foreachsubßow,itremembersthecheapestplanandallplanalternativesthatprovideinterestingproperties.Finally,thebestplanisfound,afterthealgorithmreachedthedataTocorrectlyenumerateplansforarbitraryDAGdataßows,weanalyzetheprogramDAGstoidentifywherethedataßowÒbranchesÓ(i.e.,operatorswithmultipleoutgoingedges)andwhichbinaryoperatorsÒjoinÓthesebranchesbacktogether.Forthesejoiningoperators,thesubplansrootedatthebranch-ingoperatoraretreatedlikecommonsubexpressionsandtheplancandidatesforthatoperatorÕsinputsmusthavethesamesubplanforthecommonsubexpression.Furthermore,dataßowsmaydeadlockifsome,butnotall,pathsbetweenabranchingoperatorandtherespectivejoiningoperatorsarefullypipelined.WhenweÞndsuchasituation,weplacearti-Þcialpipelinebreakersonthepipelinedpaths.Thisisdoneaspartofthecandidateplanenumerationandincludedinthecostassessmentofsubplans.ThecompilationofiterativePACTprogramsfeaturessomespeciÞcoptimizationtechniques[].Priortophys-icalplanenumeration,theoptimizerclassiÞestheedgesofthedataßowprogramaspartofthedynamicorconstantdatapath.InFig.,constantdatapathsareindicatedbythickdottedarrows(theyconsistofthescanoperatorofthegraphstructureinbothplans).Thedynamicdatapathcomprisesalledgesthattransfervaryingdataineachiteration.Edgesthattransferthesamedataineachiterationbelongtotheconstantdatapath.Duringplanenumeration,thecostsofalloperators A.Alexandrovetal.onthedynamicdatapathareweightedwithanuser-speciÞednumberofiterations(ifthisisunknown,wehavefoundthatamagicnumberequaltoafewiterationsissufÞcientinmostcases).Therefore,theoptimizerfavorsplansthatperformmuchworkwithintheconstantdatapath.Subsequenttoplanenumeration,theoptimizerplacesaÒCacheÓoperatorattheintersectionofconstantandthedynamicdatapath.Whenexecutingtheprogram,thedataoftheconstantpatharereadfromthisCacheoperatorafterithasbeencreatedduringtheÞrstiteration.Notethatthecachedresultmaybestoredinavarietyofformsdependingontherequirementsofthedynamicdatapath,e.g.,insortedorderorinahashtabledatastructure.Furtheroptimizationsapplyforincrementaliterativepro-gramswherethesolutionsetisupdatedwiththedeltasetaftereachiteration.TheelementsofthesolutionsetareidentiÞedbyakey,andthesetitselfisstoredpartitionedandindexedbythatkey.Anupdateofthesolutionsetisrealizedbyanequi-orouter-joinwiththedeltasetonthekey.Forthisjoin,thepartitionedindexofthesolutionsetcanbeexploited.Furthermore,theoptimizercandecidethatthedeltasetisnotmaterialized,andinstead,thesolutionsetisimmediatelyupdated.However,thisoptiononlyappliesifitisguaranteedthatelementsfromthesolutionsetareaccessedonlyonceperiterationandthatonlylocalindexpartitionsareupdated.7ParalleldataowexecutionAfteraprogramhasbeensubmittedtoStratosphere(eitherintheformofaMeteorquery,aSopremoplan,aPACTpro-gram,oraNepheleJobGraph),andafterithaspassedallthenecessarycompilationandtransformationsteps,itissub-mittedforexecutiontoStratosphereÕsdistributedexecutionengine.TheNepheleexecutionengineimplementsaclassicmas-ter/workerpattern(Fig.).Themaster(calledJobMan-ager)coordinatestheexecutionwhiletheworkers(calledTaskManagers)executethetasksandexchangeintermedi-ateresultsamongthemselves.TheJobManagerpushesworktotheTaskManagersandreceivesanumberofcontrolmes-sagesfromthem,suchastaskstatuschanges,executionpro-Þlingdata,andheartbeatsforfailuredetection.Toreducelatency,messagesfortasksarebundledandpushedeagerly,ratherthaninperiodicintervals.TheexecutionofaprogramstartswiththeNepheleJobManagerreceivingtheprogramÕsJobGraph.TheJobGraphisacompactdescriptionoftheexecutableparalleldataßow.Eachvertexrepresentsaunitofsequentialcode,whichisoneormorepipelineddataßowoperatorsand/orUDFs.The NephelewasacloudnymphinancientGreekmythology.ThenamecomesfromGreekÒ,ÓmeaningÒcloud.ÓThenametipsahattoDryad[](atreenymph)thatinßuencedNepheleÕsdesign. Fig.9NepheleÕsprocessmodelandthetransformationofaJobGraphintoanExecutionGraphchannelsrepresentthepassingofrecordsbetweentheoper-atorsanddescribethepatternbywhichtheparallelinstancesofapairofverticesconnecttoeachother.Exampleofthesepatternsis,whichisusedtore-partitiondata,or,forsimpleforwardpassinginalocalpipeline.Inordertotrackthestatusoftheparallelvertexandchannelinstancesindividually,theJobManagerspanstheJobGraphtotheExecutionGraph,asshowninFig..TheExecutionGraphcontainsanodeforeachparallelinstanceofavertex,whichwerefertoasatask.7.1Tasks,channels,andschedulingTasksgothoughalifecycleofscheduling,and.Initially,alltasksareintheschedulingphase.Whenataskbecomesactive(deploying),theJobManagerselectsasuitablecomputinginstance(effectivelyashareofresources)todeploythetaskupon.Thisinstancemayberequestedfromacloudservice,oraresourcemanagerofalocalcluster.Havingfoundasuitableinstance,theJobManagerpushestothatinstanceadeploymentdescriptionofthetaskincludingoptionallyrequiredlibraries,ifthosearenotyetcachedonthetargetinstance.Toreducethenumberofdeploymentmessages,ataskisdeployedtogetherwithalltasksthatitcommunicateswiththroughlocalpipelines.Whendeployed,eachtaskspawnsathreadforitscode,con-sumingitsinputstreamsandproducingoutputstreams.TheJobManagerdeploysinitiallyallinputtaskswiththeirlocalpipelines.Allothertasksaredeployed;whenataskÞrsttriestosenddatatoanothertaskvianetwork,itwill StratosphereplatformforbigdataanalyticsrequestthetargetaddressofthattaskfromtheJobManager.Ifthattargettaskisstillintheschedulingphase,theJobMan-agerwillcommenceitsdeployment.ThechannelsthroughwhichtasksexchangedataaretypicallypipelinedthroughmainmemoryorTCPstreams,toreducelatencyandI/Oload.TheymaybematerializedasÞles,ifapipelinebreakerisexplicitlyrequired.Besidestransferringdatabufferfromthesourcetothetargettask,channelsmayalsotransfercus-events.Aneventisametadatamessagethatcanbesentbothinthesamedirectionasthedataßows(inorderwiththerecords)orintheoppositedirection.Internally,Stratosphereusesthoseeventsamongotherthingsforchannelstartupandteardown,andtosignalsupersteptransitionsforitera-tivealgorithms[Inpractice,thesourceandtargettasksofseveraldifferentnetworkchannelsmaybeco-locatedonthesameTaskMan-ager.Thisisduetointra-nodeparallelismordifferenttasksco-partitioningtheirdata.Insteadofna•velycreatingoneTCPconnectionforeachnetworkchannel,Stratospheremul-tiplexesnetworkchannelsthroughTCPconnections,suchthateveryTaskManagerhasatmostonephysicalTCPcon-nectiontoeachotherTaskManager.Becausenetworkbandwidthisoftenascarceresource;reducingthenumberoftransferredbytesisfrequentlyasuitablemeansofincreasingperformance.Compressingthebufferswithageneral-purposecompressionalgorithm(zip,lzo,etc)beforeshippingthemtradesextraCPUcyclesfornet-workbandwidthsavings.Differentcompressionalgorithmshavedifferenttradeoffs,wheremoreCPUintensivecompres-sionalgorithmstypicallyyieldhighercompressionratesandthussavemorenetworkbandwidth.Thebestcompressionalgorithmforacertainprogramdependsonthetypeofdatashipped(e.g.,textormedia)andotherprogramcharacter-istics,forexample,onhowmanyspareCPUcycles,ithasbeforebeingCPUbound.Nephelesupportsvariouscompres-sionalgorithmsplusamechanismthatdynamicallyadjuststhecompressionalgorithmtoÞndthealgorithmbestsuitableforacertainchannel[7.2FaulttoleranceFault-tolerancetechniquesallowsystemstorecoverthepro-gramexecutioninthepresenceoffailures.Detailsaboutthefault-tolerancemechanismsusedbyStratospherearedescribedinapreviouspublication[];thissectiongivesabriefoverviewofthesetechniques.StratosphereÕsfaulttoleranceispredicatedonlog-basedrollbackrecovery;Thesystemmaterializesintermediatetaskresultsand,inthepresenceofafailure,resetstheaffectedtasksandreplaystheirinputdatafromthematerializationpoints.Iftheinputtoataskhasnotbeenmaterialized,thepre-decessortasksarere-executedaswell,trackingbackthroughthedataßowgraphtothelatestmaterializedresult(possiblytheoriginalinput).Thepropagationofrestartsissimilartorollbackpropagationinuncoordinatedcheckpointenviron-ments[]andhasbeenadopted,inoneformoranother,bymanydataßowenginesthatexecuteanalyticalqueries:Hadoop[](wherethedataßowisthesimpleÞxedMap-Shufße-Reducepipeline),Dryad[],orSpark[Theaforementionedsystemsallimplementtheblockingoperatormodel,i.e.,eachoperatorproducesitscompleteresultbeforeanydownstreamoperatormaystartconsum-ingtheresult.Whilethismodeloftenincreasestheexecu-tionlatency,itsimpliÞesthefaulttolerancemechanism,asitensuresthatataskconsumesonlyintermediateresultsthatarecompletelyavailable.Itpreventssituationswhereadown-streamtaskconsumedaportionofitspredecessorÕsoutput,buttheremainderbecameunavailableduetoafailure.Incon-trast,StratospheresupportspipelinedtasksandmaterializescheckpointsÒtothesideÓwithoutbreakingthepipeline.Dur-ingtaskexecution,thesystemcopiesthebuffersthatcontainthetaskÕsresultdataintoamemorycache.Ifthecacheisfull,thebuffersaregraduallymovedtodisk.Oncethelastresultofataskwasproduced,acheckpointisfullymateri-alizedlikeintheblockingmodel.However,inStratosphere,subsequenttasksdonothavetowaituntilthecheckpointwaswrittentodisktoprocessdata.Incaseofataskfailure,anydownstreamtasksthathavealreadyconsumedpartsofthistaskÕsresultdataarerestartedaswell.Ifthefailedtaskisknowntoproducedeterministicresults,wekeepthedown-streamtasksrunningandtheydeduplicateincomingbuffersusingsequencenumbers(similartopackagededuplicationinTCP).Ifwritingacheckpointtodiskfailsbutallresultdatahavebeenforwarded,thesystemdiscardsthecheckpointandcontinuesprocessing.Incaseofataskfailure,thesystemhastorecoverfromanearliercheckpointortheoriginalinputSinceStratosphereÕsruntimeisgenerallypipelined,thesystemcandecidewhichtaskresultstomaterializeandwhichtostream.Sometaskresultsmaybeverylargeandforcethecheckpointtodisk.Inmanycases,thesearenotworthmate-rializing,becausereadingthemfromdiskisnotsigniÞcantlyfasterthenrecomputingthemfromasmallerpreviouscheck-point.Thecurrentprototypeofourfaulttoleranceimplemen-tationemploysaheuristic,wecallephemeralcheckpointing,todecideatruntimewhethertocreateacheckpointornot;WestartmaterializingataskÕsoutputbykeepingtheresultbuffers.Whenrunninglowonmemoryresources,wediscardmaterializationpointswherecertainconditionsaremet;(1)ThetaskÕsproduceddatavolumeislarge,asdeterminedbytheratioofconsumedinputversusproducedoutputbuffersuptothatpoint,and(2)theaverageprocessingtimeperbufferisbelowacertainthreshold,indicatingafasttaskthatcanefÞcientlyrecomputeitsoutput(asopposedtoaCPU-intensiveUDF,forexample).Apreselectionofinterestingcheckpointingpositionscanbedoneatoptimizationtime A.Alexandrovetal.basedonestimatedresultsizes.WepresentevaluationresultsofourephemeralcheckpointtechniqueinSect.7.3RuntimeoperatorsNexttoUDFdrivers,StratosphereÕsruntimeprovidesopera-torsforexternalsorting,hybridhashjoin,mergejoin,(block)nestedloopsjoin,grouping,co-grouping,aswellasship-pingstrategiesforhashpartitioning,balancedrangeparti-tioning,andbroadcasting.Inprinciple,theirimplementationfollowsdescriptionsinthedatabaseliterature(e.g.,[WemodiÞedthealgorithmsslightlytobesuitableforalan-guagewithoutexplicitmemorycontrol,asexplainedinthenextparagraphs.StratosphereisimplementedinJava.Foraframeworkthatisdesignedtoexecutetoalargeextentuser-deÞnedfunc-tions,Javaisagoodmatch,asitoffersaneasierprogram-mingabstractionthan,forexample,CorC++.Atthesametime,Javastillallowsagoodlevelofcontrolabouttheexe-cutionprocessandoffersgoodperformance,ifusedwell.TheimplementationfollowstherequirementstoimplementoperatorsfordataintensiveprocessinginJava.Onecentralaspectisthehandlingofmemory,becauseJava,initscore,doesnotgiveaprogrammerexplicitcon-trolovermemoryviapointers.Instead,dataitemsaretypi-callyrepresentedasobjects,towhichreferencesarepassedaround.Atypical64bitJVMimplementationaddstoeachobjectaheaderof2pointers(ofwhichoneiscompressed,12bytestotal)andpadstheobjecttohaveasize,whichisamultipleof8bytes[].Considertheexampleofatuplecontaining4Þelds(integersorßoatingpointnumberswith4byteseach),havinganetmemoryrequirementof16bytes.Agenericobject-orientedrepresentationofthattuplewouldconsequentlyrequireupto64bytesforthe4objectsrepre-sentingtheÞelds,plus32bytesforanarrayobjectholdingthepointerstothoseobjects.Thememoryoverheadishencemorethan80%.Asinglecustomcode-generatedobjectfortherecordstillconsumes32bytesÑanoverheadof50%.Anadditionaldownsideoftheclassicobject-orientedapproachistheoverheadoftheautomaticobjectdealloca-tionthroughthegarbagecollector.JavaÕsgarbagecollectionworkswellforamassivecreationanddestructionofobjects,ifitsmemorypoolsarelargewithrespecttothenumberofobjectsthatarealiveatacertainpointintime.Thatway,fewergarbagecollectionsclearlargeamountsofobjectsinbulk.However,allmemorydedicatedtotheefÞciencyofthegarbagecollectorisnotavailabletothesystemforsorting,hashtables,cachingofintermediateresults,orotherformsofbuffering. WhenreferringtoJava,wereferalsotootherlanguagesbuiltontopofJavaandtheJVM,forexample,GroovyToovercometheseproblems,wedesignedtheruntimetoworkonserializeddatainlargebytearrays,ratherthanonobjects.Theworkingmemoryfortheruntimeoperatorsisacollectionofbytearraysresemblingmemorypages(typi-callyofsize32KiBytes).Eachrecordisasequenceofbytes,potentiallyspanningmultiplememorypages.Recordsarereferencedviabyteoffsets,whichareusedinternallyinasimilarwayasmemorypointers.Wheneverrecordsneedtobemoved,forexample,fromasortbufferintothebuffersofapartitioner,themoveoperationcorrespondstoasimplebytecopyoperation.WhencertainÞeldsoftherecordneedtobeaccessed,suchasinaUDF,theÞeldsarelazilydese-rializedintoobjects.Theruntimecachesandreusesthoseobjectsasfaraspossibletoreducepressureonthegarbagecollector.Towardtheruntimeoperators,therecordsaredescribedthroughserializers(recordlayout,length,copying)andcom-parators(comparisons,hashing).Forsortingandhashingoperators,everycomparisonstillincursaninvocationofavir-tual(noninline-able)functiononthecomparator,potentiallyinterpretingarecordheaderinthecaseofvariablelengthdatatypes.Toreducethatoverhead,wegeneratenormalizedkeysandcachehashcodes,asdescribedbyGraefeetal.[Thistechniqueallowstheoperatorstoworktoalargeextentwithbyte-wisecomparisonsagnostictospeciÞcrecordlay-Anadditionaladvantageofworkingwithserializeddataandapagedmemorylayoutisthatformanyalgorithms,thepagescontainingtherecordscanbedirectlywrittentodiskincaseofmemorypressure,yieldingimplementationsthatdestageefÞcientlytosecondarystorage.TheresultoftheseimplementationtechniquesisamemoryefÞcientandrobustbehavioroftheruntimeoperators,whichisessentialfordataintensiveapplications.Onecanna•velymaptheDAGofoperatorsfromtheexe-cutionplan(cf.Sect.)toaJobGraphbymakingeachoper-atoritsownvertex.However,recallthatthisway,eachoper-atorrunsitsownthreadandtheverticescommunicatewitheachotherusingthestreammodel.IfmatchingthenumberofthreadsperinstancetothenumberofCPUcores,thiseas-ilyleadstounder-utilizationoftheCPU,iftheoperatorsÕworkisnotbalanced.Incaseofhavingmultipleoperatorspercore,itincursunnecessarycontextswitchesandsynchro-nizationatthread-to-threadhandoverpoints.Forthatreason,weputmultipleoperatorsintoonevertex,iftheyformalocalpipeline.AnexamplewouldbedatasourcewithaMapUDF,asortoperator,andapreaggregation.Weuseacombi-nationofpull-chaining.Pull-chainingcorrespondstonestingiterators,andistypicallyreferredtoastheÒVol-canoExecutionModelÓ[].However,certainUDFspro-ducemultiplerecordsperinvocation,suchasunnestingoper-ations.Tokeeptheprogrammingabstractionsimple,wedonotforcetheprogrammertowritesuchUDFsintheformof Stratosphereplatformforbigdataanalyticsaniterator,asthatusuallyresultsinmorecomplexcode.Insuchcases,wechainthesuccessortasksusinganabstractioncollectors,whichimplementanfunction.ThisfunctionisthesymmetriccounterparttotheiteratorsÕpullfunction(typicallycallednext()8ExperimentalevaluationWeexperimentallyevaluatethecurrentversionofStrato-sphereagainstotheropen-sourcesystemsforlarge-scaledataprocessing.Tothatpurpose,weconductedaseriesofexperimentscomparingStratosphereagainstversion1.0.4ofthevanillaMapReduceenginethatshipswithApacheApache5],version0.10.0ofApacheHive[]Ñadeclara-tivelanguageandrelationalalgebraruntimerunningontopofHadoopÕsMapReduce,aswellasversion0.2ofApacheGiraph[]Ñanopen-sourceimplementationofPregelÕsvertex-centricgraphcomputationmodel[]thatusesaHadoopmap-onlyjobfordistributedscheduling.Thissec-tionpresentstheobtainedexperimentalresultsandhighlightskeyaspectsoftheobservedsystembehavior.8.1ExperimentalsetupWeranourexperimentsonaclusterof26machines,usingadedicatedmasterand25slavesconnectedthroughaCisco2960Sswitch.EachslavenodewasequippedwithtwoAMDOpteron6128CPUs(atotalof16coresrunningat2.0GHz),32GBofRAM,andanIntel82576gigabitEthernetadapter.AllevaluatedsystemsruninaJavaVirtualMachine(JVM),makingtheirruntimesandmemoryconsumptioneasytocompare.Weused29GBofoperatingmemoryperslave,leaving3GBfortheoperatingsystem,distributedÞlesystemcaches,andotherJVMmemorypools,suchasnativebuffersfornetworkI/O.Consequently,foreachsystemundertest,theclusterhadatotalamountof400hardwarecontextsandanaggregateJavaheapof725GB.Forallsystems,jobinputandoutputwerestoredinacommonHDFSinstanceusingplainASCIIformat.EachdatanodewasconÞguredtousefourSATAdrivesfordatastorage,resultinginapproximately500MB/sreadandwritespeedperdatanode,andtotaldiskcapacityof80TBfortheentireHDFS.Eachtestwasruninisolation,sincebothStratosphereandthenewerHadoopversions(basedonYARN)sharenoresourcesbetweenqueriesandrealizemulti-tenancythroughexclusiveresourcecontainersallocatedtoeachjob. Somelanguagecompilerscantransformfunctionsthatreturnasequenceofvaluesautomaticallyintoaniterator.Java,however,offersnosuchmechanism.Inallreportedexperiments,werangethenumberofslavesfrom5to25.ForbothMapReduceandStratosphere,theconÞgureddegreeofparallelismwas8paralleltasksperslave,yieldingatotalbetween40and200tasks,andfullCPUutilizationfortwooverlappingdata-parallelphases.Toreducetheeffectofsystemnoiseandoutliers(mostlythroughlagsintheHDFSresponsetime),wereportthemedianexecutiontimeofthreejobexecu-8.2TeraSortTomeasuretheefÞciencyofHadoopÕsandStratosphereÕsexecutionenginesinanisolatedway,weperformedasimpleexperimentcomparingtheTeraSortjobthatshipsaspartoftheexampleprogramspackagewiththetwosystems.BothTeraSortimplementationsareexpressedtriviallyusingapairofanidentitymapandreduceUDFsandacustomrange-partitioningfunctionfortheshufßephase.Forourexperi-ment,wegeneratedTeraGeninputdatawithscalingfactorrangingfrom10to50forthecorresponding5to25(aof1correspondsto10bytes).InordertofactorouttheimpactofÞlesystemaccessandisolatethesortingoperator,wealsoexecutedavariantoftheTeraSortjobthatgeneratesinputdataontheßyanddoesnotwriteouttheresult.Theobservedruntimesforbothvariants(Fig.indicatethatthedistributedsortoperatorsofStratosphereandHadoophavesimilarperformanceandscalelinearlywiththeparameter.8.3WordcountInoursecondexperiment,wecomparedStratosphereandHadoopusingasimpleÒwordcountÓjobthatcountsthewordfrequenciesandisoftenusedasastandardexampleforMapReduceprograms.Astandardoptimizationofthena•veWordCountimplementationthatweimplementedforbothsystemsistoexploitthealgebraicnatureoftheappliedaggre-gatefunctionanduseacombinerUDFinordertoreducethevolumeoftheshufßeddata.Asaninput,weusedsynthet-icallygeneratedtextdatawithwordssampledwithskewedfrequencyfromadictionarywith100000entries.Thedictio-naryentriesandtheiroccurrencefrequencieswereobtainedbyanalyzingtheGutenbergEnglishlanguagecorpus[Asbefore,aof1correspondsto1GBofplaintextTheresultsarepresentedinFig.b.Asinthepreviousexperiment,bothsystemsexhibitlinearscale-outbehavior.Thisisnotsurprising,giventhatTeraSortandWordCounthaveessentiallythesamesecond-ordertaskstructurecon-sistingofamap,areduce,andanintermediatedatashuf-ße.IncontrasttoTeraSort,however,theWordCountjobisapproximately20%fasterinStratospherethaninHadoop. A.Alexandrovetal. (b) Fig.10Scale-outexperimentswithrangingfrom40to200.TeraSortWordcountTPCHQ3TriangleenumerationPACTandNepheleDAGsforTriangleEnumerationConnectedComponentsThereasonsforthataretwofold.First,StratosphereÕspush-basedshufßeenablesbetterpipelinedparallelismbetweenthemapandtheshufßeoperators.ThecombinerfunctionisappliedÒbatch-wiseÓtosortedpartsofthemapperÕsout-put,whichthenareeagerlypusheddownstream.Incontrast,Hadoopalwayswritesandsortsthemapperoutputtodisk,applyingthe(optional)combinerUDFmultipletimesduringthisprocess,andservingtheconstructedsorteddataparti-tionsuponrequest.Thisapproachproduceslessdatawhenthecardinalityofthegroupingkeyislow,butcomesatthepriceofÞxedcostsforatwo-phasesortonthesenderside.ThesecondreasonforStratosphereÕssuperiorperfor-manceistheuseofasortoptimizationthatreducestheamountoftype-speciÞccomparisonsatthecostofrepli-catingakeypreÞxinabinaryformatthatcanbecom-paredinabit-wisefashion[].Thisoptimizationisespe-ciallyusefulforkeyswithcomplexdatatypes,suchasWealsonotethatasubstantialamountoftheprocessingtimeintheWordCountmapphasegoesintotokenizingtheinputtextintoseparatewords.AnoptimizedstringtokenizerimplementationthatworksdirectlywithStratosphereÕsstringtypesandasimpliÞedcodepagetranslation(indicatedwithadottedlineonFig.b)yields50%performancebeneÞtcomparedwiththeversionthatusesanativeJDKStringTo-kenizer.ThesameoptimizationcanbedonefortheHadoopimplementationandislikelytoresultinsimilarperformanceimprovement.8.4RelationalqueryToevaluatetheimpactofStratosphereÕscost-basedoptimiza-tionandpowerfuldataßowruntimefortheexecutionofmorecomplextasks,weranamodiÞedversionofQuery#3fromtheTPC-Hbenchmark.Weomittedtheorderbyclauses,becausethecurrentStratosphereversionhasnoimplementationofoperators.WecompareStratosphereagainstHiveÑadatawarehousingsolutionthatcompilesandexecutesSQL-likequeriesassequencesofHadoopMapRe-ducejobs.SELECTl_orderkey,SUM(l_extendedprice*(1-l_discount)),o_orderdate,o_shippriorityFROMcustomercJOINordersoON(c_custkey=o_custkey)JOINlineitemlON(l_orderkey=o_orderkey)WHEREc_mktsegment=’HOUSEHOLD’ANDo_orderdate’1995-03-15’ANDl_shipdate&#x-524;&#x.800;1995-03-15’GROUPBYl_orderkey, StratosphereplatformforbigdataanalyticsTheresultsonFig.cillustratethebeneÞtsofStratosphereÕsmoregeneralapproachforspeciÞcationandoptimizationofcomplexdataprocessingprograms.HiveÕsoptimizerisboundtotheÞxedMapReduceexecutionpipelineandhastosplitcomplexHiveQLexpressionsintomultipleMapReducejobs,whichintroducesunneces-saryI/Ooverheadduetothefactthatbetweeneachmapandreducephaseallthedatahastobespilledtodisk.Stratosphereontheotherhandcanoptimizeandexecutearbitrarycomplexjobsasawhole,usingtheDAG-orientedprogramrepresentationdescribedinSect..Fortheeval-uatedquery,theoptimizermakesfulluseofthis,select-inghash-basedexecutionstrategiesforthetwojoinoper-ators(whicharerealizedusingaMatchcontract)suchthatthelargerinputisalwayspipelined.AsthebuildsidesofthetwoMatchoperatorsandtheinputofthereducerhan-dlingtheÒrevenueÓaggregatecomputationbothÞtintomem-ory,nodataarewrittentodiskuntiltheÞnaloutputis8.5TriangleenumerationAnotherexamplethatillustratesthebeneÞtsofPACTcom-positionalityisthealgorithmthatenumeratesgraphtriangles,describedbyCohenin[]asaprerequisitefordeepergraphanalysislikeidentiÞcationofdensesubgraphs.Thealgorithmisformulatedin[]asasequenceoftwoMapReducejobs,wheretheÞrstreducerisresponsibleforbuildingalltriads(pairsofconnectededges),whilethesecondreducersimu-latesajoinbetweentriadsandedgestoÞlteroutalltriadsthatcannotbeclosedbyamatchingedge.IncontrasttothecumbersomeMapReducerealizationpre-sentedabove,StratosphereÕsPACTmodeloffersanativewaytoexpresstriangleenumerationasansingle,simpledataßowusingamap,areduce,andamatch.Moreover,aswiththeTPC-Hquery,StratosphereÕsoptimizercomparesthecostofalternativeexecutionstrategiesandpicksaplanthatreducesthatcostviapipeliningofthedata-heavyÒtriadsÓpath,asindicatedinFig.e.Suchanoptimizationcanhavesubstan-tialimpactontheexecutiontime,astheoutputoftheÒbuildtriadsÓoperatorisasymptoticallyquadraticinthenumberofdshowstheresultsofanexperimentenumerat-ingalltrianglesofasymmetricversionofthePokecsocialnetworkgraph[].TheresultshighlightthebeneÞtofthepipelinedexecutionstrategycomparedwithana•veHadoopimplementationthatusestwoMapReducejobsandmate-rializestheintermediateresultsaftereachstep.Forsmalldegreesofparallelism,thedifferenceisclosetoanorderofmagnitude.Duetotheskewedvertexdegreedistribu-tion,increasingthedoesnothaveanimpactforthepipelinedStratosphereversion,whereasHadoopbeneÞtsfromthereducedreadandwritetimesforresultmaterial-ization.However,thiseffectwearsofffor160andcanbedampenedfurtherthroughtheuseofabetterschedul-ingpolicythattakesintoaccountthedataskew.8.6ConnectedcomponentsInournextexperiment,weusedGiraphÑanopen-sourceimplementationofPregel[]ÑasareferencepointfortheevaluationofStratosphereÕsiterativedataßowconstructs.Forourexperiments,werantheconnectedcomponentsalgorithmproposedbyKangetal.in[]onthecrawledTwittergraphusedbyChaetal.in[],usingfrom80to200.fdisplaystheobservedexecutiontimes.Forlownodecounts,GiraphcurrentlyofferstwicetheperformancecomparedwithStratosphere.Weattributethismostlytoabet-tertunedimplementation,sincebothapproachesessentiallyarebasedonthesameexecutionmodelÑthebulksynchro-nousiterationspresentedinSect..However,whilethebeneÞtsofscalingoutareclearinStratosphere,increasingforGiraphhashardlyanyperformancegains.Fur-theranalysisoftheexecutiontimeofeachsuperstepinthetwosystemsrevealedthatalthoughthetimerequiredtoexe-cutetheÞrstsuperstepsdropsforhighervalues,thelattersuperstepsactuallytakemoretimetoÞnish(Fig.b).ForGiraph,thiseffectisstronger,andthetotalsumofallsuperstepscancelsouttheimprovementoftheearlystages.WeattributethisbehaviortoinefÞcienciesintheimplemen-tationofGiraphÕsworkernodecommunicationmodel.Wealsonotethatthelinearscale-outbehaviorinbothsystemsisdampenedbytheskewinthevertexdistributionacrossthegraphnodes,andfurtherscale-outwillultimatelybepro-hibitedasthedata-partitioningapproachusedforparallelexecutionwillnotbeabletohandletheskewintheinputdataandtheassociatedcomputations.8.7FaulttoleranceInthissection,wegivepreliminaryresultsontheoverheadandefÞciencyoftheephemeralcheckpointingandrecoverymechanismasbrießydescribedinSect..Thenumbersarebasedontheexperimentsofapreviouspublication[WeusedtheÒTriangleEnumerationÓprogram(includingthepreprocessingstepsoriginallysuggestedbyCohen[andavariantoftheÒRelationalQueryÓ(withonlyonejoinbetweentheÒLineitemÓandÒOrdersÓtables).Foreachpro-gram,wemeasuredthefailure-freeruntimebothwithandwithoutcheckpointing,aswellasthetotalruntimeincludingrecoveryafterafailure.Forbothprograms,thefailureoccursinthejoinoperatorafterroughly50%ofthefailure-freerun-time.Fortherelationalquery,thefault-tolerancemechanismcheckpointstheresultsbeforethejoin(theÞlteredandpro-jectedtables),andforthetriangleenumeration,itsavesboththedatabeforethecandidate-creatingreducer(whichinßates A.Alexandrovetal. (a)(b) Fig.11ExecutiontimepersuperstepoftheConnectedComponentsÞxpointalgorithm.CCSupersteps(Giraph)CCSupersteps(Stratosphere)Table2Runtime(secs)ofjobsinthefaulttoleranceexperiments ExperimentRel.queryTriangleenum. Failurefree(nocheckp.)1,026646Failurefree(w/checkp.)1,039678CheckpointingOverhead1.2%5%Failure&recovery1,131747 thedatavolume)andafterthedata-reducingjoinoperator.Theexperimentswererunon8machineswith4coreseachand16GBRAMeach,whichisenoughtokeepcheckpointsentirelyinmainmemory(TableWeobservethatthecheckpointingitselfaddsverylittleoverhead,becauseitsimplykeepsacopyofthebuffersinmemory.Themechanismofephemeralcheckpointsselec-tivelymaterializessmallintermediateresults,keepingtherequiredmemorysmallcomparedwiththesizeofdatasetsandcertainintermediateresults.Forbothjobs,theruntimewithafailureisonlymoderatelyhigherthanthefailure-freeruntime.Inthecaseoftherelationalquery,abigpartoftheexecutiontimeisspentinscanning,Þltering,andprojectingtheinputs.Becausetheresultischeckpointed,thesecostlyoperationsarenotrepeatedaspartoftherecovery.Inthecaseofthetriangleenumerationprogram,thejoinoperatorrepeatsitswork.Itcan,however,skipshippingroughlytheÞrsthalfofitsresultdatatothesinks,therebyavoidingagoodpartoftheresultwritingcosts.8.8ConclusionsTheresultsinthissectionindicatethatStratosphereofferscomparableorbetterperformanceagainstalternativeopen-sourcesystemsthatprovidegeneral-purpose(Hadoop,Hive)ordomain-speciÞc(Giraph)functionality.Mostofthegaincanbeattributedtoexecutionlayerfeatures(e.g.,eagerpushcommunication,differentPACTimplementations)andtheabilityofStratosphereÕsoptimizertoconsiderthesefeaturesduringplanenumerationinacost-basedmanner.Incontrast,mostoftheotheropen-sourcesystemsmakehard-codeddeci-sionsandÞxsomeofthephysicalaspectsintheusedexecu-tionplans,whichmaycauseinefÞcienciesdependingonthesizeandthedistributionsoftheinputdata.9OngoingworkWearecurrentlyworkingonseveralresearchanddevelop-mentthreadsinthecontextofStratosphere.Inthissection,wedescribeworkthatiscurrentlyongoing,deferringabroaderresearchoutlookuntilSect.Amajordirectionofourworkpertainstoqueryoptimiza-tion.First,wearecurrentlyworkingonconsolidatingandunifyingthePACTandSopremolayersofStratosphereintoasingleoperatormodelthatincludesoperatorswithknownsemanticsaswellasuser-deÞnedfunctions.Ourgoalwastoarriveatanone-passoptimizerthatconsidersoperatorreorderingandparallelizationtransformationsinthesamepass.Theoptimizerwillfallbacktostaticcodeanalysistech-niques(anextensionofthecurrentprototypicaltechniques)onlywhenoperatorsemanticsarenotknown.Theoptimizerwillprovidepay-as-you-goimplementationandoptimiza-tionofnew,domain-speciÞcoperators,enablingdeveloperstorapidlyaddnewoperators,i.e.,byimplementingbasicalgorithms,whichcanbeextendedandimprovedovertime.Second,wearedesigningamodulethatinjectsmonitoring Stratosphereplatformforbigdataanalyticsoperatorsintheplanthatcollectruntimestatisticsandreporttoametadatastoretouseduringoptimization.Third,weareworkingonreÞningthecostmodeloftheoptimizertowardsupportingthegoalofrobustness,inparticular,Þndingplansthatareoptimizedforavarianceinadditiontoanexpectedcostmetric.Fourth,weareworkingtowardadaptingplansatruntime,asuncertaintyabouttheintermediateresultsizesandthecharacteristicsoftheexecutionenvironment(especiallyinthecloudsetting)canquicklyrenderstaticoptimizationdecisionsmeaningless.AsecondmajordirectionisrelatedtostrengtheningthesystemÕsfault-tolerantexecutioncapabilities.Wearework-ingonanadaptivealgorithmthatselectivelypickswhichintermediateresultstomaterialize,takingintoaccountfail-ureprobabilities,theexecutiongraphstructure,thesizeofthetaskresults,thecostofthetasks,measuredandadaptedatruntime.Iterativejobspresentnovelopportunitiesforhandlingtheirfault-tolerantexecution.Wearecurrentlyinvestigatingtowhichextentalgorithmiccompensationtechniquescanalleviatetheneedforcheckpointingintermediaryalgorithmstatetostablestorage(e.g.,byexploitingtherobustnatureofÞxedpointalgorithmscommonlyusedindatamining).Bythis,weintendtoenablenoveloptimisticapproachestofaulttoleranceindistributediterativedataprocessing.AthirddirectionrelatestothescalabilityandefÞciencyoftheNepheleexecutionenginethroughconnectionmulti-plexing,application-levelßowcontrol,multicastingsupport,andselectivepipelinebreakingtoavoiddeadlocksorreducestallscausedbyhead-of-the-linewaitingeffects.Inaddition,weareworkingonbettermemorymanagementduringdatashufßinginverylargeclusters.Finally,weareexperimentingwithportingadditionalhigh-levelqueryorscriptinglanguagesontopoftheStratospherequeryprocessor.Wehave,ininitialstages,aprototypeofPig[],andaScaladialectofthePACTpro-grammingmodel[10Relatedwork10.1End-to-endbigdatasystemsTherearecurrentlyafewsystemsunderdevelopmentinacademiaandindustrythatseektoadvancethestateoftheartindistributeddatamanagement.TheHadoopecosystem(includinghigher-levellanguagessuchasHive,Pig,librariessuchasMahoutandothertooling)isthemostpopular.Com-paredwithHadoop,StratosphereoffersmoreefÞcientexecu-tionandmoresophisticatedoptimizationduetotheextendedsetofprimitives.Inaddition,thePACTmodelencouragesmoremodularcodeandcomponentreuse[].TheHadoopecosystemdoesnotcurrentlysupportDAG-structuredplansoriterativejobsandisthereforeveryinefÞcientinusecasesthatrequireorbeneÞtfromthose.Asterix[]isaresearcheffortbyseveralcampusesattheUniversityofCalifornia.LikeStratosphere,Asterixoffersacompletestackincludingahigher-levellanguageAQL,aqueryoptimizer,andadistributedruntimesystem[WhilebothStratosphereandAsterixaimatbridgingthegapbetweenMapReduceandparallelDBMSs,theystartattheoppositeendsofthespectrum(andoftenmeetinthemid-dle).Asterixstartswithasemi-structureddatamodelandlanguage,whileStratospherefollowsaUDF-centricmodel.AsterixincludesdatastoragebasedonLSM-trees,payingthepriceofadataloadingphaseforefÞcientexecution,whileStratosphereconnectstoexternaldatasources(e.g.,HDFS),convertingdatatooptimizedbinaryrepresentationsonlyaftertheinitialscans.Nevertheless,wehavearrivedatsimilarlessonswiththeAsterixteaminseveralaspectspertainingthedevelopmentofdistributeddatamanagementsystems.Scope[]isasystemusedbyMicrosoftBingforsev-eralanalytictasks.ScopeintegrateswithMicrosoftÕsLINQinterface,allowingthespeciÞcationofrichUDFssimi-lartoStratosphere.Itfeaturesasophisticatedqueryopti-mizer[]andruntimesystembasedonarewriteofDryad[].ScopeisperhapsthesystemmostsimilartoStratosphere,perhapsaimingatscalabilitymorethanefÞ-ciency.Tothebestofourknowledge,ScopedoesnotefÞ-cientlysupportincrementallyiterativequeries.TheSpark[]systemfromUCBerkeleyisadistributedsystemthatoperatesonmemory-residentdata.Sparkpro-videsfunctionalityequivalenttoStratosphereÕsbulkitera-tions,butnotincrementaliterations.Inaddition,whileSparkisasystemthatprocessesbatchesofdata,Stratospherefea-turesanexecutionenginethatpipelinesdata.Pipeliningisbettersuitedtousecasesthatincrementallyprocessdataasfoundinmanymachinelearningapplications.10.2QuerylanguagesandmodelsforparalleldataItisconceptuallyeasytoparallelizethebasicoperatorsofrelationalalgebra,andparallelanalyticaldatabaseshaveexistedfordecades.TheMapReduceparadigmwidensthescopeofparallelprogramstoincludemoregeneralizeduser-deÞneaggregationfunctions[].OurPACTmodelisageneralizationofMapReducethatinadditionenablestheparallelizationofgeneralizedjoins.MapReducehasbeenthecompilationtargetofSQLsubsets[],aswellasotherlanguagesinspiredbyscriptinglanguages[]orXQuery[].TheMeteorlanguageborrowsitssyntaxfromJaql[].Stratosphereisabettercompilationtargetforthe Atthetimeofwriting,Scopeisnotofferedasaproductorserviceby A.Alexandrovetal.aforementionedlanguagesthanMapReduce,asthesystemÕsoptimizercanbereused,andarichersetofprimitivesisavail-ableasthecompilationtargetlanguage.Whilethesequerylanguagesarebasedontherelationalorasemi-structuredmodel,otherefforts[]aimtoembeddomain-speciÞclanguagesinfunctionalgeneralprogramminglanguages.AnintegrationofafunctionallanguagewithStratospherecon-stitutesamajorpartofourfuturework.10.3QueryoptimizationQueryoptimizationisoneofthemostresearchedtopicsinthecontextofdatamanagement.Whilethephysicaloptimiza-tionasdescribedinSect.iscloselyrelatedtotraditionaloptimizationofrelationalqueriesas,forexample,inrefer-ences[],therewritingofdataßowsconsistingofuser-deÞnedoperatorshasgainedmoreinterestrecently.Similartoourapproach,MicrosoftÕsScopecompilerleveragesinfor-mationderivedbystaticcodeanalysistoreasonaboutthepreservationofinterestingphysicaldataproperties[Manimal[]appliesstaticcodeanalysisonHadoopmapandreduceUDFs.Incontrasttoourwork,operatorsarenotreorderedbutmapfunctionsthatincludeÞlterconditionsaremodiÞedinordertoaccessanindexinsteadofperformingafullscanoftheinputÞle.Stubby[]optimizesworkßowsconsistingofmultipleMapReducejobsbymergingconsec-utivemapandreduceoperatorstoreducetheoverheadofrunningmultipleMapReducejobs.Stubbygainstheinfor-mationforitsoptimizationsfrommanualcodeannotations.10.4DistributeddataßowexecutionWhiletheprinciplesbehindparalleldatabaseshavebeenknownsincethe80s[]andindustrial-strengthparal-leldatabaseshaveexistedforaslong[],thelastdecadebroughtanewwaveofÒmassivelyparallelÓrelationalqueryprocessors[]aswellastrulyscalableimplementationsofmorerestrictedmodels,notableMapReduce[].NepheleisascalableimplementationoftheDAGdataßowmodel,sim-ilartoMicrosoftÕsDryad[].Adifferentapproachisfol-lowedintheAsterixproject,wheretheHyracksengine[executesphysicaloperators(e.g.,physicaljoinimplemen-tations)ratherthanuser-deÞnedfunctionswrappedingluecodethatparallelizesexecution.Alltheseprojectsaimtoscalequeryprocessingtoclustersof100soreven1000sandbeyondnodes.10.5DistributediterativealgorithmsOverthelastyears,anumberofstand-alonegraphprocess-ingsystemsorapproachestointegrateiterativeprocess-ingindataßowengineshavebeenproposed.Spark[handlesstateasresilientdistributeddatasetsandprovidesconstructsforefÞcientlyexecutingiterativedataßowsthatrecomputethestateasawhole.GraphLab[]isaspecial-izedframeworkforparallelmachinelearning,wherepro-gramsmodelagraphexpressingthecomputationaldepen-denciesoftheinput.Programsareexpressedasupdatefunctionsonvertices,whichcanreadneighboringverticesÕstatethroughasharedmemoryabstraction.Furthermore,GraphLabprovidesconÞgurableconsistencylevelsandasyn-chronousschedulingoftheupdates.Pregel[]isagraphprocessingadoptionofbulksynchronousparallelprocessingrocessing65].Programsdirectlymodelagraph,whereverticesholdstateandsendmessagestootherverticesalongtheedges.Byreceivingmessages,verticesupdatetheirstate.Rex[isaparallelshared-nothingqueryprocessingplatformthatprovidesprogrammabledeltasforexpressingincrementallyiterativecomputations.Naiad[]uniÞesincrementallyiter-ativecomputationswithcontinuousdataingestionintoanewtechniquecalleddifferentialcomputation.Inthisapproach,intermediateresultsfromdifferentiterationsandingestionperiodsarerepresentedaspartiallyordereddeltas,andÞnalresultsarereconstructedlazilyusingcomputationallyefÞ-cientcombinationsoftheirlineagedeltas.11ConclusionsandresearchoutlookWepresentedStratosphere,adeepsoftwarestackforana-lyzingBigData.Stratospherefeaturesahigh-levelscript-inglanguage,Meteor,whichfocusesonprovidingextensi-bility.UsingMeteorandtheunderlyingSopremooperatormodel,domain-speciÞcexpertscanextendthesystemÕsfunc-tionalitywithnewoperators,inadditiontooperatorpack-agesfordatawarehousing,informationextraction,andinfor-mationintegrationalreadyprovided.StratospherefeaturesanintermediateUDF-centricprogrammingmodelbasedonsecond-orderfunctionsandhigher-orderabstractionsforiter-ativequeries.Theseprogramsareoptimizedusingacost-basedoptimizerinspiredbyrelationaldatabasesandadaptedtoaschema-lessandUDF-heavyprogramminganddatamodel.Finally,Nephele,StratosphereÕsdistributedexecutionengineprovidesscalableexecution,scheduling,networkdatatransfers,andfaulttolerance.StratosphereoccupiesasweetspotbetweenMapReduceandrelationaldatabases.ItoffersdeclarativeprogramspeciÞcation;itcoversawidevarietyofdataanalysistasksincludingiterativeorrecursivetasks;itoperatesdirectlyondistributedÞlesystemswithoutrequir-ingdataloading;anditoffersscalableexecutiononlargeclustersandinthecloud.ThelessonslearnedwhilebuildingStratospherehaveopenedseveraldirectionsforresearch.First,weseealotofpotentialinthedesign,compilation,andoptimizationofhigh-leveldeclarativelanguagesforvariousanalyticaldomains,inparticularmachinelearning.There,itischal- Stratosphereplatformforbigdataanalyticslengingtodividethelaborbetweenthelanguagecompiler,databaseoptimizer,andruntimesystem,anddeÞnetherightabstractionsbetweenthesecomponents.Second,webelievethatdistributeddatamanagementsystemsshouldbeefÞcientinadditiontoscalableandthusshouldadapttheiralgo-rithmsandarchitecturetotheever-evolvinglandscapeofhardware,includingmulti-coreprocessors,NUMAarchitec-tures,co-processors,suchasGPUsandFPGAs,ßashandphase-changememory,aswellasdatacenternetworkingFinally,weseeournextmajorresearchthreadrevolvingaroundusecasesthatmovebeyondbatchdataprocessingandrequirefastdataingestionandlow-latencydataanalysis.Suchsystems,inordertoproviderichfunctionality,musteffectivelymanagemutablestate,manifestedintheformofstatefuluser-deÞnedoperatorsoperatingondatastreams,orinthecontextofincrementaliterativealgorithms.SuchstatemustbeembeddedinadeclarativelanguageviaproperabstractionsandefÞcientlymanagedbythesystem.Inaddi-tion,workloadsof(long-running)programsneedtobeopti-mizedtogether.AcknowledgmentsWewouldliketothanktheMasterstudentsthatworkedontheStratosphereprojectandimplementedmanycompo-nentsofthesystem:ThomasBodner,ChristophBrŸcke,ErikNijkamp,MaxHeimel,MoritzKaufmann,AljoschaKrettek,MatthiasRingwald,TommyNeubert,FabianTschirschnitz,TobiasHeintz,ErikDiessler,ThomasStolltmann.References1.Ackermann,S.,Jovanovic,V.,Rompf,T.,Odersky,M.:Jet:anembeddeddslforhighperformancebigdataprocessing.In:Big-DataWorkshopatVLDB(2012)2.Alexandrov,A.,Ewen,S.,Heimel,M.,Hueske,F.,Kao,O.,Markl,V.,Nijkamp,E.,Warneke,D.:Mapreduceandpact-comparingdataparallelprogrammingmodels.In:BTW,pp.25Ð44(2011)3.Alexandrov,A.,BattrŽ,D.,Ewen,S.,Heimel,M.,Hueske,F.,Kao,O.,Markl,V.,Nijkamp,E.,Warneke,D.:Massivelyparalleldataanalysiswithpactsonnephele.PVLDB(2),1625Ð1628(2010)4.ApacheGiraph.http://incubator.apache.org/giraph/5.ApacheHadoop.http://hadoop.apache.org/6.ApacheHive.http://sortbenchmark.org/7.AsterData.http://www.asterdata.com/8.BattrŽ,D.,Ewen,S.,Hueske,F.,Kao,O.,Markl,V.,Warneke,D.:Nephele/pacts:aprogrammingmodelandexecutionframeworkforweb-scaleanalyticalprocessing.In:SoCC,pp.119Ð130(2010)9.BattrŽ,D.,Frejnik,N.,Goel,S.,Kao,O.,Warneke,D.:Evaluationofnetworktopologyinferenceinopaquecomputecloudsthroughend-to-endmeasurements.In:IEEECLOUD,pp.17Ð24(2011)10.BattrŽ,D.,Frejnik,N.,Goel,S.,Kao,O.,Warneke,D.:Infer-ringnetworktopologiesininfrastructureasaservicecloud.In:CCGRID,pp.604Ð605(2011)11.BattrŽ,D.,Hovestadt,M.,Lohrmann,B.,Stanik,A.,Warneke,D.:Detectingbottlenecksinparalleldag-baseddataßowprograms.In:MTAGS(2010)12.Behm,A.,Borkar,V.R.,Carey,M.J.,Grover,R.,Li,C.,Onose,N.,Vernica,R.,Deutsch,A.,Papakonstantinou,Y.,Tsotras,V.J.:Asterix:towardsascalable,semistructureddataplatformforevolving-worldmodels.Distrib.ParallelDatabases(3),185Ð21613.Beyer,K.S.,Ercegovac,V.,Gemulla,R.,Balmin,A.,Eltabakh,M.Y.,Kanne,C.C.,…zcan,F.,Shekita,E.J.:Jaql:ascriptinglan-guageforlargescalesemistructureddataanalysis.PVLDB1272Ð1283(2011)14.Boden,C.,Karnstedt,M.,Fernandez,M.,Markl,V.:Large-scalesocialmediaanalyticsonstratosphere.In:WWW(2013)15.Borkar,V.R.,Carey,M.J.,Grover,R.,Onose,N.,Vernica,R.:Hyracks:aßexibleandextensiblefoundationfordata-intensivecomputing.In:ICDE,pp.1151Ð1162(2011)16.Bruno,N.,Agarwal,S.,Kandula,S.,Shi,B.,Wu,M.C.,Zhou,J.:Recurringjoboptimizationinscope.In:SIGMODConference,pp.805Ð806(2012)17.Cha,M.,Haddadi,H.,Benevenuto,F.,Gummadi,P.K.:Measuringuserinßuenceintwitter:themillionfollowerfallacy.In:ICWSM18.ChaÞ,H.,DeVito,Z.,Moors,A.,Rompf,T.,Sujeeth,A.K.,Hanra-han,P.,Odersky,M.,Olukotun,K.:Languagevirtualizationforhet-erogeneousparallelcomputing.In:OOPSLA,pp.835Ð847(2010)19.Chattopadhyay,B.,Lin,L.,Liu,W.,Mittal,S.,Aragonda,P.,Lychagina,V.,Kwon,Y.,Wong,M.:Tenzingasqlimplementationonthemapreduceframework.PVLDB(12),1318Ð1327(2011)20.Chaudhuri,S.,Shim,K.:Includinggroup-byinqueryoptimization.In:VLDB,pp.354Ð366(1994)21.Cohen,J.:Graphtwiddlinginamapreduceworld.Comput.Sci.(4),29Ð41(2009)22.Dean,J.,Ghemawat,S.:Mapreduce:simpliÞeddataprocessingonlargeclusters.In:OSDI,pp.137Ð150(2004)23.DeWitt,D.J.,Gerber,R.H.,Graefe,G.,Heytens,M.L.,Kumar,K.B.,Muralikrishna,M.:GammaÑahighperformancedataßowdatabasemachine.In:VLDB,pp.228Ð237(1986)24.Elnozahy,E.N.M.,Alvisi,L.,Wang,Y.M.,Johnson,D.B.:Asurveyofrollback-recoveryprotocolsinmessage-passingsystems.ACMComput.Surv.(3),375Ð408(2002)25.Ewen,S.,Schelter,S.,Tzoumas,K.,Warneke,D.,Markl,V.:Iter-ativeparalleldataprocessingwithstratosphere:aninsidelook.In:SIGMOD(2013)26.Ewen,S.,Tzoumas,K.,Kaufmann,M.,Markl,V.:Spinningfastiterativedataßows.PVLDB(11),1268Ð1279(2012)27.Fegaras,L.,Li,C.,Gupta,U.:Anoptimizationframeworkformap-reducequeries.In:EDBT,pp.26Ð37(2012)28.Fushimi,S.,Kitsuregawa,M.,Tanaka,H.:Anoverviewofthesys-temsoftwareofaparallelrelationaldatabasemachinegrace.In:VLDB,pp.209Ð219(1986)29.Ghemawat,S.,Gobioff,H.,Leung,S.T.:ThegoogleÞlesystem.In:SOSP,pp.29Ð43(2003)30.Graefe,G.,Bunker,R.,Cooper,S.:Hashjoinsandhashteamsinmicrosoftsqlserver.In:VLDB,pp.86Ð97(1998)31.Graefe,G.:Implementingsortingindatabasesystems.ACMCom-put.Surv.(3),ArticleID10(2006)32.Graefe,G.:Parallelqueryexecutionalgorithms.In:EncyclopediaofDatabaseSystems,pp.2030Ð2035(2009)33.Graefe,G.:VolcanoÑanextensibleandparallelqueryevaluationsystem.IEEETrans.Knowl.DataEng.(1),120Ð135(1994)34.Greenplum.http://www.greenplum.com/35.Guo,Z.,Fan,X.,Chen,R.,Zhang,J.,Zhou,H.,McDirmid,S.,Liu,C.,Lin,W.,Zhou,J.,Zhou,L.:Spottingcodeoptimizationsindata-parallelpipelinesthroughperiscope.In:OSDI,pp.121Ð13336.Harjung,J.J.:Reducingformalnoiseinpactprograms.MasterÕsthesis,TechnischeUniversitŠtBerlin,FacultyofEECS(2013)37.Heise,A.,RheinlŠnder,A.,Leich,M.,Leser,U.,Naumann,F.:Meteor/sopremo:anextensiblequerylanguageandoperatormodel.In:BigDataWorkshopatVLDB(2012) A.Alexandrovetal.38.Heise,A.,Naumann,F.:Integratingopengovernmentdatawithstratosphereformoretransparency.WebSemant.:Sci.Serv.AgentsWorldWideWeb,45Ð56(2012)39.Hšger,M.,Kao,O.,Richter,P.,Warneke,D.:Ephemeralmaterial-izationpointsinstratospheredatamanagementonthecloud.Adv.ParallelComput.,163Ð181(2013)40.Hovestadt,M.,Kao,O.,Kliem,A.,Warneke,D.:Evaluatingadap-tivecompressiontomitigatetheeffectsofsharedi/oinclouds.In:IPDPSWorkshops,pp.1042Ð1051(2011)41.Hueske,F.,Krettek,A.,Tzoumas,K.:Enablingoperatorreorder-ingindataßowprogramsthroughstaticcodeanalysis.CoRRabs/1301.4200(2013)42.Hueske,F.,Peters,M.,Krettek,A.,Ringwald,M.,Tzoumas,K.,Markl,V.,Freytag,J.C.:Peekingintotheoptimizationofdataßowprogramswithmapreduce-styleudfs.In:ICDE(2013)43.Hueske,F.,Peters,M.,Sax,M.,RheinlŠnder,A.,Bergmann,R.,Krettek,A.,Tzoumas,K.:Openingtheblackboxesindataßowoptimization.PVLDB(11),1256Ð1267(2012)44.Isard,M.,Budiu,M.,Yu,Y.,Birrell,A.,Fetterly,D.:Dryad:dis-tributeddata-parallelprogramsfromsequentialbuildingblocks.In:EuroSys,pp.59Ð72(2007)45.Jahani,E.,Cafarella,M.J.,RŽ,C.:Automaticoptimizationformapreduceprograms.PVLDB(6),385Ð396(2011)46.JavaHotSpotVMWhitepaper.http://www.oracle.com/technetwork/java/whitepaper-135217.html47.JavaScriptObjectNotation.http://json.org/48.Kalavri,V.:Integratingpigandstratosphere.MasterÕsthesis,KTH,SchoolofInformationandCommunicationTechnology(ICT)49.Kang,U.,Tsourakakis,C.E.,Faloutsos,C.:Pegasus:apeta-scalegraphminingsystem.In:ICDM,pp.229Ð238(2009)50.Kung,H.T.,Robinson,J.T.:Onoptimisticmethodsforconcurrencycontrol.ACMTrans.DatabaseSyst.(2),213Ð226(1981)51.Leich,M.,Adamek,J.,Schubotz,M.,Heise,A.,RheinlŠnder,A.,Markl,V.:Applyingstratosphereforbigdataanalytics.In:BTW,pp.507Ð510(2013)52.Lim,H.,Herodotou,H.,Babu,S.:Stubby:atransformation-basedoptimizerformapreduceworkßows.PVLDB(11),1196Ð120753.Low,Y.,Gonzalez,J.,Kyrola,A.,Bickson,D.,Guestrin,C.,Heller-stein,J.M.:Distributedgraphlab:aframeworkformachinelearninginthecloud.PVLDB(8),716Ð727(2012)54.Malewicz,G.,Austern,M.H.,Bik,A.J.C.,Dehnert,J.C.,Horn,I.,Leiser,N.,Czajkowski,G.:Pregel:asystemforlarge-scalegraphprocessing.In:SIGMODConference,pp.135Ð146(2010)55.McSherry,F.,Murray,D.,Isaacs,R.,Isard,M.:Differentialdataßow.In:CIDR(2013)56.Mihaylov,S.R.,Ives,Z.G.,Guha,S.:Rex:recursive,delta-baseddata-centriccomputation.PVLDB(11),1280Ð1291(2012)57.Olston,C.,Reed,B.,Srivastava,U.,Kumar,R.,Tomkins,A.:Piglatin:anot-so-foreignlanguagefordataprocessing.In:SIGMODConference,pp.1099Ð1110(2008)58.Pike,R.,Dorward,S.,Griesemer,R.,Quinlan,S.:Interpretingthedata:parallelanalysiswithsawzall.Sci.Program.(4),277Ð29859.ProjectGutenberg.http://www.gutenberg.org/60.Selinger,P.G.,Astrahan,M.M.,Chamberlin,D.D.,Lorie,R.A.,Price,T.G.:Accesspathselectioninarelationaldatabasemanage-mentsystem.In:SIGMODConference,pp.23Ð34(1979)61.Silva,Y.N.,Larson,P.A.,Zhou,J.:Exploitingcommonsubexpres-sionsforcloudqueryprocessing.In:ICDE,pp.1337Ð1348(2012)62.StanfordNetworkAnalysisProject.http://snap.stanford.edu/63.Teradata.http://www.teradata.com/64.Thusoo,A.,Sarma,J.S.,Jain,N.,Shao,Z.,Chakka,P.,Anthony,S.,Liu,H.,Wyckoff,P.,Murthy,R.:HiveÑawarehousingsolutionoveramap-reduceframework.PVLDB(2),1626Ð1629(2009)65.Valiant,L.G.:Abridgingmodelforparallelcomputation.Com-mun.ACM(8),103Ð111(1990)66.Wang,Y.M.,Fuchs,W.K.:Lazycheckpointcoordinationforboundingrollbackpropagation.In:ReliableDistributedSystems,1993.Proceedings.,12thSymposiumon,pp.78Ð85(1993)67.Warneke,D.,Kao,O.:Nephele:efÞcientparalleldataprocessinginthecloud.In:SC-MTAGS(2009)68.Warneke,D.,Kao,O.:ExploitingdynamicresourceallocationforefÞcientparalleldataprocessinginthecloud.IEEETrans.ParallelDistrib.Syst.(6),985Ð997(2011)69.Yu,Y.,Isard,M.,Fetterly,D.,Budiu,M.,Erlingsson,ò.,Gunda,P.K.,Currey,J.:Dryadlinq:asystemforgeneral-purposedis-tributeddata-parallelcomputingusingahigh-levellanguage.In:OSDI,pp.1Ð14(2008)70.Zaharia,M.,Chowdhury,M.,Das,T.,Dave,A.,Ma,J.,McCauley,M.,Franklin,M.J.,Shenker,S.,Stoica,I.:Resilientdistributeddatasets:afault-tolerantabstractionforin-memoryclustercom-puting.In:NSDI(2012)71.Zhang,J.,Zhou,H.,Chen,R.,Fan,X.,Guo,Z.,Lin,H.,Li,J.Y.,Lin,W.,Zhou,J.,Zhou,L.:Optimizingdatashufßingindata-parallelcomputationbyunderstandinguser-deÞnedfunctions.In:NSDI(2012)72.Zhou,J.,Bruno,N.,Lin,W.:Advancedpartitioningtechniquesformassivelydistributedcomputation.In:SIGMODConference,pp.13Ð24(2012)73.Zhou,J.,Larson,P..,Chaiken,R.:Incorporatingpartitioningandparallelplansintothescopeoptimizer.In:ICDE,pp.1060Ð107174.Zhou,J.,Bruno,N.,Wu,M.C.,Larson,P..,Chaiken,R.,Shakib,D.:Scope:paralleldatabasesmeetmapreduce.VLDBJ.611Ð636(2012)

Related Contents


Next Show more