huangintelcom Jinzhu Jia UC Berkeley jzjiastatberkeleyedu Bin Yu UC Berkeley binyustatberkeleyedu ByungGon Chun Intel Labs Berkeley byunggonchunintelcom Petros Maniatis Intel Labs Berkeley petrosmaniatisintelcom Mayur Naik Intel Labs Berkeley mayurna ID: 24196
Download Pdf The PPT/PDF document "Predicting Execution Time of Computer Pr..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
PredictingExecutionTimeofComputerProgramsUsingSparsePolynomialRegression LingHuangIntelLabsBerkeleyling.huang@intel.comJinzhuJiaUCBerkeleyjzjia@stat.berkeley.eduBinYuUCBerkeleybinyu@stat.berkeley.eduByung-GonChunIntelLabsBerkeleybyung-gon.chun@intel.comPetrosManiatisIntelLabsBerkeleypetros.maniatis@intel.comMayurNaikIntelLabsBerkeleymayur.naik@intel.comAbstractPredictingtheexecutiontimeofcomputerprogramsisanimportantbutchalleng-ingprobleminthecommunityofcomputersystems.Existingmethodsrequireex-pertstoperformdetailedanalysisofprogramcodeinordertoconstructpredictorsorselectimportantfeatures.Werecentlydevelopedanewsystemtoautomaticallyextractalargenumberoffeaturesfromprogramexecutiononsampleinputs,onwhichpredictionmodelscanbeconstructedwithoutexpertknowledge.Inthispaperwestudytheconstructionofpredictivemodelsforthisproblem.Wepro-posetheSPORE(SparsePOlynomialREgression)methodologytobuildaccuratepredictionmodelsofprogramperformanceusingfeaturedatacollectedfrompro-gramexecutiononsampleinputs.OurtwoSPOREalgorithmsareabletobuildrelationshipsbetweenresponses(e.g.,theexecutiontimeofacomputerprogram)andfeatures,andselectafewfromhundredsoftheretrievedfeaturestocon-structanexplicitlysparseandnon-linearmodeltopredicttheresponsevariable.Thecompactandexplicitlypolynomialformoftheestimatedmodelcouldrevealimportantinsightsintothecomputerprogram(e.g.,featuresandtheirnon-linearcombinationsthatdominatetheexecutiontime),enablingabetterunderstandingoftheprogram'sbehavior.Ourevaluationonthreewidelyusedcomputerpro-gramsshowsthatSPOREmethodscangiveaccuratepredictionwithrelativeerrorlessthan7%byusingamoderatenumberoftrainingdatasamples.Inaddition,wecompareSPOREalgorithmstostate-of-the-artsparseregressionalgorithms,andshowthatSPOREmethods,motivatedbyrealapplications,outperformtheothermethodsintermsofbothinterpretabilityandpredictionaccuracy.1IntroductionComputingsystemstodayareubiquitous,andrangefromtheverysmall(e.g.,iPods,cellphones,laptops)totheverylarge(servers,datacenters,computationalgrids).Attheheartofsuchsystemsaremanagementcomponentsthatdecidehowtoscheduletheexecutionofdifferentprogramsovertime(e.g.,toensurehighsystemutilizationorefcientenergyuse[11,15]),howtoallocatetoeachprogramresourcessuchasmemory,storageandnetworking(e.g.,toensurealongbatterylifeorfairresourceallocation),andhowtoweatheranomalies(e.g.,ashcrowdsorattacks[6,17,24]).Thesemanagementcomponentstypicallymustmakeguessesabouthowaprogramwillperformundergivenhypotheticalinputs,soastodecidehowbesttoplanforthefuture.Forexample,considerasimplescenarioinadatacenterwithtwocomputers,fastcomputerAandslowcomputerB,andaprogramwaitingtorunonalargelefstoredincomputerB.Aschedulerisoftenfaced1 withthedecisionofwhethertoruntheprogramatB,potentiallytakinglongertoexecute,butavoidinganytransmissioncostsforthele;ormovingthelefromBtoAbutpotentiallyexecutingtheprogramatAmuchfaster.IftheschedulercanpredictaccuratelyhowlongtheprogramwouldtaketoexecuteoninputfatcomputerAorB,he/shecanmakeanoptimaldecision,returningresultsfaster,possiblyminimizingenergyuse,etc.Despitealltheseopportunitiesanddemands,usesofpredictionhavebeenatbestunsophisticatedinmoderncomputersystems.Existingapproacheseithercreateanalyticalmodelsfortheprogramsbasedonsimplisticassumptions[12],ortreattheprogramasablackboxandcreateamappingfunc-tionbetweencertainpropertiesofinputdata(e.g.,lesize)andoutputresponse[13].Thesuccessofsuchmethodsishighlydependentonhumanexpertswhoareabletoselectimportantpredictorsbeforeastatisticalmodelingstepcantakeplace.Unfortunately,inpracticeexpertsmaybehardtocomeby,becauseprogramscangetcomplexquicklybeyondthecapabilitiesofasingleexpert,orbecausetheymaybeshort-lived(e.g.,applicationsfromtheiPhoneappstore)andunworthyoftheattentionofahighlypaidexpert.Evenwhenanexpertisavailable,programperformanceisoftendependentnotonexternallyvisiblefeaturessuchascommand-lineparametersandinputles,butontheinternalsemanticsoftheprogram(e.g.,whatlinesofcodeareexecuted).Toaddressthisproblem(lackofexpertandinherentsemantics),werecentlydevelopedanewsys-tem[7]toautomaticallyextractalargenumberoffeaturesfromtheintermediateexecutionstepsofaprogram(e.g.,internalvariables,loops,andbranches)onsampleinputs;thenpredictionmodelscanbebuiltfromthosefeatureswithouttheneedforahumanexpert.Inthispaper,weproposetwoSparsePOlynomialREgression(SPORE)algorithmsthatusetheautomaticallyextractedfeaturestopredictacomputerprogram'sperformance.TheyarevariantsofeachotherinthewaytheybuildthenonlineartermsintothemodelSPORE-LASSOrstselectsasmallnumberoffeaturesandthenentertainsafullnonlinearpolynomialexpansionoforderlessthanagivendegree;whileSPORE-FoBachoosesadaptivelyasubsetofthefullexpandedtermsandhenceallowspossiblyahigherorderofpolynomials.Ouralgorithmsareinfactnewgeneralmethodsmotivatedbythecomputerperformancepredictionproblem.Theycanlearnarelationshipbetweenaresponse(e.g.,theexecutiontimeofacomputerprogramgivenaninput)andthegeneratedfeatures,andselectafewfromhundredsoffeaturestoconstructanexplicitpolynomialformtopredicttheresponse.Thecompactandexplicitpolynomialformrevealsimportantinsightsintheprogramsemantics(e.g.,theinternalprogramloopthataffectsprogramexecutiontimethemost).Ourapproachisgeneral,exibleandautomated,andcanadaptthepredictionmodelstospecicprograms,computerplatforms,andeveninputs.Weevaluateouralgorithmsexperimentallyonthreepopularcomputerprogramsfromwebsearchandimageprocessing.WeshowthatourSPOREalgorithmscanachieveaccuratepredictionswithrelativeerrorlessthan7%byusingasmallamountoftrainingdataforourapplication,andthatouralgorithmsoutperformexistingstate-of-the-artsparseregressionalgorithmsintheliteratureintermsofinterpretabilityandaccuracy.RelatedWork.Inpriorattemptstopredictprogramexecutiontime,Guptaetal.[13]useavariantofdecisiontreestopredictexecutiontimerangesfordatabasequeries.Ganapathietal.[11]useKCCAtopredicttimeandresourceconsumptionfordatabasequeriesusingstatisticsonquerytextsandexecutionplans.Tomeasuretheempiricalcomputationalcomplexityofaprogram,Trendprof[12]constructslinearorpower-lawmodelsthatpredictprogramexecutioncounts.Thedrawbacksofsuchapproachesincludetheirneedforexpertknowledgeabouttheprogramtoidentifygoodfeatures,ortheirrequirementforsimpleinput-sizetoexecutiontimecorrelations.SeshiaandRakhlin[22,23]proposeagame-theoreticestimatorofquantitativeprogramproperties,suchasworst-caseexecutiontime,forembeddedsystems.Thesepropertiesdependheavilyonthetargethardwareenvironmentinwhichtheprogramisexecuted.Modelingtheenvironmentmanuallyistediousanderror-prone.Asaresult,theyformulatetheproblemasagamebetweentheiralgorithm(player)andtheprogram'senvironment(adversary),wheretheplayerseekstoaccuratelypredictthepropertyofinterestwhiletheadversarysetsenvironmentstatesandparameters.Sinceexpertresourceislimitedandcostly,itisdesirabletoautomaticallyextractfeaturesfrompro-gramcodes.Thenmachinelearningtechniquescanbeusedtoselectthemostimportantfeaturestobuildamodel.Instatisticalmachinelearning,featureselectionmethodsunderlinearregres-sionmodelssuchasLASSOhavebeenwidelystudiedinthepastdecade.Featureselectionwith2 non-linearmodelshasbeenstudiedmuchless,buthasrecentlybeenattractingattention.ThemostnotablearetheSpAMworkwiththeoreticalandsimulationresults[20]andadditiveandgeneral-izedforwardregression[18].Empiricalstudieswithdataofthesenon-linearsparsemethodsareveryfew[21].ThedrawbackofapplyingtheSpAMmethodinourexecutiontimepredictionprob-lemisthatSpAMoutputsanadditivemodelandcannotusetheinteractioninformationbetweenfeatures.Butitiswell-knownthatfeaturesofcomputerprogramsinteracttodeterminetheexecu-tiontime[12].Onenon-parametricmodicationofSpAMtoreplacetheadditivemodelhasbeenproposed[18].However,theresultingnon-parametricmodelsarenoteasytointerpretandhencearenotdesirableforourexecutiontimepredictionproblem.Instead,weproposetheSPOREmethod-ologyandproposeefcientalgorithmstotrainaSPOREmodel.Ourworkprovidesapromisingexampleofinterpretablenon-linearsparseregressionmodelsinsolvingrealdataproblems.2OverviewofOurSystemOurfocusinthispaperisonalgorithmsforfeatureselectionandmodelbuilding.Howeverwerstreviewtheproblemwithinwhichweapplythesetechniquestoprovidecontext[7].Ourgoalistopredicthowagivenprogramwillperform(e.g.,itsexecutiontime)onaparticularinput(e.g.,inputlesandcommand-lineparameters).Thesystemconsistsoffoursteps.First,thefeatureinstrumentationstepanalyzesthesourcecodeandautomaticallyinstrumentsittoextractvaluesofprogramfeaturessuchasloopcounts(howmanytimesaparticularloophasexecuted),branchcounts(howmanytimeseachbranchofaconditionalhasexecuted),andvariablevalues(thekrstvaluesassignedtoanumericalvariable,forsomesmallksuchas5).Second,theprolingstepexecutestheinstrumentedprogramwithsampleinputdatatocollectvaluesforallcreatedprogramfeaturesandtheprogram'sexecutiontimes.Thetimeimpactofthedatacollectionisminimal.Third,theslicingstepanalyzeseachautomaticallyidentiedfeaturetodeterminethesmallestsubsetoftheactualprogramthatcancomputethevalueofthatfeature,i.e.,thefeatureslice.Thisisthecostofobtainingthevalueofthefeature;ifthewholeprogrammustexecutetocomputethevalue,thenthefeatureisexpensiveandnotuseful,sincewecanjustmeasureexecutiontimeandwehavenoneedforprediction,whereasifonlyalittleoftheprogrammustexecute,thefeatureischeapandthereforepossiblyvaluableinapredictivemodel.Finally,themodelingstepusesthefeaturevaluescollectedduringprolingalongwiththefeaturecostscomputedduringslicingtobuildapredictivemodelonasmallsubsetofgeneratedfeatures.Toobtainamodelconsistingoflow-costfeatures,weiterateoverthemodelingandslicingsteps,evaluatingthecostofselectedfeaturesandrejectingexpensiveones,untilonlylow-costfeaturesareselectedtoconstructthepredictionmodel.Atruntime,givenanewinput,theselectedfeaturesarecomputedusingthecorrespondingslices,andthemodelisusedtopredictexecutiontimefromthefeaturevalues.Theabovedescriptionisminimalbynecessityduetospaceconstraints,andomitsdetailsontherationale,suchaswhywechosethekindsoffeatureswechoseorhowprogramslicingworks.Thoughimportant,thosedetailshavenobearingintheresultsshowninthispaper.Atpresentoursystemtargetsaxed,overprovisionedcomputationenvironmentwithoutCPUjobcontentionornetworkbandwidthuctuations.Wethereforeassumethatexecutiontimesobservedduringtrainingwillbeconsistentwithsystembehavioron-line.Ourapproachcanadapttomodestchangeinexecutionenvironmentbyretrainingondifferentenvironments.Inourfutureresearch,weplantoincorporatecandidatefeaturesofbothhardware(e.g.,congurationsofCPU,memory,etc)andsoftwareenvironment(e.g.,OS,cachepolicy,etc)forpredictivemodelconstruction.3SparsePolynomialRegressionModelOurbasicpremiseforpredictiveprogramanalysisisthatasmallbutrelevantsetoffeaturesmayex-plaintheexecutiontimewell.Inotherwords,weseekacompactmodelanexplicitformfunctionofasmallnumberoffeaturesthataccuratelyestimatestheexecutiontimeoftheprogram.3 Tomaketheproblemtractable,weconstrainourmodelstothemultivariatepolynomialfamily,foratleastthreereasons.First,agoodprogramisusuallyexpectedtohavepolynomialexecutiontimeinsome(combinationof)features.Second,apolynomialmodeluptocertaindegreecanapproximatewellmanynonlinearmodels(duetoTaylorexpansion).Finally,acompactpolynomialmodelcanprovideaneasy-to-understandexplanationofwhatdeterminestheexecutiontimeofaprogram,providingprogramdeveloperswithintuitivefeedbackandasolidbasisforanalysis.Foreachcomputerprogram,ourfeatureinstrumentationprocedureoutputsadatasetwithnsamplesastuplesoffyi;xigni=1,whereyi2Rdenotestheithobservationofexecutiontime,andxidenotestheithobservationofthevectorofpfeatures.WenowreviewsomeobviousalternativemethodstomodelingtherelationshipbetweenY=[yi]andX=[xi],pointouttheirdrawbacks,andthenweproceedtoourSPOREmethodology.3.1SparseRegressionandAlternativesLeastsquareregressioniswidelyusedforndingthebest-ttingf(x;)toagivensetofresponsesyibyminimizingthesumofthesquaresoftheresiduals[14].Regressionwithsubsetselectionndsforeachk2f1;2;:::;mgthefeaturesubsetofsizekthatgivesthesmallestresidualsumofsquares.However,itisacombinatorialoptimizationandisknowntobeNP-hard[14].Inrecentyearsanumberofefcientalternativesbasedonmodelregularizationhavebeenproposed.Amongthem,LASSO[25]ndstheselectedfeatureswithcoefcients^givenatuningparameterasfollows:^=argmin1 2kY Xk22+Xjjjj:(1)LASSOeffectivelyenforcesmanyj'stobe0,andselectsasmallsubsetoffeatures(indexedbynon-zeroj's)tobuildthemodel,whichisusuallysparseandhasbetterpredictionaccuracythanmodelscreatedbyordinaryleastsquareregression[14]whenpislarge.Parametercontrolsthecomplexityofthemodel:asgrowslarger,fewerfeaturesareselected.BeingaconvexoptimizationproblemisanimportantadvantageoftheLASSOmethodsinceseveralfastalgorithmsexisttosolvetheproblemefcientlyevenwithlarge-scaledatasets[9,10,16,19].Furthermore,LASSOhasconvenienttheoreticalandempiricalproperties.Undersuitableassump-tions,itcanrecoverthetrueunderlyingmodel[8,25].Unfortunately,whenpredictorsarehighlycorrelated,LASSOusuallycannotselectthetrueunderlyingmodel.Theadaptive-LASSO[29]denedbelowinEquation(2)canovercomethisproblem^=argmin1 2kY Xk22+Xjjj wjj;(2)wherewjcanbeanyconsistentestimateof.Herewechoosewjtobearidgeestimateof:wj=(XTX+0:001I) 1XTY;whereIistheidentitymatrix.TechnicallyLASSOcanbeeasilyextendedtocreatenonlinearmodels(e.g.,usingpolynomialbasisfunctionsuptodegreedofallpfeatures).However,thisapproachgivesus p+ddterms,whichisverylargewhenpislarge(ontheorderofthousands)evenforsmalld,makingregressioncomputa-tionallyexpensive.Wegivetwoalternativestotthesparsepolynomialregressionmodelnext.3.2SPOREMethodologyandTwoAlgorithmsOurmethodologycapturesnon-lineareffectsoffeaturesaswellasnon-linearinteractionsamongfeaturesbyusingpolynomialbasisfunctionsoverthosefeatures(weusetermstodenotethepoly-nomialbasisfunctionssubsequently).Weexpandthefeaturesetx=fx1x2:::xkg;kptoallthetermsintheexpansionofthedegree-dpolynomial(1+x1+:::+xk)d,andusethetermstoconstructamultivariatepolynomialfunctionf(x;)fortheregression.Wedeneexpan(X;d)asthemappingfromtheoriginaldatamatrixXtoanewmatrixwiththepolynomialexpansiontermsuptodegreedasthecolumns.Forexample,usingadegree-2polynomialwithfeatureset4 x=fx1x2g,weexpandout(1+x1+x2)2togetterms1;x1;x2;x21;x1x2;x22,andusethemasbasisfunctionstoconstructthefollowingfunctionforregression:expan([x1;x2];2)=[1;[x1];[x2];[x21];[x1x2];[x22]];f(x;)=0+1x1+2x2+3x21+4x1x2+5x22:Completeexpansiononallpfeaturesisnotnecessary,becausemanyofthemhavelittlecontri-butiontotheexecutiontime.Motivatedbythisexecutiontimeapplication,weproposeageneralmethodologycalledSPOREwhichisasparsepolynomialregressiontechnique.Next,wedeveloptwoalgorithmstotourSPOREmethodology.3.2.1SPORE-LASSO:ATwo-StepMethodForasparsepolynomialmodelwithonlyafewfeatures,ifwecanpreselectasmallnumberoffeatures,applyingtheLASSOonthepolynomialexpansionofthosepreselectedfeatureswillstillbeefcient,becausewedonothavetoomanypolynomialterms.Hereistheidea:Step1:UsethelinearLASSOalgorithmtoselectasmallnumberoffeaturesandlterout(oftenmany)featuresthathardlyhavecontributionstotheexecutiontime.Step2:Usetheadaptive-LASSOmethodontheexpandedpolynomialtermsoftheselectedfeatures(fromStep1)toconstructthesparsepolynomialmodel.Adaptive-LASSOisusedinStep2becauseofthecollinearityoftheexpandedpolynomialfeatures.Step2canbecomputedefcientlyifweonlychooseasmallnumberoffeaturesinStep1.WepresenttheresultingSPORE-LASSOalgorithminAlgorithm1below. Algorithm1SPORE-LASSO Input:responseY,featuredataX,maximumdegreed,1,2Output:FeatureindexS,termindexSt,weights^ford-degreepolynomialbasis.1:^=argmin1 2kY Xk22+1kk12:S=fj:^j=0g3:Xnew=expan(X(S);d)4:w=(XTnewXnew+0:001I) 1XTnewY5:^=argmin1 2kY Xnewk22+2Pjjj wjj6:St=fj:^j=0g X(S)inStep3ofAlgorithm1isasub-matrixofXcontainingonlycolumnsfromXindexedbyS.ForanewobservationwithfeaturevectorX=[x1;x2;:::;xp],werstgettheselectedfeaturevectorX(S),thenobtainthepolynomialtermsXnew=expan(X(S);d),andnallywecomputetheprediction:^Y=Xnew^.Notethatthepredictiondependsonthechoiceof1;2andmaximumdegreed.Inthispaper,wexd=3.1and2arechosenbyminimizingtheAkaikeInformationCriterion(AIC)ontheLASSOsolutionpaths.TheAICisdenedasnlog(kY ^Yk22)+2s,where^YisthettedYandsisthenumberofpolynomialtermsselectedinthemodel.Tobeprecise,forthelinearLASSOstep(Step1ofAlgorithm1),awholesolutionpathwithanumberof1canbeobtainedusingthealgorithmin[10].Onthesolutionpath,foreachxed1,wecomputeasolutionpathwithvaried2forStep5ofAlgorithm1toselectthepolynomialterms.Foreach2,wecalculatetheAIC,andchoosethe(1;2)withthesmallestAIC.OnemaywonderwhetherStep1incorrectlydiscardsfeaturesrequiredforbuildingagoodmodelinStep2.Wenextshowtheoreticallythisisnotthecase.LetSbeasubsetoff1;2;:::;pganditscomplementSc=f1;2;:::;pgnS.WritethefeaturematrixXasX=[X(S);X(Sc)].LetresponseY=f(X(S))+,wheref()isanyfunctionandisadditivenoise.LetnbethenumberofobservationsandsthesizeofS.WeassumethatXisdeterministic,pandsarexed,and0isarei.i.d.andfollowtheGaussiandistributionwithmean0andvariance2.Ourresultsalsoholdforzeromeansub-Gaussiannoisewithparameter2.Moregeneralresultsregardinggeneralscalingofn;pandscanalsobeobtained.Underthefollowingconditions,weshowthatStep1ofSPORE-LASSO,thelinearLASSO,selectstherelevantfeatureseveniftheresponseYdependsonpredictorsX(S)nonlinearly:5 1.Thecolumns(Xj;j=1;:::;p)ofXarestandardized:1 nXTjXj=1;forallj;2.min(1 nX(S)TX(S))cwithaconstantc0;3.minj(X(S)TX(S)) 1X(S)Tf(X(S))jwithaconstant0;4.XTSc[I XS(XTSXS) 1XTS]f(XS) nc 2p s+1,forsome01;5.kXTScXS(XTSXS) 1k11 ;wheremin()denotestheminimumeigenvalueofamatrix,kAk1isdenedasmaxihPjjAijjiandtheinequalitiesaredenedelement-wise.Theorem3.1.Undertheconditionsabove,withprobability!1asn!1,thereexistssome,suchthat^=(^S;^Sc)istheuniquesolutionoftheLASSO(Equation(1)),where^j=0;forallj2Sand^Sc=0.Remark.Thersttwoconditionsaretrivial:Condition1canbeobtainedbyrescalingwhileCon-dition2assumesthatthedesignmatrixcomposedofthetruepredictorsinthemodelisnotsingular.Condition3isareasonableconditionwhichmeansthatthelinearprojectionoftheexpectedre-sponsetothespacespannedbytruepredictorsisnotdegenerated.Condition4isalittlebittricky;itsaysthattheirrelevantpredictors(XSc)arenotverycorrelatedwiththeresidualsofE(Y)afteritsprojectionontoXS.Condition5isalwaysneededwhenconsideringLASSO'smodelselectionconsistency[26,28].Theproofofthetheoremisincludedinthesupplementarymaterial.3.2.2AdaptiveForward-Backward:SPORE-FoBaUsingallofthepolynomialexpansionsofafeaturesubsetisnotexible.Inthissection,weproposetheSPORE-FoBaalgorithm,amoreexiblealgorithmusingadaptiveforward-backwardsearchingoverthepolynomiallyexpandeddata:duringsearchstepkwithanactivesetT(k),weexamineonenewfeatureXj,andconsiderasmallcandidatesetwhichconsistsofthecandidatefeatureXj,itshigherorderterms,andthe(non-linear)interactionsbetweenpreviouslyselectedfeatures(indexedbyS)andcandidatefeatureXjwithtotaldegreeuptod,i.e.,termswithformXd1jl2SXdll;withd1-5.1;ä¡£0;dl0;andd1+Xdld:(3)Algorithm2belowisashortdescriptionoftheSPORE-FoBa,whichuseslinearFoBa[27]atstep5and6.ThemainideaofSPORE-FoBaisthatatermfromthecandidatesetisaddedintothemodelifandonlyifaddingthistermmakestheresidualsumofsquares(RSS)decreasealot.WescanallofthetermsinthecandidatesetandchoosetheonewhichmakestheRSSdropmost.IfthedropintheRSSisgreaterthanapre-speciedvalue,weaddthattermtotheactiveset,whichcontainsthecurrentlyselectedtermsbytheSPORE-FoBaalgorithm.Whenconsideringdeletingonetermfromtheactiveset,wechoosetheonethatmakesthesumofresidualsincreasetheleast.Ifthisincrementissmallenough,wedeletethattermfromourcurrentactiveset. Algorithm2SPORE-FoBa Input:responseY,featurecolumnsX1;:::;Xp,themaximumdegreedOutput:polynomialtermsandtheweights1:LetT=;2:whiletruedo3:forj=1;:::;pdo4:LetCbethecandidatesetthatcontainsnon-linearandinteractiontermsfromEquation(3)5:UseLinearFoBatoselecttermsfromCtoformthenewactivesetT.6:UseLinearFoBatodeletetermsfromTtoformanewactivesetT.7:ifnotermscanbeaddedordeletedthen8:break 6 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.05 0.1 0.15 0.2 Prediction ErrorPercentage of Training data SPORE-LASSO SPORE-FoBa 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.05 0.1 0.15 0.2 Prediction ErrorPercentage of Training data SPORE-LASSO SPORE-FoBa 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.05 0.1 0.15 0.2 Prediction ErrorPercentage of Training data SPORE-LASSO SPORE-FoBa (a)Lucene(b)FindMaxima(c)SegmentationFigure1:Predictionerrorsofouralgorithmsacrossthethreedatasetsvaryingtraining-setfractions.4EvaluationResultsWenowexperimentallydemonstratethatouralgorithmsarepractical,givehighlyaccuratepredic-torsforrealproblemswithsmalltraining-setsizes,comparefavorablyinaccuracytootherstate-of-the-artsparse-regressionalgorithms,andproduceinterpretable,intuitivemodels.Toevaluateouralgorithms,weuseascasestudiesthreeprograms:theLuceneSearchEngine[4],andtwoimageprocessingalgorithms,oneforndingmaximaandoneforsegmentinganimage(bothofwhichareimplementedwithintheImageJimageprocessingframework[3]).Wechoseallthreeprogramsaccordingtotwocriteria.Firstandmostimportantly,wesoughtprogramswithhighvariabilityinthepredictedmeasure(executiontime),especiallyinthefaceofotherwisesimilarinputs(e.g.,imagelesofroughlythesamesizeforimageprocessing).Second,wesoughtprogramsthatimplementreasonablycomplexfunctionality,forwhichaninexperiencedobserverwouldnotbeabletotriviallyidentifytheimportantfeatures.Ourcollecteddatasetsareasfollows.ForLucene,weusedavarietyoftextinputqueriesfromtwocorpora:theworksofShakespeareandtheKingJamesBible.Wecollectedadatasetwithn=3840samples,eachofwhichconsistsofanexecutiontimeandatotalofp=126automaticallygeneratedfeatures.Thetimevaluesareinrangeof(0:88;1:13)withstandarddeviation0.19.FortheFindMaximaprogramwithintheImageJframework,wecollectedn=3045samples(fromanequalnumberofdistinct,diverseimagesobtainedfromthreevisioncorpora[1,2,5]),andatotalofp=182features.Theexecutiontimevaluesareinrangeof(0:09;2:99)withstandarddeviation0.24.Finally,fromtheSegmentationprogramwithinthesameImageJframeworkonthesameimageset,wecollectedagainn=3045samples,andatotalofp=816featuresforeach.Thetimevaluesareinrangeof(0:21;58:05)withstandarddeviation3.05.Inalltheexperiments,wexdegreed=3forpolynomialexpansion,andnormalizedeachcolumnoffeaturedataintorange[0;1].PredictionError.Werstshowthatouralgorithmspredictaccurately,evenwhentrainingonasmallnumberofsamples,inbothabsoluteandrelativeterms.Theaccuracymeasureweuseistherelativepredictionerrordenedas1 ntPj^yi yi yij;wherentisthesizeofthetestdataset,and^yi'sandyi'sarethepredictedandactualresponsesoftestdata,respectively.Werandomlyspliteverydatasetintoatrainingsetandatestsetforagiventraining-setfraction,trainthealgorithmsandmeasuretheirpredictionerroronthetestdata.Foreachtrainingfraction,werepeatthesplitting,trainingandtestingprocedure10timesandshowthemeanandstandarddeviationofpredictionerrorinFigure1.Weseethatouralgorithmshavehighpredictionaccuracy,evenwhentrainingononly10%orlessofthedata(roughly300-400samples).Specically,bothofouralgorithmscanachievelessthan7%predictionerroronbothLuceneandFindMaximadatasets;onthesegmentationdataset,SPORE-FoBaachieveslessthan8%predictionerror,andSPORE-LASSOachievesaround10%predictionerroronaverage.ComparisonstoState-of-the-Art.Wecompareouralgorithmstoseveralexistingsparseregressionmethodsbyexaminingtheirpredictionerrorsatdifferentsparsitylevels(thenumberoffeaturesusedinthemodel),andshowouralgorithmscanclearlyoutperformLASSO,FoBaandrecentlyproposednon-parametricgreedymethods[18](Figure2).Asanon-parametricgreedyalgorithm,weuseAd-ditiveForwardRegression(AFR),becauseitisfasterandoftenachievesbetterpredictionaccuracythanGeneralizedForwardRegression(GFR)algorithms.WeusetheGlmnetMatlabimplementa-7 tionofLASSOandtoobtaintheLASSOsolutionpath[10].SinceFoBaandSPORE-FoBanaturallyproduceapathbyaddingordeletingfeatures(orterms),werecordthepredictionerrorateachstep.Whentwostepshavethesamesparsitylevel,wereportthesmallestpredictionerror.TogeneratethesolutionpathforSPORE-LASSO,werstuseGlmnettogenerateasolutionpathforlinearLASSO;thenateachsparsitylevelk,weperformfullpolynomialexpansionwithd=3ontheselectedkfeatures,obtainasolutionpathontheexpandeddata,andchoosethemodelwiththesmallestpredictionerroramongallmodelscomputedfromallactivefeaturesetsofsizek.Fromthegure,weseethatourSPOREalgorithmshavecomparableperformance,andbothofthemclearlyachievebetterpredictionaccuracythanLASSO,FoBa,andAFR.Noneoftheexistingmethodscanbuildmodelswithin10%ofrelativepredictionerror.Webelievethisisbecauseexecutiontimeofacomputerprogramoftendependsonnon-linearcombinationsofdifferentfeatures,whichisusuallynotwell-handledbyeitherlinearmethodsortheadditivenon-parametricmethods.Instead,bothofouralgorithmscanselect2-3high-qualityfeaturesandbuildmodelswithnon-linearcombinationsofthemtopredictexecutiontimewithhighaccuracy. 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 Prediction ErrorSparsity LASSO FoBa AFR SPORE-LASSO SPORE-FoBa 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 Prediction ErrorSparsity LASSO FoBa AFR SPORE-LASSO SPORE-FoBa 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 Prediction ErrorSparsity LASSO FoBa AFR SPORE-LASSO SPORE-FoBa (a)Lucene(b)FindMaxima(c)SegmentationFigure2:Performanceofthealgorithms:relativepredictionerrorversussparsitylevel.ModelInterpretability.Togainbetterunderstanding,weinvestigatethedetailsofthemodelcon-structedbySPORE-FoBaforFindMaxima.Ourconclusionsaresimilarfortheothercasestudies,butweomitthemduetospace.Weseethatwithdifferenttrainingsetfractionsandwithdifferentsparsitycongurations,SPORE-FoBacanalwaysselecttwohigh-qualityfeaturesfromhundredsofautomaticallygeneratedones.ByconsultingwithexpertsoftheFindMaximaprogram,wendthatthetwoselectedfeaturescorrespondtothewidth(w)andheight(h)oftheregionofinterestintheimage,whichmayinpracticedifferfromtheactualimagewidthandheight.Thoseareindeedthemostimportantfactorsfordeterminingtheexecutiontimeoftheparticularalgorithmused.Fora10%trainingsetfractionand=0:01,SPORE-FoBaobtainedf(w;h)=0:1+0:22w+0:23h+1:93wh+0:24wh2whichusesnon-linearfeatureterms(e.g.,wh,wh2)topredicttheexecutiontimeaccurately(around5.5%predictionerror).EspeciallywhenFindMaximaisusedasacomponentofamorecompleximageprocessingpipeline,thismodelwouldnotbethemostobviouschoiceevenanexpertwouldpick.Onthecontrary,asobservedinourexperiments,neitherthelinearnortheadditivesparsemethodshandlewellsuchnonlinearterms,andresultininferiorpredictionperformance.Amoredetailedcomparisonacrossdifferentmethodsisthesubjectofouron-goingwork.5ConclusionInthispaper,weproposedtheSPORE(SparsePOlynomialREgression)methodologytobuildtherelationshipbetweenexecutiontimeofcomputerprogramsandfeaturesoftheprograms.Wein-troducedtwoalgorithmstolearnaSPOREmodel,andshowedthatbothalgorithmscanpredictexecutiontimewithmorethan93%accuracyfortheapplicationswetested.Forthethreetestcases,theseresultspresentasignicantimprovement(a40%ormorereductioninpredictionerror)overothersparsemodelingtechniquesintheliteraturewhenappliedtothisproblem.Henceourworkprovidesoneconvincingexampleofusingsparsenon-linearregressiontechniquestosolverealproblems.Moreover,theSPOREmethodologyisageneralmethodologythatcanbeusedtomodelcomputerprogramperformancemetricsotherthanexecutiontimeandsolveproblemsfromotherareasofscienceandengineering.8 References[1]Caltech101ObjectCategories.http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html.[2]EventDataset.http://vision.stanford.edu/lijiali/event_dataset/.[3]ImageJ.http://rsbweb.nih.gov/ij/.[4]Mahout.lucene.apache.org/mahout.[5]VisualObjectClassesChallenge2008.http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2008/.[6]S.Chen,K.Joshi,M.A.Hiltunen,W.H.Sanders,andR.D.Schlichting.Linkgradients:Predictingtheimpactofnetworklatencyonmultitierapplications.InINFOCOM,2009.[7]B.-G.Chun,L.Huang,S.Lee,P.Maniatis,andM.Naik.Mantis:Predictingsystemperformancethroughprogramanalysisandmodeling.TechnicalReport,2010.arXiv:1010.0019v1[cs.PF].[8]D.Donoho.Formostlargeunderdeterminedsystemsofequations,theminimal1-normsolutionisthesparsestsolution.CommunicationsonPureandAppliedMathematics,59:797829,2006.[9]B.Efron,T.Hastie,I.Johnstone,andR.Tibshirani.Leastangleregression.AnnalsofStatistics,32(2):407499,2002.[10]J.Friedman,T.Hastie,andR.Tibshirani.Regularizationpathsforgeneralizedlinearmodelsviacoordi-natedescent.JournalofStatisticalSoftware,33(1),2010.[11]A.Ganapathi,H.Kuno,U.Dayal,J.L.Wiener,A.Fox,M.Jordan,andD.Patterson.Predictingmultiplemetricsforqueries:Betterdecisionsenabledbymachinelearning.InICDE,2009.[12]S.Goldsmith,A.Aiken,andD.Wilkerson.Measuringempiricalcomputationalcomplexity.InFSE,2007.[13]C.Gupta,A.Mehta,andU.Dayal.PQR:Predictingqueryexecutiontimesforautonomousworkloadmanagement.InICAC,2008.[14]T.Hastie,R.Tibshirani,andJ.Friedman.TheElementsofStatisticalLearning.Springer,2009.[15]M.Isard,V.Prabhakaran,J.Currey,U.Wieder,K.Talwar,andA.Goldberg.Quincy:fairschedulingfordistributedcomputingclusters.InProceedingsofSOSP'09,2009.[16]S.-J.Kim,K.Koh,M.Lustig,S.Boyd,andD.Gorinevsky.Aninterior-pointmethodforlarge-scalel1-regularizedleastsquares.IEEEJournalonSelectedTopicsinSignalProcessing,1(4):606617,2007.[17]Z.Li,M.Zhang,Z.Zhu,Y.Chen,A.Greenberg,andY.-M.Wang.WebProphet:Automatingperformancepredictionforwebservices.InNSDI,2010.[18]H.LiuandX.Chen.Nonparametricgreedyalgorithmforthesparselearningproblems.InNIPS22,2009.[19]M.Osborne,B.Presnell,andB.Turlach.Onthelassoanditsdual.JournalofComputationalandGraphicalStatistics,9(2):319337,2000.[20]P.Ravikumar,J.Lafferty,H.Liu,andL.Wasserman.Sparseadditivemodels.JournaloftheRoyalStatisticalSociety:SeriesB(StatisticalMethodology),71(5):10091030,2009.[21]P.Ravikumar,V.Vu,B.Yu,T.Naselaris,K.Kay,J.Gallant,andC.Berkeley.Nonparametricsparsehier-archicalmodelsdescribev1fmriresponsestonaturalimages.AdvancesinNeuralInformationProcessingSystems(NIPS),21,2008.[22]S.A.SeshiaandA.Rakhlin.Game-theoretictiminganalysis.InProceedingsoftheIEEE/ACMInterna-tionalConferenceonComputer-AidedDesign(ICCAD),pages575582.IEEEPress,Nov.2008.[23]S.A.SeshiaandA.Rakhlin.Quantitativeanalysisofsystemsusinggame-theoreticlearning.ACMTransactionsonEmbeddedComputingSystems(TECS),2010.Toappear.[24]M.Tariq,A.Zeitoun,V.Valancius,N.Feamster,andM.Ammar.Answeringwhat-ifdeploymentandcongurationquestionswithwise.InACMSIGCOMM,2008.[25]R.Tibshirani.Regressionshrinkageandselectionviathelasso.J.Royal.Statist.SocB.,1996.[26]M.Wainwright.Sharpthresholdsforhigh-dimensionalandnoisysparsityrecoveryusingl1-constrainedquadraticprogramming(Lasso).IEEETrans.InformationTheory,55:21832202,2009.[27]T.Zhang.Adaptiveforward-backwardgreedyalgorithmforsparselearningwithlinearmodels.AdvancesinNeuralInformationProcessingSystems,22,2008.[28]P.ZhaoandB.Yu.OnmodelselectionconsistencyofLasso.TheJournalofMachineLearningResearch,7:2563,2006.[29]H.Zou.Theadaptivelassoanditsoracleproperties.JournaloftheAmericanStatisticalAssociation,101(476):14181429,2006.9