/
PyGloveSymbolicProgrammingforAutomatedMachineLearning PyGloveSymbolicProgrammingforAutomatedMachineLearning

PyGloveSymbolicProgrammingforAutomatedMachineLearning - PDF document

cecilia
cecilia . @cecilia
Follow
342 views
Uploaded On 2021-09-22

PyGloveSymbolicProgrammingforAutomatedMachineLearning - PPT Presentation

DaiyiPengXuanyiDongEstebanRealMingxingTanYifengLuHanxiaoLiuGabrielBenderAdamKraftChenLiangQuocVLeGoogleResearchBrainTeamdaiyiperealtanmingxingyifengluhanxiaolgbenderadamkraftcrazydonkeyqvlgooglecomxua ID: 883281

150 x0000 bench 2019 x0000 150 2019 bench 2018 forexample 2017 2020 101 intheconferenceonneuralinformationprocessingsystems ininternationalconferenceonlearningrepresentations neurips iclr nas figure1

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "PyGloveSymbolicProgrammingforAutomatedMa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 PyGlove:SymbolicProgrammingforAutomatedM
PyGlove:SymbolicProgrammingforAutomatedMachineLearning DaiyiPeng,XuanyiDong,EstebanReal,MingxingTan,YifengLuHanxiaoLiu,GabrielBender,AdamKraft,ChenLiang,QuocV.LeGoogleResearch,BrainTeam{daiyip,ereal,tanmingxing,yifenglu,hanxiaol,gbender,adamkraft,crazydonkey,qvl}@google.comxuanyi.dxy@gmail.comAbstractNeuralnetworksaresensitivetohyper-parameterandarchitecturechoices.Auto-matedMachineLearning(AutoML)isapromisingparadigmforautomatingthesechoices.CurrentMLsoftwarelibraries,however,arequitelimitedinhandlingthedynamicinteractionsamongthecomponentsofAutoML.Forexample,efcientNASalgorithms,suchasENAS[1]andDARTS[2],typicallyrequireanimple-mentationcouplingbetweenthesearchspaceandsearchalgorithm,thetwokeycomponentsinAutoML.Furthermore,implementingacomplexsearchow,suchassearchingarchitectureswithinaloopofsearchinghardwarecongurations,isdifcult.Tosummarize,changingthesearchspace,searchalgorithm,orsearchowincurrentMLlibrariesusuallyrequiresasignicantchangeintheprogramlogic.Inthispaper,weintroduceanewwayofprogrammingAutoMLbasedonsymbolicprogramming.Underthisparadigm,MLprogramsaremutable,thuscanbemanipulatedeasilybyanotherprogram.Asaresult,AutoMLcanbereformulatedasanautomatedprocessofsymbolicmanipulation.Withthisformulation,wedecouplethetriangleofthesearchalgorithm,thesearchspaceandthechildprogram.Thisdecouplingmakesiteasytochangethesearchspaceandsearchalgorithm(withoutandwithweightsharing),aswellastoaddsearchcapabilitiestoexistingcodeandimplementcomplexsearchows.WethenintroducePyGlove,anewPythonlibrarythatimplementsthisparadigm.ThroughcasestudiesonImageNetandNAS-Bench-101,weshowthatwithPyGloveuserscaneasilyconvertastaticprogramintoasearchspace,quicklyiterateonthesearchspacesandsearchalgorithms,andcraftcomplexsearchowstoachievebetterresults.1IntroductionNeuralnetworksaresensitivetoarchitectureandhyper-parameterchoices[3,4].Forexample,ontheImageNetdataset[5],wehaveobservedalargeincreaseinaccuracythankstochangesinarchitectures,hyper-parameters,andtrainingalgorithms,fromtheseminalworkofAlexNet[5]torecentstate-of-the-artmodelssuchasEfcientNet[6].However,asneuralnetworksbecomeincreasinglycomplex,thepotentialnumberofarchitectureandhyper-parameterchoicesbecomesnumerous.Hand-craftingneuralnetworkarchitecturesandselectingtherighthyper-parametersis,therefore,increasinglydifcultandoftentakemonthsofexperimentation.AutomatedMachineLearning(AutoML)isapromisingparadigmfortacklingthisdifculty.InAutoML,selectingarchitecturesandhyper-parametersisformulatedasasearchproblem,whereasearchspaceisdenedtorepresentallpossiblechoicesandasearchalgorithmisusedtondthe WorkdoneasaresearchinternatGoogle.34thConferenceonNeuralInformationProcessingSystems(NeurIPS2020),Vancouver,Canada. bestchoices.Forhyper-parametersearch,thesearchspacewouldspecifytherangeofvaluestotry.Forarchitecturesearch,thesearchspacewouldspecifythearchitecturalcongurationstotry.Thesearchspaceplaysacriticalroleinthesuccessofneuralarchitecturesearch(NAS)[7,8],andcanbesignicantlydifferentfromoneapplicationtoanother[8–11].Inaddition,therearealsomanydifferentsearchalgorithms,suchasrandomsearch[12],Bayesianoptimization[13],RL-basedmethods[1,9,14,15],evolutionarymethods[16],gradient-basedmethods[2,10,17]andneuralpredictors[18].ThisproliferationofsearchspacesandsearchalgorithmsinAutoMLmakesitdifculttoprogramwithexistingsoftwarelibraries.Inparticular,acommonproblemofcurrentlibrariesisthatsearchspacesandsearchalgorithmsaretightlycoupled,makingithardtomodifysearchspaceorsearchalgorithmalone.Apracticalscenariothatarisesistheneedtoupgradeasearchalgorithmwhilekeepingtherestoftheinfrastructurethesame.Forexample,recentyearshaveseenatransitionfromAutoMLalgo-rithmsthattraineachmodelfromscratch[8,9]tothosethatemployweight-sharingtoattainmassiveefciencygains,suchasENASandDARTS[1,2,14,15,19].Yet,upgradinganexistingsearchspacebyintroducingweight-sharingrequiressignicantchangestoboththesearchalgorithmandthemodelbuildinglogic,aswewillseeinSection2.2.Suchcouplingbetweensearchspacesandsearchalgo-rithms,andtheresultinginexibility,imposeaheavyburdenonAutoMLresearchersandpractitioners.WebelievethatthemainchallengeliesintheprogrammingparadigmmismatchbetweenexistingsoftwarelibrariesandAutoML.Mostexistinglibrariesarebuiltonthepremiseofimmutableprograms,whereaxedprogramisusedtoprocessdifferentdata.Onthecontrary,AutoMLrequiresprograms(i.e.modelarchitectures)tobemutable,astheymustbedynamicallymodiedbyanotherprogram(i.e.thesearchalgorithm)whosejobistoexplorethesearchspace.Duetothismismatch,predenedinterfacesforsearchspacesandsearchalgorithmsstruggletoaccommodateunanticipatedinteractions,makingitdifculttotrynewAutoMLapproaches.Symbolicprogramming,whichoriginatedfromLISP[20],providesapotentialsolutiontothisproblem,byallowingaprogramtomanipulateitsowncomponentsasiftheywereplaindata[21].However,despiteitslonghistory,symbolicprogramminghasnotyetbeenwidelyexploredintheMLcommunity.Inthispaper,wereformulateAutoMLasanautomatedprocessofmanipulatingMLprogramssymbolically.Underthisformulation,programsaremutableobjectswhichcanbeclonedandmodiedaftertheircreation.Thesemutableobjectscanexpressstandardmachinelearningconcepts,fromaconvolutionalunittoacomplexuser-denedtrainingprocedure.Asaresult,allpartsofaMLprogramaremutable.Moreover,throughsymbolicprogramming,pr

2 ogramscanmodifyprograms.Thereforetheinte
ogramscanmodifyprograms.Thereforetheinteractionsbetweenthechildprogram,searchspace,andsearchalgorithmarenolongerstatic.Wecanmediatethemorchangethemviameta-programs.Forexample,wecanmapthesearchspaceintoanabstractviewwhichisunderstoodbythesearchalgorithm,translatinganarchitecturalsearchspaceintoasuper-networkthatcanbeoptimizedbyefcientNASalgorithms.Further,weproposePyGlove,alibrarythatenablesgeneralsymbolicprogramminginPython,asanimplementationofourmethodtestedonreal-worldAutoMLscenarios.WithPyGlove,PythonclassesandfunctionscanbemademutablethroughbriefPythonannotations,whichmakesitmucheasiertowriteAutoMLprograms.PyGloveallowsAutoMLtechniquestobeeasilydroppedintopreexistingMLpipelines,whilealsobenetingopen-endedresearchwhichrequiresextremeexibility.Tosummarize,ourcontributionsarethefollowing:WereformulateAutoMLunderthesymbolicprogrammingparadigm,greatlysimplifyingtheprogramminginterfaceforAutoMLbyaccommodatingunanticipatedinteractionsamongthechildprograms,searchspacesandsearchalgorithmsviaamutableobjectmodel.WeintroducePyGlove,ageneralsymbolicprogramminglibraryforPythonwhichim-plementsoursymbolicformulationofAutoML.WithPyGlove,AutoMLcanbeeasilydroppedintopreexistingMLprograms,withallprogrampartssearchable,permittingrapidexplorationondifferentdimensionsofAutoML.Throughcasestudies,wedemonstratetheexpressivenessofPyGloveinreal-worldsearchspaces.WedemonstratehowPyGloveallowsAutoMLresearchersandpractitionerstochangesearchspaces,searchalgorithmsandsearchowswithonlyafewlinesofcode.2SymbolicProgrammingforAutoMLManyAutoMLapproaches(e.g.,[2,9,22])canbeformulatedasthreeinteractingcomponents:thechildprogram,thesearchspace,andthesearchalgorithm.AutoML'sgoalistodiscoveraperformantchildprogram(e.g.,aneuralnetworkarchitectureoradataaugmentationpolicy)outofalargeset2 ofpossibilitiesdenedbythesearchspace.Thesearchalgorithmaccomplishesthesaidgoalbyiterativelysamplingchildprogramsfromthesearchspace.Eachsampledchildprogramisthenevaluated,resultinginanumericmeasureofitsquality.Thismeasureiscalledthereward2.Therewardisthenfedbacktothesearchalgorithmtoimprovefuturesamplingofchildprograms.IntypicalAutoMLlibraries[23–31],thesethreecomponentsareusuallytightlycoupled.Thecou-plingbetweenthesecomponentsmeansthatwecannotchangetheinteractionsbetweenthemunlessnon-trivialmodicationsaremade.Thislimitstheexibilityofthelibraries.Somesuccessfulat-temptshavebeenmadetobreakthesecouplings.Forexample,Vizier[26]decouplesthesearchspaceandthesearchalgorithmbyusingadictionaryasthesearchspacecontractbetweenthechildprogramandthesearchalgorithm,resultinginmodularblack-boxsearchalgorithms.AnotherexampleistheNNIlibrary[27],whichtriestounifysearchalgorithmswithandwithoutweightsharingbycarefullydesignedAPIs.Thispaper,however,solvesthecouplingprobleminadifferentandmoregeneralway:withsymbolicprogramming,programsareallowedtobemodiedbyotherprograms.Therefore,in-steadofsolvingxedcouplings,weallowdynamiccouplingsthroughamutableobjectmodel.Inthissection,wewillexplainourmethodandshowhowthismakesAutoMLprogrammingmoreexible.2.1AutoMLasanAutomatedSymbolicManipulationProcessAutoMLcanbeinterpretedasanautomatedprocessofsearchingforachildprogramfromasearchspacetomaximizeareward.Wedecomposethisprocessintoasequenceofsymbolicoperations.A(regular)childprogram(Figure1-a)issymbolizedintoasymbolicchildprogram(Figure1-b),whichcanbethenclonedandmodied.Thesymbolicprogramisfurtherhyperiedintoasearchspace(Figure1-c)byreplacingsomeofthexedpartswithto-be-determinedspecications.Duringthesearch,thesearchspaceismaterializedintodifferentchildprograms(Figure1-d)basedonsearchalgorithmdecisions,orcanberewrittenintoasuper-program(Figure1-e)toapplycomplexsearchalgorithmssuchasefcientNAS. Figure1:AutoMLasanautomatedsymbolicmanipulationprocess.AnanalogytothisprocessistohavearobotbuildahousewithLEGO[32]brickstomeetahumanbeing'staste:symbolizingaregularprogramislikeconvertingmoldedplasticpartsintoLEGObricks;hyperifyingasymbolicprogramintoasearchspaceislikeprovidingablueprintofthehousewithvariations.Withthehelpofthesearchalgorithm,thesearchspaceismaterializedintodifferentchildprogramswhoserewardsarefedbacktothesearchalgorithmtoimprovefuturesampling,likearobottryingdifferentwaystobuildthehouseandgraduallylearningwhathumansprefer. Figure2:Symbolizingclassesintomutablesym-bolictrees.Theirhyper-parametersarelikethestudsofLEGObricks,whiletheirimplementationsarelessinterestingwhilewemanipulatethetrees.Symbolization.A(regular)childprogramcanbedescribedasacomplexobject,whichisacomposi-tionofitssub-objects.Asymbolicchildprogramissuchacompositionwhosesub-objectsarenolongertiedtogetherforever,butaredetachablefromeachotherhencecanbereplacedbyothersub-objects.Thesymbolicobjectcanbehierarchical,formingasymbolictreewhichcanbemanipulatedorexecuted.Asymbolicobjectismanipulatedthroughitshyper-parameters,whicharelikethestudsofaLEGObrick,interfacingconnectionswithotherbricks.However,symbolicobjects,unlikeLEGObricks,canhaveinternalstateswhichareautomaticallyrecomputeduponmodications.For 2WhileweuseRLconceptstoillustratethecoreideaofourmethod,aswillbeshownlater,theproposedparadigmisapplicabletoothertypesofAutoMLmethodsaswell.3 V\PEROL]H K\SHULI\ S \ T ] PDWHULDOL]HUHZULWH T S ] [ \ D E  F G

3 H RU T S ] [ \ RQHR
H RU T S ] [ \ RQHRI #V\PEROL]H FODVV7UDLQHU REMHFW  GHI BBLQLWBB VHOI  RSWLPL]HUPRGHO  « #V\PEROL]H FODVV&RQY /D\HU  GHI BBLQLWBB VHOI ILOWHUV  NHUQHO  « GHIFDOO VHOILQSXW   T S RQHRI  � S$GDP H  T5063URS IORDWY HH @ $GDP H [ ] \ RQHRI  � [,GHQWLW\  \'HQVH   ]&RQY RQHRI � @   @ &RQY   T S [ ] \ T S ] [ \ 7UDLQHU 2QHVKRW 7UDLQHU RQHRI  � [,GHQWLW\  \'HQVH   ]&RQY  RQHRI � @   @ 6ZLWFK � ,GHQWLW\  'HQVH    0DVNHG&RQY' �@  @ UHZULWH T I ]   T S [ ] \ 6HDUFK $OJRULWKP S ] D E F G PDS PDWHULDOL]H [ \ ] T S  T S [ ] \ 6HDUFK $OJRULWKP D 6HDUFK $OJRULWKP S ] E G PDS ZLWK š  PDSZLWK š  T S [ \ ]   H   � � D E #V\PEROL]H FODVV 7UDLQHU REMHFW   GHI  BBLQLWBB VHOI  PRGHORSWLPL]HU  « #V\PEROL]H FODVV &RQY /D\HU   GHI  BBLQLWBB VHOI  ILOWHUV  NHUQHO  «  GHI FDOO VHOI LQSXW   example,whenwechangethedatasetofatrainer,thetrainstepswillberecomputedfromthenumberofexamplesinthedatasetifthetrainingisbasedonthenumberofepochs.Withsuchamutableobjectmodel,wenolongerneedtocreateobjectsfromscratchrepeatedly,ormodifytheproducersup-stream,butcancloneexistingobjectsandmodifythemintonewones.Thesymbolictreerepresentationputsanemphasisonmanipulatingtheobjectdenitions,whileleavingtheimplementationdetailsbehind.Figure2illustratesthesymbolizationprocess.2.2DisentanglingAutoMLthroughSymbolicProgramming Figure3:Hyperifyingachildprogramintoasearchspacebyreplacingxedpartswithto-be-determinedspecications. Figure4:Materializinga(concrete)childprogram(d)fromthesearchspace(a)withanabstractchildprogram(c)proposedfromthesearchalgorithm,whichholdsanabstractsearchspace(b)asthealgorithm'sviewforthe(concrete)searchspace.Disentanglingsearchspacesfromchildprograms.Thesearchspacecanbedisentangledfromthechildprograminthat1)theclassesandfunctionsofthechildprogramcanbeimplementedwithoutdependingonanyAutoMLlibrary(AppendixB.1.1),whichappliestomostpreexistingMLprojectswhoseprogramswerestartedwithouttakingAutoMLinmind;2)achildprogramcanbemanipulatedintoasearchspacewithoutmodifyingitsimplementation.Figure3showsthatachildprogramisturnedintoasearchspacebyreplacingaxedConvwithachoiceofIdentity,MaxPoolandConvwithsearchableltersize.Meanwhile,itswapsaxedAdamoptimizerwithachoicebetweentheAdamandanRMSPropwithasearchablelearningrate.Disentanglingsearchspacesfromsearchalgorithms.Symbolicprogrammingbreaksthecouplingbetweenthesearchspaceandthesearchalgorithmbypreventingthealgorithmfromseeingthefullsearchspacespecication.Instead,thealgorithmonlyseeswhatitneedstoseeforthepurposesofsearching.Werefertothealgorithm'sviewofthesearchspaceastheabstractsearchspace.Thefullspecication,incontrast,willbecalledtheconcretesearchspace(orjustthe“searchspace”outsidethissection).ThedistinctionbetweentheconcreteandabstractsearchspaceisillustratedinFigure4:theconcretesearchspaceactsasaboilerplateforproducingconcretechildprograms,whichholdsalltheprogramdetails(e.g.,thexedparts).How-ever,theabstractsearchspaceonlyseesthepartsthatneeddecisions,alongwiththeirnumericranges.Basedontheabstractsearchspace,anabstractchildprogramisproposed,whichcanbestaticnumericvaluesorvariables.Thestaticformisforobtainingaconcretechildprogram,showninFigure4,whilethevariableformisusedformakingasuper-programusedinefcientNAS–thevariablescanbeeitherdiscreteforRL-basedusecasesorreal-valuedvectorsforgradient-basedmethods.Mediatedbytheabstractsearchspaceandtheabstractchildprogram,thesearchalgorithmcanbethoroughlydecoupledfromthechildprogram.Figure5givesamoredetailedillustrationofFigure4. Figure5:Thepathfroma(concrete)searchspacetoa(concrete)childprogram.Thedisentanglementbetweenthesearchspaceandthesearchalgorithmisachievedby(1)abstractingthesearchspace,(2)proposinganabstractchildprogram,and(3)materializingtheabstractchildprogramintoaconcreteone.4 #V\PEROL]H FODVV7UDLQHU REMHFW  GHI BBLQLWBB VHOI

4   RSWLPL]
  RSWLPL]HUPRGHO  « #V\PEROL]H FODVV&RQY /D\HU  GHI BBLQLWBB VHOI ILOWHUV  NHUQHO  « GHIFDOO VHOILQSXW   T S RQHRI  � S$GDP H  T5063URS IORDWY HH @ $GDP H [ ] \ RQHRI  � [,GHQWLW\  \0D[3RRO   ]&RQY RQHRI � @   @ &RQY   T S [ ] \ T S ] [ \ 7UDLQHU 2QHVKRW 7UDLQHU RQHRI  � [,GHQWLW\  \0D[3RRO   ]&RQY  RQHRI � @   @ 6ZLWFK � ,GHQWLW\  0D[3RRO    0DVNHG&RQY' �@  @ UHZULWH T I ]   T S [ ] \ 6HDUFK $OJRULWKP S ] D E F G PDS PDWHULDOL]H [ \ ] T S  T S [ ] \ 6HDUFK $OJRULWKP D 6HDUFK $OJRULWKP S ] E G PDS ZLWK š  PDSZLWK š  T S [ \ ]   H   � � D E #V\PEROL]H FODVV 7UDLQHU REMHFW   GHI  BBLQLWBB VHOI  PRGHORSWLPL]HU  « #V\PEROL]H FODVV &RQY /D\HU   GHI  BBLQLWBB VHOI  ILOWHUV  NHUQHO  «  GHI FDOO VHOI LQSXW   #V\PEROL]H FODVV7UDLQHU REMHFW  GHI BBLQLWBB VHOI  RSWLPL]HUPRGHO  « #V\PEROL]H FODVV&RQY /D\HU  GHI BBLQLWBB VHOI ILOWHUV  NHUQHO  « GHIFDOO VHOILQSXW   T S RQHRI  � S$GDP H  T5063URS IORDWY HH @ $GDP H [ ] \ RQHRI  � [,GHQWLW\ \'HQVH  ]&RQY RQHRI � @   @ &RQY   T S [ ] \ T S ] [ \ 7UDLQHU 2QHVKRW 7UDLQHU RQHRI  � [,GHQWLW\ \'HQVH  ]&RQY  RQHRI � @   @ 6ZLWFK � ,GHQWLW\ 'HQVH   0DVNHG&RQY' �@  @ UHZULWH T I ]   T S [ ] \ 6HDUFK $OJRULWKP S ] D E F G PDS PDWHULDOL]H [ \ ] T S  T S [ ] \ 6HDUFK $OJRULWKP D 6HDUFK $OJRULWKP S ] E G PDS ZLWK š  PDSZLWK š  T S [ \ ]   H   � � D E #V\PEROL]H FODVV7UDLQHU REMHFW  GHI BBLQLWBB VHOI PRGHORSWLPL]HU  « #V\PEROL]H FODVV&RQY /D\HU  GHI BBLQLWBB VHOI ILOWHUV  NHUQHO  « GHIFDOO VHOILQSXW   7UDLQHU  PRGHO 6WDFNHG RS RQHRI � ,GHQWLW\  0D[3RRO   &RQY RQHRI � @   @ UHSHDWV   RSWLPL]HU RQHRI � $GDP H  5063URS IORDWY HH @  [ \ ] T S &KRLFH C ^` )ORDW C �HH@ &KRLFH C ^`  H  &KRLFH C ^`  � @ 7UDLQHU PRGHO 6WDFNHG RS &RQY     UHSHDWV   RSWLPL]HU 5063URS H 6HDUFK6SDFH $EVWUDFW6HDUFK6SDFH $EVWUDFW&KLOG3URJUDP &KLOG3URJUDP Fi

5 gure6:Rewritingasearchspace(a)intoasuper
gure6:Rewritingasearchspace(a)intoasuper-program(b)requiredbyTuNAS.Disentanglingsearchalgorithmsfromchildprograms.Whilemanysearchalgorithmscanbeimplementedbyrewritingsymbolicobjects,complexalgorithmssuchasENAS[1],DARTS[2]andTuNAS[15]canbedecom-posedinto1)achild-program-agnosticalgorithm,plus2)ameta-program(e.g.aPythonfunction)whichrewritesthesearchspaceintoarepresentationrequiredbythesearchal-gorithm.Themeta-programonlymanipulatesthesymbolswhichareinterestingtothesearchalgorithmandignorestherest.Inthisway,wecandecouplethesearchalgorithmfromthechildprogram.Forexample,theTuNAS[15]algorithmcanbedecomposedinto1)animplementationofRE-INFORCE[33]and2)arewritefunctionwhichtransformsthearchitecturesearchspaceintoasuper-network,andreplacestheregulartrainerwithatrainerthatsamplesandtrainsthesuper-network,illustratedinFigure6.IfwewanttoswitchthesearchalgorithmtoDARTS[2],weuseadifferentrewritefunctionthatgeneratesasuper-networkwithsoftchoices,andreplacethetrainerwithasuper-networktrainerthatupdatesthechoiceweightsbasedonthegradients.2.3SearchspacepartitioningandcomplexsearchowsEarlywork[19,34,35]showsthatfactorizedsearchcanhelppartitionthecomputationforoptimizingdifferentpartsoftheprogram.Yet,complexsearchowshavebeenlessexplored,possiblydueinparttotheirimplementationcomplexity.Theeffortinvolvedinpartitioningasearchspaceandcoordinatingthesearchalgorithmsisusuallynon-trivial.However,thesymbolictreerepresentationmakessearchspacepartitioningamucheasiertask:withapartitionfunction,wecandividethoseto-be-determinedpartsintodifferentgroupsandoptimizeeachgroupseparately.Asaresult,eachoptimizationprocessseesonlyaportionofthesearchspace–asub-space–andtheyworktogethertooptimizethecompletesearchspace.Section3.4discussescommonpatternsofsuchcollaborationandhowweexpresscomplexsearchows.3AutoMLwithPyGloveInthissection,weintroducePyGlove,ageneralsymbolicprogramminglibraryonPython,whichalsoimplementsourmethodforAutoML.Withexamples,wedemonstratehowaregularprogramismadesymbolicallyprogrammable,thenturnedintosearchspaces,searchedwithdifferentsearchalgorithmsandowsinadozenlinesofcode. Figure7:AregularPythonclassmadesymbolicallyprogrammableviathesymbolizedecorator(left),whoseobjectisasymbolictree(middle),inwhichallnodescanbesymbolicallyoperated(right).Forexample,wecan(i)retrievealltheLayerobjectsinthetreeviaquery,(ii)clonetheobjectand(iii)modifythecopybyswappingallConvlayerswithMaxPoollayersofthesamekernelsizeusingrebind.3.1SymbolizeaPythonprogramTable1:ThedevelopmentcostofdroppingPyGloveintoexistingprojectsondifferentMLframeworks.ThesourcecodeofMNISTisincludedinAppendixB.5.ProjectsOriginallinesofcodeModiedlinesofcode PyTorchResNet[36]35315 TensorFlowMNIST[37]12024 InPyGlove,preexistingPythonprogramscanbemadesymbolicallyprogrammablewithasym-bolizedecorator.Besidesclasses,functionscanbesymbolizedtoo,asdiscussedinAp-pendixB.1.2.Tofacilitatemanipulation,Py-Gloveprovidesawiderangeofsymbolicopera-tions.Amongthem,query,cloneandrebindareofspecialimportanceastheyarefoundationalto5 #V\PEROL]H FODVV7UDLQHU REMHFW  GHI BBLQLWBB VHOI  RSWLPL]HUPRGHO  « #V\PEROL]H FODVV&RQY /D\HU  GHI BBLQLWBB VHOI ILOWHUV  NHUQHO  « GHIFDOO VHOILQSXW   T S RQHRI  � S$GDP H  T5063URS IORDWY HH @ $GDP H [ ] \ RQHRI  � [,GHQWLW\  \0D[3RRO   ]&RQY RQHRI � @   @ &RQY   T S [ ] \ T S ] [ \ 7UDLQHU 6XSHUQHWZRUN 7UDLQHU RQHRI  � [,GHQWLW\  \0D[3RRO   ]&RQY  RQHRI � @   @ 6ZLWFK � ,GHQWLW\  0D[3RRO    0DVNHG&RQY' �@  @ UHZULWH T I ]   T S [ ] \ 6HDUFK $OJRULWKP S ] D E F G PDS PDWHULDOL]H [ \ ] T S  T S [ ] \ 6HDUFK $OJRULWKP D 6HDUFK $OJRULWKP S ] E G PDS ZLWK š  PDSZLWK š  T S [ \ ]   H   � � D E #V\PEROL]H FODVV 7UDLQHU REMHFW   GHI  BBLQLWBB VHOI  PRGHORSWLPL]HU  « #V\PEROL]H FODVV &RQY /D\HU   GHI  BBLQLWBB VHOI  ILOWHUV  NHUQHO  «  GHI FDOO VHOI LQSXW   GHI VZDS NY&RQY   LI 

6 LVLQVWDQFH Y&RQY
LVLQVWDQFH Y&RQY  UHWXUQ0D[3RRO YNHUQHO  UHWXUQ Y SULQW WUDLQHU TXHU\  ODPEGD Y LVLQVWDQFH Y/D\HU WUDLQHU FORQH  UHELQG VZDS 7UDLQHU PRGHO 5HV1HW/LNH EORFN 6HTXHQWLDO  SHUPXWDWH � &RQY  ILOWHUV RQHRI � @  NHUQHO   %DWFK1RUPDOL]DWLRQ  5H/8 @  QXPBEORFNV LQWY   RSWLPL]HU RQHRI � $GDP H  5063URS IORDWY HH @  #V\PEROL]H FODVV 7UDLQHU REMHFW   GHI BBLQLWBB  VHOI PRGHORSWLPL]HU    GHI WUDLQ VHOI  UHWXUQWUDLQHUBLPSO  VHOI RSWLPL]HU  VHOI PRGHO 7UDLQHU PRGHO 5HV1HW/LNH �EORFN 6HTXHQWLDO &RQY    %DWFK1RUPDOL]DWLRQ  5H/8  QXPBEORFNV   RSWLPL]HU $GDP H  6\PEROLFFODVV 6\PEROLFWUHHRIREMHFW 6\PEROLFRSHUDWLRQV IRU WUDLQHUIHHGEDFN LQ  VDPSOH  VHDUFKBVSDFH K\SHUBWUDLQHU  DOJRULWKP 332   SDUWLWLRQBIQ 1RQH  UHZDUG WUDLQHUWUDLQ  IHHGEDFN UHZDUG GHI UHOD[BILOWHUV NYSDUHQW   LI  LVLQVWDQFH SDUHQW&RQY   LI N  ILOWHUV  UHWXUQ RQHRI � YYY\r@  UHWXUQ Y K\SHUBWUDLQHU WUDLQHU FORQH  UHELQG UHOD[BILOWHUV #V\PEROL]H FODVV %ORFN REMHFW   GHI BBLQLWBB VHOI  ILOWHUV   VHOI ILOWHUV ILOWHUV  GHI BBFDOOBB VHOI   UHWXUQ �6HTXHQWLDO &RQY VHOIILOWHUV   &RQY VHOIILOWHUV\r   @ othersymbolicoperations.ExamplesoftheseoperationscanbefoundinAppendixB.2.Figure7shows(1)asymbolicPythonclass,(2)aninstanceoftheclassasasymbolictree,and(3)keysymbolicoperationswhichareapplicabletoasymbolicobject.ToconveytheamountofworkrequiredtodropPyGloveintoreal-lifeprojects,weshowthenumberoflinesofcodeinmakingaPyTorch[36]andaTensorFlow[37]projectssearchableinTable1.3.2Fromasymbolicprogramtoasearchspace Figure8:ThechildprogramfromFig-ure7-2isturnedintoasearchspace. Figure9:Expressingdependenthyper-parametersbyintroducingahigher-ordersymbolicBlockclass.Withachildprogrambeingasymbolictree,anynodeinthetreecanbereplacedwithato-be-determinedspecication,whichwecallhypervalue(incorrespondencetohyperify,averbintroducedinSection2.1inmakingsearchspaces).Asearchspaceisnaturallyrepresentedasasymbolictreewithhypervalues.InPyGlove,therearethreeclassesofhypervalues:1)acontinuousvaluedeclaredbyfloatv;2)adiscretevaluedeclaredbyintv;and3)acategoricalvaluedeclaredbyoneof,manyoforpermutate.Table2summarizesdifferenthypervalueclasseswiththeirsemantics.Figure8showsasearchspacethatjointlyoptimizesamodelandanoptimizer.Themodelspaceisanumberofblockswhosestructureisasequenceofpermutationfrom[Conv,BatchNormalization,ReLU]withsearchableltersize.Dependenthyper-parameterscanbeachievedbyusinghigher-ordersymbolicobjects.Forexample,ifwewanttosearchfortheltersofaConv,whichfollowsanotherConvwhoseltersaretwicetheinputlters,wecancreateasymbolicBlockclass,whichtakesonlyoneltersize–theoutputltersoftherstConv–asitshyper-parameters.Whenit'scalled,itreturnsasequenceof2Convunitsbasedonitslters,asshowninFigure9.Theltersoftheblockcanbeahypervalueatconstructiontime,appearingasanodeinthesymbolictree,butwillbematerializedwhenit'scalled.3.3SearchalgorithmsWithoutinteractingwiththechildprogramandthesearchspacedirectly,thesearchalgorithminPyGloverepeatedly1)proposesanabstractchildprogrambasedontheabstractsearchspaceand2)receivesmeasuredqualitiesfortheabstractchildprogramtoimprovefutureproposals.PyGloveimplementsmanysearchalgorithms,includingRandomSearch,PPOandRegularizedEvolution.Table2:Hypervalueclassesandtheirsemantics.StrategyHyper-parameterannotationSearchspacesemantics Continuousfloatv(min,max)AoatvaluefromR[min;max] Discreteintv(min,max)AnintvaluefromZ[min;max] Categoricaloneof(candidates)Choose1outofNcandidatesmanyof(K,candidates,)Ch

7 ooseKoutofNcandidateswithoptionalconstra
ooseKoutofNcandidateswithoptionalconstraintsontheuniquenessandorderofchosencandidatespermutate(candidates)Aspecialcaseofmanyofwhichsearchesforapermutationofallcandidates Hierarchical(whenacategoricalhypervaluecontainschildhypervalues)Conditionalsearchspace 3.4ExpressingsearchowsWithasearchspace,asearchalgorithm,andanoptionalsearchspacepartitionfunction,asearchowcanbeexpressedasafor-loop,illustratedinFigure10-left.Searchspacepartitioningenablesvariouswaysinoptimizingthedividedsub-spaces,resultinginthreebasicsearchtypes:1)optimizethesub-6 GHI VZDS NY&RQY   LI  LVLQVWDQFH Y&RQY  UHWXUQ'HQVH YILOWHUV  UHWXUQ Y SULQW WUDLQHU TXHU\  ODPEGD YLVLQVWDQFH Y/D\HU WUDLQHU FORQH  UHELQG VZDS 7UDLQHU PRGHO 5HV1HW/LNH EORFN 6HTXHQWLDO  SHUPXWDWH � &RQY  ILOWHUV RQHRI � @  NHUQHO   %DWFK1RUPDOL]DWLRQ  5H/8 @  QXPBEORFNV LQWY   RSWLPL]HU RQHRI � $GDP H  5063URS IORDWY HH @  #V\PEROL]H FODVV 7UDLQHU REMHFW   GHI BBLQLWBB  VHOI PRGHORSWLPL]HU    GHI WUDLQ VHOI  UHWXUQWUDLQHUBLPSO  VHOI RSWLPL]HU  VHOI PRGHO 7UDLQHU PRGHO 5HV1HW/LNH �EORFN 6HTXHQWLDO ,GHQWLW\  'HQVH   &RQY   @  QXPBEORFNV   RSWLPL]HU $GDP H  6\PEROLFFODVV 6\PEROLFWUHHRIREMHFW 6\PEROLFRSHUDWLRQV IRU WUDLQHUIHHGEDFN LQ  VDPSOH  VHDUFKBVSDFH K\SHUBWUDLQHU  DOJRULWKP 332   SDUWLWLRQBIQ 1RQH  UHZDUG WUDLQHUWUDLQ  IHHGEDFN UHZDUG GHI UHOD[BILOWHUV NYSDUHQW   LI LVLQVWDQFH SDUHQW&RQY   LI N  ILOWHUV  UHWXUQ RQHRI � YYY\r@  UHWXUQ Y K\SHUBWUDLQHU WUDLQHU FORQH  UHELQG UHOD[BILOWHUV #V\PEROL]H FODVV %ORFN REMHFW   GHI BBLQLWBB VHOI  ILOWHUV   VHOI ILOWHUV ILOWHUV  GHI BBFDOOBB VHOI   UHWXUQ �6HTXHQWLDO &RQY VHOIILOWHUV   &RQY VHOIILOWHUV\r   @ GHI VZDS NY&RQY   LI  LVLQVWDQFH Y&RQY  UHWXUQ0D[3RRO YNHUQHO  UHWXUQ Y SULQW WUDLQHU TXHU\  ODPEGD Y LVLQVWDQFH Y/D\HU WUDLQHU FORQH  UHELQG VZDS 7UDLQHU PRGHO 5HV1HW/LNH EORFN 6HTXHQWLDO  SHUPXWDWH � &RQY  ILOWHUV RQHRI � @  NHUQHO   %DWFK1RUPDOL]DWLRQ  5H/8 @  QXPBEORFNV LQWY   RSWLPL]HU RQHRI � $GDP H  5063URS IORDWY HH @  #V\PEROL]H FODVV 7UDLQHU REMHFW   GHI BBLQLWBB  VHOI PRGHORSWLPL]HU    GHI WUDLQ VHOI  UHWXUQWUDLQHUBLPSO  VHOI RSWLPL]HU  VHOI PRGHO 7UDLQHU PRGHO 5HV1HW/LNH �EORFN 6HTXHQWLDO &RQY    %DWFK1RUPDOL]DWLRQ  5H/8  QXPBEORFNV   RSWLPL]HU $GDP H  6\PEROLFFODVV 6\PEROLFWUHHRIREMHFW 6\PE

8 ROLFRSHUDWLRQV IRU 
ROLFRSHUDWLRQV IRU WUDLQHUIHHGEDFN LQ  VDPSOH  VHDUFKBVSDFH K\SHUBWUDLQHU  DOJRULWKP 332   SDUWLWLRQBIQ 1RQH  UHZDUG WUDLQHUWUDLQ  IHHGEDFN UHZDUG GHI UHOD[BILOWHUV NYSDUHQW   LI  LVLQVWDQFH SDUHQW&RQY   LI N  ILOWHUV  UHWXUQ RQHRI � YYY\r@  UHWXUQ Y K\SHUBWUDLQHU WUDLQHU FORQH  UHELQG UHOD[BILOWHUV #V\PEROL]H FODVV %ORFN REMHFW   GHI BBLQLWBB VHOI  ILOWHUV   VHOI ILOWHUV ILOWHUV  GHI BBFDOOBB VHOI   UHWXUQ �6HTXHQWLDO &RQY VHOIILOWHUV   &RQY VHOIILOWHUV\r  @ Searchtypefor-looppattern Jointfor(x;fx):::: Separatefor(x1;fx1)::::for(x2;fx2):::: Factorizedfor(x1;fx1):for(x2;fx2):::: Figure10:PyGloveexpressessearchasafor-loop(left).Complexsearchowscanbeexpressedascompositionsoffor-loops(right).spacesjointly;2)optimizethesub-spacesseparately;or3)factorizetheoptimization.Figure10-rightmapsthethreesearchtypesintodifferentcompositionsoffor-loop.Let'stakethesearchspacedenedinFigure8asanexample,whichhasahyper-parametersub-space(thehyperoptimizer)andanarchitecturalsub-space(thehypermodel).Towardsthetwosub-spaces,wecan1)jointlyoptimizethemwithoutspecifyingapartitionfunction,asisshowninFigure10-left;2)separatelyoptimizethem,bysearchingthehyperoptimizerrstwithaxedmodel,thenusethebestoptimizerfoundtooptimizethehypermodel;or3)factorizetheoptimization,bysearchingthehyperoptimizerwithapartitionfunctionintheouterloop.Eachexampleintheloopisatrainerwithaxedoptimizerandahypermodel;thelatterwillbeoptimizedintheinnerloop.Thecombinationofthesebasicpatternscanexpressverycomplexsearchows,whichwillbefurtherstudiedthroughourNAS-Bench-101experimentsdiscussedinSection4.3.3.5Switchingbetweensearchspaces Figure11:ManipulatingthemodelinatrainerintoasearchspacebyrelaxingthexedltersoftheConvasasetofoptions.MakingchangestothesearchspaceisadailyroutineforAu-toMLpractitioners,whomaymovefromonesearchspacetoanother,ortocombineorthogonalsearchspacesintomorecomplexones.Forexample,wemaystartbysearchingfordifferentoperationsateachlayer,thentrytheideaofsearch-ingfordifferentoutputlters(Figure11),andeventuallyendupwithsearchingforboth.WeshowcasesuchsearchspaceexplorationinSection4.2.3.6SwitchingbetweensearchalgorithmsThesearchalgorithmisanotherdimensiontoexperimentwith.WecaneasilyswitchbetweensearchalgorithmsbypassingadifferentalgorithmtothesamplefunctionshowninFigure10-1.WhenapplyingefcientNASalgorithms,thehyper_trainerwillberewrittenintoatrainerthatsamplesandtrainsthesuper-networktransformedfromthearchitecturalsearchspace.4CaseStudy Figure12:Partialsearchspacede-nitionforNAS-Bench-101(top),NAS-FPN(middle)andTuNAS(bottom).Inthissection,wedemonstratethatwithPyGlovehowuserscandenecomplexsearchspaces,explorenewsearchspaces,searchalgorithms,andsearchowswithsimplicity.4.1ExpressingcomplexsearchspacesThecompositionofhypervaluescanrepresentcomplexsearchspaces.WehavereproducedpopularNASpapers,includingNAS-Bench-101[38],MNASNet[8],NAS-FPN[39],Prox-ylessNAS[14],TuNAS[15],andNATS-Bench[40].HereweusethesearchspacesfromNAS-Bench-101,NAS-FPN,andTuNAStodemonstratetheexpressivenessofPyGlove.IntheNAS-Bench-101searchspace(Figure12-top),thereareNdifferentpositionsinthenetworkand�N2=N(N�1) 2edgepositionsthatcanbeindependentlyturnedonoroff.EachnodeindependentlyselectsoneofKpossibleoperations.7 #V\PEROL]H FODVV7UDLQHU REMHFW GHIBBLQLWBB 6HOIPRGHORSWLPL]HU   GHIWUDLQ VHOI  UHWXUQWUDLQHUBLPSO VHOIRSWLPL]HU VHOIPRGHO 7UDLQHU PRGHO 5HV1HW/LNH �EORFN 6HTXHQWLDO ,GHQWLW\  'HQVH   &RQY   @  QXPBEORFNV   RSWLPL]HU $GDP H  GHIVZDS NY&RQY  LILVLQVWDQFH Y&RQY  UHWXUQ'HQVH YILOWHUV UHWXUQY SULQW WUDLQHU TXHU\ ODPEGDYLVLQVWDQFH Y/D\HU WUDLQHU FORQH  UHELQG VZDS 6\PEROLFFODVV 6\PEROLFWUHHRIREMHFW 6\PEROLFRSHUDWLRQV 7UDLQHU PRGHO 5HV1HW/LNH EORFN 6HTXHQWLDO  SHUPXWDWH � &RQY  ILOWHUV

9 RQHRI � @ 
RQHRI � @  NHUQHO  %DWFK1RUPDOL]DWLRQ  5H/8 @  QXPBEORFNV LQWY   RSWLPL]HU RQHRI � $GDP H  5063URS IORDWY HH @  IRU WUDLQHUIHHGEDFN LQ  VDPSOH  VHDUFKBVSDFH K\SHUBWUDLQHU  DOJRULWKP 332   SDUWLWLRQBIQ 1RQH  UHZDUG WUDLQHUWUDLQ  IHHGEDFN UHZDUG GHI UHOD[BILOWHUV NYSDUHQW   LI LVLQVWDQFH SDUHQW&RQY   LI N ´ILOWHUVµ UHWXUQ RQHRI � YYY\r@  UHWXUQ Y K\SHUBWUDLQHU WUDLQHU FORQH  UHELQG UHOD[BILOWHUV #V\PEROL]H GHIEORFN ILOWHUV  �UHWXUQ6HTXHQWLDO &RQY ILOWHUV    &RQY ILOWHUV\r   @ GHI VZDS NY&RQY   LI  LVLQVWDQFH Y&RQY  UHWXUQ'HQVH YILOWHUV  UHWXUQ Y SULQW WUDLQHU TXHU\  ODPEGD Y LVLQVWDQFH Y/D\HU WUDLQHU FORQH  UHELQG VZDS 7UDLQHU PRGHO 5HV1HW/LNH EORFN 6HTXHQWLDO  SHUPXWDWH � &RQY  ILOWHUV RQHRI � @  NHUQHO   %DWFK1RUPDOL]DWLRQ  5H/8 @  QXPBEORFNV LQWY   RSWLPL]HU RQHRI � $GDP H  5063URS IORDWY HH @  #V\PEROL]H FODVV 7UDLQHU REMHFW   GHI BBLQLWBB  VHOI PRGHORSWLPL]HU    GHI WUDLQ VHOI  UHWXUQWUDLQHUBLPSO  VHOI RSWLPL]HU  VHOI PRGHO 7UDLQHU PRGHO 5HV1HW/LNH �EORFN 6HTXHQWLDO ,GHQWLW\  'HQVH   &RQY   @  QXPBEORFNV   RSWLPL]HU $GDP H  6\PEROLFFODVV 6\PEROLFWUHHRIREMHFW 6\PEROLFRSHUDWLRQV IRU WUDLQHUIHHGEDFN LQ  VDPSOH  VHDUFKBVSDFH K\SHUBWUDLQHU  DOJRULWKP 332   SDUWLWLRQBIQ 1RQH  UHZDUG WUDLQHUWUDLQ  IHHGEDFN UHZDUG GHI UHOD[BILOWHUV NYSDUHQW   LI  LVLQVWDQFH SDUHQW&RQY   LI N  ILOWHUV  UHWXUQ RQHRI � YYY\r@  UHWXUQ Y K\SHUBWUDLQHU WUDLQHU FORQH  UHELQG UHOD[BILOWHUV #V\PEROL]H FODVV %ORFN REMHFW   GHI BBLQLWBB VHOI  ILOWHUV   VHOI ILOWHUV ILOWHUV  GHI BBFDOOBB VHOI   UHWXUQ �6HTXHQWLDO &RQY VHOIILOWHUV   &RQY VHOIILOWHUV\r   @ 1$6%HQFK 0RGHO6SHF �QRGHV RQHRI UDQJH . @\r1 �HGJHV RQHRI � @ @\r1\r 1  1$6)31 )SQ1RGH W\SH RQHRI � VXP  DWWHQWLRQ @  OHYHO  LQSXWBRIIVHWV PDQ\RI UDQJH 180B35(B12'(6   GLVWLQFW 7UXH  VRUWHG 7UXH 7X1$6 5HVLGXDO RQHRI � ,QYHUWHG%RWWOHQHFN ILOWHUV RQHRI � @  NHUQHO RQHRI � @  H[SDQVLRQ RQHRI � @  =HUR @ TheNAS-FPNsearchspaceisarepeatedFPNcell,eachofwhosenodes(Figure

10 12-middle)ag-gregatestwooutputsofpreviou
12-middle)ag-gregatestwooutputsofpreviousnodes.Theaggregationiseithersumorglobalattention.Weusemanyofwiththeconstraintsdistinctandsortedtoselectinputnodeswithoutduplication.TheTuNASsearchspaceisastackofblocks,eachcontaininganumberofresiduallayers(Figure12-bottom)ofinvertedbottleneckunits,whoseltersize,kernelsizeandexpansionfactorwillbetuned.Tosearchthenumberoflayersinablock,weputZerosasacandidateintheResiduallayersotheresiduallayermaydowngradeintoanidentitymapping.4.2ExploringsearchspacesandsearchalgorithmsWeuseMobileNetV2[41]asanexampletodemonstratehowtoexplorenewsearchspacesandsearchalgorithms.Forafaircomparison,werstretraintheMobileNetV2modelonImageNettoobtainabaseline.Withourtrainingsetup,itachievesavalidationaccuracyof73.1%(Table3,row1)comparedwith72.0%intheoriginalMobileNetV2paper.Detailsaboutourexperimentsetup,searchspacedenitions,andthecodeforcreatingsearchspacescanbefoundinAppendixC.1.Searchspaceexploration:SimilartopreviousAutoMLworks[8,14],weexplore3searchspacesderivedfromMobileNetV2thattunethehyper-parametersoftheinvertedbottleneckunits[41]:(1)SearchspaceS1tunesthekernelsizeandexpansionratio.(2)SearchspaceS2tunestheoutputlters(3)SearchspaceS3combinesS1andS2totunethekernelsize,expansionratioandoutputlters.FromTable3,wecanseethatwithPyGlovewewereabletoconvertMobileNetV2intoS1with23linesofcode(row2)andS2with10linesofcode(row5).FromS1andS2,weobtainS3injustasinglelineofcode(row6)usingrebindwithchainingthetransformfunctionsfromS1andS2.Searchalgorithmexploration:Onthesearchalgorithmdimension,westartbyexploringdifferentsearchalgorithmsonS1usingblack-boxsearchalgorithms(RandomSearch[12],Bayesian[26])andthenefcientNAS(TuNAS[15]).Tomakemodelsizescomparable,weconstrainthesearchto300Mmultiply-adds3usingTuNAS'sabsoluterewardfunction[15].Toswitchbetweenthesealgorithms,weonlyhadtochange1lineofcode.Table3:ProgrammingcostofswitchingbetweenthreesearchspacesandthreeAutoMLalgorithmsbasedonPyGlove.Linesofcodeinredisthecostincreatingnewsearchspaces,whilethelinesofcodeinblackisthecostforswitchingalgorithms.TheunitcostforsearchandtrainingisdenedastheTPUhourstotrainaMobileNetV2modelonImageNetfor360epochs.ThetestaccuraciesandMAddsarebasedon3runs.#SearchspaceSearchalgorithmLinesofcodesSearchcostTraincostTestaccuracy#ofMAdds 1(static)N/AN/AN/A173.10.1300M 2(static)!S1RS+2325173.70.3("0.6)3003M3S1RS!Bayesian+125173.90.3("0.8)3015M4S1Bayesian!TuNAS+11174.20.1("1.1)3015M 5(static)!S2TuNAS+101173.30.1("0.2)3027M 6S1;S2!S3TuNAS+12173.80.1("0.7)3026M 4.3ExploringcomplexsearchowsonNAS-Bench-101 Figure13:MeanandstandarddeviationofsearchperformanceswithdifferentsearchowsonNAS-Bench-101(500runs),usingRegularizedEvolution[16].PyGlovecangreatlyreducetheengineeringcostwhenex-ploringcomplexsearchows.Inthissection,weexplorevariouswaystooptimizetheNAS-Bench-101searchspace.NAS-Bench-101isaNASbenchmarkwherethegoalistondhigh-performingimageclassiersinasearchspaceofneuralnetworkarchitectures.Thissearchspacerequiresop-timizingboththetypesofneuralnetworklayersusedinthemodel(e.g.,3x3Conv)andhowthelayersareconnected.Weexperimentwiththreesearchowsinthisexploration:1)wereproducetheoriginalpapertoestablishabaseline,whichusesthesearchspacedenedinFigure12-toptojointlyoptimizethenodesandedges.2)wetryafactorizedsearch,whichoptimizesthenodesintheouterloopandtheedgesintheinnerloop–therewardforanodesettingiscomputedastheaverage 3ForRSandBayesian,weuserejectionsamplingtoensuresampledarchitectureshavearound300MMAdds.8 oftop5rewardsfromthearchitecturessampledintheinnerloop.Whileitsperformanceisnotasgoodasthebaselineunderthesamesearchbudget,wesuspectthatundereachxednodesetting,theedgespaceisnotexploredenough.3)Toalleviatethisproblem,wecomeoutahybridsolution,whichusesthersthalfofthebudgettooptimizethenodesasinsearchow2,whileusingtheotherhalftooptimizetheedges,basedonthebestnodesettingfoundintherstphase.Interestingly,thesearchtrajectorycrossesoverthebaselineinthesecondphase,endedwithanoticeablemargin(Figure13).WeusedRegularizedEvolution[16]forallthesesearches,eachwith500runs.Ittakesonly15linesofcodetoimplementthefactorizedsearchand26linesofcodetoimplementthehybridsearch.SourcecodesareincludedinAppendixC.2.5RelatedWorkSoftwareframeworkshavegreatlyinuencedandfueledtheadvancementofmachinelearning.Theneedforcomputinggradientshasmadeauto-gradientbasedframeworks[36,37,42–45]ourish.Tosupportmodularmachinelearningprogramswiththeexibilitytomodifythem,frameworkswereintroducedwithanemphasisonhyper-parametermanagement[46,47].Thesensitivityofmachinelearningtohyper-parametersandmodelarchitecturehasledtotheadventofAutoMLlibraries[23–31].Some(e.g.,[23–25])formulateAutoMLasaproblemofjointlyoptimizingarchitecturesandhyper-parameters.Others(e.g.,[26–28])focusonprovidinginterfacesforblack-boxoptimization.Inparticular,Google'sVizierlibrary[26]providestoolsforoptimizingauser-speciedsearchspaceusingblack-boxalgorithms[12,48],butmakestheenduserresponsiblefortranslatingapointinthesearchspaceintoauserprogram.DeepArchitect[29]proposesalanguagetocreateasearchspaceasaprogramthatconnectsusercomponents.Keras-tuner[30]employsadifferentwaytoannotateamodelintoasearchspace,thoughthisannotationislimitedtoalistofsupportedcomponents.Optuna[49]embraceseagerevaluationoftunableparamete

11 rs,makingiteasytodeclareasearchspaceonth
rs,makingiteasytodeclareasearchspaceonthego(AppendixB.4).Meanwhile,efcientNASalgorithms[1,2,14]broughtnewchallengestoAutoMLframeworks,whichrequirecouplingbetweenthecontrollerandchildprogram.AutoGluon[28]andNNI[27]partiallysolvethisproblembybuildingpredenedmodulesthatworkinbothgeneralsearchmodeandweight-sharingmode,however,supportingdifferentefcientNASalgorithmsarestillnon-trivial.AmongtheexistingAutoMLsystemsweareawareof,complexsearchowsarelessexplored.Comparedtoexistingsystems,PyGloveemploysamutableprogrammingmodeltosolvetheseproblems,makingAutoMLeasilyaccessibletopreexistingMLprograms.Italsoaccommodatesthedynamicinteractionsamongthechildprograms,searchspaces,searchalgorithms,andsearchowstoprovidetheexibilityneededforfutureAutoMLresearch.Symbolicprogramming,whereaprogrammanipulatessymbolicrepresentations,hasalonghistorydatingbacktoLISP[20].Thesymbolicrepresentationcanbeprogramsasinmeta-programming,rulesasinlogicprogramming[50]andmathexpressionsasinsymboliccomputation[51,52].Inthiswork,weintroducethesymbolicprogrammingparadigmtoAutoMLbymanipulatingasymbolictree-basedrepresentationthatencodesthekeyelementsofamachinelearningprogram.Suchprogrammanipulationisalsoreminiscentofprogramsynthesis[53–55],whichsearchesforprogramstosolvedifferenttaskslikestringandnumbermanipulation[56–59],questionanswering[60,61],andlearningtasks[62,63].Ourmethodalsosharessimilaritieswithpriorworksinnon-deterministicprogramming[64–66],whichdenenon-deterministicoperatorslikechoiceintheprogrammingenvironmentthatcanbeconnectedtooptimizationalgorithms.Lastbutnotleast,ourworkechostheideaofbuildingrobustsoftwaresystemsthatcancopewithunanticipatedrequirementsviaadvancedsymbolicprogramming[67].6ConclusionInthispaper,wereformulateAutoMLasanautomatedprocessofmanipulatingaMLprogramthroughsymbolicprogramming.Underthisformulation,thecomplexinteractionsbetweenthechildprogram,thesearchspace,andthesearchalgorithmareelegantlydisentangled.Complexsearchowscanbeexpressedascompositionsoffor-loops,greatlysimplifyingtheprogramminginterfaceofAutoMLwithoutsacricingexibility.ThisisachievedbyresolvingtheconictbetweenAutoML'sintrinsicrequirementinmodifyingprogramsandtheimmutable-programpremiseofexistingsoftwarelibraries.WethenintroducePyGlove,ageneral-purposesymbolicprogramminglibraryforPythonwhichimplementsourmethodandistestedonreal-worldAutoMLscenarios.WithPyGlove,AutoMLcanbeeasilydroppedintopreexistingMLprograms,withallprogrampartssearchable,permittingrapidexplorationofdifferentdimensionsofAutoML.9 BroaderImpactSymbolicprogramming/PyGlovemakesAutoMLmoreaccessibletomachinelearningpractitioners,whichmeansmanualtrial-and-errorofmanycategoriescanbereplacedbymachines.ThiscanalsogreatlyincreasetheproductivityofAutoMLresearch,atthecostofincreasingdemandforcomputation,and–aresult–increasingCO2emissions.Weseeabigpotentialinsymbolicprogramming/PyGloveinmakingmachinelearningresearchersmoreproductive.Onanewgroundofmutableprograms,experimentscanbereproducedmoreeasily,modiedwithlowercost,andsharedlikedata.Alargevarietyofexperimentscanco-existinasharedcodebasethatmakescombiningandcomparingdifferenttechniquesmoreconvenient.Symbolicprogramming/PyGlovemakesitmucheasiertodevelopsearch-basedprogramswhichcanbeusedinabroadspectrumofresearchandproductareas.Somepotentialareas,suchasmedicinedesign,haveaclearsocietalbenet,whileotherspotentialapplications,suchasvideosurveillance,couldimprovesecuritywhileraisingnewprivacyconcerns.AcknowledgmentsandDisclosureofFundingWewouldliketothankPieter-JanKindermansandDavidDohanfortheirhelpinpreparingthecasestudysectionofthispaper;JiquanNgiam,RishabhSinghfortheirfeedbacktotheearlyversionsofthepaper;RuomingPang,VijayVasudevan,DaHuang,MingCheng,YanpingHuang,JieYang,JinsongMufortheirfeedbackatearlystageofPyGlove;AdamsYu,DanielPark,GolnazGhiasi,AzadeNazi,ThangLuong,BarretZoph,DavidSo,DanielDeFreitasAdiwardana,JunyangShen,LavRai,GuanhangWu,VishyTirumalashetty,PengchongJin,XianzhiDu,YeqingLi,XiaodanSong,AbhanshuSharma,CongLi,MeiChen,AleksandraFaust,YingjieMiao,JDCo-Reyes,KevinWu,YanqiZhang,BerkinAkin,AmirYazdanbakhsh,ShuyangCheng,HyoukJoongLee,PeishengLiandBarbaraWangforbeingearlyadoptersofPyGloveandtheirinvaluablefeedback.Fundingdisclosure:Thisworkwasdoneasapartoftheauthors'full-timejobinGoogle.References[1]HieuPham,MelodyYGuan,BarretZoph,QuocVLe,andJeffDean.Efcientneuralarchitecturesearchviaparametersharing.InTheInternationalConferenceonMachineLearning(ICML),pages4092–4101,2018.[2]HanxiaoLiu,KarenSimonyan,andYimingYang.DARTS:Differentiablearchitecturesearch.InInternationalConferenceonLearningRepresentations(ICLR),2019.[3]GáborMelis,ChrisDyer,andPhilBlunsom.Onthestateoftheartofevaluationinneurallanguagemodels.InInternationalConferenceonLearningRepresentations(ICLR),2018.[4]AlfredoCanziani,AdamPaszke,andEugenioCulurciello.Ananalysisofdeepneuralnetworkmodelsforpracticalapplications.arXivpreprintarXiv:1605.07678,2016.[5]AlexKrizhevsky,IlyaSutskever,andGeoffreyEHinton.Imagenetclassicationwithdeepconvolutionalneuralnetworks.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages1097–1105,2012.[6]MingxingTanandQuocLe.EfcientNet:Rethinkingmodelscalingforconvolutionalneuralnetworks.InTheInternationalConferenceonMachineLearning(ICML),pages6105–6114,2019.[7]BarretZoph,VijayVasudevan,Jon

12 athonShlens,andQuocVLe.Learningtransfera
athonShlens,andQuocVLe.Learningtransferablearchitecturesforscalableimagerecognition.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition(CVPR),pages8697–8710,2018.[8]MingxingTan,BoChen,RuomingPang,VijayVasudevan,MarkSandler,AndrewHoward,andQuocVLe.MNASNet:Platform-awareneuralarchitecturesearchformobile.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition(CVPR),pages2820–2828,2019.[9]BarretZophandQuocVLe.Neuralarchitecturesearchwithreinforcementlearning.InInternationalConferenceonLearningRepresentations(ICLR),2017.[10]BichenWu,XiaoliangDai,PeizhaoZhang,YanghanWang,FeiSun,YimingWu,YuandongTian,PeterVajda,YangqingJia,andKurtKeutzer.FbNet:Hardware-awareefcientconvnetdesignviadifferentiableneuralarchitecturesearch.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition(CVPR),pages10734–10742,2019.10 [11]XiaoliangDai,PeizhaoZhang,BichenWu,HongxuYin,FeiSun,YanghanWang,MaratDukhan,YunqingHu,YimingWu,YangqingJia,etal.ChamNet:Towardsefcientnetworkdesignthroughplatform-awaremodeladaptation.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition,pages11398–11407,2019.[12]JamesBergstraandYoshuaBengio.Randomsearchforhyper-parameteroptimization.TheJournalofMachineLearningResearch(JMLR),13(Feb):281–305,2012.[13]JasperSnoek,HugoLarochelle,andRyanPAdams.Practicalbayesianoptimizationofmachinelearningalgorithms.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages2951–2959,2012.[14]HanCai,LigengZhu,andSongHan.ProxylessNAS:Directneuralarchitecturesearchontargettaskandhardware.InInternationalConferenceonLearningRepresentations(ICLR),2019.[15]GabrielBender,HanxiaoLiu,BoChen,GraceChu,ShuyangCheng,Pieter-JanKindermans,andQuocLe.Canweightsharingoutperformrandomarchitecturesearch?aninvestigationwithTuNAS.InTheIEEEConferenceonComputerVisionandPatternRecognition,2020.[16]EstebanReal,AlokAggarwal,YanpingHuang,andQuocVLe.Regularizedevolutionforimageclassierarchitecturesearch.InAAAIConferenceonArticialIntelligence(AAAI),pages4780–4789,2019.[17]SiruiXie,HehuiZheng,ChunxiaoLiu,andLiangLin.Snas:Stochasticneuralarchitecturesearch,2020.[18]WeiWen,HanxiaoLiu,HaiLi,YiranChen,GabrielBender,andPieter-JanKindermans.Neuralpredictorforneuralarchitecturesearch.InProceedingsoftheEuropeanConferenceonComputerVision(ECCV),2020.[19]XuanyiDong,MingxingTan,AdamsWeiYu,DaiyiPeng,BogdanGabrys,andQuocVLe.AutoHAS:Efcienthyperparameterandarchitecturesearch.arXivpreprintarXiv:2006.03656,2020.[20]JohnMcCarthy.Recursivefunctionsofsymbolicexpressionsandtheircomputationbymachine,parti.CommunicationsoftheACM,3(4):184–195,1960.[21]Symbolicprogramming.https://en.wikipedia.org/wiki/Symbolic_programming.[22]EstebanReal,SherryMoore,AndrewSelle,SaurabhSaxena,YutakaLeonSuematsu,JieTan,QuocVLe,andAlexeyKurakin.Large-scaleevolutionofimageclassiers.InTheInternationalConferenceonMachineLearning(ICML),pages2902–2911,2017.[23]MatthiasFeurer,AaronKlein,KatharinaEggensperger,JostSpringenberg,ManuelBlum,andFrankHutter.Efcientandrobustautomatedmachinelearning.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages2962–2970,2015.[24]LarsKotthoff,ChrisThornton,HolgerHHoos,FrankHutter,andKevinLeyton-Brown.Auto-weka2.0:Automaticmodelselectionandhyperparameteroptimizationinweka.TheJournalofMachineLearningResearch(JMLR),18(1):826–830,2017.[25]MatthiasFeurer,AaronKlein,KatharinaEggensperger,JostTobiasSpringenberg,ManuelBlum,andFrankHutter.Auto-sklearn:efcientandrobustautomatedmachinelearning.InAutomatedMachineLearning,pages113–134.Springer,2019.[26]DanielGolovin,BenjaminSolnik,SubhodeepMoitra,GregKochanski,JohnKarro,andDSculley.Googlevizier:Aserviceforblack-boxoptimization.InTheSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,pages1487–1495,2017.[27]Neuralnetworkintelligence.https://github.com/microsoft/nni.[28]NickErickson,JonasMueller,AlexanderShirkov,HangZhang,PedroLarroy,MuLi,andAlexanderSmola.Autogluon-tabular:Robustandaccurateautomlforstructureddata.arXivpreprintarXiv:2003.06505,2020.[29]RenatoNegrinho,DarshanPatil,NghiaLe,DanielFerreira,MatthewGormley,andGeoffreyGordon.Towardsmodularandprogrammablearchitecturesearch.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages13715–13725,2019.[30]Kerastuner.https://github.com/keras-team/keras-tuner.[31]HaifengJin,QingquanSong,andXiaHu.Auto-keras:Anefcientneuralarchitecturesearchsystem.InTheSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,pages1946–1956,2019.[32]TheLegoGroup.Lego.https://en.wikipedia.org/wiki/Lego.[33]RonaldJWilliams.Simplestatisticalgradient-followingalgorithmsforconnectionistreinforcementlearning.Machinelearning,8(3-4):229–256,1992.11 [34]JeshuaBratman,SatinderSingh,JonathanSorg,andRichardLewis.Strongmitigation:Nestingsearchforgoodpolicieswithinsearchforgoodreward.InProceedingsofthe11thInternationalConferenceonAutonomousAgentsandMultiagentSystems-Volume1,pages407–414.InternationalFoundationforAutonomousAgentsandMultiagentSystems,2012.[35]ArberZela,AaronKlein,StefanFalkner,andFrankHutter.Towardsautomateddeeplearning:Efcientjointneuralarchitectureandhyperparametersearch.CoRR,abs/1807.06906,2018.[36]AdamPaszke,SamGross,FranciscoMassa,AdamLerer,JamesBradbury,GregoryChanan,TrevorKilleen,ZemingLin,NataliaGimelshein,LucaAn

13 tiga,etal.PyTorch:Animperativestyle,high
tiga,etal.PyTorch:Animperativestyle,high-performancedeeplearninglibrary.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages8024–8035,2019.[37]MartínAbadi,PaulBarham,JianminChen,ZhifengChen,AndyDavis,JeffreyDean,MatthieuDevin,SanjayGhemawat,GeoffreyIrving,MichaelIsard,etal.Tensorow:Asystemforlarge-scalemachinelearning.InThefUSENIXgSymposiumonOperatingSystemsDesignandImplementation(fOSDIg16),pages265–283,2016.[38]ChrisYing,AaronKlein,EricChristiansen,EstebanReal,KevinMurphy,andFrankHutter.Nas-bench-101:Towardsreproducibleneuralarchitecturesearch.InTheInternationalConferenceonMachineLearning(ICML),pages7105–7114,2019.[39]GolnazGhiasi,Tsung-YiLin,RuomingPang,andQuocV.Le.NAS-FPN:learningscalablefeaturepyramidarchitectureforobjectdetection.CoRR,abs/1904.07392,2019.[40]XuanyiDong,LuLiu,KatarzynaMusial,andBogdanGabrys.NATS-Bench:Benchmarkingnasalgorithmsforarchitecturetopologyandsize.arXivpreprintarXiv:2009.00437,2020.[41]MarkSandler,AndrewHoward,MenglongZhu,AndreyZhmoginov,andLiang-ChiehChen.Mobilenetv2:Invertedresidualsandlinearbottlenecks.InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages4510–4520,2018.[42]JamesBergstra,OlivierBreuleux,FrédéricBastien,PascalLamblin,RazvanPascanu,GuillaumeDes-jardins,JosephTurian,DavidWarde-Farley,andYoshuaBengio.Theano:acpuandgpumathexpressioncompiler.InProceedingsofthePythonforscienticcomputingconference(SciPy),pages18–24,2010.[43]YangqingJia,EvanShelhamer,JeffDonahue,SergeyKarayev,JonathanLong,RossGirshick,SergioGuadarrama,andTrevorDarrell.Caffe:Convolutionalarchitectureforfastfeatureembedding.InTheACMInternationalConferenceonMultimedia(ACMMM),pages675–678,2014.[44]SeiyaTokui.Chainer:Apowerful,exibleandintuitiveframeworkofneuralnetworks,2018.[45]RoyFrostig,MatthewJohnson,andChrisLeary.Compilingmachinelearningprogramsviahigh-leveltracing.InConferenceonMachineLearningandSystems(MLSys),2018.[46]JonathanShen,PatrickNguyen,YonghuiWu,ZhifengChen,MiaXChen,YeJia,AnjuliKannan,TaraSainath,YuanCao,Chung-ChengChiu,etal.Lingvo:amodularandscalableframeworkforsequence-to-sequencemodeling.arXivpreprintarXiv:1902.08295,2019.[47]Gin-cong.https://github.com/google/gin-cong.[48]MaxJaderberg,ValentinDalibard,SimonOsindero,WojciechMCzarnecki,JeffDonahue,AliRazavi,OriolVinyals,TimGreen,IainDunning,KarenSimonyan,etal.Populationbasedtrainingofneuralnetworks.arXivpreprintarXiv:1711.09846,2017.[49]TakuyaAkiba,ShotaroSano,ToshihikoYanase,TakeruOhta,andMasanoriKoyama.Optuna:Anext-generationhyperparameteroptimizationframework.InProceedingsofthe25rdACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,2019.[50]AlainColmerauerandPhilippeRoussel.Thebirthofprolog.InHistoryofprogramminglanguages—II,pages331–367,1996.[51]WolframResearch,Inc.Mathematica,Version12.1.Champaign,IL,2020.[52]BrunoBuchbergeretal.Symboliccomputation(aneditorial).J.SymbolicComput,1(1):1–6,1985.[53]HaroldAbelsonandGeraldJaySussman.Structureandinterpretationofcomputerprograms.TheMITPress,1996.[54]SumitGulwani,SusmitJha,AshishTiwari,andRamarathnamVenkatesan.Synthesisofloop-freeprograms.ACMSIGPLANNotices,46(6):62–73,2011.[55]SumitGulwani,OleksandrPolozov,andRishabhSingh.Programsynthesis.FoundationsandTrendsR inProgrammingLanguages,4(1-2):1–119,2017.[56]OleksandrPolozovandSumitGulwani.Flashmeta:aframeworkforinductiveprogramsynthesis.InProceedingsoftheACMSIGPLANInternationalConferenceonObject-OrientedProgramming,Systems,Languages,andApplications(OOPSLA),pages107–126,2015.12 [57]EmilioParisotto,AbdelrahmanMohamed,RishabhSingh,LihongLi,DengyongZhou,andPushmeetKohli.Neuro-symbolicprogramsynthesis.InInternationalConferenceonLearningRepresentations(ICLR),2017.[58]JacobDevlin,JonathanUesato,SuryaBhupatiraju,RishabhSingh,AbdelrahmanMohamed,andPushmeetKohli.Robustll:Neuralprogramlearningundernoisyi/o.InTheInternationalConferenceonMachineLearning(ICML),pages990–998,2017.[59]MatejBalog,AlexanderLGaunt,MarcBrockschmidt,SebastianNowozin,andDanielTarlow.DeepCoder:Learningtowriteprograms.InInternationalConferenceonLearningRepresentations(ICLR),2017.[60]ChenLiang,MohammadNorouzi,JonathanBerant,QuocV.Le,andNiLao.Memoryaugmentedpolicyoptimizationforprogramsynthesisandsemanticparsing.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages9994–10006,2018.[61]ArvindNeelakantan,QuocV.Le,andIlyaSutskever.Neuralprogrammer:Inducinglatentprogramswithgradientdescent.InInternationalConferenceonLearningRepresentations(ICLR),2016.[62]EstebanReal,ChenLiang,DavidRSo,andQuocVLe.Automl-zero:Evolvingmachinelearningalgorithmsfromscratch.InTheInternationalConferenceonMachineLearning(ICML),2020.[63]LazarValkov,DipakChaudhari,AkashSrivastava,CharlesSutton,andSwaratChaudhuri.Houdini:Lifelonglearningasprogramsynthesis.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages8687–8698,2018.[64]D.AndreandS.Russell.Stateabstractioninprogrammablereinforcementlearning.InAAAIConferenceonArticialIntelligence(AAAI),2002.[65]HaraldSøndergaardandPeterSestoft.Non-determinisminfunctionallanguages.TheComputerJournal,35(5):514–523,1992.[66]ArmandoSolar-Lezama.Thesketchingapproachtoprogramsynthesis.InAsianSymposiumonProgram-mingLanguagesandSystems,pages4–13.Springer,2009.[67]G.Sussman.Buildingrobustsystemsanessay.InMassachusettsInstituteofTechnolog

Related Contents


Next Show more