PyGloveSymbolicProgrammingforAutomatedMachineLearning - PDF document

cecilia . @cecilia

342 views
Uploaded On 2021-09-22

PyGloveSymbolicProgrammingforAutomatedMachineLearning - PPT Presentation

DaiyiPengXuanyiDongEstebanRealMingxingTanYifengLuHanxiaoLiuGabrielBenderAdamKraftChenLiangQuocVLeGoogleResearchBrainTeamdaiyiperealtanmingxingyifengluhanxiaolgbenderadamkraftcrazydonkeyqvlgooglecomxua ID: 883281

150 x0000 bench 2019 x0000 150 2019 bench 2018 forexample 2017 2020 101 intheconferenceonneuralinformationprocessingsystems ininternationalconferenceonlearningrepresentations neurips iclr nas figure1

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/883281" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Pdf The PPT/PDF document "PyGloveSymbolicProgrammingforAutomatedMa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

1 PyGlove:SymbolicProgrammingforAutomatedM
PyGlove:SymbolicProgrammingforAutomatedMachineLearning DaiyiPeng,XuanyiDong,EstebanReal,MingxingTan,YifengLuHanxiaoLiu,GabrielBender,AdamKraft,ChenLiang,QuocV.LeGoogleResearch,BrainTeam{daiyip,ereal,tanmingxing,yifenglu,hanxiaol,gbender,adamkraft,crazydonkey,qvl}@google.comxuanyi.dxy@gmail.comAbstractNeuralnetworksaresensitivetohyper-parameterandarchitecturechoices.Auto-matedMachineLearning(AutoML)isapromisingparadigmforautomatingthesechoices.CurrentMLsoftwarelibraries,however,arequitelimitedinhandlingthedynamicinteractionsamongthecomponentsofAutoML.Forexample,efcientNASalgorithms,suchasENAS[1]andDARTS[2],typicallyrequireanimple-mentationcouplingbetweenthesearchspaceandsearchalgorithm,thetwokeycomponentsinAutoML.Furthermore,implementingacomplexsearchow,suchassearchingarchitectureswithinaloopofsearchinghardwarecongurations,isdifcult.Tosummarize,changingthesearchspace,searchalgorithm,orsearchowincurrentMLlibrariesusuallyrequiresasignicantchangeintheprogramlogic.Inthispaper,weintroduceanewwayofprogrammingAutoMLbasedonsymbolicprogramming.Underthisparadigm,MLprogramsaremutable,thuscanbemanipulatedeasilybyanotherprogram.Asaresult,AutoMLcanbereformulatedasanautomatedprocessofsymbolicmanipulation.Withthisformulation,wedecouplethetriangleofthesearchalgorithm,thesearchspaceandthechildprogram.Thisdecouplingmakesiteasytochangethesearchspaceandsearchalgorithm(withoutandwithweightsharing),aswellastoaddsearchcapabilitiestoexistingcodeandimplementcomplexsearchows.WethenintroducePyGlove,anewPythonlibrarythatimplementsthisparadigm.ThroughcasestudiesonImageNetandNAS-Bench-101,weshowthatwithPyGloveuserscaneasilyconvertastaticprogramintoasearchspace,quicklyiterateonthesearchspacesandsearchalgorithms,andcraftcomplexsearchowstoachievebetterresults.1IntroductionNeuralnetworksaresensitivetoarchitectureandhyper-parameterchoices[3,4].Forexample,ontheImageNetdataset[5],wehaveobservedalargeincreaseinaccuracythankstochangesinarchitectures,hyper-parameters,andtrainingalgorithms,fromtheseminalworkofAlexNet[5]torecentstate-of-the-artmodelssuchasEfcientNet[6].However,asneuralnetworksbecomeincreasinglycomplex,thepotentialnumberofarchitectureandhyper-parameterchoicesbecomesnumerous.Hand-craftingneuralnetworkarchitecturesandselectingtherighthyper-parametersis,therefore,increasinglydifcultandoftentakemonthsofexperimentation.AutomatedMachineLearning(AutoML)isapromisingparadigmfortacklingthisdifculty.InAutoML,selectingarchitecturesandhyper-parametersisformulatedasasearchproblem,whereasearchspaceisdenedtorepresentallpossiblechoicesandasearchalgorithmisusedtondthe WorkdoneasaresearchinternatGoogle.34thConferenceonNeuralInformationProcessingSystems(NeurIPS2020),Vancouver,Canada. bestchoices.Forhyper-parametersearch,thesearchspacewouldspecifytherangeofvaluestotry.Forarchitecturesearch,thesearchspacewouldspecifythearchitecturalcongurationstotry.Thesearchspaceplaysacriticalroleinthesuccessofneuralarchitecturesearch(NAS)[7,8],andcanbesignicantlydifferentfromoneapplicationtoanother[8–11].Inaddition,therearealsomanydifferentsearchalgorithms,suchasrandomsearch[12],Bayesianoptimization[13],RL-basedmethods[1,9,14,15],evolutionarymethods[16],gradient-basedmethods[2,10,17]andneuralpredictors[18].ThisproliferationofsearchspacesandsearchalgorithmsinAutoMLmakesitdifculttoprogramwithexistingsoftwarelibraries.Inparticular,acommonproblemofcurrentlibrariesisthatsearchspacesandsearchalgorithmsaretightlycoupled,makingithardtomodifysearchspaceorsearchalgorithmalone.Apracticalscenariothatarisesistheneedtoupgradeasearchalgorithmwhilekeepingtherestoftheinfrastructurethesame.Forexample,recentyearshaveseenatransitionfromAutoMLalgo-rithmsthattraineachmodelfromscratch[8,9]tothosethatemployweight-sharingtoattainmassiveefciencygains,suchasENASandDARTS[1,2,14,15,19].Yet,upgradinganexistingsearchspacebyintroducingweight-sharingrequiressignicantchangestoboththesearchalgorithmandthemodelbuildinglogic,aswewillseeinSection2.2.Suchcouplingbetweensearchspacesandsearchalgo-rithms,andtheresultinginexibility,imposeaheavyburdenonAutoMLresearchersandpractitioners.WebelievethatthemainchallengeliesintheprogrammingparadigmmismatchbetweenexistingsoftwarelibrariesandAutoML.Mostexistinglibrariesarebuiltonthepremiseofimmutableprograms,whereaxedprogramisusedtoprocessdifferentdata.Onthecontrary,AutoMLrequiresprograms(i.e.modelarchitectures)tobemutable,astheymustbedynamicallymodiedbyanotherprogram(i.e.thesearchalgorithm)whosejobistoexplorethesearchspace.Duetothismismatch,predenedinterfacesforsearchspacesandsearchalgorithmsstruggletoaccommodateunanticipatedinteractions,makingitdifculttotrynewAutoMLapproaches.Symbolicprogramming,whichoriginatedfromLISP[20],providesapotentialsolutiontothisproblem,byallowingaprogramtomanipulateitsowncomponentsasiftheywereplaindata[21].However,despiteitslonghistory,symbolicprogramminghasnotyetbeenwidelyexploredintheMLcommunity.Inthispaper,wereformulateAutoMLasanautomatedprocessofmanipulatingMLprogramssymbolically.Underthisformulation,programsaremutableobjectswhichcanbeclonedandmodiedaftertheircreation.Thesemutableobjectscanexpressstandardmachinelearningconcepts,fromaconvolutionalunittoacomplexuser-denedtrainingprocedure.Asaresult,allpartsofaMLprogramaremutable.Moreover,throughsymbolicprogramming,pr

2 ogramscanmodifyprograms.Thereforetheinte
ogramscanmodifyprograms.Thereforetheinteractionsbetweenthechildprogram,searchspace,andsearchalgorithmarenolongerstatic.Wecanmediatethemorchangethemviameta-programs.Forexample,wecanmapthesearchspaceintoanabstractviewwhichisunderstoodbythesearchalgorithm,translatinganarchitecturalsearchspaceintoasuper-networkthatcanbeoptimizedbyefcientNASalgorithms.Further,weproposePyGlove,alibrarythatenablesgeneralsymbolicprogramminginPython,asanimplementationofourmethodtestedonreal-worldAutoMLscenarios.WithPyGlove,PythonclassesandfunctionscanbemademutablethroughbriefPythonannotations,whichmakesitmucheasiertowriteAutoMLprograms.PyGloveallowsAutoMLtechniquestobeeasilydroppedintopreexistingMLpipelines,whilealsobenetingopen-endedresearchwhichrequiresextremeexibility.Tosummarize,ourcontributionsarethefollowing:WereformulateAutoMLunderthesymbolicprogrammingparadigm,greatlysimplifyingtheprogramminginterfaceforAutoMLbyaccommodatingunanticipatedinteractionsamongthechildprograms,searchspacesandsearchalgorithmsviaamutableobjectmodel.WeintroducePyGlove,ageneralsymbolicprogramminglibraryforPythonwhichim-plementsoursymbolicformulationofAutoML.WithPyGlove,AutoMLcanbeeasilydroppedintopreexistingMLprograms,withallprogrampartssearchable,permittingrapidexplorationondifferentdimensionsofAutoML.Throughcasestudies,wedemonstratetheexpressivenessofPyGloveinreal-worldsearchspaces.WedemonstratehowPyGloveallowsAutoMLresearchersandpractitionerstochangesearchspaces,searchalgorithmsandsearchowswithonlyafewlinesofcode.2SymbolicProgrammingforAutoMLManyAutoMLapproaches(e.g.,[2,9,22])canbeformulatedasthreeinteractingcomponents:thechildprogram,thesearchspace,andthesearchalgorithm.AutoML'sgoalistodiscoveraperformantchildprogram(e.g.,aneuralnetworkarchitectureoradataaugmentationpolicy)outofalargeset2 ofpossibilitiesdenedbythesearchspace.Thesearchalgorithmaccomplishesthesaidgoalbyiterativelysamplingchildprogramsfromthesearchspace.Eachsampledchildprogramisthenevaluated,resultinginanumericmeasureofitsquality.Thismeasureiscalledthereward2.Therewardisthenfedbacktothesearchalgorithmtoimprovefuturesamplingofchildprograms.IntypicalAutoMLlibraries[23–31],thesethreecomponentsareusuallytightlycoupled.Thecou-plingbetweenthesecomponentsmeansthatwecannotchangetheinteractionsbetweenthemunlessnon-trivialmodicationsaremade.Thislimitstheexibilityofthelibraries.Somesuccessfulat-temptshavebeenmadetobreakthesecouplings.Forexample,Vizier[26]decouplesthesearchspaceandthesearchalgorithmbyusingadictionaryasthesearchspacecontractbetweenthechildprogramandthesearchalgorithm,resultinginmodularblack-boxsearchalgorithms.AnotherexampleistheNNIlibrary[27],whichtriestounifysearchalgorithmswithandwithoutweightsharingbycarefullydesignedAPIs.Thispaper,however,solvesthecouplingprobleminadifferentandmoregeneralway:withsymbolicprogramming,programsareallowedtobemodiedbyotherprograms.Therefore,in-steadofsolvingxedcouplings,weallowdynamiccouplingsthroughamutableobjectmodel.Inthissection,wewillexplainourmethodandshowhowthismakesAutoMLprogrammingmoreexible.2.1AutoMLasanAutomatedSymbolicManipulationProcessAutoMLcanbeinterpretedasanautomatedprocessofsearchingforachildprogramfromasearchspacetomaximizeareward.Wedecomposethisprocessintoasequenceofsymbolicoperations.A(regular)childprogram(Figure1-a)issymbolizedintoasymbolicchildprogram(Figure1-b),whichcanbethenclonedandmodied.Thesymbolicprogramisfurtherhyperiedintoasearchspace(Figure1-c)byreplacingsomeofthexedpartswithto-be-determinedspecications.Duringthesearch,thesearchspaceismaterializedintodifferentchildprograms(Figure1-d)basedonsearchalgorithmdecisions,orcanberewrittenintoasuper-program(Figure1-e)toapplycomplexsearchalgorithmssuchasefcientNAS. Figure1:AutoMLasanautomatedsymbolicmanipulationprocess.AnanalogytothisprocessistohavearobotbuildahousewithLEGO[32]brickstomeetahumanbeing'staste:symbolizingaregularprogramislikeconvertingmoldedplasticpartsintoLEGObricks;hyperifyingasymbolicprogramintoasearchspaceislikeprovidingablueprintofthehousewithvariations.Withthehelpofthesearchalgorithm,thesearchspaceismaterializedintodifferentchildprogramswhoserewardsarefedbacktothesearchalgorithmtoimprovefuturesampling,likearobottryingdifferentwaystobuildthehouseandgraduallylearningwhathumansprefer. Figure2:Symbolizingclassesintomutablesym-bolictrees.Theirhyper-parametersarelikethestudsofLEGObricks,whiletheirimplementationsarelessinterestingwhilewemanipulatethetrees.Symbolization.A(regular)childprogramcanbedescribedasacomplexobject,whichisacomposi-tionofitssub-objects.Asymbolicchildprogramissuchacompositionwhosesub-objectsarenolongertiedtogetherforever,butaredetachablefromeachotherhencecanbereplacedbyothersub-objects.Thesymbolicobjectcanbehierarchical,formingasymbolictreewhichcanbemanipulatedorexecuted.Asymbolicobjectismanipulatedthroughitshyper-parameters,whicharelikethestudsofaLEGObrick,interfacingconnectionswithotherbricks.However,symbolicobjects,unlikeLEGObricks,canhaveinternalstateswhichareautomaticallyrecomputeduponmodications.For 2WhileweuseRLconceptstoillustratethecoreideaofourmethod,aswillbeshownlater,theproposedparadigmisapplicabletoothertypesofAutoMLmethodsaswell.3 V\PEROL]H K\SHULI\ S \ T ] PDWHULDOL]HUHZULWH T S ] [ \ D E F G

3 H RU T S ] [ \ RQHR
H RU T S ] [ \ RQHRI #V\PEROL]H FODVV7UDLQHUREMHFW GHI BBLQLWBB VHOI RSWLPL]HUPRGHO « #V\PEROL]H FODVV&RQY/D\HU GHI BBLQLWBB VHOI ILOWHUV NHUQHO « GHIFDOOVHOILQSXW T S RQHRI � S$GDPH T5063URS IORDWY HH@ $GDPH [ ] \ RQHRI � [,GHQWLW\ \'HQVH ]&RQY RQHRI �@@ &RQY T S [ ] \ T S ] [ \ 7UDLQHU 2QHVKRW 7UDLQHU RQHRI � [,GHQWLW\ \'HQVH ]&RQY RQHRI �@@ 6ZLWFK � ,GHQWLW\ 'HQVH 0DVNHG&RQY' �@@ UHZULWH T I ] T S [ ] \ 6HDUFK $OJRULWKP S ] D E F G PDS PDWHULDOL]H [ \ ] T S T S [ ] \ 6HDUFK $OJRULWKP D 6HDUFK $OJRULWKP S ] E G PDS ZLWK PDSZLWK T S [ \ ] H � � D E #V\PEROL]H FODVV 7UDLQHU REMHFW GHI BBLQLWBB VHOI PRGHORSWLPL]HU « #V\PEROL]H FODVV &RQY/D\HU GHI BBLQLWBB VHOI ILOWHUV NHUQHO « GHI FDOO VHOI LQSXW example,whenwechangethedatasetofatrainer,thetrainstepswillberecomputedfromthenumberofexamplesinthedatasetifthetrainingisbasedonthenumberofepochs.Withsuchamutableobjectmodel,wenolongerneedtocreateobjectsfromscratchrepeatedly,ormodifytheproducersup-stream,butcancloneexistingobjectsandmodifythemintonewones.Thesymbolictreerepresentationputsanemphasisonmanipulatingtheobjectdenitions,whileleavingtheimplementationdetailsbehind.Figure2illustratesthesymbolizationprocess.2.2DisentanglingAutoMLthroughSymbolicProgramming Figure3:Hyperifyingachildprogramintoasearchspacebyreplacingxedpartswithto-be-determinedspecications. Figure4:Materializinga(concrete)childprogram(d)fromthesearchspace(a)withanabstractchildprogram(c)proposedfromthesearchalgorithm,whichholdsanabstractsearchspace(b)asthealgorithm'sviewforthe(concrete)searchspace.Disentanglingsearchspacesfromchildprograms.Thesearchspacecanbedisentangledfromthechildprograminthat1)theclassesandfunctionsofthechildprogramcanbeimplementedwithoutdependingonanyAutoMLlibrary(AppendixB.1.1),whichappliestomostpreexistingMLprojectswhoseprogramswerestartedwithouttakingAutoMLinmind;2)achildprogramcanbemanipulatedintoasearchspacewithoutmodifyingitsimplementation.Figure3showsthatachildprogramisturnedintoasearchspacebyreplacingaxedConvwithachoiceofIdentity,MaxPoolandConvwithsearchableltersize.Meanwhile,itswapsaxedAdamoptimizerwithachoicebetweentheAdamandanRMSPropwithasearchablelearningrate.Disentanglingsearchspacesfromsearchalgorithms.Symbolicprogrammingbreaksthecouplingbetweenthesearchspaceandthesearchalgorithmbypreventingthealgorithmfromseeingthefullsearchspacespecication.Instead,thealgorithmonlyseeswhatitneedstoseeforthepurposesofsearching.Werefertothealgorithm'sviewofthesearchspaceastheabstractsearchspace.Thefullspecication,incontrast,willbecalledtheconcretesearchspace(orjustthe“searchspace”outsidethissection).ThedistinctionbetweentheconcreteandabstractsearchspaceisillustratedinFigure4:theconcretesearchspaceactsasaboilerplateforproducingconcretechildprograms,whichholdsalltheprogramdetails(e.g.,thexedparts).How-ever,theabstractsearchspaceonlyseesthepartsthatneeddecisions,alongwiththeirnumericranges.Basedontheabstractsearchspace,anabstractchildprogramisproposed,whichcanbestaticnumericvaluesorvariables.Thestaticformisforobtainingaconcretechildprogram,showninFigure4,whilethevariableformisusedformakingasuper-programusedinefcientNAS–thevariablescanbeeitherdiscreteforRL-basedusecasesorreal-valuedvectorsforgradient-basedmethods.Mediatedbytheabstractsearchspaceandtheabstractchildprogram,thesearchalgorithmcanbethoroughlydecoupledfromthechildprogram.Figure5givesamoredetailedillustrationofFigure4. Figure5:Thepathfroma(concrete)searchspacetoa(concrete)childprogram.Thedisentanglementbetweenthesearchspaceandthesearchalgorithmisachievedby(1)abstractingthesearchspace,(2)proposinganabstractchildprogram,and(3)materializingtheabstractchildprogramintoaconcreteone.4 #V\PEROL]H FODVV7UDLQHUREMHFW GHI BBLQLWBB VHOI

4 RSWLPL]
RSWLPL]HUPRGHO « #V\PEROL]H FODVV&RQY/D\HU GHI BBLQLWBB VHOI ILOWHUV NHUQHO « GHIFDOOVHOILQSXW T S RQHRI � S$GDPH T5063URS IORDWY HH@ $GDPH [ ] \ RQHRI � [,GHQWLW\ \0D[3RRO ]&RQY RQHRI �@@ &RQY T S [ ] \ T S ] [ \ 7UDLQHU 2QHVKRW 7UDLQHU RQHRI � [,GHQWLW\ \0D[3RRO ]&RQY RQHRI �@@ 6ZLWFK � ,GHQWLW\ 0D[3RRO 0DVNHG&RQY' �@@ UHZULWH T I ] T S [ ] \ 6HDUFK $OJRULWKP S ] D E F G PDS PDWHULDOL]H [ \ ] T S T S [ ] \ 6HDUFK $OJRULWKP D 6HDUFK $OJRULWKP S ] E G PDS ZLWK PDSZLWK T S [ \ ] H � � D E #V\PEROL]H FODVV 7UDLQHU REMHFW GHI BBLQLWBB VHOI PRGHORSWLPL]HU « #V\PEROL]H FODVV &RQY/D\HU GHI BBLQLWBB VHOI ILOWHUV NHUQHO « GHI FDOO VHOI LQSXW #V\PEROL]H FODVV7UDLQHUREMHFW GHI BBLQLWBB VHOI RSWLPL]HUPRGHO « #V\PEROL]H FODVV&RQY/D\HU GHI BBLQLWBB VHOI ILOWHUV NHUQHO « GHIFDOOVHOILQSXW T S RQHRI � S$GDPH T5063URS IORDWY HH@ $GDPH [ ] \ RQHRI � [,GHQWLW\ \'HQVH ]&RQY RQHRI �@@ &RQY T S [ ] \ T S ] [ \ 7UDLQHU 2QHVKRW 7UDLQHU RQHRI � [,GHQWLW\ \'HQVH ]&RQY RQHRI �@@ 6ZLWFK � ,GHQWLW\ 'HQVH 0DVNHG&RQY' �@@ UHZULWH T I ] T S [ ] \ 6HDUFK $OJRULWKP S ] D E F G PDS PDWHULDOL]H [ \ ] T S T S [ ] \ 6HDUFK $OJRULWKP D 6HDUFK $OJRULWKP S ] E G PDS ZLWK PDSZLWK T S [ \ ] H � � D E #V\PEROL]H FODVV7UDLQHUREMHFW GHI BBLQLWBB VHOI PRGHORSWLPL]HU « #V\PEROL]H FODVV&RQY/D\HU GHI BBLQLWBB VHOI ILOWHUV NHUQHO « GHIFDOOVHOILQSXW 7UDLQHU PRGHO 6WDFNHGRS RQHRI � ,GHQWLW\ 0D[3RRO &RQY RQHRI �@ @UHSHDWV RSWLPL]HU RQHRI � $GDPH 5063URS IORDWY HH @ [ \ ] T S &KRLFH C ^` )ORDW C �HH@ &KRLFH C ^` H &KRLFH C ^` � @ 7UDLQHU PRGHO 6WDFNHG RS &RQY UHSHDWV RSWLPL]HU 5063URS H 6HDUFK6SDFH $EVWUDFW6HDUFK6SDFH $EVWUDFW&KLOG3URJUDP &KLOG3URJUDP Fi

5 gure6:Rewritingasearchspace(a)intoasuper
gure6:Rewritingasearchspace(a)intoasuper-program(b)requiredbyTuNAS.Disentanglingsearchalgorithmsfromchildprograms.Whilemanysearchalgorithmscanbeimplementedbyrewritingsymbolicobjects,complexalgorithmssuchasENAS[1],DARTS[2]andTuNAS[15]canbedecom-posedinto1)achild-program-agnosticalgorithm,plus2)ameta-program(e.g.aPythonfunction)whichrewritesthesearchspaceintoarepresentationrequiredbythesearchal-gorithm.Themeta-programonlymanipulatesthesymbolswhichareinterestingtothesearchalgorithmandignorestherest.Inthisway,wecandecouplethesearchalgorithmfromthechildprogram.Forexample,theTuNAS[15]algorithmcanbedecomposedinto1)animplementationofRE-INFORCE[33]and2)arewritefunctionwhichtransformsthearchitecturesearchspaceintoasuper-network,andreplacestheregulartrainerwithatrainerthatsamplesandtrainsthesuper-network,illustratedinFigure6.IfwewanttoswitchthesearchalgorithmtoDARTS[2],weuseadifferentrewritefunctionthatgeneratesasuper-networkwithsoftchoices,andreplacethetrainerwithasuper-networktrainerthatupdatesthechoiceweightsbasedonthegradients.2.3SearchspacepartitioningandcomplexsearchowsEarlywork[19,34,35]showsthatfactorizedsearchcanhelppartitionthecomputationforoptimizingdifferentpartsoftheprogram.Yet,complexsearchowshavebeenlessexplored,possiblydueinparttotheirimplementationcomplexity.Theeffortinvolvedinpartitioningasearchspaceandcoordinatingthesearchalgorithmsisusuallynon-trivial.However,thesymbolictreerepresentationmakessearchspacepartitioningamucheasiertask:withapartitionfunction,wecandividethoseto-be-determinedpartsintodifferentgroupsandoptimizeeachgroupseparately.Asaresult,eachoptimizationprocessseesonlyaportionofthesearchspace–asub-space–andtheyworktogethertooptimizethecompletesearchspace.Section3.4discussescommonpatternsofsuchcollaborationandhowweexpresscomplexsearchows.3AutoMLwithPyGloveInthissection,weintroducePyGlove,ageneralsymbolicprogramminglibraryonPython,whichalsoimplementsourmethodforAutoML.Withexamples,wedemonstratehowaregularprogramismadesymbolicallyprogrammable,thenturnedintosearchspaces,searchedwithdifferentsearchalgorithmsandowsinadozenlinesofcode. Figure7:AregularPythonclassmadesymbolicallyprogrammableviathesymbolizedecorator(left),whoseobjectisasymbolictree(middle),inwhichallnodescanbesymbolicallyoperated(right).Forexample,wecan(i)retrievealltheLayerobjectsinthetreeviaquery,(ii)clonetheobjectand(iii)modifythecopybyswappingallConvlayerswithMaxPoollayersofthesamekernelsizeusingrebind.3.1SymbolizeaPythonprogramTable1:ThedevelopmentcostofdroppingPyGloveintoexistingprojectsondifferentMLframeworks.ThesourcecodeofMNISTisincludedinAppendixB.5.ProjectsOriginallinesofcodeModiedlinesofcode PyTorchResNet[36]35315 TensorFlowMNIST[37]12024 InPyGlove,preexistingPythonprogramscanbemadesymbolicallyprogrammablewithasym-bolizedecorator.Besidesclasses,functionscanbesymbolizedtoo,asdiscussedinAp-pendixB.1.2.Tofacilitatemanipulation,Py-Gloveprovidesawiderangeofsymbolicopera-tions.Amongthem,query,cloneandrebindareofspecialimportanceastheyarefoundationalto5 #V\PEROL]H FODVV7UDLQHUREMHFW GHI BBLQLWBB VHOI RSWLPL]HUPRGHO « #V\PEROL]H FODVV&RQY/D\HU GHI BBLQLWBB VHOI ILOWHUV NHUQHO « GHIFDOOVHOILQSXW T S RQHRI � S$GDPH T5063URS IORDWY HH@ $GDPH [ ] \ RQHRI � [,GHQWLW\ \0D[3RRO ]&RQY RQHRI �@@ &RQY T S [ ] \ T S ] [ \ 7UDLQHU 6XSHUQHWZRUN 7UDLQHU RQHRI � [,GHQWLW\ \0D[3RRO ]&RQY RQHRI �@@ 6ZLWFK � ,GHQWLW\ 0D[3RRO 0DVNHG&RQY' �@@ UHZULWH T I ] T S [ ] \ 6HDUFK $OJRULWKP S ] D E F G PDS PDWHULDOL]H [ \ ] T S T S [ ] \ 6HDUFK $OJRULWKP D 6HDUFK $OJRULWKP S ] E G PDS ZLWK PDSZLWK T S [ \ ] H � � D E #V\PEROL]H FODVV 7UDLQHU REMHFW GHI BBLQLWBB VHOI PRGHORSWLPL]HU « #V\PEROL]H FODVV &RQY/D\HU GHI BBLQLWBB VHOI ILOWHUV NHUQHO « GHI FDOO VHOI LQSXW GHI VZDSNY&RQY LI

6 LVLQVWDQFH Y&RQY
LVLQVWDQFH Y&RQY UHWXUQ0D[3RROYNHUQHO UHWXUQ Y SULQW WUDLQHU TXHU\ ODPEGD Y LVLQVWDQFH Y/D\HU WUDLQHU FORQH UHELQG VZDS 7UDLQHU PRGHO 5HV1HW/LNH EORFN 6HTXHQWLDO SHUPXWDWH � &RQY ILOWHUV RQHRI �@ NHUQHO %DWFK1RUPDOL]DWLRQ 5H/8 @ QXPBEORFNV LQWY RSWLPL]HU RQHRI � $GDPH 5063URS IORDWY HH @ #V\PEROL]H FODVV 7UDLQHU REMHFW GHI BBLQLWBB VHOI PRGHORSWLPL]HU GHI WUDLQ VHOI UHWXUQWUDLQHUBLPSO VHOI RSWLPL]HU VHOI PRGHO 7UDLQHU PRGHO 5HV1HW/LNH �EORFN 6HTXHQWLDO &RQY %DWFK1RUPDOL]DWLRQ 5H/8 QXPBEORFNV RSWLPL]HU $GDPH 6\PEROLFFODVV 6\PEROLFWUHHRIREMHFW 6\PEROLFRSHUDWLRQV IRU WUDLQHUIHHGEDFN LQ VDPSOH VHDUFKBVSDFH K\SHUBWUDLQHU DOJRULWKP 332 SDUWLWLRQBIQ 1RQH UHZDUG WUDLQHUWUDLQ IHHGEDFN UHZDUG GHI UHOD[BILOWHUVNYSDUHQW LI LVLQVWDQFH SDUHQW&RQY LI N ILOWHUV UHWXUQ RQHRI �YYY\r@ UHWXUQ Y K\SHUBWUDLQHU WUDLQHU FORQH UHELQG UHOD[BILOWHUV #V\PEROL]H FODVV %ORFN REMHFW GHI BBLQLWBB VHOI ILOWHUV VHOI ILOWHUV ILOWHUV GHI BBFDOOBB VHOI UHWXUQ �6HTXHQWLDO &RQYVHOIILOWHUV &RQYVHOIILOWHUV\r@ othersymbolicoperations.ExamplesoftheseoperationscanbefoundinAppendixB.2.Figure7shows(1)asymbolicPythonclass,(2)aninstanceoftheclassasasymbolictree,and(3)keysymbolicoperationswhichareapplicabletoasymbolicobject.ToconveytheamountofworkrequiredtodropPyGloveintoreal-lifeprojects,weshowthenumberoflinesofcodeinmakingaPyTorch[36]andaTensorFlow[37]projectssearchableinTable1.3.2Fromasymbolicprogramtoasearchspace Figure8:ThechildprogramfromFig-ure7-2isturnedintoasearchspace. Figure9:Expressingdependenthyper-parametersbyintroducingahigher-ordersymbolicBlockclass.Withachildprogrambeingasymbolictree,anynodeinthetreecanbereplacedwithato-be-determinedspecication,whichwecallhypervalue(incorrespondencetohyperify,averbintroducedinSection2.1inmakingsearchspaces).Asearchspaceisnaturallyrepresentedasasymbolictreewithhypervalues.InPyGlove,therearethreeclassesofhypervalues:1)acontinuousvaluedeclaredbyfloatv;2)adiscretevaluedeclaredbyintv;and3)acategoricalvaluedeclaredbyoneof,manyoforpermutate.Table2summarizesdifferenthypervalueclasseswiththeirsemantics.Figure8showsasearchspacethatjointlyoptimizesamodelandanoptimizer.Themodelspaceisanumberofblockswhosestructureisasequenceofpermutationfrom[Conv,BatchNormalization,ReLU]withsearchableltersize.Dependenthyper-parameterscanbeachievedbyusinghigher-ordersymbolicobjects.Forexample,ifwewanttosearchfortheltersofaConv,whichfollowsanotherConvwhoseltersaretwicetheinputlters,wecancreateasymbolicBlockclass,whichtakesonlyoneltersize–theoutputltersoftherstConv–asitshyper-parameters.Whenit'scalled,itreturnsasequenceof2Convunitsbasedonitslters,asshowninFigure9.Theltersoftheblockcanbeahypervalueatconstructiontime,appearingasanodeinthesymbolictree,butwillbematerializedwhenit'scalled.3.3SearchalgorithmsWithoutinteractingwiththechildprogramandthesearchspacedirectly,thesearchalgorithminPyGloverepeatedly1)proposesanabstractchildprogrambasedontheabstractsearchspaceand2)receivesmeasuredqualitiesfortheabstractchildprogramtoimprovefutureproposals.PyGloveimplementsmanysearchalgorithms,includingRandomSearch,PPOandRegularizedEvolution.Table2:Hypervalueclassesandtheirsemantics.StrategyHyper-parameterannotationSearchspacesemantics Continuousfloatv(min,max)AoatvaluefromR[min;max] Discreteintv(min,max)AnintvaluefromZ[min;max] Categoricaloneof(candidates)Choose1outofNcandidatesmanyof(K,candidates,)Ch

7 ooseKoutofNcandidateswithoptionalconstra
ooseKoutofNcandidateswithoptionalconstraintsontheuniquenessandorderofchosencandidatespermutate(candidates)Aspecialcaseofmanyofwhichsearchesforapermutationofallcandidates Hierarchical(whenacategoricalhypervaluecontainschildhypervalues)Conditionalsearchspace 3.4ExpressingsearchowsWithasearchspace,asearchalgorithm,andanoptionalsearchspacepartitionfunction,asearchowcanbeexpressedasafor-loop,illustratedinFigure10-left.Searchspacepartitioningenablesvariouswaysinoptimizingthedividedsub-spaces,resultinginthreebasicsearchtypes:1)optimizethesub-6 GHI VZDSNY&RQY LI LVLQVWDQFH Y&RQY UHWXUQ'HQVHYILOWHUV UHWXUQ Y SULQW WUDLQHU TXHU\ ODPEGD YLVLQVWDQFHY/D\HU WUDLQHU FORQH UHELQG VZDS 7UDLQHU PRGHO 5HV1HW/LNH EORFN 6HTXHQWLDO SHUPXWDWH � &RQY ILOWHUV RQHRI �@ NHUQHO %DWFK1RUPDOL]DWLRQ 5H/8 @ QXPBEORFNV LQWY RSWLPL]HU RQHRI � $GDPH 5063URS IORDWY HH @ #V\PEROL]H FODVV 7UDLQHU REMHFW GHI BBLQLWBB VHOI PRGHORSWLPL]HU GHI WUDLQ VHOI UHWXUQWUDLQHUBLPSO VHOI RSWLPL]HU VHOI PRGHO 7UDLQHU PRGHO 5HV1HW/LNH �EORFN 6HTXHQWLDO ,GHQWLW\ 'HQVH &RQY@ QXPBEORFNV RSWLPL]HU $GDPH 6\PEROLFFODVV 6\PEROLFWUHHRIREMHFW 6\PEROLFRSHUDWLRQV IRU WUDLQHUIHHGEDFN LQ VDPSOH VHDUFKBVSDFH K\SHUBWUDLQHU DOJRULWKP 332 SDUWLWLRQBIQ 1RQH UHZDUG WUDLQHUWUDLQ IHHGEDFN UHZDUG GHI UHOD[BILOWHUVNYSDUHQW LI LVLQVWDQFHSDUHQW&RQY LI N ILOWHUV UHWXUQ RQHRI �YYY\r@ UHWXUQ Y K\SHUBWUDLQHU WUDLQHU FORQH UHELQG UHOD[BILOWHUV #V\PEROL]H FODVV %ORFN REMHFW GHI BBLQLWBB VHOI ILOWHUV VHOI ILOWHUV ILOWHUV GHI BBFDOOBB VHOI UHWXUQ �6HTXHQWLDO &RQYVHOIILOWHUV &RQYVHOIILOWHUV\r@ GHI VZDSNY&RQY LI LVLQVWDQFH Y&RQY UHWXUQ0D[3RROYNHUQHO UHWXUQ Y SULQW WUDLQHU TXHU\ ODPEGD Y LVLQVWDQFH Y/D\HU WUDLQHU FORQH UHELQG VZDS 7UDLQHU PRGHO 5HV1HW/LNH EORFN 6HTXHQWLDO SHUPXWDWH � &RQY ILOWHUV RQHRI �@ NHUQHO %DWFK1RUPDOL]DWLRQ 5H/8 @ QXPBEORFNV LQWY RSWLPL]HU RQHRI � $GDPH 5063URS IORDWY HH @ #V\PEROL]H FODVV 7UDLQHU REMHFW GHI BBLQLWBB VHOI PRGHORSWLPL]HU GHI WUDLQ VHOI UHWXUQWUDLQHUBLPSO VHOI RSWLPL]HU VHOI PRGHO 7UDLQHU PRGHO 5HV1HW/LNH �EORFN 6HTXHQWLDO &RQY %DWFK1RUPDOL]DWLRQ 5H/8 QXPBEORFNV RSWLPL]HU $GDPH 6\PEROLFFODVV 6\PEROLFWUHHRIREMHFW 6\PE

8 ROLFRSHUDWLRQV IRU
ROLFRSHUDWLRQV IRU WUDLQHUIHHGEDFN LQ VDPSOH VHDUFKBVSDFH K\SHUBWUDLQHU DOJRULWKP 332 SDUWLWLRQBIQ 1RQH UHZDUG WUDLQHUWUDLQ IHHGEDFN UHZDUG GHI UHOD[BILOWHUVNYSDUHQW LI LVLQVWDQFH SDUHQW&RQY LI N ILOWHUV UHWXUQ RQHRI �YYY\r@ UHWXUQ Y K\SHUBWUDLQHU WUDLQHU FORQH UHELQG UHOD[BILOWHUV #V\PEROL]H FODVV %ORFN REMHFW GHI BBLQLWBB VHOI ILOWHUV VHOI ILOWHUV ILOWHUV GHI BBFDOOBB VHOI UHWXUQ �6HTXHQWLDO &RQYVHOIILOWHUV &RQYVHOIILOWHUV\r@ Searchtypefor-looppattern Jointfor(x;fx):::: Separatefor(x1;fx1)::::for(x2;fx2):::: Factorizedfor(x1;fx1):for(x2;fx2):::: Figure10:PyGloveexpressessearchasafor-loop(left).Complexsearchowscanbeexpressedascompositionsoffor-loops(right).spacesjointly;2)optimizethesub-spacesseparately;or3)factorizetheoptimization.Figure10-rightmapsthethreesearchtypesintodifferentcompositionsoffor-loop.Let'stakethesearchspacedenedinFigure8asanexample,whichhasahyper-parametersub-space(thehyperoptimizer)andanarchitecturalsub-space(thehypermodel).Towardsthetwosub-spaces,wecan1)jointlyoptimizethemwithoutspecifyingapartitionfunction,asisshowninFigure10-left;2)separatelyoptimizethem,bysearchingthehyperoptimizerrstwithaxedmodel,thenusethebestoptimizerfoundtooptimizethehypermodel;or3)factorizetheoptimization,bysearchingthehyperoptimizerwithapartitionfunctionintheouterloop.Eachexampleintheloopisatrainerwithaxedoptimizerandahypermodel;thelatterwillbeoptimizedintheinnerloop.Thecombinationofthesebasicpatternscanexpressverycomplexsearchows,whichwillbefurtherstudiedthroughourNAS-Bench-101experimentsdiscussedinSection4.3.3.5Switchingbetweensearchspaces Figure11:ManipulatingthemodelinatrainerintoasearchspacebyrelaxingthexedltersoftheConvasasetofoptions.MakingchangestothesearchspaceisadailyroutineforAu-toMLpractitioners,whomaymovefromonesearchspacetoanother,ortocombineorthogonalsearchspacesintomorecomplexones.Forexample,wemaystartbysearchingfordifferentoperationsateachlayer,thentrytheideaofsearch-ingfordifferentoutputlters(Figure11),andeventuallyendupwithsearchingforboth.WeshowcasesuchsearchspaceexplorationinSection4.2.3.6SwitchingbetweensearchalgorithmsThesearchalgorithmisanotherdimensiontoexperimentwith.WecaneasilyswitchbetweensearchalgorithmsbypassingadifferentalgorithmtothesamplefunctionshowninFigure10-1.WhenapplyingefcientNASalgorithms,thehyper_trainerwillberewrittenintoatrainerthatsamplesandtrainsthesuper-networktransformedfromthearchitecturalsearchspace.4CaseStudy Figure12:Partialsearchspacede-nitionforNAS-Bench-101(top),NAS-FPN(middle)andTuNAS(bottom).Inthissection,wedemonstratethatwithPyGlovehowuserscandenecomplexsearchspaces,explorenewsearchspaces,searchalgorithms,andsearchowswithsimplicity.4.1ExpressingcomplexsearchspacesThecompositionofhypervaluescanrepresentcomplexsearchspaces.WehavereproducedpopularNASpapers,includingNAS-Bench-101[38],MNASNet[8],NAS-FPN[39],Prox-ylessNAS[14],TuNAS[15],andNATS-Bench[40].HereweusethesearchspacesfromNAS-Bench-101,NAS-FPN,andTuNAStodemonstratetheexpressivenessofPyGlove.IntheNAS-Bench-101searchspace(Figure12-top),thereareNdifferentpositionsinthenetworkand�N2=N(N�1) 2edgepositionsthatcanbeindependentlyturnedonoroff.EachnodeindependentlyselectsoneofKpossibleoperations.7 #V\PEROL]H FODVV7UDLQHUREMHFW GHIBBLQLWBB 6HOIPRGHORSWLPL]HU GHIWUDLQVHOI UHWXUQWUDLQHUBLPSO VHOIRSWLPL]HU VHOIPRGHO 7UDLQHU PRGHO 5HV1HW/LNH �EORFN 6HTXHQWLDO ,GHQWLW\ 'HQVH &RQY@ QXPBEORFNV RSWLPL]HU $GDPH GHIVZDSNY&RQY LILVLQVWDQFHY&RQY UHWXUQ'HQVHYILOWHUV UHWXUQY SULQWWUDLQHU TXHU\ ODPEGDYLVLQVWDQFHY/D\HU WUDLQHU FORQH UHELQG VZDS 6\PEROLFFODVV 6\PEROLFWUHHRIREMHFW 6\PEROLFRSHUDWLRQV 7UDLQHU PRGHO 5HV1HW/LNH EORFN 6HTXHQWLDO SHUPXWDWH � &RQY ILOWHUV

9 RQHRI �@
RQHRI �@ NHUQHO %DWFK1RUPDOL]DWLRQ 5H/8 @ QXPBEORFNV LQWY RSWLPL]HU RQHRI � $GDPH 5063URS IORDWY HH @ IRU WUDLQHUIHHGEDFN LQ VDPSOH VHDUFKBVSDFH K\SHUBWUDLQHU DOJRULWKP 332 SDUWLWLRQBIQ 1RQH UHZDUG WUDLQHUWUDLQ IHHGEDFN UHZDUG GHI UHOD[BILOWHUVNYSDUHQW LI LVLQVWDQFHSDUHQW&RQY LI N ´ILOWHUVµ UHWXUQ RQHRI �YYY\r@ UHWXUQ Y K\SHUBWUDLQHU WUDLQHU FORQH UHELQG UHOD[BILOWHUV #V\PEROL]H GHIEORFN ILOWHUV �UHWXUQ6HTXHQWLDO &RQY ILOWHUV &RQY ILOWHUV\r @ GHI VZDSNY&RQY LI LVLQVWDQFH Y&RQY UHWXUQ'HQVHYILOWHUV UHWXUQ Y SULQW WUDLQHU TXHU\ ODPEGD Y LVLQVWDQFH Y/D\HU WUDLQHU FORQH UHELQG VZDS 7UDLQHU PRGHO 5HV1HW/LNH EORFN 6HTXHQWLDO SHUPXWDWH � &RQY ILOWHUV RQHRI �@ NHUQHO %DWFK1RUPDOL]DWLRQ 5H/8 @ QXPBEORFNV LQWY RSWLPL]HU RQHRI � $GDPH 5063URS IORDWY HH @ #V\PEROL]H FODVV 7UDLQHU REMHFW GHI BBLQLWBB VHOI PRGHORSWLPL]HU GHI WUDLQ VHOI UHWXUQWUDLQHUBLPSO VHOI RSWLPL]HU VHOI PRGHO 7UDLQHU PRGHO 5HV1HW/LNH �EORFN 6HTXHQWLDO ,GHQWLW\ 'HQVH &RQY@ QXPBEORFNV RSWLPL]HU $GDPH 6\PEROLFFODVV 6\PEROLFWUHHRIREMHFW 6\PEROLFRSHUDWLRQV IRU WUDLQHUIHHGEDFN LQ VDPSOH VHDUFKBVSDFH K\SHUBWUDLQHU DOJRULWKP 332 SDUWLWLRQBIQ 1RQH UHZDUG WUDLQHUWUDLQ IHHGEDFN UHZDUG GHI UHOD[BILOWHUVNYSDUHQW LI LVLQVWDQFH SDUHQW&RQY LI N ILOWHUV UHWXUQ RQHRI �YYY\r@ UHWXUQ Y K\SHUBWUDLQHU WUDLQHU FORQH UHELQG UHOD[BILOWHUV #V\PEROL]H FODVV %ORFN REMHFW GHI BBLQLWBB VHOI ILOWHUV VHOI ILOWHUV ILOWHUV GHI BBFDOOBB VHOI UHWXUQ �6HTXHQWLDO &RQYVHOIILOWHUV &RQYVHOIILOWHUV\r@ 1$6%HQFK 0RGHO6SHF �QRGHV RQHRI UDQJH.@\r1 �HGJHV RQHRI �@@\r1\r1 1$6)31 )SQ1RGH W\SH RQHRI � VXP DWWHQWLRQ @ OHYHO LQSXWBRIIVHWV PDQ\RI UDQJH180B35(B12'(6 GLVWLQFW 7UXH VRUWHG 7UXH 7X1$6 5HVLGXDO RQHRI � ,QYHUWHG%RWWOHQHFN ILOWHUV RQHRI �@ NHUQHO RQHRI �@ H[SDQVLRQ RQHRI �@ =HUR@ TheNAS-FPNsearchspaceisarepeatedFPNcell,eachofwhosenodes(Figure

10 12-middle)ag-gregatestwooutputsofpreviou
12-middle)ag-gregatestwooutputsofpreviousnodes.Theaggregationiseithersumorglobalattention.Weusemanyofwiththeconstraintsdistinctandsortedtoselectinputnodeswithoutduplication.TheTuNASsearchspaceisastackofblocks,eachcontaininganumberofresiduallayers(Figure12-bottom)ofinvertedbottleneckunits,whoseltersize,kernelsizeandexpansionfactorwillbetuned.Tosearchthenumberoflayersinablock,weputZerosasacandidateintheResiduallayersotheresiduallayermaydowngradeintoanidentitymapping.4.2ExploringsearchspacesandsearchalgorithmsWeuseMobileNetV2[41]asanexampletodemonstratehowtoexplorenewsearchspacesandsearchalgorithms.Forafaircomparison,werstretraintheMobileNetV2modelonImageNettoobtainabaseline.Withourtrainingsetup,itachievesavalidationaccuracyof73.1%(Table3,row1)comparedwith72.0%intheoriginalMobileNetV2paper.Detailsaboutourexperimentsetup,searchspacedenitions,andthecodeforcreatingsearchspacescanbefoundinAppendixC.1.Searchspaceexploration:SimilartopreviousAutoMLworks[8,14],weexplore3searchspacesderivedfromMobileNetV2thattunethehyper-parametersoftheinvertedbottleneckunits[41]:(1)SearchspaceS1tunesthekernelsizeandexpansionratio.(2)SearchspaceS2tunestheoutputlters(3)SearchspaceS3combinesS1andS2totunethekernelsize,expansionratioandoutputlters.FromTable3,wecanseethatwithPyGlovewewereabletoconvertMobileNetV2intoS1with23linesofcode(row2)andS2with10linesofcode(row5).FromS1andS2,weobtainS3injustasinglelineofcode(row6)usingrebindwithchainingthetransformfunctionsfromS1andS2.Searchalgorithmexploration:Onthesearchalgorithmdimension,westartbyexploringdifferentsearchalgorithmsonS1usingblack-boxsearchalgorithms(RandomSearch[12],Bayesian[26])andthenefcientNAS(TuNAS[15]).Tomakemodelsizescomparable,weconstrainthesearchto300Mmultiply-adds3usingTuNAS'sabsoluterewardfunction[15].Toswitchbetweenthesealgorithms,weonlyhadtochange1lineofcode.Table3:ProgrammingcostofswitchingbetweenthreesearchspacesandthreeAutoMLalgorithmsbasedonPyGlove.Linesofcodeinredisthecostincreatingnewsearchspaces,whilethelinesofcodeinblackisthecostforswitchingalgorithms.TheunitcostforsearchandtrainingisdenedastheTPUhourstotrainaMobileNetV2modelonImageNetfor360epochs.ThetestaccuraciesandMAddsarebasedon3runs.#SearchspaceSearchalgorithmLinesofcodesSearchcostTraincostTestaccuracy#ofMAdds 1(static)N/AN/AN/A173.10.1300M 2(static)!S1RS+2325173.70.3("0.6)3003M3S1RS!Bayesian+125173.90.3("0.8)3015M4S1Bayesian!TuNAS+11174.20.1("1.1)3015M 5(static)!S2TuNAS+101173.30.1("0.2)3027M 6S1;S2!S3TuNAS+12173.80.1("0.7)3026M 4.3ExploringcomplexsearchowsonNAS-Bench-101 Figure13:MeanandstandarddeviationofsearchperformanceswithdifferentsearchowsonNAS-Bench-101(500runs),usingRegularizedEvolution[16].PyGlovecangreatlyreducetheengineeringcostwhenex-ploringcomplexsearchows.Inthissection,weexplorevariouswaystooptimizetheNAS-Bench-101searchspace.NAS-Bench-101isaNASbenchmarkwherethegoalistondhigh-performingimageclassiersinasearchspaceofneuralnetworkarchitectures.Thissearchspacerequiresop-timizingboththetypesofneuralnetworklayersusedinthemodel(e.g.,3x3Conv)andhowthelayersareconnected.Weexperimentwiththreesearchowsinthisexploration:1)wereproducetheoriginalpapertoestablishabaseline,whichusesthesearchspacedenedinFigure12-toptojointlyoptimizethenodesandedges.2)wetryafactorizedsearch,whichoptimizesthenodesintheouterloopandtheedgesintheinnerloop–therewardforanodesettingiscomputedastheaverage 3ForRSandBayesian,weuserejectionsamplingtoensuresampledarchitectureshavearound300MMAdds.8 oftop5rewardsfromthearchitecturessampledintheinnerloop.Whileitsperformanceisnotasgoodasthebaselineunderthesamesearchbudget,wesuspectthatundereachxednodesetting,theedgespaceisnotexploredenough.3)Toalleviatethisproblem,wecomeoutahybridsolution,whichusesthersthalfofthebudgettooptimizethenodesasinsearchow2,whileusingtheotherhalftooptimizetheedges,basedonthebestnodesettingfoundintherstphase.Interestingly,thesearchtrajectorycrossesoverthebaselineinthesecondphase,endedwithanoticeablemargin(Figure13).WeusedRegularizedEvolution[16]forallthesesearches,eachwith500runs.Ittakesonly15linesofcodetoimplementthefactorizedsearchand26linesofcodetoimplementthehybridsearch.SourcecodesareincludedinAppendixC.2.5RelatedWorkSoftwareframeworkshavegreatlyinuencedandfueledtheadvancementofmachinelearning.Theneedforcomputinggradientshasmadeauto-gradientbasedframeworks[36,37,42–45]ourish.Tosupportmodularmachinelearningprogramswiththeexibilitytomodifythem,frameworkswereintroducedwithanemphasisonhyper-parametermanagement[46,47].Thesensitivityofmachinelearningtohyper-parametersandmodelarchitecturehasledtotheadventofAutoMLlibraries[23–31].Some(e.g.,[23–25])formulateAutoMLasaproblemofjointlyoptimizingarchitecturesandhyper-parameters.Others(e.g.,[26–28])focusonprovidinginterfacesforblack-boxoptimization.Inparticular,Google'sVizierlibrary[26]providestoolsforoptimizingauser-speciedsearchspaceusingblack-boxalgorithms[12,48],butmakestheenduserresponsiblefortranslatingapointinthesearchspaceintoauserprogram.DeepArchitect[29]proposesalanguagetocreateasearchspaceasaprogramthatconnectsusercomponents.Keras-tuner[30]employsadifferentwaytoannotateamodelintoasearchspace,thoughthisannotationislimitedtoalistofsupportedcomponents.Optuna[49]embraceseagerevaluationoftunableparamete

11 rs,makingiteasytodeclareasearchspaceonth
rs,makingiteasytodeclareasearchspaceonthego(AppendixB.4).Meanwhile,efcientNASalgorithms[1,2,14]broughtnewchallengestoAutoMLframeworks,whichrequirecouplingbetweenthecontrollerandchildprogram.AutoGluon[28]andNNI[27]partiallysolvethisproblembybuildingpredenedmodulesthatworkinbothgeneralsearchmodeandweight-sharingmode,however,supportingdifferentefcientNASalgorithmsarestillnon-trivial.AmongtheexistingAutoMLsystemsweareawareof,complexsearchowsarelessexplored.Comparedtoexistingsystems,PyGloveemploysamutableprogrammingmodeltosolvetheseproblems,makingAutoMLeasilyaccessibletopreexistingMLprograms.Italsoaccommodatesthedynamicinteractionsamongthechildprograms,searchspaces,searchalgorithms,andsearchowstoprovidetheexibilityneededforfutureAutoMLresearch.Symbolicprogramming,whereaprogrammanipulatessymbolicrepresentations,hasalonghistorydatingbacktoLISP[20].Thesymbolicrepresentationcanbeprogramsasinmeta-programming,rulesasinlogicprogramming[50]andmathexpressionsasinsymboliccomputation[51,52].Inthiswork,weintroducethesymbolicprogrammingparadigmtoAutoMLbymanipulatingasymbolictree-basedrepresentationthatencodesthekeyelementsofamachinelearningprogram.Suchprogrammanipulationisalsoreminiscentofprogramsynthesis[53–55],whichsearchesforprogramstosolvedifferenttaskslikestringandnumbermanipulation[56–59],questionanswering[60,61],andlearningtasks[62,63].Ourmethodalsosharessimilaritieswithpriorworksinnon-deterministicprogramming[64–66],whichdenenon-deterministicoperatorslikechoiceintheprogrammingenvironmentthatcanbeconnectedtooptimizationalgorithms.Lastbutnotleast,ourworkechostheideaofbuildingrobustsoftwaresystemsthatcancopewithunanticipatedrequirementsviaadvancedsymbolicprogramming[67].6ConclusionInthispaper,wereformulateAutoMLasanautomatedprocessofmanipulatingaMLprogramthroughsymbolicprogramming.Underthisformulation,thecomplexinteractionsbetweenthechildprogram,thesearchspace,andthesearchalgorithmareelegantlydisentangled.Complexsearchowscanbeexpressedascompositionsoffor-loops,greatlysimplifyingtheprogramminginterfaceofAutoMLwithoutsacricingexibility.ThisisachievedbyresolvingtheconictbetweenAutoML'sintrinsicrequirementinmodifyingprogramsandtheimmutable-programpremiseofexistingsoftwarelibraries.WethenintroducePyGlove,ageneral-purposesymbolicprogramminglibraryforPythonwhichimplementsourmethodandistestedonreal-worldAutoMLscenarios.WithPyGlove,AutoMLcanbeeasilydroppedintopreexistingMLprograms,withallprogrampartssearchable,permittingrapidexplorationofdifferentdimensionsofAutoML.9 BroaderImpactSymbolicprogramming/PyGlovemakesAutoMLmoreaccessibletomachinelearningpractitioners,whichmeansmanualtrial-and-errorofmanycategoriescanbereplacedbymachines.ThiscanalsogreatlyincreasetheproductivityofAutoMLresearch,atthecostofincreasingdemandforcomputation,and–aresult–increasingCO2emissions.Weseeabigpotentialinsymbolicprogramming/PyGloveinmakingmachinelearningresearchersmoreproductive.Onanewgroundofmutableprograms,experimentscanbereproducedmoreeasily,modiedwithlowercost,andsharedlikedata.Alargevarietyofexperimentscanco-existinasharedcodebasethatmakescombiningandcomparingdifferenttechniquesmoreconvenient.Symbolicprogramming/PyGlovemakesitmucheasiertodevelopsearch-basedprogramswhichcanbeusedinabroadspectrumofresearchandproductareas.Somepotentialareas,suchasmedicinedesign,haveaclearsocietalbenet,whileotherspotentialapplications,suchasvideosurveillance,couldimprovesecuritywhileraisingnewprivacyconcerns.AcknowledgmentsandDisclosureofFundingWewouldliketothankPieter-JanKindermansandDavidDohanfortheirhelpinpreparingthecasestudysectionofthispaper;JiquanNgiam,RishabhSinghfortheirfeedbacktotheearlyversionsofthepaper;RuomingPang,VijayVasudevan,DaHuang,MingCheng,YanpingHuang,JieYang,JinsongMufortheirfeedbackatearlystageofPyGlove;AdamsYu,DanielPark,GolnazGhiasi,AzadeNazi,ThangLuong,BarretZoph,DavidSo,DanielDeFreitasAdiwardana,JunyangShen,LavRai,GuanhangWu,VishyTirumalashetty,PengchongJin,XianzhiDu,YeqingLi,XiaodanSong,AbhanshuSharma,CongLi,MeiChen,AleksandraFaust,YingjieMiao,JDCo-Reyes,KevinWu,YanqiZhang,BerkinAkin,AmirYazdanbakhsh,ShuyangCheng,HyoukJoongLee,PeishengLiandBarbaraWangforbeingearlyadoptersofPyGloveandtheirinvaluablefeedback.Fundingdisclosure:Thisworkwasdoneasapartoftheauthors'full-timejobinGoogle.References[1]HieuPham,MelodyYGuan,BarretZoph,QuocVLe,andJeffDean.Efcientneuralarchitecturesearchviaparametersharing.InTheInternationalConferenceonMachineLearning(ICML),pages4092–4101,2018.[2]HanxiaoLiu,KarenSimonyan,andYimingYang.DARTS:Differentiablearchitecturesearch.InInternationalConferenceonLearningRepresentations(ICLR),2019.[3]GÃ¡borMelis,ChrisDyer,andPhilBlunsom.Onthestateoftheartofevaluationinneurallanguagemodels.InInternationalConferenceonLearningRepresentations(ICLR),2018.[4]AlfredoCanziani,AdamPaszke,andEugenioCulurciello.Ananalysisofdeepneuralnetworkmodelsforpracticalapplications.arXivpreprintarXiv:1605.07678,2016.[5]AlexKrizhevsky,IlyaSutskever,andGeoffreyEHinton.Imagenetclassicationwithdeepconvolutionalneuralnetworks.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages1097–1105,2012.[6]MingxingTanandQuocLe.EfcientNet:Rethinkingmodelscalingforconvolutionalneuralnetworks.InTheInternationalConferenceonMachineLearning(ICML),pages6105–6114,2019.[7]BarretZoph,VijayVasudevan,Jon

12 athonShlens,andQuocVLe.Learningtransfera
athonShlens,andQuocVLe.Learningtransferablearchitecturesforscalableimagerecognition.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition(CVPR),pages8697–8710,2018.[8]MingxingTan,BoChen,RuomingPang,VijayVasudevan,MarkSandler,AndrewHoward,andQuocVLe.MNASNet:Platform-awareneuralarchitecturesearchformobile.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition(CVPR),pages2820–2828,2019.[9]BarretZophandQuocVLe.Neuralarchitecturesearchwithreinforcementlearning.InInternationalConferenceonLearningRepresentations(ICLR),2017.[10]BichenWu,XiaoliangDai,PeizhaoZhang,YanghanWang,FeiSun,YimingWu,YuandongTian,PeterVajda,YangqingJia,andKurtKeutzer.FbNet:Hardware-awareefcientconvnetdesignviadifferentiableneuralarchitecturesearch.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition(CVPR),pages10734–10742,2019.10 [11]XiaoliangDai,PeizhaoZhang,BichenWu,HongxuYin,FeiSun,YanghanWang,MaratDukhan,YunqingHu,YimingWu,YangqingJia,etal.ChamNet:Towardsefcientnetworkdesignthroughplatform-awaremodeladaptation.InProceedingsoftheIEEEConferenceonComputerVisionandPatternRecognition,pages11398–11407,2019.[12]JamesBergstraandYoshuaBengio.Randomsearchforhyper-parameteroptimization.TheJournalofMachineLearningResearch(JMLR),13(Feb):281–305,2012.[13]JasperSnoek,HugoLarochelle,andRyanPAdams.Practicalbayesianoptimizationofmachinelearningalgorithms.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages2951–2959,2012.[14]HanCai,LigengZhu,andSongHan.ProxylessNAS:Directneuralarchitecturesearchontargettaskandhardware.InInternationalConferenceonLearningRepresentations(ICLR),2019.[15]GabrielBender,HanxiaoLiu,BoChen,GraceChu,ShuyangCheng,Pieter-JanKindermans,andQuocLe.Canweightsharingoutperformrandomarchitecturesearch?aninvestigationwithTuNAS.InTheIEEEConferenceonComputerVisionandPatternRecognition,2020.[16]EstebanReal,AlokAggarwal,YanpingHuang,andQuocVLe.Regularizedevolutionforimageclassierarchitecturesearch.InAAAIConferenceonArticialIntelligence(AAAI),pages4780–4789,2019.[17]SiruiXie,HehuiZheng,ChunxiaoLiu,andLiangLin.Snas:Stochasticneuralarchitecturesearch,2020.[18]WeiWen,HanxiaoLiu,HaiLi,YiranChen,GabrielBender,andPieter-JanKindermans.Neuralpredictorforneuralarchitecturesearch.InProceedingsoftheEuropeanConferenceonComputerVision(ECCV),2020.[19]XuanyiDong,MingxingTan,AdamsWeiYu,DaiyiPeng,BogdanGabrys,andQuocVLe.AutoHAS:Efcienthyperparameterandarchitecturesearch.arXivpreprintarXiv:2006.03656,2020.[20]JohnMcCarthy.Recursivefunctionsofsymbolicexpressionsandtheircomputationbymachine,parti.CommunicationsoftheACM,3(4):184–195,1960.[21]Symbolicprogramming.https://en.wikipedia.org/wiki/Symbolic_programming.[22]EstebanReal,SherryMoore,AndrewSelle,SaurabhSaxena,YutakaLeonSuematsu,JieTan,QuocVLe,andAlexeyKurakin.Large-scaleevolutionofimageclassiers.InTheInternationalConferenceonMachineLearning(ICML),pages2902–2911,2017.[23]MatthiasFeurer,AaronKlein,KatharinaEggensperger,JostSpringenberg,ManuelBlum,andFrankHutter.Efcientandrobustautomatedmachinelearning.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages2962–2970,2015.[24]LarsKotthoff,ChrisThornton,HolgerHHoos,FrankHutter,andKevinLeyton-Brown.Auto-weka2.0:Automaticmodelselectionandhyperparameteroptimizationinweka.TheJournalofMachineLearningResearch(JMLR),18(1):826–830,2017.[25]MatthiasFeurer,AaronKlein,KatharinaEggensperger,JostTobiasSpringenberg,ManuelBlum,andFrankHutter.Auto-sklearn:efcientandrobustautomatedmachinelearning.InAutomatedMachineLearning,pages113–134.Springer,2019.[26]DanielGolovin,BenjaminSolnik,SubhodeepMoitra,GregKochanski,JohnKarro,andDSculley.Googlevizier:Aserviceforblack-boxoptimization.InTheSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,pages1487–1495,2017.[27]Neuralnetworkintelligence.https://github.com/microsoft/nni.[28]NickErickson,JonasMueller,AlexanderShirkov,HangZhang,PedroLarroy,MuLi,andAlexanderSmola.Autogluon-tabular:Robustandaccurateautomlforstructureddata.arXivpreprintarXiv:2003.06505,2020.[29]RenatoNegrinho,DarshanPatil,NghiaLe,DanielFerreira,MatthewGormley,andGeoffreyGordon.Towardsmodularandprogrammablearchitecturesearch.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages13715–13725,2019.[30]Kerastuner.https://github.com/keras-team/keras-tuner.[31]HaifengJin,QingquanSong,andXiaHu.Auto-keras:Anefcientneuralarchitecturesearchsystem.InTheSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,pages1946–1956,2019.[32]TheLegoGroup.Lego.https://en.wikipedia.org/wiki/Lego.[33]RonaldJWilliams.Simplestatisticalgradient-followingalgorithmsforconnectionistreinforcementlearning.Machinelearning,8(3-4):229–256,1992.11 [34]JeshuaBratman,SatinderSingh,JonathanSorg,andRichardLewis.Strongmitigation:Nestingsearchforgoodpolicieswithinsearchforgoodreward.InProceedingsofthe11thInternationalConferenceonAutonomousAgentsandMultiagentSystems-Volume1,pages407–414.InternationalFoundationforAutonomousAgentsandMultiagentSystems,2012.[35]ArberZela,AaronKlein,StefanFalkner,andFrankHutter.Towardsautomateddeeplearning:Efcientjointneuralarchitectureandhyperparametersearch.CoRR,abs/1807.06906,2018.[36]AdamPaszke,SamGross,FranciscoMassa,AdamLerer,JamesBradbury,GregoryChanan,TrevorKilleen,ZemingLin,NataliaGimelshein,LucaAn

13 tiga,etal.PyTorch:Animperativestyle,high
tiga,etal.PyTorch:Animperativestyle,high-performancedeeplearninglibrary.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages8024–8035,2019.[37]MartÃnAbadi,PaulBarham,JianminChen,ZhifengChen,AndyDavis,JeffreyDean,MatthieuDevin,SanjayGhemawat,GeoffreyIrving,MichaelIsard,etal.Tensorow:Asystemforlarge-scalemachinelearning.InThefUSENIXgSymposiumonOperatingSystemsDesignandImplementation(fOSDIg16),pages265–283,2016.[38]ChrisYing,AaronKlein,EricChristiansen,EstebanReal,KevinMurphy,andFrankHutter.Nas-bench-101:Towardsreproducibleneuralarchitecturesearch.InTheInternationalConferenceonMachineLearning(ICML),pages7105–7114,2019.[39]GolnazGhiasi,Tsung-YiLin,RuomingPang,andQuocV.Le.NAS-FPN:learningscalablefeaturepyramidarchitectureforobjectdetection.CoRR,abs/1904.07392,2019.[40]XuanyiDong,LuLiu,KatarzynaMusial,andBogdanGabrys.NATS-Bench:Benchmarkingnasalgorithmsforarchitecturetopologyandsize.arXivpreprintarXiv:2009.00437,2020.[41]MarkSandler,AndrewHoward,MenglongZhu,AndreyZhmoginov,andLiang-ChiehChen.Mobilenetv2:Invertedresidualsandlinearbottlenecks.InProceedingsoftheIEEEconferenceoncomputervisionandpatternrecognition,pages4510–4520,2018.[42]JamesBergstra,OlivierBreuleux,FrÃ©dÃ©ricBastien,PascalLamblin,RazvanPascanu,GuillaumeDes-jardins,JosephTurian,DavidWarde-Farley,andYoshuaBengio.Theano:acpuandgpumathexpressioncompiler.InProceedingsofthePythonforscienticcomputingconference(SciPy),pages18–24,2010.[43]YangqingJia,EvanShelhamer,JeffDonahue,SergeyKarayev,JonathanLong,RossGirshick,SergioGuadarrama,andTrevorDarrell.Caffe:Convolutionalarchitectureforfastfeatureembedding.InTheACMInternationalConferenceonMultimedia(ACMMM),pages675–678,2014.[44]SeiyaTokui.Chainer:Apowerful,exibleandintuitiveframeworkofneuralnetworks,2018.[45]RoyFrostig,MatthewJohnson,andChrisLeary.Compilingmachinelearningprogramsviahigh-leveltracing.InConferenceonMachineLearningandSystems(MLSys),2018.[46]JonathanShen,PatrickNguyen,YonghuiWu,ZhifengChen,MiaXChen,YeJia,AnjuliKannan,TaraSainath,YuanCao,Chung-ChengChiu,etal.Lingvo:amodularandscalableframeworkforsequence-to-sequencemodeling.arXivpreprintarXiv:1902.08295,2019.[47]Gin-cong.https://github.com/google/gin-cong.[48]MaxJaderberg,ValentinDalibard,SimonOsindero,WojciechMCzarnecki,JeffDonahue,AliRazavi,OriolVinyals,TimGreen,IainDunning,KarenSimonyan,etal.Populationbasedtrainingofneuralnetworks.arXivpreprintarXiv:1711.09846,2017.[49]TakuyaAkiba,ShotaroSano,ToshihikoYanase,TakeruOhta,andMasanoriKoyama.Optuna:Anext-generationhyperparameteroptimizationframework.InProceedingsofthe25rdACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining,2019.[50]AlainColmerauerandPhilippeRoussel.Thebirthofprolog.InHistoryofprogramminglanguages—II,pages331–367,1996.[51]WolframResearch,Inc.Mathematica,Version12.1.Champaign,IL,2020.[52]BrunoBuchbergeretal.Symboliccomputation(aneditorial).J.SymbolicComput,1(1):1–6,1985.[53]HaroldAbelsonandGeraldJaySussman.Structureandinterpretationofcomputerprograms.TheMITPress,1996.[54]SumitGulwani,SusmitJha,AshishTiwari,andRamarathnamVenkatesan.Synthesisofloop-freeprograms.ACMSIGPLANNotices,46(6):62–73,2011.[55]SumitGulwani,OleksandrPolozov,andRishabhSingh.Programsynthesis.FoundationsandTrendsR inProgrammingLanguages,4(1-2):1–119,2017.[56]OleksandrPolozovandSumitGulwani.Flashmeta:aframeworkforinductiveprogramsynthesis.InProceedingsoftheACMSIGPLANInternationalConferenceonObject-OrientedProgramming,Systems,Languages,andApplications(OOPSLA),pages107–126,2015.12 [57]EmilioParisotto,AbdelrahmanMohamed,RishabhSingh,LihongLi,DengyongZhou,andPushmeetKohli.Neuro-symbolicprogramsynthesis.InInternationalConferenceonLearningRepresentations(ICLR),2017.[58]JacobDevlin,JonathanUesato,SuryaBhupatiraju,RishabhSingh,AbdelrahmanMohamed,andPushmeetKohli.Robustll:Neuralprogramlearningundernoisyi/o.InTheInternationalConferenceonMachineLearning(ICML),pages990–998,2017.[59]MatejBalog,AlexanderLGaunt,MarcBrockschmidt,SebastianNowozin,andDanielTarlow.DeepCoder:Learningtowriteprograms.InInternationalConferenceonLearningRepresentations(ICLR),2017.[60]ChenLiang,MohammadNorouzi,JonathanBerant,QuocV.Le,andNiLao.Memoryaugmentedpolicyoptimizationforprogramsynthesisandsemanticparsing.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages9994–10006,2018.[61]ArvindNeelakantan,QuocV.Le,andIlyaSutskever.Neuralprogrammer:Inducinglatentprogramswithgradientdescent.InInternationalConferenceonLearningRepresentations(ICLR),2016.[62]EstebanReal,ChenLiang,DavidRSo,andQuocVLe.Automl-zero:Evolvingmachinelearningalgorithmsfromscratch.InTheInternationalConferenceonMachineLearning(ICML),2020.[63]LazarValkov,DipakChaudhari,AkashSrivastava,CharlesSutton,andSwaratChaudhuri.Houdini:Lifelonglearningasprogramsynthesis.InTheConferenceonNeuralInformationProcessingSystems(NeurIPS),pages8687–8698,2018.[64]D.AndreandS.Russell.Stateabstractioninprogrammablereinforcementlearning.InAAAIConferenceonArticialIntelligence(AAAI),2002.[65]HaraldSÃ¸ndergaardandPeterSestoft.Non-determinisminfunctionallanguages.TheComputerJournal,35(5):514–523,1992.[66]ArmandoSolar-Lezama.Thesketchingapproachtoprogramsynthesis.InAsianSymposiumonProgram-mingLanguagesandSystems,pages4–13.Springer,2009.[67]G.Sussman.Buildingrobustsystemsanessay.InMassachusettsInstituteofTechnolog

PyGloveSymbolicProgrammingforAutomatedMachineLearning - PDF document

PyGloveSymbolicProgrammingforAutomatedMachineLearning - PPT Presentation

Share:

Link:

Embed:

Related Contents