geurtsulgacbe Gilles Louppe glouppeulgacbe Department of Electrical Engineering and Computer Science GIGAR University of Liege Institut Monte64257ore Sart Tilman B28 B4000 Liege Belgium Editor Olivier Chapelle Yi Chang TieYan Liu Abstract In this pa ID: 12430
Download Pdf The PPT/PDF document "JMLR Workshop and Conference Proceedings..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
GeurtsLouppeTable1:Descriptionofthechallengedatasets set1set2TrainValTestTrainValTest Nbqueries19,9442,9946,9831,2661,2663,798NbURLs473,13471,083165,66034,81534,881103,174 andthegoalwastoleveragetherstdatasettoimprovetheperformanceontheseconddataset.Ourapproachtothischallengeisprimarilybasedontheuseofensemblesofrandomizedregressiontrees,inparticularExtremelyRandomizedTrees(Geurtsetal.,2006).Althoughweexperimentedwithseveraloutputencodings,webasicallyaddressedthisproblemasastandardregressionproblemtryingtopredicttherelevancescoreasafunctionoftheinputfeatures.Onthersttrackofthechallenge,oursubmissionwasrankedatthe10thposition.Althoughthisresultwasobtainedwithaheterogeneous(andnotveryelegant)ensembleof16000randomizedtreesproducedbycombiningBagging,RandomForestsandExtremelyRandomizedTrees,ourpost-challengeexperimentsshowthatusingExtremelyRandomizedTreesalonewithdefaultparametersettingsandonly1000treesyieldsexactlythesamerank.Onthesecondtrack,oursubmissionwasrankedatthe4thposition.Wehaveexperimentedwithvarioussimpletransferlearningmethodsbutunfortunatelynoneofthesereallyoutperformedamodellearnedonlyfromtheseconddataset.Interestingly,post-challengeexperimentsshowhoweverthatsomeofthetransferapproachesthatwetestedwouldhaveactuallyimprovedperformanceswithrespecttoamodellearnedonlyfromtheseconddatasetifthesizeofthisdatasetwouldhavebeensmaller.Therestofthispaperisorganizedasfollows.Section2describesthetwodatasetsandtherulesofthechallenge.InSection3,webrie yreviewstandardregressiontreesandensemblemethodsandthenpresentseveralsimpleadaptationsofthesemethodsforthelearningtorankproblemandfortransferlearning.OurexperimentsonbothtracksarethendescribedinSection4,whichendswithananalysisofthecomputationalrequirementsofouralgorithms.WeconcludeanddiscussfutureworkdirectionsinSection5.2.Learningtorankchallenge:dataandprotocolContestantsoftheLTRchallengewereprovidedwithtwodistinctdatasetscollectedatYahoo!fromreal-worldwebsearchrankingdataoftwodierentcountries.Bothdatasets(respectivelylabeledasset1andset2)consistofquery-URLpairsrepresentedasfeaturevectors.415featuresarecommontothetwosets,104arespecictoset1and181arespecictoset2.Allofthemarenumericalfeaturesnormalizedbetween0and1.Foreachquery-URLpair,anumericalrelevancescorealsoindicateshowwelltheURLmatchesthecorrespondingquery,0beingirrelevantand4beingperfectlyrelevant.Thequeries,URLsandfeaturessemanticshavenotbeenrevealed.Bothdatasetswererandomlysplitinto3subsets,training,validationandtest(seeTable1fortheirrespectivesizes),andrelevancescoresweregivenforthetrainingsubsetsonly.Inthersttrackofthechallenge,contestantswereaskedtotrainarankingfunctionon50 GeurtsLouppebeensuccessfulinthecontextofranking(Lietal.,2007;Wuetal.,2008),wewillfocusinthispaperonrandomizationmethods.Randomizationmethodsproducedierentmodelsfromasingleinitiallearningsamplebyintroducingrandomperturbationsintothelearningprocedure.Forexample,Bagging(Breiman,1996)buildseachtreeoftheensemblefromarandombootstrapcopyoftheoriginaldata.Anotherpopulartree-basedrandomizationmethodistheRandomForestsalgorithmproposedbyBreiman(2001),whichcombinesthebootstrapsamplingideaofBaggingwithafurtherrandomizationoftheinputfeaturesthatareusedascandidatestosplitaninteriornodeofthetree.Insteadoflookingforthebestsplitamongallfeatures,thealgorithmrstselects,ateachnode,asubsetofKfeaturesatrandomandthendeterminesthebesttestoverthesefeaturestoeectivelysplitthenode.Thismethodisveryeectiveandhasfoundmanysuccessfulapplicationsinvariouselds.ExtremelyRandomizedTrees.MostexperimentsinthispaperwillbecarriedoutwithaparticularrandomizationmethodcalledExtremelyRandomizedTrees(ET)proposedbyGeurtsetal.(2006).ThismethodissimilartotheRandomForestsalgorithminthesensethatitisbasedonselectingateachnodearandomsubsetofKfeaturestodecideonthesplit.UnlikeintheRandomForestsmethod,eachtreeisbuiltfromthecompletelearningsample(nobootstrapcopying)and,mostimportantly,foreachofthefeatures(randomlyselectedateachinteriornode)adiscretizationthreshold(cut-point)isselectedatrandomtodeneasplit,insteadofchoosingthebestcut-pointbasedonthelocalsample(asinTreeBaggingorintheRandomForestsmethod).Asaconsequence,whenKisxedtoone,theresultingtreestructureisactuallyselectedindependentlyoftheoutputlabelsofthetrainingset.Inpractice,thealgorithmonlydependsonasinglemainparameter,K.GooddefaultvaluesofKhavebeenfoundempiricallytobeK=p pforclassicationproblemsandK=pforregressionproblems,wherepisthenumberofinputfeatures(Geurtsetal.,2006).Experimentsin(Geurtsetal.,2006)showthatthismethodismostofthetimecom-petitivewithRandomForestsintermsofaccuracy,andsometimessuperior.Becauseitremovestheneedfortheoptimizationofthediscretizationthresholds,ithasalsoaclearadvantageintermsofcomputingtimesandeaseofimplementation.Wereferthereaderto(Geurtsetal.,2006)foramoreformaldescriptionofthealgorithmandadetaileddiscussionofitsmainfeatures.Summary.Themainstrengthoftree-basedrandomizationmethodsisthefactthattheyare(almost)parameter-freewhilestillabletolearnnon-linearmodels,hencemakingthemgoodall-purposeando-the-shelfmethods(Hastieetal.,2001).Theircomputingtimesarealsoverycompetitiveand,throughfeatureimportancemeasures,theycanprovidesomeinsightabouttheproblemathand(Hastieetal.,2001).Onthedownside,theiraccuracyissometimesnotatthelevelofthestate-of-the-artonsomespecicclassesofproblemsandtheirstoragerequirementmightbeprohibitiveinsomeapplications.3.2.OutputencodingsInmostofourexperiments,wehaveadoptedasimplepointwiseregressionapproachforsolvingtherankingproblemwithensemblesofrandomizedtrees.Eachquery-URLpair52 LearningtorankwithextremelyrandomizedtreesTable2:TypicalstandarderrorsforERRandNCDGonbothtracks set1validationset1testERRNDCGERRNDCG 0:0060:0040:0040:0025 set2validationset2testERRNDCGERRNDCG 0:0080:0050:0050:003 Table3:Comparisonofdierenttree-basedensemblemethods(with500trees.Inbold:bestvaluesineachcolumn) set1validationset1testMethodERRNDCGERRNDCG Random0.27790.58090.28300.5783ET,K=p p0.45330.78410.45950.7896ET,K=p0.45640.79070.46200.7948RF,K=p p0.45520.78840.46110.7923RF,K=p(Bagging)0.45530.78660.46290.7934 ThetwomethodsarenotverysensitivetothevalueofK,althoughinbothcases,thebestresultisobtainedwithK=p.BothmethodsareverycloseintermsofperformancewithaslightadvantagetoETK=ponthevalidationsampleandaslightadvantagetoRFK=p(whichisequivalenttoBagginginthiscase)onthetestsample.Thesedierencesarehowevernotstatisticallysignicant.Eectoftheensemblesize.Table4showstheeectofthenumberoftreeswithETandK=ponERRandNDCGonthevalidationandtestsamples.Thelastcolumngivestherankthateachentrywouldhaveobtainedatthechallenge.Asexpected,boththeERRandtheNDCGmonotonicallyincreaseasthesizeoftheensembleincreases.ERRhoweversaturatesat1000trees.Althoughslight,theimprovementfrom500to1000treesisstatisticallysignicant(p-value=0.0009)anditmakesthemethodclimbsfromthe17thranktothe10thrank.Outputencoding.ThevedierentoutputencodingmethodsarecomparedinTable5,ineachcasewithET,K=pand500trees.Regr-probaisthebestapproachonthevalidationsetbutitislessgoodthanRegronthetestset.Onthetestset,onlyClas-mocisbetterthanthestraightforwardregressionencodingandthedierenceissignicant(p-value=0.03).However,becauseitbuildsfourbinaryclassiers,thisvariantusesfourtimesmoretreesthantheothermethodsandasshowninTable4,theregressionapproachreachesthesametestERRwith1000treesormore.UnlikewhatwasfoundinLietal.(2007),thetwoclassicationencodingsarethusnotinterestinginourcontext.55 LearningtorankwithextremelyrandomizedtreesTable7:Comparisonofvariousapproachesfortransferlearning(withET,K=p,1000trees.Inbold:bestvaluesineachcolumn) set2validationset2testMethodERRNDCGERRNDCG Random0.25670.52990.25970.52511M(set2;FC+F2)0.45140.78590.46110.78102M(set1;FC)0.43530.74410.44720.74093M(set1+(1)set2;FC+F1+F2)0.45140.78590.46110.78104M(set1;FC)+(1)M(set2;FC+F2)0.45140.78590.46110.78105M(set2;FC+F2+M(set1;FC))0.44970.78530.46250.78306M(set2M(set1;FC);FC+F2)0.44870.78110.45940.7784 4.2.Track2:transferlearningForourexperimentsonthesecondtrack,wefocusedonthepointwiseregressionapproachwiththeETalgorithmandinvestigatedseveralwaystocombinethetwodatasets.Comparisonoftransferlearningapproaches.AllmethodsdiscussedinSection3.3arecomparedinTable7,inallcaseswithET,K=p,and1000trees.Formethods3and4,thevalueofwasoptimizedonthevalidationset.ForMethod3,wasselectedinf0:0;0:05;0:125;0:25;0:5;1:0g.ForMethod4thatdoesnotrequiretorelearnamodelforeachnewvalue,wescreenedallvaluesofin[0;1]withastepsizeof0.01.Onthevalidationset,nomethodisabletoimprovewithrespecttotheuseoftheset2data.Themodellearnedfromset1isclearlyinferiorbutitisneverthelessquitegoodcomparedtotherandommodel.Forbothmethods3and4,theoptimalvalueofisactually0,meaningthatthesetwomodelsareequivalenttothemodellearnedfromset2.1Method5,whichintroducesthepredictionofthemodellearnedonset1amongthefeatures,showsthebestperformancesoverallonthetestset.However,theimprovementoverthebasemodelisonlyslightandactuallynotstatisticallysignicant.Experimentswithlessset2data.Onepotentialreasonforthesedisappointingresultscouldbethat,giventhesizeoftheset2learningsample,thelimitofwhatcanbeobtainedwithourmodelfamilyonthisproblemisalreadyreached.Asaconsequence,set1dataisnotbringinganyusefuladditionalinformation.Tocheckwhetherthetransfermethodscouldstillbringsomeimprovementinthepresenceoflessset2data,werepeatedthesameexperimentbutreducingthistimeby95%thesizeoftheset2learningsample(i.e.,byrandomlysampling5%ofthequeries).AverageresultsoververandomsamplingsarereportedinTable8,withalltransfermethods.Withonly5%oftheset2data(about65queries),themodellearnedonset2dataisnowcomparablewiththemodellearnedonset1data.Transfermethods3,4,and5improvewithrespecttoboththemodellearnedfrom 1.NotethatthevalueofthatoptimizesERRonthetestsetis0.22.ItcorrespondstoanERRof0.4622,whichisslightlybetterthanERRofthemodellearnedonset2alone.57 LearningtorankwithextremelyrandomizedtreesTable10:Computingtimesandstoragerequirementsofdierentmethods MethodLearning1Testing2Nbnodes3Storage4ERRset1test5 ET,K=p296s492ms280K1.3MB0.4620RF,K=p618s469ms140K663kB0.4629ET,K=p p20s543ms400K1.8MB0.4595RF,K=p p28s507ms205K959kB0.4552ET,K=p,nmin=50268s336ms47K215kB0.4626ET,K=p,nmin=100254s260ms25K115kB0.4613 1Computingtimesinsecondsforlearningonetreeonset1learningsample2Computingtimesinmillisecondsfortestingonetreeonthevalidationandtestsamplesofset13Totalnumberofnodesinonetreebuiltonset1learningsample4Storagerequirementforonetree5ERRobtainedwith500treesontheset1testsample4.3.ComputationalconsiderationsToconcludeourexperiments,wereportinthissectiononthecomputationalrequirementsofouralgorithms.Table10showstrainingandtestingtimes,aswellasstoragerequirementsofdierenttree-basedmethodsallcomputedonset1.3Inadditiontothemethodsdiscussedabove,wealsoanalysedinthistabletheeectofthestopsplittingparameternminthatreducesthecomplexityofthetrees.Thesestatisticshavebeenobtainedwithanunoptimizedcodeintendedonlyforresearchpurpose4andshouldthusbetakenwithsomecaution.ETaretwiceasfastasRFforK=pand30%fasterforK=p p.Methodsdiergreatlyintermsofthesizeofthemodels(from25Knodesto400Knodes)butpredictiontimes,whichismainlyrelatedtothetreedepth,onlyvaryby50%fromtheslowesttothefastestmethod.Giventhatallmethodsareverycloseintermsofaccuracy,retrospectively,ETwithK=pandnmin=100seemstobethebestmethodasitproducesmuchsmallermodelwithnoapparentdegradationintermsofERRscore.Ifonecansacricesomeaccuracytoreducecomputingtimes,thenETwithK=p pisalsoaninterestingtradeo.500modelsarebuiltinlessthan3hours(onasinglecore)anditsERRscoreisonly2%lowerthantheERRscoreofthebestperformingmethodofthechallenge.Notethatofcourse,bettertradeosforagivenapplicationcouldbeachievedbyadoptingotherparametersettings(e.g.,lesstreesintheensembleorothervaluesofKornmin).Giventhattreesintheensemblearealsobuiltindependentlyofeachother,theparallelizationofthealgorithmistrivial. 3.ThesoftwarehasbeenimplementedinCwithaMATLAB(TheMathWorks,Inc.)interface.Computingtimesexcludesloadingtimesandassumesthatalldataarestoredinmainmemory.TheyhavebeencomputedonaMacProQuad-coreIntelXeonNehalem2,66GHzwith16Goofmemory.Storagere-quirementshavebeenderivedfrombinarymatlesgeneratedwithMATLABassumingthatallnumbers(i.e.,testedattributes,nodeindicesanddiscretizationthresholds)arestoredinsingleprecision.4.Softwarecanbedownloadedfromhttp://www.montefiore.ulg.ac.be/~geurts/Software.html.59 LearningtorankwithextremelyrandomizedtreesReferencesL.Breiman.Baggingpredictors.MachineLearning,24(2):123{140,1996.L.Breiman.Randomforests.MachineLearning,45:5{32,2001.L.Brei-man,J.H.Friedman,R.A.Olsen,andC.J.Stone.ClassicationandRegressionTrees.WadsworthInternational(California),1984.O.ChapelleandY.Chang.Yahoo!learningtorankchallengeoverview.JMLRWorkshopandConferenceProceedings,14:1{24,2011.O.Chapelle,D.Metlzer,Y.Zhang,andP.Grinspan.Expectedreciprocalrankforgradedrelevance.InProceedingofthe18thACMconferenceonInformationandknowledgemanagement,pages621{630.ACM,2009.K.Chen,R.Lu,C.K.Wong,G.Sun,L.Heck,andB.Tseng.Trada:treebasedrankingfunctionadaptation.InProceedingofthe17thACMconferenceonInformationandknowledgemanagement,CIKM'08,pages1143{1152,NewYork,NY,USA,2008.ACM.J.Gao,Q.Wu,C.Burges,K.Svore,Y.Su,N.Khan,S.Shah,andH.Zhou.Modeladaptationviamodelinterpolationandboostingforwebsearchranking.InProceedingsofthe2009ConferenceonEmpiricalMethodsinNaturalLanguageProcessing:Volume2-Volume2,EMNLP'09,pages505{513,Morristown,NJ,USA,2009.AssociationforComputationalLinguistics.P.Geurts,D.Ernst,andL.Wehenkel.Extremelyrandomizedtrees.MachineLearning,36(1):3{42,2006.T.Hastie,R.Tibshirani,andJ.Friedman.TheElementsofStatisticalLearning.SpringerSeriesinStatistics.SpringerNewYorkInc.,NewYork,NY,USA,2001.K.JarvelinandJ.Kekalainen.Cumulatedgain-basedevaluationofIRtechniques.ACMTransactionsonInformationSystems(TOIS),20(4):446,2002.P.Li,C.Burges,andQ.Wu.Learningtorankusingclassicationandgradientboosting.ProceedingsoftheInternationalConferenceonAdvancesinNeuralInformationProcess-ingSystems(NIPS),2007.S.J.PanandQ.Yang.Asurveyontransferlearning.IEEETransactionsonKnowledgeandDataEngineering,22(10):1345{1359,October2010.ISSN1041-4347.Q.Wu,C.J.C.Burges,K.M.Svore,andJ.Gao.Ranking,boosting,andmodeladaptation.TecnicalReport,MSR-TR-2008-109,2008.J.Ye,J.-H.Chow,J.Chen,andZ.Zheng.Stochasticgradientboosteddistributedde-cisiontrees.InProceedingofthe18thACMconferenceonInformationandknowledgemanagement,CIKM'09,pages2061{2064,NewYork,NY,USA,2009.ACM.61