/
JMLR Workshop and Conference Proceedings     Yahoo Learning to Rank Challenge Learning JMLR Workshop and Conference Proceedings     Yahoo Learning to Rank Challenge Learning

JMLR Workshop and Conference Proceedings Yahoo Learning to Rank Challenge Learning - PDF document

yoshiko-marsland
yoshiko-marsland . @yoshiko-marsland
Follow
515 views
Uploaded On 2014-11-15

JMLR Workshop and Conference Proceedings Yahoo Learning to Rank Challenge Learning - PPT Presentation

geurtsulgacbe Gilles Louppe glouppeulgacbe Department of Electrical Engineering and Computer Science GIGAR University of Liege Institut Monte64257ore Sart Tilman B28 B4000 Liege Belgium Editor Olivier Chapelle Yi Chang TieYan Liu Abstract In this pa ID: 12430

geurtsulgacbe Gilles Louppe glouppeulgacbe Department

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "JMLR Workshop and Conference Proceedings..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

GeurtsLouppeTable1:Descriptionofthechallengedatasets set1set2TrainValTestTrainValTest Nbqueries19,9442,9946,9831,2661,2663,798NbURLs473,13471,083165,66034,81534,881103,174 andthegoalwastoleveragethe rstdatasettoimprovetheperformanceontheseconddataset.Ourapproachtothischallengeisprimarilybasedontheuseofensemblesofrandomizedregressiontrees,inparticularExtremelyRandomizedTrees(Geurtsetal.,2006).Althoughweexperimentedwithseveraloutputencodings,webasicallyaddressedthisproblemasastandardregressionproblemtryingtopredicttherelevancescoreasafunctionoftheinputfeatures.Onthe rsttrackofthechallenge,oursubmissionwasrankedatthe10thposition.Althoughthisresultwasobtainedwithaheterogeneous(andnotveryelegant)ensembleof16000randomizedtreesproducedbycombiningBagging,RandomForestsandExtremelyRandomizedTrees,ourpost-challengeexperimentsshowthatusingExtremelyRandomizedTreesalonewithdefaultparametersettingsandonly1000treesyieldsexactlythesamerank.Onthesecondtrack,oursubmissionwasrankedatthe4thposition.Wehaveexperimentedwithvarioussimpletransferlearningmethodsbutunfortunatelynoneofthesereallyoutperformedamodellearnedonlyfromtheseconddataset.Interestingly,post-challengeexperimentsshowhoweverthatsomeofthetransferapproachesthatwetestedwouldhaveactuallyimprovedperformanceswithrespecttoamodellearnedonlyfromtheseconddatasetifthesizeofthisdatasetwouldhavebeensmaller.Therestofthispaperisorganizedasfollows.Section2describesthetwodatasetsandtherulesofthechallenge.InSection3,webrie yreviewstandardregressiontreesandensemblemethodsandthenpresentseveralsimpleadaptationsofthesemethodsforthelearningtorankproblemandfortransferlearning.OurexperimentsonbothtracksarethendescribedinSection4,whichendswithananalysisofthecomputationalrequirementsofouralgorithms.WeconcludeanddiscussfutureworkdirectionsinSection5.2.Learningtorankchallenge:dataandprotocolContestantsoftheLTRchallengewereprovidedwithtwodistinctdatasetscollectedatYahoo!fromreal-worldwebsearchrankingdataoftwodi erentcountries.Bothdatasets(respectivelylabeledasset1andset2)consistofquery-URLpairsrepresentedasfeaturevectors.415featuresarecommontothetwosets,104arespeci ctoset1and181arespeci ctoset2.Allofthemarenumericalfeaturesnormalizedbetween0and1.Foreachquery-URLpair,anumericalrelevancescorealsoindicateshowwelltheURLmatchesthecorrespondingquery,0beingirrelevantand4beingperfectlyrelevant.Thequeries,URLsandfeaturessemanticshavenotbeenrevealed.Bothdatasetswererandomlysplitinto3subsets,training,validationandtest(seeTable1fortheirrespectivesizes),andrelevancescoresweregivenforthetrainingsubsetsonly.Inthe rsttrackofthechallenge,contestantswereaskedtotrainarankingfunctionon50 GeurtsLouppebeensuccessfulinthecontextofranking(Lietal.,2007;Wuetal.,2008),wewillfocusinthispaperonrandomizationmethods.Randomizationmethodsproducedi erentmodelsfromasingleinitiallearningsamplebyintroducingrandomperturbationsintothelearningprocedure.Forexample,Bagging(Breiman,1996)buildseachtreeoftheensemblefromarandombootstrapcopyoftheoriginaldata.Anotherpopulartree-basedrandomizationmethodistheRandomForestsalgorithmproposedbyBreiman(2001),whichcombinesthebootstrapsamplingideaofBaggingwithafurtherrandomizationoftheinputfeaturesthatareusedascandidatestosplitaninteriornodeofthetree.Insteadoflookingforthebestsplitamongallfeatures,thealgorithm rstselects,ateachnode,asubsetofKfeaturesatrandomandthendeterminesthebesttestoverthesefeaturestoe ectivelysplitthenode.Thismethodisverye ectiveandhasfoundmanysuccessfulapplicationsinvarious elds.ExtremelyRandomizedTrees.MostexperimentsinthispaperwillbecarriedoutwithaparticularrandomizationmethodcalledExtremelyRandomizedTrees(ET)proposedbyGeurtsetal.(2006).ThismethodissimilartotheRandomForestsalgorithminthesensethatitisbasedonselectingateachnodearandomsubsetofKfeaturestodecideonthesplit.UnlikeintheRandomForestsmethod,eachtreeisbuiltfromthecompletelearningsample(nobootstrapcopying)and,mostimportantly,foreachofthefeatures(randomlyselectedateachinteriornode)adiscretizationthreshold(cut-point)isselectedatrandomtode neasplit,insteadofchoosingthebestcut-pointbasedonthelocalsample(asinTreeBaggingorintheRandomForestsmethod).Asaconsequence,whenKis xedtoone,theresultingtreestructureisactuallyselectedindependentlyoftheoutputlabelsofthetrainingset.Inpractice,thealgorithmonlydependsonasinglemainparameter,K.GooddefaultvaluesofKhavebeenfoundempiricallytobeK=p pforclassi cationproblemsandK=pforregressionproblems,wherepisthenumberofinputfeatures(Geurtsetal.,2006).Experimentsin(Geurtsetal.,2006)showthatthismethodismostofthetimecom-petitivewithRandomForestsintermsofaccuracy,andsometimessuperior.Becauseitremovestheneedfortheoptimizationofthediscretizationthresholds,ithasalsoaclearadvantageintermsofcomputingtimesandeaseofimplementation.Wereferthereaderto(Geurtsetal.,2006)foramoreformaldescriptionofthealgorithmandadetaileddiscussionofitsmainfeatures.Summary.Themainstrengthoftree-basedrandomizationmethodsisthefactthattheyare(almost)parameter-freewhilestillabletolearnnon-linearmodels,hencemakingthemgoodall-purposeando -the-shelfmethods(Hastieetal.,2001).Theircomputingtimesarealsoverycompetitiveand,throughfeatureimportancemeasures,theycanprovidesomeinsightabouttheproblemathand(Hastieetal.,2001).Onthedownside,theiraccuracyissometimesnotatthelevelofthestate-of-the-artonsomespeci cclassesofproblemsandtheirstoragerequirementmightbeprohibitiveinsomeapplications.3.2.OutputencodingsInmostofourexperiments,wehaveadoptedasimplepointwiseregressionapproachforsolvingtherankingproblemwithensemblesofrandomizedtrees.Eachquery-URLpair52 LearningtorankwithextremelyrandomizedtreesTable2:TypicalstandarderrorsforERRandNCDGonbothtracks set1validationset1testERRNDCGERRNDCG 0:0060:0040:0040:0025 set2validationset2testERRNDCGERRNDCG 0:0080:0050:0050:003 Table3:Comparisonofdi erenttree-basedensemblemethods(with500trees.Inbold:bestvaluesineachcolumn) set1validationset1testMethodERRNDCGERRNDCG Random0.27790.58090.28300.5783ET,K=p p0.45330.78410.45950.7896ET,K=p0.45640.79070.46200.7948RF,K=p p0.45520.78840.46110.7923RF,K=p(Bagging)0.45530.78660.46290.7934 ThetwomethodsarenotverysensitivetothevalueofK,althoughinbothcases,thebestresultisobtainedwithK=p.BothmethodsareverycloseintermsofperformancewithaslightadvantagetoETK=ponthevalidationsampleandaslightadvantagetoRFK=p(whichisequivalenttoBagginginthiscase)onthetestsample.Thesedi erencesarehowevernotstatisticallysigni cant.E ectoftheensemblesize.Table4showsthee ectofthenumberoftreeswithETandK=ponERRandNDCGonthevalidationandtestsamples.Thelastcolumngivestherankthateachentrywouldhaveobtainedatthechallenge.Asexpected,boththeERRandtheNDCGmonotonicallyincreaseasthesizeoftheensembleincreases.ERRhoweversaturatesat1000trees.Althoughslight,theimprovementfrom500to1000treesisstatisticallysigni cant(p-value=0.0009)anditmakesthemethodclimbsfromthe17thranktothe10thrank.Outputencoding.The vedi erentoutputencodingmethodsarecomparedinTable5,ineachcasewithET,K=pand500trees.Regr-probaisthebestapproachonthevalidationsetbutitislessgoodthanRegronthetestset.Onthetestset,onlyClas-mocisbetterthanthestraightforwardregressionencodingandthedi erenceissigni cant(p-value=0.03).However,becauseitbuildsfourbinaryclassi ers,thisvariantusesfourtimesmoretreesthantheothermethodsandasshowninTable4,theregressionapproachreachesthesametestERRwith1000treesormore.UnlikewhatwasfoundinLietal.(2007),thetwoclassi cationencodingsarethusnotinterestinginourcontext.55 LearningtorankwithextremelyrandomizedtreesTable7:Comparisonofvariousapproachesfortransferlearning(withET,K=p,1000trees.Inbold:bestvaluesineachcolumn) set2validationset2testMethodERRNDCGERRNDCG Random0.25670.52990.25970.52511M(set2;FC+F2)0.45140.78590.46110.78102M(set1;FC)0.43530.74410.44720.74093M( set1+(1� )set2;FC+F1+F2)0.45140.78590.46110.78104 M(set1;FC)+(1� )M(set2;FC+F2)0.45140.78590.46110.78105M(set2;FC+F2+M(set1;FC))0.44970.78530.46250.78306M(set2�M(set1;FC);FC+F2)0.44870.78110.45940.7784 4.2.Track2:transferlearningForourexperimentsonthesecondtrack,wefocusedonthepointwiseregressionapproachwiththeETalgorithmandinvestigatedseveralwaystocombinethetwodatasets.Comparisonoftransferlearningapproaches.AllmethodsdiscussedinSection3.3arecomparedinTable7,inallcaseswithET,K=p,and1000trees.Formethods3and4,thevalueof wasoptimizedonthevalidationset.ForMethod3, wasselectedinf0:0;0:05;0:125;0:25;0:5;1:0g.ForMethod4thatdoesnotrequiretorelearnamodelforeachnew value,wescreenedallvaluesof in[0;1]withastepsizeof0.01.Onthevalidationset,nomethodisabletoimprovewithrespecttotheuseoftheset2data.Themodellearnedfromset1isclearlyinferiorbutitisneverthelessquitegoodcomparedtotherandommodel.Forbothmethods3and4,theoptimalvalueof isactually0,meaningthatthesetwomodelsareequivalenttothemodellearnedfromset2.1Method5,whichintroducesthepredictionofthemodellearnedonset1amongthefeatures,showsthebestperformancesoverallonthetestset.However,theimprovementoverthebasemodelisonlyslightandactuallynotstatisticallysigni cant.Experimentswithlessset2data.Onepotentialreasonforthesedisappointingresultscouldbethat,giventhesizeoftheset2learningsample,thelimitofwhatcanbeobtainedwithourmodelfamilyonthisproblemisalreadyreached.Asaconsequence,set1dataisnotbringinganyusefuladditionalinformation.Tocheckwhetherthetransfermethodscouldstillbringsomeimprovementinthepresenceoflessset2data,werepeatedthesameexperimentbutreducingthistimeby95%thesizeoftheset2learningsample(i.e.,byrandomlysampling5%ofthequeries).Averageresultsover verandomsamplingsarereportedinTable8,withalltransfermethods.Withonly5%oftheset2data(about65queries),themodellearnedonset2dataisnowcomparablewiththemodellearnedonset1data.Transfermethods3,4,and5improvewithrespecttoboththemodellearnedfrom 1.Notethatthevalueof thatoptimizesERRonthetestsetis0.22.ItcorrespondstoanERRof0.4622,whichisslightlybetterthanERRofthemodellearnedonset2alone.57 LearningtorankwithextremelyrandomizedtreesTable10:Computingtimesandstoragerequirementsofdi erentmethods MethodLearning1Testing2Nbnodes3Storage4ERRset1test5 ET,K=p296s492ms280K1.3MB0.4620RF,K=p618s469ms140K663kB0.4629ET,K=p p20s543ms400K1.8MB0.4595RF,K=p p28s507ms205K959kB0.4552ET,K=p,nmin=50268s336ms47K215kB0.4626ET,K=p,nmin=100254s260ms25K115kB0.4613 1Computingtimesinsecondsforlearningonetreeonset1learningsample2Computingtimesinmillisecondsfortestingonetreeonthevalidationandtestsamplesofset13Totalnumberofnodesinonetreebuiltonset1learningsample4Storagerequirementforonetree5ERRobtainedwith500treesontheset1testsample4.3.ComputationalconsiderationsToconcludeourexperiments,wereportinthissectiononthecomputationalrequirementsofouralgorithms.Table10showstrainingandtestingtimes,aswellasstoragerequirementsofdi erenttree-basedmethodsallcomputedonset1.3Inadditiontothemethodsdiscussedabove,wealsoanalysedinthistablethee ectofthestopsplittingparameternminthatreducesthecomplexityofthetrees.Thesestatisticshavebeenobtainedwithanunoptimizedcodeintendedonlyforresearchpurpose4andshouldthusbetakenwithsomecaution.ETaretwiceasfastasRFforK=pand30%fasterforK=p p.Methodsdi ergreatlyintermsofthesizeofthemodels(from25Knodesto400Knodes)butpredictiontimes,whichismainlyrelatedtothetreedepth,onlyvaryby50%fromtheslowesttothefastestmethod.Giventhatallmethodsareverycloseintermsofaccuracy,retrospectively,ETwithK=pandnmin=100seemstobethebestmethodasitproducesmuchsmallermodelwithnoapparentdegradationintermsofERRscore.Ifonecansacri cesomeaccuracytoreducecomputingtimes,thenETwithK=p pisalsoaninterestingtradeo .500modelsarebuiltinlessthan3hours(onasinglecore)anditsERRscoreisonly2%lowerthantheERRscoreofthebestperformingmethodofthechallenge.Notethatofcourse,bettertradeo sforagivenapplicationcouldbeachievedbyadoptingotherparametersettings(e.g.,lesstreesintheensembleorothervaluesofKornmin).Giventhattreesintheensemblearealsobuiltindependentlyofeachother,theparallelizationofthealgorithmistrivial. 3.ThesoftwarehasbeenimplementedinCwithaMATLAB(TheMathWorks,Inc.)interface.Computingtimesexcludesloadingtimesandassumesthatalldataarestoredinmainmemory.TheyhavebeencomputedonaMacProQuad-coreIntelXeonNehalem2,66GHzwith16Goofmemory.Storagere-quirementshavebeenderivedfrombinarymat lesgeneratedwithMATLABassumingthatallnumbers(i.e.,testedattributes,nodeindicesanddiscretizationthresholds)arestoredinsingleprecision.4.Softwarecanbedownloadedfromhttp://www.montefiore.ulg.ac.be/~geurts/Software.html.59 LearningtorankwithextremelyrandomizedtreesReferencesL.Breiman.Baggingpredictors.MachineLearning,24(2):123{140,1996.L.Breiman.Randomforests.MachineLearning,45:5{32,2001.L.Brei-man,J.H.Friedman,R.A.Olsen,andC.J.Stone.Classi cationandRegressionTrees.WadsworthInternational(California),1984.O.ChapelleandY.Chang.Yahoo!learningtorankchallengeoverview.JMLRWorkshopandConferenceProceedings,14:1{24,2011.O.Chapelle,D.Metlzer,Y.Zhang,andP.Grinspan.Expectedreciprocalrankforgradedrelevance.InProceedingofthe18thACMconferenceonInformationandknowledgemanagement,pages621{630.ACM,2009.K.Chen,R.Lu,C.K.Wong,G.Sun,L.Heck,andB.Tseng.Trada:treebasedrankingfunctionadaptation.InProceedingofthe17thACMconferenceonInformationandknowledgemanagement,CIKM'08,pages1143{1152,NewYork,NY,USA,2008.ACM.J.Gao,Q.Wu,C.Burges,K.Svore,Y.Su,N.Khan,S.Shah,andH.Zhou.Modeladaptationviamodelinterpolationandboostingforwebsearchranking.InProceedingsofthe2009ConferenceonEmpiricalMethodsinNaturalLanguageProcessing:Volume2-Volume2,EMNLP'09,pages505{513,Morristown,NJ,USA,2009.AssociationforComputationalLinguistics.P.Geurts,D.Ernst,andL.Wehenkel.Extremelyrandomizedtrees.MachineLearning,36(1):3{42,2006.T.Hastie,R.Tibshirani,andJ.Friedman.TheElementsofStatisticalLearning.SpringerSeriesinStatistics.SpringerNewYorkInc.,NewYork,NY,USA,2001.K.JarvelinandJ.Kekalainen.Cumulatedgain-basedevaluationofIRtechniques.ACMTransactionsonInformationSystems(TOIS),20(4):446,2002.P.Li,C.Burges,andQ.Wu.Learningtorankusingclassi cationandgradientboosting.ProceedingsoftheInternationalConferenceonAdvancesinNeuralInformationProcess-ingSystems(NIPS),2007.S.J.PanandQ.Yang.Asurveyontransferlearning.IEEETransactionsonKnowledgeandDataEngineering,22(10):1345{1359,October2010.ISSN1041-4347.Q.Wu,C.J.C.Burges,K.M.Svore,andJ.Gao.Ranking,boosting,andmodeladaptation.TecnicalReport,MSR-TR-2008-109,2008.J.Ye,J.-H.Chow,J.Chen,andZ.Zheng.Stochasticgradientboosteddistributedde-cisiontrees.InProceedingofthe18thACMconferenceonInformationandknowledgemanagement,CIKM'09,pages2061{2064,NewYork,NY,USA,2009.ACM.61