formulationsmightnotre ectrealusersatisfactionForexampleclicksareaectedbypresentationbiasuserstendtoclickonhigherresultsregardlessofrelevanceJoachimsetal2007Anyobjectivebasedonabsolutemeasur ID: 298823
Download Pdf The PPT/PDF document "InteractivelyOptimizingInformationRetrie..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem formulationsmightnotre ectrealusersatisfaction.Forexample,clicksareaectedbypresentationbias{userstendtoclickonhigherresultsregardlessofrele-vance(Joachimsetal.,2007).Anyobjectivebasedonabsolutemeasuresmustusecarefulcalibration.Incon-trast,theinterleavingmethodproposedbyRadlinskietal.(2008b)oersareliablemechanismforderivingrelativepreferencesbetweenretrievalfunctions.3.TheDuelingBanditsProblemWedeneanewon-lineoptimizationproblem,calledtheDuelingBanditsProblem,wheretheonlyactionsarecomparisons(orduels)betweentwopointswithinaspaceW(e.g.,aparameterizedspaceofretrievalfunctionsinasearchengine).WeconsiderthecasewhereWcontainstheorigin,iscompact,convex,andcontainedinad-dimensionalballofradiusR1.Anysinglecomparisonbetweentwopointswandw0(e.g.,individualretrievalfunctions)isdeterminedindepen-dentlyofallothercomparisonswithprobabilityP(ww0)=1 2+(w;w0);(1)where(w;w0)2[1=2;1=2].Inthesearchexample,P(ww0)referstothefractionofuserswhoprefertheresultsproducedbywoverthoseofw0.Onecanregard(w;w0)asthedistinguishabilitybetweenwandw0.Algorithmslearnonlyviaobservingcomparisonresults(e.g.,frominterleaving(Radlinskietal.,2008b)).Wequantifytheperformanceofanon-linealgorithmusingthefollowingregretformulation:T=TXt=1(w;wt)+(w;w0t);(2)wherewtandw0tarethetwopointsselectedattimet,andwisthebestpointknownonlyinhindsight.Notethatthealgorithmisallowedtoselecttwoiden-ticalpoints,soselectingwt=w0t=waccumulatesnoadditionalregret.Inthesearchexample,regretcorrespondstothefractionofuserswhowouldpreferthebestretrievalfunctionwovertheselectedoneswtandw0t.AgoodalgorithmshouldachievesublinearregretinT,whichimpliesdecreasingaverageregret.3.1.ModelingAssumptionsWefurtherassumetheexistenceofadierentiable,strictlyconcavevalue(orutility)functionv:W!R.Thisfunctionre ectstheintrinsicqualityofeachpointinW,andisneverdirectlyobserved.Sincevisstrictly 1AnalternativesettingistheK-armedbanditcasewherejWj=K(Yueetal.,2009) Algorithm1DuelingBanditGradientDescent 1:Input: ,,w12:forqueryqt(t=1::T)do3:Sampleunitvectorutuniformly.4:w0t PW(wt+ut)//projectedbackintoW5:Comparewtandw0t6:ifw0twinsthen7:wt+1 PW(wt+ ut)//alsoprojected8:else9:wt+1 wt10:endif11:endfor concave,thereexistsauniquemaximumv(w).Prob-abilisticcomparisonsaremadeusingalinkfunction:R![0;1],andaredenedasP(ww0)=(v(w)v(w0)):Thus(w;w0)=(v(w)v(w0))1=2.Linkfunctionsbehavelikecumulativedistributionfunctions(monotonicincreasing,(1)=0,and(1)=1).Weconsideronlylinkfunctionswhicharerotation-symmetric((x)=1(x))andhaveasinglein ectionpointat(0)=1=2.Thisim-pliesthat(x)isconvexforx0andconcaveforx0.Onecommonlinkfunctionisthelogisticfunc-tionL(x)=1=(1+exp(x)).Wenallymaketwosmoothnessassumptions.First,isL-Lipschitz,andvisLv-Lipschitz.Thatis,j(a)(b)jLkabk.Thus(;)isL-Lipschitzinbotharguments,whereL=LLv.Wefurtheras-sumethatLandLvaretheleastpossible.Second,issecondorderL2-Lipschitz,thatis,j0(a)0(b)jL2kabk.Theserelativelymildassumptionsprovidesucientstructureforshowingsublinearregret.4.Algorithm&AnalysisOuralgorithm,DuelingBanditGradientDescent(DBGD),isdescribedinAlgorithm1.DBGDmain-tainsacandidatewtandcomparesitwithaneighbor-ingpointw0talongarandomdirectionut.Ifw0twinsthecomparison,thenanupdateistakenalongut,andthenprojectedbackintoW(denotedbyPW).DBGDrequirestwoparameterswhichcanbeinter-pretedastheexploration()andexploitation( )stepsizes.Thelatterisrequiredforallgradientdescental-gorithms.SinceDBGDprobesfordescentdirectionsrandomly,thisintroducesagradientestimationerrorthatdependson(discussedSection4.2).WewillshowinTheorem2that,forsuitableand ,DBGDachievessublinearregretinT,E[T]2TT3=4p 26RdL; InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Lemma2.Fix0,overrandomunitvectorsu,Eu[ct(PW(w+u))u]= dr^ct(w);wheredisthedimensionalityofx.(ProofanalagoustoLemma2.1ofFlaxmanetal.,2005)CombiningLemma1andLemma2impliesthatDBGDisimplicitlyperforminggradientdescentover^t(w)=Ex2B[t(PW(w+x))]:(7)Notethatj^t(w)t(w)jL,andthat^tispara-meterizedby(suppressedforbrevity).Hence,goodregretboundsdenedon^timplygoodboundsdenedont,withcontrollingthedierence.Oneconcernisthat^tmightnotbeconvexatwt.Observation1showedthattisconvexatwt,andthussatisest(wt)t(w)rt(wt)(wtw).Wenowshowthat^t(wt)is\almostconvex"inaspecicway.Theorem1.Fordenedas=L LLvL2;(8)and20;L LvL2,then^t(wt)^t(w)r^t(wt)(wtw)+(3+)L:Proof.Firstdenewt;xPW(wt+x),andalsot;x(w)(wt;x;w).Werewrite^t(wt)^t(w)as=Ex2B[t(PW(wt+x))t(PW(w+x))]Ex2B[t;x(wt;x)t;x(w)]+3L(9)Ex2B[rt;x(wt;x)(wt;xw)]+3L(10)where(9)followsfrombeingL-Lipschitz,and(10)followsfromwt;xandwbothbeingintheconvexregionoft;x.Nowdenet(y)(v(wt)y),andt;x(y)(v(wt;x)y).Wecanseethatrt(wt;x)=0t(v(wt;x))rv(wt;x):andsimilarlyrt;x(wt;x)=0t;x(v(wt;x))rv(wt;x):Wecanthenwrite(10)as=Ex0t;x(wt;x)rv(wt;x)(wt;xw)+3L:(11)Weknowthatboth0t;x(y)0and0t(y)0,and0t;x(v(wt;x))=L;sincethatisthein ectionpoint.ThusL0t(v(wt;x))L+LvL2;whichfollowsfrombeingsecondorderL2-Lipschitz.Sincet;x(wt;x)t;x(w)0,theterminsidetheexpectationin(11)isalsonon-negative.Usingourdenitionof(8),wecanwrite(11)asEx[0t(wt;x)rv(wt;x)(wt;xw)]+3L=Ex[rt(wt;x)(wt;xw)]+3L=Ex[rt(wt;x)(wt;xwt+wtw)]+3LEx[rt(wt;x)(wtw)]+(3+)L(12)=r^t(wt)(wtw)+(3+)Lwhere(12)followsfromobservingthatEx[rt(wt;x)(wt;xwt)]Ex[krt(wt;x)k]L: 4.3.RegretBoundforDBGDThusfar,wehavefocusedonprovingpropertiesre-gardingtherelativelossfunctionstand^t.Wecaneasilyboundourregretformulation(2)usingt.Lemma3.Fix0.ExpectedregretisboundedbyE[T]2E"TXt=1t(w)#+LT:Proof.WecanwriteexpectedregretasE[T]2EhPTt=1(wt)i+LT=2EhPTt=1t(w)i+LTbynotingthatj(w0t)(wt)jL,andalsothatt(w)=(wt). Wenowanalyzetheregretbehaviorofthesmoothedlossfunctions^t.Lemma4providesausefulinterme-diateresult.NotethattheregretformulationanalyzedinLemma4isdierentfrom(2).Lemma4.Fix20;L LvL2,anddeneasin(8).Assumeasequenceofsmoothedrelativelossfunctions^1;:::;^T(^t+1dependingonwt)andw1;:::;wT2Wdenedbyw1=0andwt+1=PW(wtgt),where0andg1;:::;gTarevector-valuedrandomvari-ableswith(a)E[gtjwt]=r^t,(b)kgtkG,and(c)WRB.Thenfor=R Gp T,E"TXt=1^t(wt)^t(w)#RGp T+(3+)T:(13)(AdaptedfromLemma3.1inFlaxmanetal.,2005)Proof.Theorem1impliestheLHSof(13)tobe InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Table1. AverageregretofDBGDwithsyntheticfunctions. LFactor 0.60.8123 P1 0.4650.3980.3340.3030.415 P2 0.8030.7670.7600.7800.807 P3 0.6870.6280.6040.6370.663 P4 0.5000.3780.3250.3040.418 P5 0.7100.6630.6740.7980.887 4.4.PracticalConsiderationsChoosingtoachievetheregretboundstatedinThe-orem2requiresknowledgeoft(i.e.,L),whichistypicallynotknowninpracticalsettings.Theregretboundisindeedrobusttothechoiceof.Sosublinearregretisachievableusingmanychoicesfor,aswewillverifyempirically.Intheanalysisw1=0waschosentominimizeitsdistancetoanyotherpointinW.Incertainsettings,wemightchoosew16=0,inwhichcaseouranalysisstillfollowswithslightlyworseconstants.5.Experiments5.1.SyntheticValueFunctionsWerstexperimentedusingsyntheticvaluefunctions,whichallowsustotesttherobustnessofDBGDtodif-ferentchoicesof.SinceLisunknown,weintroducedafreeparameterLandused=T1=4Lp 0:4Rd:WetestedonvesettingsP1toP5.Eachsettingoptimizesovera50-dimensionalballofradius10,andusesthelogistictransferfunctionwithdierentvaluefunctionsthatexplorearangeofcurvatures(whichaectstheLipschitzconstant)andsymmetries:v1(w)=wTw;v2(w)=jwjv3(w)=Xi:oddw(i)2Xi:evenw(i)v4(w)=Xihexpw(i)+expw(i)iv5(w)=v3(w)Xi:(i%3=1)e[w(i)]+Xi:(i%3=2)e[w(i)]+Theinitialpointisw1=~1p 5=d.Table1showstheregretovertheinterestingrangeofLvalues.Per-formancedegradesgracefullybeyondthisrange.Notethattheregretofarandompointisabout1sincemostpointsinWhavemuchlowervaluethanv(w).WealsocomparedagainstBanditGradientDescent(BGD)(Flaxmanetal.,2005).LikeDBGD,BGDex-ploresinrandomdirectionsateachiteration.How-ever,BGDassumesaccesstoP(wtw),whereas Figure2.AverageregretforL=1DBGDonlyobservesrandomoutcomes.ThusBGDassumesstrictlymoreinformation4.Weevaluatedtwoversions:BGD1usingP(wtw),andBGD2usingt(w)=P(wtw)1=2.WeexpectBGD2toper-formbestsincethesignoft(w)revealssignicantin-formationregardingthetruegradient.Figure2showstheaverageregretforproblemsP1andP5withL=1.WeobservethebehaviorsofDBGDandBGDbeingverysimilarforboth.Interestingly,DBGDoutper-formsBGD1onP5despitehavinglessinformation.WealsoobservethistrendforP2andP3,notingthatallthreeproblemshavesignicantlinearcomponents.5.2.WebSearchDatasetForamorerealisticsimulationenvironment,welever-agedarealWebSearchdataset(courtesyofChrisBurgesatMicrosoftResearch).Theideaistosimulateusersissuingqueriesbysamplingfromqueriesinthedataset.Foreachquery,thecompetingretrievalfunc-tionswillproducerankings,afterwhichthe\user"willrandomlypreferonerankingovertheother;weusedavaluefunctionbasedonNDCG@10(denedbelow)todeterminethecomparisonoutcomeprobabilities.Westressthatourusageofthedatasetisverydierentfromsupervisedlearningsettings.Inparticular,(ex-tensionsof)ouralgorithmmightbeappliedtoexper-imentsinvolvingrealuserswhereverylittleisknownabouteachuser'sinternalvaluefunction.Weleveragethisdatasetasareasonablerststepforsimulatinguserbehaviorinanon-linelearningsetting.Thetraining,validationandtestsetseachconsistof1000queries.Weonlysimulatedonthetrainingset,al-thoughwemeasuredperformanceontheothersetstocheckfor,e.g.,generalizationpower.Thereareabout50documentsperquery,anddocumentsarelabeledby5levelsofrelevancefrom0(Bad)to4(Perfect).Thecompatibilitybetweenadocument/querypairis 4Ouranalysisyieldsmatchingupperboundsonex-pectedregretforallthreemethods,thoughitcanbeshownthattheBGDgradientestimateshavelowervariance. InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Table2. Average(upper)andFinal(lower)NDCG@10onWebSearchtrainingset(sampling100queries/iteration) n 0.0010.0050.010.050.1 0.5 0.5240.5700.5800.5690.557 0.8 0.5330.5750.5820.5760.566 1 0.5370.5750.5840.5770.568 3 0.5290.5650.5730.5750.571 0.5 0.5590.5910.5920.5690.565 0.8 0.5640.5930.5930.5740.559 1 0.5680.5920.5950.5820.570 3 0.5570.5810.5820.5770.576 representedusing367features.Astandardretrievalfunctioncomputesascoreforeachdocumentbasedonthesefeatures,withthenalrankingresultingfromsortingbythescores.Forsimplicity,weconsideredonlylinearfunctionsw,sothatthescorefordocumentxiswTx.Sinceonlythedirectionofwmatters,wearethusoptimizingovera367-dimensionalunitsphere.OurvaluefunctionisbasedonNormalizedDiscountedCumulativeGain(NDCG),whichisacommonmea-sureforevaluatingrankings(Donmezetal.,2009).Forqueryq,NDCG@Kofarankingfordocumentsofqis1 N(q)KKXk=12rk1 log(k+1);whererkistherelevancelevelofthekthrankeddocument,andN(q)Kisanormalizationfactor5suchthatthebestrankingachievesNDCG@K=1.Forourexperiments,weusedthelogisticfunctionand10NDCG@10tomakeprobabilisticcomparisons.Wenoteafewpropertiesofthissetup,somegoingbeyondtheassumptionsinSection3.1.ThisallowsustofurtherexaminethegeneralityofDBGD.First,thevaluefunctionisnowrandom(dependentonthequery).Second,ourfeasiblespaceWistheunitsphereandnotconvex,althoughitisawell-behavedmani-fold.Third,weassumeahomogenoususergroup(i.e.,allusershavethesamevaluefunction{NDCG@10).Fourth,rankingsvarydiscontinuouslyw.r.t.documentscores,andNDCG@10isthusadiscontinuousvaluefunction.Weaddressedthisissuebycomparingmul-tiplequeries(i.e.,delayingmultipleiterations)beforeanupdatedecision,andalsobyusinglargerchoicesofand .Lastly,evensmoothedversionsofNDCGhavelocaloptima(Donmezetal.,2009),makingitdiculttondw(whichisrequiredforcomputingregret).WethususedNDCG@10tomeasureperformance.WetestedDBGDforT=107andarangeof and 5NotethatN(q)Kwillbedierentfordierentqueries. Figure3.NDCG@10onWebSearchtrainingsetvalues.Table2showstheaverage(acrossalliter-ations)andnaltrainingNDCG@10whencompar-ing100queriesperupdate.Performancepeaksat(; )=(1;0:01)anddegradessmoothly.Wefoundsimilarresultswhenvaryingthenumberofqueriescomparedperupdate.Figure3depictsperiterationNDCG@10forthebestmodelswhensampling1,10and100queries.Makingmultiplecomparisonsperupdatehasnoimpactonperformance(thebestpa-rametersaretypicallysmallerwhensamplingfewerqueries).Samplingmultiplequeriesisveryrealistic,sinceasearchsystemmightbeconstrainedto,e.g.,makingdailyupdatestotheirrankingfunction.Per-formanceonthevalidationandtestsetscloselyfollowstrainingsetperformance(soweomittheirresults).Thisimpliesthatourmethodisnotovertting.Forcompleteness,wecomparedourbestDBGDmod-elswitharankingSVM,whichoptimizesoverpair-wisedocumentpreferencesandisastandardbaselineinsupervisedlearningtoranksettings.Moresophisti-catedmethods(e.g.,Chakrabartietal.,2008;Donmezetal.,2009)canfurtherimproveperformance.Table3showsthatDBGDapproachesrankingSVMper-formancedespitemakingfundamentallydierentas-sumptions(e.g.,rankingSVMshaveaccesstoveryspe-cicdocument-levelinformation).Wecautionagainstover-optimizinghere,andadvocateinsteadfordevel-opingmorerealisticexperimentalsettings.6.ConclusionWehavepresentedanon-linelearningframeworkbasedonpairwisecomparisons,andnaturallytswithrecentworkonderivingreliablepairwisejudgments.Ourproposedalgorithm,DBGD,achievessublinearre-gret.Asevidencedbyoursimulationsbasedonwebdata,DBGDcanbeappliedmuchmoregenerallythansuggestedbyourtheoreticalanalysis.Hence,itbegsformoresophisticatedformulationswhichaccountforpropertiessuchasheterogenoususerbehavior,querydependentvaluefunctions,andthediscontinuityof InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Table3. ComparingRankingSVMvs.nalDBGDmodelusingaverageNDCG@10andper-querywin/tie/losscounts. Model SVMSample1Sample5Sample10Sample25Sample50Sample100 NDCG@10 0.6120.5960.5930.5890.5930.5960.595 W/T/L {490/121/389489/121/390504/118/378489/118/393472/119/409490/116/394 rankings.Anotherinterestingdirectionisadaptivelychoosingand forany-timeregretanalyses.Ourframeworkisextendableinmanyways,suchasintegratingpairwisedocumentpreferences(Joachimsetal.,2007;Carteretteetal.,2008),anddiversity(Yue&Joachims,2008;Radlinskietal.,2008a).Progressinthisareacanleadtocost-eectivesystemsforavari-etyofapplicationdomainssuchaspersonalizedsearch,enterprisesearch,andalsosmallinterestgroups.AcknowledgementsTheworkwasfundedunderNSFAwardIIS-0713483,NSFCAREERAward0237381,andagiftfromYahoo!Research.TherstauthorisalsopartlyfundedbyaMicrosoftResearchGraduateFellowshipandaYahoo!KeyTechnicalChallengesGrant.TheauthorsalsothankRobertKleinberg,JosefBroderandtheanony-mousreviewersfortheirhelpfulcomments.ReferencesAgichtein,E.,Brill,E.,&Dumais,S.(2006).Im-provingWebSearchRankingbyIncorporatingUserBehaviorInformation.ACMConferenceonInfor-mationRetrieval(SIGIR)(pp.19{26).Carterette,B.,Bennett,P.,Chickering,D.M.,&Du-mais,S.(2008).HereorThere:PreferenceJudg-mentsforRelevance.EuropeanConferenceonIn-formationRetrieval(ECIR)(pp.16{27).Carterette,B.,&Jones,R.(2007).EvaluatingSearchEnginesbyModelingtheRelationshipBetweenRel-evanceandClicks.NeuralInformationProcessingSystems(NIPS)(pp.217{224).Chakrabarti,S.,Khanna,R.,Sawant,U.,&Bat-tacharyya,C.(2008).StructuredLearningforNon-SmoothRankingLosses.ACMConferenceonKnowledgeDiscoveryandDataMining(KDD)(pp.88{96).Donmez,P.,Svore,K.,&Burges,C.(2009).OntheLocalOptimalityofLambdaRank.ACMConferenceonInformationRetrieval(SIGIR).Dupret,G.,&Piwowarski,B.(2008).AUserBrows-ingModeltoPredictSearchEngineClickDatafromPastObservations.ACMConferenceonInforma-tionRetrieval(SIGIR)(pp.331{338).Flaxman,A.,Kalai,A.,&McMahan,H.B.(2005).OnlineConvexOptimizationintheBanditSetting:GradientDescentWithoutaGradient.ACM-SIAMSymposiumonDiscreteAlgorithms(SODA)(pp.385{394).Joachims,T.,Granka,L.,Pan,B.,Hembrooke,H.,Radlinski,F.,&Gay,G.(2007).EvaluatingtheAc-curacyofImplicitFeedbackfromClicksandQueryReformulationsinWebSearch.ACMTransactionsonInformationSystems(TOIS),25,7:1{26.Kleinberg,R.(2004).Nearlytightboundsforthecontinuum-armedbanditproblem.NeuralInforma-tionProcessingSystems(NIPS)(pp.697{704).Langford,J.,&Zhang,T.(2007).TheEpoch-GreedyAlgorithmforContextualMulti-armedBan-dits.NeuralInformationProcessingSystems(NIPS)(pp.817{824).Pandey,S.,Agarwal,D.,Chakrabarti,D.,&Josi-fovski,V.(2007).BanditsforTaxonomies:AModel-basedApproach.SIAMConferenceonDataMining(SDM)(pp.216{227).Radlinski,F.,Kleinberg,R.,&Joachims,T.(2008a).LearningDiverseRankingswithMulti-ArmedBan-dits.InternationalConferenceonMachineLearning(ICML)(pp.784{791).Radlinski,F.,Kurup,M.,&Joachims,T.(2008b).HowDoesClickthroughDataRe ectRetrievalQuality?ACMConferenceonInformationandKnowledgeManagement(CIKM)(pp.43{52).Yue,Y.,Broder,J.,Kleinberg,R.,&Joachims,T.(2009).TheK-armedDuelingBanditsProblem.ConferenceonLearningTheory(COLT).Yue,Y.,&Joachims,T.(2008).PredictingDiverseSubsetsUsingStructuralSVMs.InternationalCon-ferenceonMachineLearning(ICML)(pp.1224{1231).Zinkevich,M.(2003).OnlineConvexProgram-mingandGeneralizedInnitesimalGradientAs-cent.InternationalConferenceonMachineLearn-ing(ICML)(pp.928{936).