/
InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsPro InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsPro

InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsPro - PDF document

danika-pritchard
danika-pritchard . @danika-pritchard
Follow
353 views
Uploaded On 2016-04-29

InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsPro - PPT Presentation

formulationsmightnotre ectrealusersatisfactionForexampleclicksarea ectedbypresentationbiasuserstendtoclickonhigherresultsregardlessofrelevanceJoachimsetal2007Anyobjectivebasedonabsolutemeasur ID: 298823

formulationsmightnotre ectrealusersatisfaction.Forexample clicksarea ectedbypresentationbias{userstendtoclickonhigherresultsregardlessofrele-vance(Joachimsetal. 2007).Anyobjectivebasedonabsolutemeasur

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "InteractivelyOptimizingInformationRetrie..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem formulationsmightnotre ectrealusersatisfaction.Forexample,clicksarea ectedbypresentationbias{userstendtoclickonhigherresultsregardlessofrele-vance(Joachimsetal.,2007).Anyobjectivebasedonabsolutemeasuresmustusecarefulcalibration.Incon-trast,theinterleavingmethodproposedbyRadlinskietal.(2008b)o ersareliablemechanismforderivingrelativepreferencesbetweenretrievalfunctions.3.TheDuelingBanditsProblemWede neanewon-lineoptimizationproblem,calledtheDuelingBanditsProblem,wheretheonlyactionsarecomparisons(orduels)betweentwopointswithinaspaceW(e.g.,aparameterizedspaceofretrievalfunctionsinasearchengine).WeconsiderthecasewhereWcontainstheorigin,iscompact,convex,andcontainedinad-dimensionalballofradiusR1.Anysinglecomparisonbetweentwopointswandw0(e.g.,individualretrievalfunctions)isdeterminedindepen-dentlyofallothercomparisonswithprobabilityP(ww0)=1 2+(w;w0);(1)where(w;w0)2[�1=2;1=2].Inthesearchexample,P(ww0)referstothefractionofuserswhoprefertheresultsproducedbywoverthoseofw0.Onecanregard(w;w0)asthedistinguishabilitybetweenwandw0.Algorithmslearnonlyviaobservingcomparisonresults(e.g.,frominterleaving(Radlinskietal.,2008b)).Wequantifytheperformanceofanon-linealgorithmusingthefollowingregretformulation:T=TXt=1(w;wt)+(w;w0t);(2)wherewtandw0tarethetwopointsselectedattimet,andwisthebestpointknownonlyinhindsight.Notethatthealgorithmisallowedtoselecttwoiden-ticalpoints,soselectingwt=w0t=waccumulatesnoadditionalregret.Inthesearchexample,regretcorrespondstothefractionofuserswhowouldpreferthebestretrievalfunctionwovertheselectedoneswtandw0t.AgoodalgorithmshouldachievesublinearregretinT,whichimpliesdecreasingaverageregret.3.1.ModelingAssumptionsWefurtherassumetheexistenceofadi erentiable,strictlyconcavevalue(orutility)functionv:W!R.Thisfunctionre ectstheintrinsicqualityofeachpointinW,andisneverdirectlyobserved.Sincevisstrictly 1AnalternativesettingistheK-armedbanditcasewherejWj=K(Yueetal.,2009) Algorithm1DuelingBanditGradientDescent 1:Input: ,,w12:forqueryqt(t=1::T)do3:Sampleunitvectorutuniformly.4:w0t PW(wt+ut)//projectedbackintoW5:Comparewtandw0t6:ifw0twinsthen7:wt+1 PW(wt+ ut)//alsoprojected8:else9:wt+1 wt10:endif11:endfor concave,thereexistsauniquemaximumv(w).Prob-abilisticcomparisonsaremadeusingalinkfunction:R![0;1],andarede nedasP(ww0)=(v(w)�v(w0)):Thus(w;w0)=(v(w)�v(w0))�1=2.Linkfunctionsbehavelikecumulativedistributionfunctions(monotonicincreasing,(�1)=0,and(1)=1).Weconsideronlylinkfunctionswhicharerotation-symmetric((x)=1�(�x))andhaveasinglein ectionpointat(0)=1=2.Thisim-pliesthat(x)isconvexforx0andconcaveforx0.Onecommonlinkfunctionisthelogisticfunc-tionL(x)=1=(1+exp(�x)).We nallymaketwosmoothnessassumptions.First,isL-Lipschitz,andvisLv-Lipschitz.Thatis,j(a)�(b)jLka�bk.Thus(;)isL-Lipschitzinbotharguments,whereL=LLv.Wefurtheras-sumethatLandLvaretheleastpossible.Second,issecondorderL2-Lipschitz,thatis,j0(a)�0(b)jL2ka�bk.Theserelativelymildassumptionsprovidesucientstructureforshowingsublinearregret.4.Algorithm&AnalysisOuralgorithm,DuelingBanditGradientDescent(DBGD),isdescribedinAlgorithm1.DBGDmain-tainsacandidatewtandcomparesitwithaneighbor-ingpointw0talongarandomdirectionut.Ifw0twinsthecomparison,thenanupdateistakenalongut,andthenprojectedbackintoW(denotedbyPW).DBGDrequirestwoparameterswhichcanbeinter-pretedastheexploration()andexploitation( )stepsizes.Thelatterisrequiredforallgradientdescental-gorithms.SinceDBGDprobesfordescentdirectionsrandomly,thisintroducesagradientestimationerrorthatdependson(discussedSection4.2).WewillshowinTheorem2that,forsuitableand ,DBGDachievessublinearregretinT,E[T]2TT3=4p 26RdL; InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Lemma2.Fix�0,overrandomunitvectorsu,Eu[ct(PW(w+u))u]= dr^ct(w);wheredisthedimensionalityofx.(ProofanalagoustoLemma2.1ofFlaxmanetal.,2005)CombiningLemma1andLemma2impliesthatDBGDisimplicitlyperforminggradientdescentover^t(w)=Ex2B[t(PW(w+x))]:(7)Notethatj^t(w)�t(w)jL,andthat^tispara-meterizedby(suppressedforbrevity).Hence,goodregretboundsde nedon^timplygoodboundsde nedont,withcontrollingthedi erence.Oneconcernisthat^tmightnotbeconvexatwt.Observation1showedthattisconvexatwt,andthussatis est(wt)�t(w)rt(wt)(wt�w).Wenowshowthat^t(wt)is\almostconvex"inaspeci cway.Theorem1.Forde nedas=L L�LvL2;(8)and20;L LvL2,then^t(wt)�^t(w)r^t(wt)(wt�w)+(3+)L:Proof.Firstde newt;xPW(wt+x),andalsot;x(w)(wt;x;w).Werewrite^t(wt)�^t(w)as=Ex2B[t(PW(wt+x))�t(PW(w+x))]Ex2B[t;x(wt;x)�t;x(w)]+3L(9)Ex2B[rt;x(wt;x)(wt;x�w)]+3L(10)where(9)followsfrombeingL-Lipschitz,and(10)followsfromwt;xandwbothbeingintheconvexregionoft;x.Nowde net(y)(v(wt)�y),andt;x(y)(v(wt;x)�y).Wecanseethatrt(wt;x)=0t(v(wt;x))rv(wt;x):andsimilarlyrt;x(wt;x)=0t;x(v(wt;x))rv(wt;x):Wecanthenwrite(10)as=Ex0t;x(wt;x)rv(wt;x)(wt;x�w)+3L:(11)Weknowthatboth0t;x(y)0and0t(y)0,and0t;x(v(wt;x))=�L;sincethatisthein ectionpoint.Thus�L0t(v(wt;x))�L+LvL2;whichfollowsfrombeingsecondorderL2-Lipschitz.Sincet;x(wt;x)�t;x(w)0,theterminsidetheexpectationin(11)isalsonon-negative.Usingourde nitionof(8),wecanwrite(11)asEx[0t(wt;x)rv(wt;x)(wt;x�w)]+3L=Ex[rt(wt;x)(wt;x�w)]+3L=Ex[rt(wt;x)(wt;x�wt+wt�w)]+3LEx[rt(wt;x)(wt�w)]+(3+)L(12)=r^t(wt)(wt�w)+(3+)Lwhere(12)followsfromobservingthatEx[rt(wt;x)(wt;x�wt)]Ex[krt(wt;x)k]L: 4.3.RegretBoundforDBGDThusfar,wehavefocusedonprovingpropertiesre-gardingtherelativelossfunctionstand^t.Wecaneasilyboundourregretformulation(2)usingt.Lemma3.Fix�0.ExpectedregretisboundedbyE[T]�2E"TXt=1t(w)#+LT:Proof.WecanwriteexpectedregretasE[T]2EhPTt=1(wt)i+LT=�2EhPTt=1t(w)i+LTbynotingthatj(w0t)�(wt)jL,andalsothatt(w)=�(wt). Wenowanalyzetheregretbehaviorofthesmoothedlossfunctions^t.Lemma4providesausefulinterme-diateresult.NotethattheregretformulationanalyzedinLemma4isdi erentfrom(2).Lemma4.Fix20;L LvL2,andde neasin(8).Assumeasequenceofsmoothedrelativelossfunctions^1;:::;^T(^t+1dependingonwt)andw1;:::;wT2Wde nedbyw1=0andwt+1=PW(wt�gt),where�0andg1;:::;gTarevector-valuedrandomvari-ableswith(a)E[gtjwt]=r^t,(b)kgtkG,and(c)WRB.Thenfor=R Gp T,E"TXt=1^t(wt)�^t(w)#RGp T+(3+)T:(13)(AdaptedfromLemma3.1inFlaxmanetal.,2005)Proof.Theorem1impliestheLHSof(13)tobe InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Table1. AverageregretofDBGDwithsyntheticfunctions. LFactor 0.60.8123 P1 0.4650.3980.3340.3030.415 P2 0.8030.7670.7600.7800.807 P3 0.6870.6280.6040.6370.663 P4 0.5000.3780.3250.3040.418 P5 0.7100.6630.6740.7980.887 4.4.PracticalConsiderationsChoosingtoachievetheregretboundstatedinThe-orem2requiresknowledgeoft(i.e.,L),whichistypicallynotknowninpracticalsettings.Theregretboundisindeedrobusttothechoiceof.Sosublinearregretisachievableusingmanychoicesfor,aswewillverifyempirically.Intheanalysisw1=0waschosentominimizeitsdistancetoanyotherpointinW.Incertainsettings,wemightchoosew16=0,inwhichcaseouranalysisstillfollowswithslightlyworseconstants.5.Experiments5.1.SyntheticValueFunctionsWe rstexperimentedusingsyntheticvaluefunctions,whichallowsustotesttherobustnessofDBGDtodif-ferentchoicesof.SinceLisunknown,weintroducedafreeparameterLandused=T�1=4Lp 0:4Rd:Wetestedon vesettingsP1toP5.Eachsettingoptimizesovera50-dimensionalballofradius10,andusesthelogistictransferfunctionwithdi erentvaluefunctionsthatexplorearangeofcurvatures(whicha ectstheLipschitzconstant)andsymmetries:v1(w)=�wTw;v2(w)=�jwjv3(w)=�Xi:oddw(i)2�Xi:even w(i) v4(w)=�Xihexpw(i)+exp�w(i)iv5(w)=v3(w)�Xi:(i%3=1)e[w(i)]+�Xi:(i%3=2)e[�w(i)]+Theinitialpointisw1=~1p 5=d.Table1showstheregretovertheinterestingrangeofLvalues.Per-formancedegradesgracefullybeyondthisrange.Notethattheregretofarandompointisabout1sincemostpointsinWhavemuchlowervaluethanv(w).WealsocomparedagainstBanditGradientDescent(BGD)(Flaxmanetal.,2005).LikeDBGD,BGDex-ploresinrandomdirectionsateachiteration.How-ever,BGDassumesaccesstoP(wtw),whereas Figure2.AverageregretforL=1DBGDonlyobservesrandomoutcomes.ThusBGDassumesstrictlymoreinformation4.Weevaluatedtwoversions:BGD1usingP(wtw),andBGD2usingt(w)=P(wtw)�1=2.WeexpectBGD2toper-formbestsincethesignoft(w)revealssigni cantin-formationregardingthetruegradient.Figure2showstheaverageregretforproblemsP1andP5withL=1.WeobservethebehaviorsofDBGDandBGDbeingverysimilarforboth.Interestingly,DBGDoutper-formsBGD1onP5despitehavinglessinformation.WealsoobservethistrendforP2andP3,notingthatallthreeproblemshavesigni cantlinearcomponents.5.2.WebSearchDatasetForamorerealisticsimulationenvironment,welever-agedarealWebSearchdataset(courtesyofChrisBurgesatMicrosoftResearch).Theideaistosimulateusersissuingqueriesbysamplingfromqueriesinthedataset.Foreachquery,thecompetingretrievalfunc-tionswillproducerankings,afterwhichthe\user"willrandomlypreferonerankingovertheother;weusedavaluefunctionbasedonNDCG@10(de nedbelow)todeterminethecomparisonoutcomeprobabilities.Westressthatourusageofthedatasetisverydi erentfromsupervisedlearningsettings.Inparticular,(ex-tensionsof)ouralgorithmmightbeappliedtoexper-imentsinvolvingrealuserswhereverylittleisknownabouteachuser'sinternalvaluefunction.Weleveragethisdatasetasareasonable rststepforsimulatinguserbehaviorinanon-linelearningsetting.Thetraining,validationandtestsetseachconsistof1000queries.Weonlysimulatedonthetrainingset,al-thoughwemeasuredperformanceontheothersetstocheckfor,e.g.,generalizationpower.Thereareabout50documentsperquery,anddocumentsarelabeledby5levelsofrelevancefrom0(Bad)to4(Perfect).Thecompatibilitybetweenadocument/querypairis 4Ouranalysisyieldsmatchingupperboundsonex-pectedregretforallthreemethods,thoughitcanbeshownthattheBGDgradientestimateshavelowervariance. InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Table2. Average(upper)andFinal(lower)NDCG@10onWebSearchtrainingset(sampling100queries/iteration) n 0.0010.0050.010.050.1 0.5 0.5240.5700.5800.5690.557 0.8 0.5330.5750.5820.5760.566 1 0.5370.5750.5840.5770.568 3 0.5290.5650.5730.5750.571 0.5 0.5590.5910.5920.5690.565 0.8 0.5640.5930.5930.5740.559 1 0.5680.5920.5950.5820.570 3 0.5570.5810.5820.5770.576 representedusing367features.Astandardretrievalfunctioncomputesascoreforeachdocumentbasedonthesefeatures,withthe nalrankingresultingfromsortingbythescores.Forsimplicity,weconsideredonlylinearfunctionsw,sothatthescorefordocumentxiswTx.Sinceonlythedirectionofwmatters,wearethusoptimizingovera367-dimensionalunitsphere.OurvaluefunctionisbasedonNormalizedDiscountedCumulativeGain(NDCG),whichisacommonmea-sureforevaluatingrankings(Donmezetal.,2009).Forqueryq,NDCG@Kofarankingfordocumentsofqis1 N(q)KKXk=12rk�1 log(k+1);whererkistherelevancelevelofthekthrankeddocument,andN(q)Kisanormalizationfactor5suchthatthebestrankingachievesNDCG@K=1.Forourexperiments,weusedthelogisticfunctionand10NDCG@10tomakeprobabilisticcomparisons.Wenoteafewpropertiesofthissetup,somegoingbeyondtheassumptionsinSection3.1.ThisallowsustofurtherexaminethegeneralityofDBGD.First,thevaluefunctionisnowrandom(dependentonthequery).Second,ourfeasiblespaceWistheunitsphereandnotconvex,althoughitisawell-behavedmani-fold.Third,weassumeahomogenoususergroup(i.e.,allusershavethesamevaluefunction{NDCG@10).Fourth,rankingsvarydiscontinuouslyw.r.t.documentscores,andNDCG@10isthusadiscontinuousvaluefunction.Weaddressedthisissuebycomparingmul-tiplequeries(i.e.,delayingmultipleiterations)beforeanupdatedecision,andalsobyusinglargerchoicesofand .Lastly,evensmoothedversionsofNDCGhavelocaloptima(Donmezetal.,2009),makingitdicultto ndw(whichisrequiredforcomputingregret).WethususedNDCG@10tomeasureperformance.WetestedDBGDforT=107andarangeof and 5NotethatN(q)Kwillbedi erentfordi erentqueries. Figure3.NDCG@10onWebSearchtrainingsetvalues.Table2showstheaverage(acrossalliter-ations)and naltrainingNDCG@10whencompar-ing100queriesperupdate.Performancepeaksat(; )=(1;0:01)anddegradessmoothly.Wefoundsimilarresultswhenvaryingthenumberofqueriescomparedperupdate.Figure3depictsperiterationNDCG@10forthebestmodelswhensampling1,10and100queries.Makingmultiplecomparisonsperupdatehasnoimpactonperformance(thebestpa-rametersaretypicallysmallerwhensamplingfewerqueries).Samplingmultiplequeriesisveryrealistic,sinceasearchsystemmightbeconstrainedto,e.g.,makingdailyupdatestotheirrankingfunction.Per-formanceonthevalidationandtestsetscloselyfollowstrainingsetperformance(soweomittheirresults).Thisimpliesthatourmethodisnotover tting.Forcompleteness,wecomparedourbestDBGDmod-elswitharankingSVM,whichoptimizesoverpair-wisedocumentpreferencesandisastandardbaselineinsupervisedlearningtoranksettings.Moresophisti-catedmethods(e.g.,Chakrabartietal.,2008;Donmezetal.,2009)canfurtherimproveperformance.Table3showsthatDBGDapproachesrankingSVMper-formancedespitemakingfundamentallydi erentas-sumptions(e.g.,rankingSVMshaveaccesstoveryspe-ci cdocument-levelinformation).Wecautionagainstover-optimizinghere,andadvocateinsteadfordevel-opingmorerealisticexperimentalsettings.6.ConclusionWehavepresentedanon-linelearningframeworkbasedonpairwisecomparisons,andnaturally tswithrecentworkonderivingreliablepairwisejudgments.Ourproposedalgorithm,DBGD,achievessublinearre-gret.Asevidencedbyoursimulationsbasedonwebdata,DBGDcanbeappliedmuchmoregenerallythansuggestedbyourtheoreticalanalysis.Hence,itbegsformoresophisticatedformulationswhichaccountforpropertiessuchasheterogenoususerbehavior,querydependentvaluefunctions,andthediscontinuityof InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Table3. ComparingRankingSVMvs. nalDBGDmodelusingaverageNDCG@10andper-querywin/tie/losscounts. Model SVMSample1Sample5Sample10Sample25Sample50Sample100 NDCG@10 0.6120.5960.5930.5890.5930.5960.595 W/T/L {490/121/389489/121/390504/118/378489/118/393472/119/409490/116/394 rankings.Anotherinterestingdirectionisadaptivelychoosingand forany-timeregretanalyses.Ourframeworkisextendableinmanyways,suchasintegratingpairwisedocumentpreferences(Joachimsetal.,2007;Carteretteetal.,2008),anddiversity(Yue&Joachims,2008;Radlinskietal.,2008a).Progressinthisareacanleadtocost-e ectivesystemsforavari-etyofapplicationdomainssuchaspersonalizedsearch,enterprisesearch,andalsosmallinterestgroups.AcknowledgementsTheworkwasfundedunderNSFAwardIIS-0713483,NSFCAREERAward0237381,andagiftfromYahoo!Research.The rstauthorisalsopartlyfundedbyaMicrosoftResearchGraduateFellowshipandaYahoo!KeyTechnicalChallengesGrant.TheauthorsalsothankRobertKleinberg,JosefBroderandtheanony-mousreviewersfortheirhelpfulcomments.ReferencesAgichtein,E.,Brill,E.,&Dumais,S.(2006).Im-provingWebSearchRankingbyIncorporatingUserBehaviorInformation.ACMConferenceonInfor-mationRetrieval(SIGIR)(pp.19{26).Carterette,B.,Bennett,P.,Chickering,D.M.,&Du-mais,S.(2008).HereorThere:PreferenceJudg-mentsforRelevance.EuropeanConferenceonIn-formationRetrieval(ECIR)(pp.16{27).Carterette,B.,&Jones,R.(2007).EvaluatingSearchEnginesbyModelingtheRelationshipBetweenRel-evanceandClicks.NeuralInformationProcessingSystems(NIPS)(pp.217{224).Chakrabarti,S.,Khanna,R.,Sawant,U.,&Bat-tacharyya,C.(2008).StructuredLearningforNon-SmoothRankingLosses.ACMConferenceonKnowledgeDiscoveryandDataMining(KDD)(pp.88{96).Donmez,P.,Svore,K.,&Burges,C.(2009).OntheLocalOptimalityofLambdaRank.ACMConferenceonInformationRetrieval(SIGIR).Dupret,G.,&Piwowarski,B.(2008).AUserBrows-ingModeltoPredictSearchEngineClickDatafromPastObservations.ACMConferenceonInforma-tionRetrieval(SIGIR)(pp.331{338).Flaxman,A.,Kalai,A.,&McMahan,H.B.(2005).OnlineConvexOptimizationintheBanditSetting:GradientDescentWithoutaGradient.ACM-SIAMSymposiumonDiscreteAlgorithms(SODA)(pp.385{394).Joachims,T.,Granka,L.,Pan,B.,Hembrooke,H.,Radlinski,F.,&Gay,G.(2007).EvaluatingtheAc-curacyofImplicitFeedbackfromClicksandQueryReformulationsinWebSearch.ACMTransactionsonInformationSystems(TOIS),25,7:1{26.Kleinberg,R.(2004).Nearlytightboundsforthecontinuum-armedbanditproblem.NeuralInforma-tionProcessingSystems(NIPS)(pp.697{704).Langford,J.,&Zhang,T.(2007).TheEpoch-GreedyAlgorithmforContextualMulti-armedBan-dits.NeuralInformationProcessingSystems(NIPS)(pp.817{824).Pandey,S.,Agarwal,D.,Chakrabarti,D.,&Josi-fovski,V.(2007).BanditsforTaxonomies:AModel-basedApproach.SIAMConferenceonDataMining(SDM)(pp.216{227).Radlinski,F.,Kleinberg,R.,&Joachims,T.(2008a).LearningDiverseRankingswithMulti-ArmedBan-dits.InternationalConferenceonMachineLearning(ICML)(pp.784{791).Radlinski,F.,Kurup,M.,&Joachims,T.(2008b).HowDoesClickthroughDataRe ectRetrievalQuality?ACMConferenceonInformationandKnowledgeManagement(CIKM)(pp.43{52).Yue,Y.,Broder,J.,Kleinberg,R.,&Joachims,T.(2009).TheK-armedDuelingBanditsProblem.ConferenceonLearningTheory(COLT).Yue,Y.,&Joachims,T.(2008).PredictingDiverseSubsetsUsingStructuralSVMs.InternationalCon-ferenceonMachineLearning(ICML)(pp.1224{1231).Zinkevich,M.(2003).OnlineConvexProgram-mingandGeneralizedIn nitesimalGradientAs-cent.InternationalConferenceonMachineLearn-ing(ICML)(pp.928{936).

Related Contents


Next Show more