InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsPro - PDF document

353 views
Uploaded On 2016-04-29

InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsPro - PPT Presentation

formulationsmightnotre ectrealusersatisfactionForexampleclicksareaectedbypresentationbiasuserstendtoclickonhigherresultsregardlessofrelevanceJoachimsetal2007Anyobjectivebasedonabsolutemeasur ID: 298823

formulationsmightnotre ectrealusersatisfaction.Forexample clicksareaectedbypresentationbias{userstendtoclickonhigherresultsregardlessofrele-vance(Joachimsetal. 2007).Anyobjectivebasedonabsolutemeasur

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/298823" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Pdf The PPT/PDF document "InteractivelyOptimizingInformationRetrie..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem formulationsmightnotre ectrealusersatisfaction.Forexample,clicksareaectedbypresentationbias{userstendtoclickonhigherresultsregardlessofrele-vance(Joachimsetal.,2007).Anyobjectivebasedonabsolutemeasuresmustusecarefulcalibration.Incon-trast,theinterleavingmethodproposedbyRadlinskietal.(2008b)oersareliablemechanismforderivingrelativepreferencesbetweenretrievalfunctions.3.TheDuelingBanditsProblemWedeneanewon-lineoptimizationproblem,calledtheDuelingBanditsProblem,wheretheonlyactionsarecomparisons(orduels)betweentwopointswithinaspaceW(e.g.,aparameterizedspaceofretrievalfunctionsinasearchengine).WeconsiderthecasewhereWcontainstheorigin,iscompact,convex,andcontainedinad-dimensionalballofradiusR1.Anysinglecomparisonbetweentwopointswandw0(e.g.,individualretrievalfunctions)isdeterminedindepen-dentlyofallothercomparisonswithprobabilityP(ww0)=1 2+(w;w0);(1)where(w;w0)2[�1=2;1=2].Inthesearchexample,P(ww0)referstothefractionofuserswhoprefertheresultsproducedbywoverthoseofw0.Onecanregard(w;w0)asthedistinguishabilitybetweenwandw0.Algorithmslearnonlyviaobservingcomparisonresults(e.g.,frominterleaving(Radlinskietal.,2008b)).Wequantifytheperformanceofanon-linealgorithmusingthefollowingregretformulation:T=TXt=1(w;wt)+(w;w0t);(2)wherewtandw0tarethetwopointsselectedattimet,andwisthebestpointknownonlyinhindsight.Notethatthealgorithmisallowedtoselecttwoiden-ticalpoints,soselectingwt=w0t=waccumulatesnoadditionalregret.Inthesearchexample,regretcorrespondstothefractionofuserswhowouldpreferthebestretrievalfunctionwovertheselectedoneswtandw0t.AgoodalgorithmshouldachievesublinearregretinT,whichimpliesdecreasingaverageregret.3.1.ModelingAssumptionsWefurtherassumetheexistenceofadierentiable,strictlyconcavevalue(orutility)functionv:W!R.Thisfunctionre ectstheintrinsicqualityofeachpointinW,andisneverdirectlyobserved.Sincevisstrictly 1AnalternativesettingistheK-armedbanditcasewherejWj=K(Yueetal.,2009) Algorithm1DuelingBanditGradientDescent 1:Input: ,,w12:forqueryqt(t=1::T)do3:Sampleunitvectorutuniformly.4:w0t PW(wt+ut)//projectedbackintoW5:Comparewtandw0t6:ifw0twinsthen7:wt+1 PW(wt+ ut)//alsoprojected8:else9:wt+1 wt10:endif11:endfor concave,thereexistsauniquemaximumv(w).Prob-abilisticcomparisonsaremadeusingalinkfunction:R![0;1],andaredenedasP(ww0)=(v(w)�v(w0)):Thus(w;w0)=(v(w)�v(w0))�1=2.Linkfunctionsbehavelikecumulativedistributionfunctions(monotonicincreasing,(�1)=0,and(1)=1).Weconsideronlylinkfunctionswhicharerotation-symmetric((x)=1�(�x))andhaveasinglein ectionpointat(0)=1=2.Thisim-pliesthat(x)isconvexforx0andconcaveforx0.Onecommonlinkfunctionisthelogisticfunc-tionL(x)=1=(1+exp(�x)).Wenallymaketwosmoothnessassumptions.First,isL-Lipschitz,andvisLv-Lipschitz.Thatis,j(a)�(b)jLka�bk.Thus(;)isL-Lipschitzinbotharguments,whereL=LLv.Wefurtheras-sumethatLandLvaretheleastpossible.Second,issecondorderL2-Lipschitz,thatis,j0(a)�0(b)jL2ka�bk.Theserelativelymildassumptionsprovidesucientstructureforshowingsublinearregret.4.Algorithm&AnalysisOuralgorithm,DuelingBanditGradientDescent(DBGD),isdescribedinAlgorithm1.DBGDmain-tainsacandidatewtandcomparesitwithaneighbor-ingpointw0talongarandomdirectionut.Ifw0twinsthecomparison,thenanupdateistakenalongut,andthenprojectedbackintoW(denotedbyPW).DBGDrequirestwoparameterswhichcanbeinter-pretedastheexploration()andexploitation( )stepsizes.Thelatterisrequiredforallgradientdescental-gorithms.SinceDBGDprobesfordescentdirectionsrandomly,thisintroducesagradientestimationerrorthatdependson(discussedSection4.2).WewillshowinTheorem2that,forsuitableand ,DBGDachievessublinearregretinT,E[T]2TT3=4p 26RdL; InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Lemma2.Fix�0,overrandomunitvectorsu,Eu[ct(PW(w+u))u]= dr^ct(w);wheredisthedimensionalityofx.(ProofanalagoustoLemma2.1ofFlaxmanetal.,2005)CombiningLemma1andLemma2impliesthatDBGDisimplicitlyperforminggradientdescentover^t(w)=Ex2B[t(PW(w+x))]:(7)Notethatj^t(w)�t(w)jL,andthat^tispara-meterizedby(suppressedforbrevity).Hence,goodregretboundsdenedon^timplygoodboundsdenedont,withcontrollingthedierence.Oneconcernisthat^tmightnotbeconvexatwt.Observation1showedthattisconvexatwt,andthussatisest(wt)�t(w)rt(wt)(wt�w).Wenowshowthat^t(wt)is\almostconvex"inaspecicway.Theorem1.Fordenedas=L L�LvL2;(8)and20;L LvL2,then^t(wt)�^t(w)r^t(wt)(wt�w)+(3+)L:Proof.Firstdenewt;xPW(wt+x),andalsot;x(w)(wt;x;w).Werewrite^t(wt)�^t(w)as=Ex2B[t(PW(wt+x))�t(PW(w+x))]Ex2B[t;x(wt;x)�t;x(w)]+3L(9)Ex2B[rt;x(wt;x)(wt;x�w)]+3L(10)where(9)followsfrombeingL-Lipschitz,and(10)followsfromwt;xandwbothbeingintheconvexregionoft;x.Nowdenet(y)(v(wt)�y),andt;x(y)(v(wt;x)�y).Wecanseethatrt(wt;x)=0t(v(wt;x))rv(wt;x):andsimilarlyrt;x(wt;x)=0t;x(v(wt;x))rv(wt;x):Wecanthenwrite(10)as=Ex0t;x(wt;x)rv(wt;x)(wt;x�w)+3L:(11)Weknowthatboth0t;x(y)0and0t(y)0,and0t;x(v(wt;x))=�L;sincethatisthein ectionpoint.Thus�L0t(v(wt;x))�L+LvL2;whichfollowsfrombeingsecondorderL2-Lipschitz.Sincet;x(wt;x)�t;x(w)0,theterminsidetheexpectationin(11)isalsonon-negative.Usingourdenitionof(8),wecanwrite(11)asEx[0t(wt;x)rv(wt;x)(wt;x�w)]+3L=Ex[rt(wt;x)(wt;x�w)]+3L=Ex[rt(wt;x)(wt;x�wt+wt�w)]+3LEx[rt(wt;x)(wt�w)]+(3+)L(12)=r^t(wt)(wt�w)+(3+)Lwhere(12)followsfromobservingthatEx[rt(wt;x)(wt;x�wt)]Ex[krt(wt;x)k]L: 4.3.RegretBoundforDBGDThusfar,wehavefocusedonprovingpropertiesre-gardingtherelativelossfunctionstand^t.Wecaneasilyboundourregretformulation(2)usingt.Lemma3.Fix�0.ExpectedregretisboundedbyE[T]�2E"TXt=1t(w)#+LT:Proof.WecanwriteexpectedregretasE[T]2EhPTt=1(wt)i+LT=�2EhPTt=1t(w)i+LTbynotingthatj(w0t)�(wt)jL,andalsothatt(w)=�(wt). Wenowanalyzetheregretbehaviorofthesmoothedlossfunctions^t.Lemma4providesausefulinterme-diateresult.NotethattheregretformulationanalyzedinLemma4isdierentfrom(2).Lemma4.Fix20;L LvL2,anddeneasin(8).Assumeasequenceofsmoothedrelativelossfunctions^1;:::;^T(^t+1dependingonwt)andw1;:::;wT2Wdenedbyw1=0andwt+1=PW(wt�gt),where�0andg1;:::;gTarevector-valuedrandomvari-ableswith(a)E[gtjwt]=r^t,(b)kgtkG,and(c)WRB.Thenfor=R Gp T,E"TXt=1^t(wt)�^t(w)#RGp T+(3+)T:(13)(AdaptedfromLemma3.1inFlaxmanetal.,2005)Proof.Theorem1impliestheLHSof(13)tobe InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Table1. AverageregretofDBGDwithsyntheticfunctions. LFactor 0.60.8123 P1 0.4650.3980.3340.3030.415 P2 0.8030.7670.7600.7800.807 P3 0.6870.6280.6040.6370.663 P4 0.5000.3780.3250.3040.418 P5 0.7100.6630.6740.7980.887 4.4.PracticalConsiderationsChoosingtoachievetheregretboundstatedinThe-orem2requiresknowledgeoft(i.e.,L),whichistypicallynotknowninpracticalsettings.Theregretboundisindeedrobusttothechoiceof.Sosublinearregretisachievableusingmanychoicesfor,aswewillverifyempirically.Intheanalysisw1=0waschosentominimizeitsdistancetoanyotherpointinW.Incertainsettings,wemightchoosew16=0,inwhichcaseouranalysisstillfollowswithslightlyworseconstants.5.Experiments5.1.SyntheticValueFunctionsWerstexperimentedusingsyntheticvaluefunctions,whichallowsustotesttherobustnessofDBGDtodif-ferentchoicesof.SinceLisunknown,weintroducedafreeparameterLandused=T�1=4Lp 0:4Rd:WetestedonvesettingsP1toP5.Eachsettingoptimizesovera50-dimensionalballofradius10,andusesthelogistictransferfunctionwithdierentvaluefunctionsthatexplorearangeofcurvatures(whichaectstheLipschitzconstant)andsymmetries:v1(w)=�wTw;v2(w)=�jwjv3(w)=�Xi:oddw(i)2�Xi:evenw(i)v4(w)=�Xihexpw(i)+exp�w(i)iv5(w)=v3(w)�Xi:(i%3=1)e[w(i)]+�Xi:(i%3=2)e[�w(i)]+Theinitialpointisw1=~1p 5=d.Table1showstheregretovertheinterestingrangeofLvalues.Per-formancedegradesgracefullybeyondthisrange.Notethattheregretofarandompointisabout1sincemostpointsinWhavemuchlowervaluethanv(w).WealsocomparedagainstBanditGradientDescent(BGD)(Flaxmanetal.,2005).LikeDBGD,BGDex-ploresinrandomdirectionsateachiteration.How-ever,BGDassumesaccesstoP(wtw),whereas Figure2.AverageregretforL=1DBGDonlyobservesrandomoutcomes.ThusBGDassumesstrictlymoreinformation4.Weevaluatedtwoversions:BGD1usingP(wtw),andBGD2usingt(w)=P(wtw)�1=2.WeexpectBGD2toper-formbestsincethesignoft(w)revealssignicantin-formationregardingthetruegradient.Figure2showstheaverageregretforproblemsP1andP5withL=1.WeobservethebehaviorsofDBGDandBGDbeingverysimilarforboth.Interestingly,DBGDoutper-formsBGD1onP5despitehavinglessinformation.WealsoobservethistrendforP2andP3,notingthatallthreeproblemshavesignicantlinearcomponents.5.2.WebSearchDatasetForamorerealisticsimulationenvironment,welever-agedarealWebSearchdataset(courtesyofChrisBurgesatMicrosoftResearch).Theideaistosimulateusersissuingqueriesbysamplingfromqueriesinthedataset.Foreachquery,thecompetingretrievalfunc-tionswillproducerankings,afterwhichthe\user"willrandomlypreferonerankingovertheother;weusedavaluefunctionbasedonNDCG@10(denedbelow)todeterminethecomparisonoutcomeprobabilities.Westressthatourusageofthedatasetisverydierentfromsupervisedlearningsettings.Inparticular,(ex-tensionsof)ouralgorithmmightbeappliedtoexper-imentsinvolvingrealuserswhereverylittleisknownabouteachuser'sinternalvaluefunction.Weleveragethisdatasetasareasonablerststepforsimulatinguserbehaviorinanon-linelearningsetting.Thetraining,validationandtestsetseachconsistof1000queries.Weonlysimulatedonthetrainingset,al-thoughwemeasuredperformanceontheothersetstocheckfor,e.g.,generalizationpower.Thereareabout50documentsperquery,anddocumentsarelabeledby5levelsofrelevancefrom0(Bad)to4(Perfect).Thecompatibilitybetweenadocument/querypairis 4Ouranalysisyieldsmatchingupperboundsonex-pectedregretforallthreemethods,thoughitcanbeshownthattheBGDgradientestimateshavelowervariance. InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Table2. Average(upper)andFinal(lower)NDCG@10onWebSearchtrainingset(sampling100queries/iteration) n 0.0010.0050.010.050.1 0.5 0.5240.5700.5800.5690.557 0.8 0.5330.5750.5820.5760.566 1 0.5370.5750.5840.5770.568 3 0.5290.5650.5730.5750.571 0.5 0.5590.5910.5920.5690.565 0.8 0.5640.5930.5930.5740.559 1 0.5680.5920.5950.5820.570 3 0.5570.5810.5820.5770.576 representedusing367features.Astandardretrievalfunctioncomputesascoreforeachdocumentbasedonthesefeatures,withthenalrankingresultingfromsortingbythescores.Forsimplicity,weconsideredonlylinearfunctionsw,sothatthescorefordocumentxiswTx.Sinceonlythedirectionofwmatters,wearethusoptimizingovera367-dimensionalunitsphere.OurvaluefunctionisbasedonNormalizedDiscountedCumulativeGain(NDCG),whichisacommonmea-sureforevaluatingrankings(Donmezetal.,2009).Forqueryq,NDCG@Kofarankingfordocumentsofqis1 N(q)KKXk=12rk�1 log(k+1);whererkistherelevancelevelofthekthrankeddocument,andN(q)Kisanormalizationfactor5suchthatthebestrankingachievesNDCG@K=1.Forourexperiments,weusedthelogisticfunctionand10NDCG@10tomakeprobabilisticcomparisons.Wenoteafewpropertiesofthissetup,somegoingbeyondtheassumptionsinSection3.1.ThisallowsustofurtherexaminethegeneralityofDBGD.First,thevaluefunctionisnowrandom(dependentonthequery).Second,ourfeasiblespaceWistheunitsphereandnotconvex,althoughitisawell-behavedmani-fold.Third,weassumeahomogenoususergroup(i.e.,allusershavethesamevaluefunction{NDCG@10).Fourth,rankingsvarydiscontinuouslyw.r.t.documentscores,andNDCG@10isthusadiscontinuousvaluefunction.Weaddressedthisissuebycomparingmul-tiplequeries(i.e.,delayingmultipleiterations)beforeanupdatedecision,andalsobyusinglargerchoicesofand .Lastly,evensmoothedversionsofNDCGhavelocaloptima(Donmezetal.,2009),makingitdiculttondw(whichisrequiredforcomputingregret).WethususedNDCG@10tomeasureperformance.WetestedDBGDforT=107andarangeof and 5NotethatN(q)Kwillbedierentfordierentqueries. Figure3.NDCG@10onWebSearchtrainingsetvalues.Table2showstheaverage(acrossalliter-ations)andnaltrainingNDCG@10whencompar-ing100queriesperupdate.Performancepeaksat(; )=(1;0:01)anddegradessmoothly.Wefoundsimilarresultswhenvaryingthenumberofqueriescomparedperupdate.Figure3depictsperiterationNDCG@10forthebestmodelswhensampling1,10and100queries.Makingmultiplecomparisonsperupdatehasnoimpactonperformance(thebestpa-rametersaretypicallysmallerwhensamplingfewerqueries).Samplingmultiplequeriesisveryrealistic,sinceasearchsystemmightbeconstrainedto,e.g.,makingdailyupdatestotheirrankingfunction.Per-formanceonthevalidationandtestsetscloselyfollowstrainingsetperformance(soweomittheirresults).Thisimpliesthatourmethodisnotovertting.Forcompleteness,wecomparedourbestDBGDmod-elswitharankingSVM,whichoptimizesoverpair-wisedocumentpreferencesandisastandardbaselineinsupervisedlearningtoranksettings.Moresophisti-catedmethods(e.g.,Chakrabartietal.,2008;Donmezetal.,2009)canfurtherimproveperformance.Table3showsthatDBGDapproachesrankingSVMper-formancedespitemakingfundamentallydierentas-sumptions(e.g.,rankingSVMshaveaccesstoveryspe-cicdocument-levelinformation).Wecautionagainstover-optimizinghere,andadvocateinsteadfordevel-opingmorerealisticexperimentalsettings.6.ConclusionWehavepresentedanon-linelearningframeworkbasedonpairwisecomparisons,andnaturallytswithrecentworkonderivingreliablepairwisejudgments.Ourproposedalgorithm,DBGD,achievessublinearre-gret.Asevidencedbyoursimulationsbasedonwebdata,DBGDcanbeappliedmuchmoregenerallythansuggestedbyourtheoreticalanalysis.Hence,itbegsformoresophisticatedformulationswhichaccountforpropertiessuchasheterogenoususerbehavior,querydependentvaluefunctions,andthediscontinuityof InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsProblem Table3. ComparingRankingSVMvs.nalDBGDmodelusingaverageNDCG@10andper-querywin/tie/losscounts. Model SVMSample1Sample5Sample10Sample25Sample50Sample100 NDCG@10 0.6120.5960.5930.5890.5930.5960.595 W/T/L {490/121/389489/121/390504/118/378489/118/393472/119/409490/116/394 rankings.Anotherinterestingdirectionisadaptivelychoosingand forany-timeregretanalyses.Ourframeworkisextendableinmanyways,suchasintegratingpairwisedocumentpreferences(Joachimsetal.,2007;Carteretteetal.,2008),anddiversity(Yue&Joachims,2008;Radlinskietal.,2008a).Progressinthisareacanleadtocost-eectivesystemsforavari-etyofapplicationdomainssuchaspersonalizedsearch,enterprisesearch,andalsosmallinterestgroups.AcknowledgementsTheworkwasfundedunderNSFAwardIIS-0713483,NSFCAREERAward0237381,andagiftfromYahoo!Research.TherstauthorisalsopartlyfundedbyaMicrosoftResearchGraduateFellowshipandaYahoo!KeyTechnicalChallengesGrant.TheauthorsalsothankRobertKleinberg,JosefBroderandtheanony-mousreviewersfortheirhelpfulcomments.ReferencesAgichtein,E.,Brill,E.,&Dumais,S.(2006).Im-provingWebSearchRankingbyIncorporatingUserBehaviorInformation.ACMConferenceonInfor-mationRetrieval(SIGIR)(pp.19{26).Carterette,B.,Bennett,P.,Chickering,D.M.,&Du-mais,S.(2008).HereorThere:PreferenceJudg-mentsforRelevance.EuropeanConferenceonIn-formationRetrieval(ECIR)(pp.16{27).Carterette,B.,&Jones,R.(2007).EvaluatingSearchEnginesbyModelingtheRelationshipBetweenRel-evanceandClicks.NeuralInformationProcessingSystems(NIPS)(pp.217{224).Chakrabarti,S.,Khanna,R.,Sawant,U.,&Bat-tacharyya,C.(2008).StructuredLearningforNon-SmoothRankingLosses.ACMConferenceonKnowledgeDiscoveryandDataMining(KDD)(pp.88{96).Donmez,P.,Svore,K.,&Burges,C.(2009).OntheLocalOptimalityofLambdaRank.ACMConferenceonInformationRetrieval(SIGIR).Dupret,G.,&Piwowarski,B.(2008).AUserBrows-ingModeltoPredictSearchEngineClickDatafromPastObservations.ACMConferenceonInforma-tionRetrieval(SIGIR)(pp.331{338).Flaxman,A.,Kalai,A.,&McMahan,H.B.(2005).OnlineConvexOptimizationintheBanditSetting:GradientDescentWithoutaGradient.ACM-SIAMSymposiumonDiscreteAlgorithms(SODA)(pp.385{394).Joachims,T.,Granka,L.,Pan,B.,Hembrooke,H.,Radlinski,F.,&Gay,G.(2007).EvaluatingtheAc-curacyofImplicitFeedbackfromClicksandQueryReformulationsinWebSearch.ACMTransactionsonInformationSystems(TOIS),25,7:1{26.Kleinberg,R.(2004).Nearlytightboundsforthecontinuum-armedbanditproblem.NeuralInforma-tionProcessingSystems(NIPS)(pp.697{704).Langford,J.,&Zhang,T.(2007).TheEpoch-GreedyAlgorithmforContextualMulti-armedBan-dits.NeuralInformationProcessingSystems(NIPS)(pp.817{824).Pandey,S.,Agarwal,D.,Chakrabarti,D.,&Josi-fovski,V.(2007).BanditsforTaxonomies:AModel-basedApproach.SIAMConferenceonDataMining(SDM)(pp.216{227).Radlinski,F.,Kleinberg,R.,&Joachims,T.(2008a).LearningDiverseRankingswithMulti-ArmedBan-dits.InternationalConferenceonMachineLearning(ICML)(pp.784{791).Radlinski,F.,Kurup,M.,&Joachims,T.(2008b).HowDoesClickthroughDataRe ectRetrievalQuality?ACMConferenceonInformationandKnowledgeManagement(CIKM)(pp.43{52).Yue,Y.,Broder,J.,Kleinberg,R.,&Joachims,T.(2009).TheK-armedDuelingBanditsProblem.ConferenceonLearningTheory(COLT).Yue,Y.,&Joachims,T.(2008).PredictingDiverseSubsetsUsingStructuralSVMs.InternationalCon-ferenceonMachineLearning(ICML)(pp.1224{1231).Zinkevich,M.(2003).OnlineConvexProgram-mingandGeneralizedInnitesimalGradientAs-cent.InternationalConferenceonMachineLearn-ing(ICML)(pp.928{936).

InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsPro - PDF document

InteractivelyOptimizingInformationRetrievalSystemsasaDuelingBanditsPro - PPT Presentation

Share:

Link:

Embed:

Related Contents