/
Optimizing Search Engines using Clickthrough Data Thorsten Joachims Cornell Univ Optimizing Search Engines using Clickthrough Data Thorsten Joachims Cornell Univ

Optimizing Search Engines using Clickthrough Data Thorsten Joachims Cornell Univ - PDF document

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
456 views
Uploaded On 2014-10-06

Optimizing Search Engines using Clickthrough Data Thorsten Joachims Cornell Univ - PPT Presentation

cornelledu ABSTRACT This paper presents an approach to automatically optimiz ing the retrieval quality of search engines using clickthrough data Intuitively a good information retrieval system should present relevant documents high in the ranking wit ID: 2816

cornelledu ABSTRACT This paper presents

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Optimizing Search Engines using Clickthr..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1.KernelMachineshttp:==svm:first:gmd:de=2.SupportVectorMachinehttp:==jbolivar:freeservers:com=3.SVM-LightSupportVectorMachinehttp:==ais:gmd:de=thorsten=svm light=4.AnIntroductiontoSupportVectorMachineshttp:==www:support�vector:net=5.SupportVectorMachineandKernelMethodsReferenceshttp:==svm:research:bell�labs:com=SVMrefs:html6.ArchivesofSUPPORT-VECTOR-MACHINES@JISCMAIL.AC.UKhttp:==www:jiscmail:ac:uk=lists=SUPPORT�VECTOR�MACHINES:html7.LucentTechnologies:SVMdemoapplethttp:==svm:research:bell�labs:com=SVT=SVMsvt:html8.RoyalHollowaySupportVectorMachinehttp:==svm:dcs:rhbnc:ac:uk=9.SupportVectorMachine-TheSoftwarehttp:==www:support�vector:net=software:html10.LagrangianSupportVectorMachineHomePagehttp:==www:cs:wisc:edu=dmi=lsvm Figure1:Rankingpresentedforthequery\supportvectormachine".Markedinboldarethelinkstheuserclickedon.ofthesearchengine.Inparticular,comparedtoexplicituserfeedback,itdoesnotaddanyoverheadfortheuser.Thequeryqandthereturnedrankingrcaneasilyberecordedwhenevertheresultingrankingisdisplayedtotheuser.Forrecordingtheclicks,asimpleproxysystemcankeepalog le.Fortheexperimentsinthispaper,thefollowingsystemwasused.EachqueryisassignedauniqueIDwhichisstoredinthequery-logalongwiththequerywordsandthepresentedranking.Thelinksontheresults-pagepresentedtotheuserdonotleaddirectlytothesuggesteddocument,butpointtoaproxyserver.Theselinksencodethequery-IDandtheURLofthesuggesteddocument.Whentheuserclicksonthelink,theproxy-serverrecordstheURLandthequery-IDintheclick-log.TheproxythenusestheHTTPLoca-tioncommandtoforwardtheusertothetargetURL.Thisprocesscanbemadetransparenttotheuseranddoesnotin uencesystemperformance.Thisshowsthatclickthroughdatacanberecordedeasilyandatlittlecost.Let'snowaddressthekeyquestionofhowitcanbeanalyzedinaprincipledandecientway.2.2WhatKindofInformationdoesClick­throughDataConvey?Therearestrongdependenciesbetweenthethreepartsof(q;r;c).Thepresentedrankingrdependsonthequeryqasdeterminedbytheretrievalfunctionimplementedinthesearchengine.Furthermore,thesetcofclicked-onlinksdependsonboththequeryqandthepresentedrankingr.First,auserismorelikelytoclickonalink,ifitisrelevanttoq[16].Whilethisdependencyisdesirableandinterestingforanalysis,thedependencyoftheclicksonthepresentedrankingrmuddiesthewater.Inparticular,auserislesslikelytoclickonalinklowintheranking,independentofhowrelevantitis.Intheextreme,theprobabilitythattheuserclicksonalinkatrank10.000isvirtuallyzeroevenifitisthedocumentmostrelevanttothequery.Nouserwillscrolldowntherankingfarenoughtoobservethislink.Therefore,inordertogetinterpretableandmeaningful retrievalfunction bxx tfc hand-tuned avg.clickrank 6.261.14 6.181.33 6.040.92 Table1:Averageclickrankforthreeretrievalfunc-tions(\bxx",\tfc"[23],anda\hand-tuned"strat-egythatusesdi erentweightsaccordingtoHTMLtags)implementedinLASER.RowscorrespondtotheretrievalmethodusedbyLASERatquerytime;columnsholdvaluesfromsubsequentevaluationwithothermethods.Figuresreportedaremeansandtwostandarderrors.Thedataforthistableistakenfrom[5].resultsfromclickthroughdata,itisnecessarytoconsiderandmodelthedependenciesofconqandrappropriately.Beforede ningsuchamodel,let's rstconsideraninter-pretationofclickthroughdatathatisnotappropriate.Aclickonaparticularlinkcannotbeseenasanabsoluterel-evancejudgment.ConsidertheempiricaldatainTable1.Thedataistakenfrom[5]andwasrecordedforthesearchengineLASERcoveringtheWWWoftheCMUSchoolofComputerScience.Thetableshowstheaveragerankoftheclicksperquery(e.g.3:67intheexampleinFigure1).Eachtablecellcontainstheaverageclickrankforthreere-trievalstrategiesaveragedover1400queries.Theaverageclickrankisalmostequalforallmethods.However,accord-ingtosubjectivejudgments,thethreeretrievalfunctionsaresubstantiallydi erentintheirrankingquality.Thelackofdi erenceintheobservedaverageclickrankcanbeex-plainedasfollows.Sinceuserstypicallyscanonlythe rstl(e.g.l10[24])linksoftheranking,clickingonalinkcannotbeinterpretedasarelevancejudgmentonanabso-lutescale.Maybeadocumentrankedmuchlowerinthelistwasmuchmorerelevant,buttheuserneversawit.Itappearsthatusersclickonthe(relatively)mostpromisinglinksinthetopl,independentoftheirabsoluterelevance.Howcantheserelativepreferencejudgmentsbecaptured ismaximal.Notethat(6)is(proportionalto)ariskfunc-tional[25]with�asthelossfunction.Whilethegoaloflearningisnowde ned,thequestionremainswhetheritispossibletodesignlearningmethodsthatoptimize(6)?4.ANSVMALGORITHMFORLEARNINGOFRANKINGFUNCTIONSMostworkonmachinelearningininformationretrievaldoesnotconsidertheformulationofabove,butsimpli esthetasktoabinaryclassi cationproblemwiththetwoclasses\relevant"and\non-relevant".Suchasimpli cationhasseveraldrawbacks.Forexample,duetoastrongma-jorityof\non-relevant"documents,alearnerwilltypicallyachievethemaximumpredictiveclassi cationaccuracy,ifitalwaysresponds\non-relevant",independentofwheretherelevantdocumentsareranked.Butevenmoreimportantly,Section2.2showedthatsuchabsoluterelevancejudgmentscannotbeextractedfromclickthroughdata,sothattheyaresimplynotavailable.Therefore,thefollowingalgorithmdirectlyaddresses(6),takinganempiricalriskminimiza-tionapproach[25].GivenanindependentlyandidenticallydistributedtrainingsampleSofsizencontainingqueriesqwiththeirtargetrankingsr(q1;r1);(q2;r2);:::;(qn;rn):(7)thelearnerLwillselectarankingfunctionffromafamilyofrankingfunctionsFthatmaximizestheempiricalS(f)=1 nnXi=1(rf(qi);ri):(8)onthetrainingsample.Notethatthissetupisanalogoustoe.g.classi cationbyminimizingtrainingerror,justthatthetargetisnotaclasslabel,butabinaryorderingrelation.4.1TheRankingSVMAlgorithmIsitpossibletodesignanalgorithmandafamilyofrank-ingfunctionsFsothat(a) ndingthefunctionf2Fmaxi-mizing(8)isecient,and(b)thatthisfunctiongeneralizeswellbeyondthetrainingdata.Considertheclassoflinearrankingfunctions(di;dj)2f~w(q)()~w(q;di)�~w(q;dj):(9)~wisaweightvectorthatisadjustedbylearning.(q;d)isamappingontofeaturesthatdescribethematchbetweenqueryqanddocumentdlikeinthedescription-orientedre-trievalapproachofFuhretal.[10][11].Suchfeaturesare,forexample,thenumberofwordsthatqueryanddocumentshare,thenumberofwordstheyshareinsidecertainHTMLtags(e.g.TITLE,H1,H2,...),orthepage-rankofd[22](seealsoSection5.2).Figure2illustrateshowtheweightvector~wdeterminestheorderingoffourpointsinatwo-dimensionalexample.Foranyweightvector~w,thepointsareorderedbytheirprojectiononto~w(or,equivalently,bytheirsigneddistancetoahyperplanewithnormalvector~w).Thismeansthatfor~w1thepointsareordered(1;2;3;4),while~w2impliestheordering(2;3;1;4).Insteadofmaximizing(8)directly,itisequivalenttomin-imizethenumberQofdiscordantpairsinEquation(2).Fortheclassoflinearrankingfunctions(9),thisisequivalentto ndingtheweightvectorsothatthemaximumnumberof Figure2:Exampleofhowtwoweightvectors~w1and~w2rankfourpoints.thefollowinginequalitiesisful lled.8(di;dj)2r1:~w(q1;di)�~w(q1;dj)(10):::8(di;dj)2rn:~w(qn;di)�~w(qn;dj)(11)Unfortunately,adirectgeneralizationoftheresultin[13]showsthatthisproblemisNP-hard.However,justlikeinclassi cationSVMs[7],itispossibletoapproximatethesolutionbyintroducing(non-negative)slackvariablesi;j;kandminimizingtheupperboundPi;j;k.AddingSVMreg-ularizationformarginmaximizationtotheobjectiveleadstothefollowingoptimizationproblem,whichissimilartotheordinalregressionapproachin[12].OptimizationProblem1.(RankingSVM)minimize:V(~w;~)=1 2~w~w+CXi;j;k(12)subjectto:8(di;dj)2r1:~w(q1;di)~w(q1;dj)+1�i;j;1:::(13)8(di;dj)2rn:~w(qn;di)~w(qn;dj)+1�i;j;n8i8j8k:i;j;k0(14)Cisaparameterthatallowstrading-o marginsizeagainsttrainingerror.Geometrically,themarginisthedistancebetweentheclosesttwoprojectionswithinalltargetrank-ings.ThisisillustratedinFigure2.OptimizationProblem1isconvexandhasnolocalop-tima.Byrearrangingtheconstraints(13)as~w((qk;di)�(qk;dj))1�i;j;k;(15)itbecomesapparentthattheoptimizationproblemisequiv-alenttothatofaclassi cationSVMonpairwisedi erencevectors(qk;di)�(qk;dj).Duetothissimilarity,itcanbesolvedusingdecompositionalgorithmssimilartothoseusedforSVMclassi cation.Inthefollowing,anadaptationoftheSVMlightalgorithm[14]isusedfortraining1.Itcanbeshownthatthelearnedretrievalfunctionf~wcanalwaysberepresentedasalinearcombinationofthefeature 1Availableathttp:==svmlight:joachims:org RankingA:1.KernelMachineshttp:==svm:first:gmd:de=2.SVM-LightSupportVectorMachinehttp:==ais:gmd:de=thorsten=svm light=3.SupportVectorMachineandKernel...Referenceshttp:==svm:::::com=SVMrefs:html4.LucentTechnologies:SVMdemoapplethttp:==svm:::::com=SVT=SVMsvt:html5.RoyalHollowaySupportVectorMachinehttp:==svm:dcs:rhbnc:ac:uk=6.SupportVectorMachine-TheSoftwarehttp:==www:support�vector:net=software:html7.SupportVectorMachine-Tutorialhttp:==www:support�vector:net=tutorial:html8.SupportVectorMachinehttp:==jbolivar:freeservers:com= RankingB:1.KernelMachineshttp:==svm:first:gmd:de=2.SupportVectorMachinehttp:==jbolivar:freeservers:com=3.AnIntroductiontoSupportVectorMachineshttp:==www:support�vector:net=4.ArchivesofSUPPORT-VECTOR-MACHINES...http:==www:jiscmail:ac:uk=lists=SUPPORT:::5.SVM-LightSupportVectorMachinehttp:==ais:gmd:de=thorsten=svm light=6.SupportVectorMachine-TheSoftwarehttp:==www:support�vector:net=software:html7.LagrangianSupportVectorMachineHomePagehttp:==www:cs:wisc:edu=dmi=lsvm8.ASupport...-Bennett,Blue(ResearchIndex)http:==citeseer:::=bennett97support:html CombinedResults:1.KernelMachineshttp:==svm:first:gmd:de=2.SupportVectorMachinehttp:==jbolivar:freeservers:com=3.SVM-LightSupportVectorMachinehttp:==ais:gmd:de=thorsten=svm light=4.AnIntroductiontoSupportVectorMachineshttp:==www:support�vector:net=5.SupportVectorMachineandKernelMethodsReferenceshttp:==svm:research:bell�labs:com=SVMrefs:html6.ArchivesofSUPPORT-VECTOR-MACHINES@JISCMAIL.AC.UKhttp:==www:jiscmail:ac:uk=lists=SUPPORT�VECTOR�MACHINES:html7.LucentTechnologies:SVMdemoapplethttp:==svm:research:bell�labs:com=SVT=SVMsvt:html8.RoyalHollowaySupportVectorMachinehttp:==svm:dcs:rhbnc:ac:uk=9.SupportVectorMachine-TheSoftwarehttp:==www:support�vector:net=software:html10.LagrangianSupportVectorMachineHomePagehttp:==www:cs:wisc:edu=dmi=lsvm Figure3:Exampleforquery\supportvectormachine".ThetwoupperboxesshowtherankingsreturnedbyretrievalfunctionsAandB.Thelowerboxcontainsthecombinedrankingpresentedtotheuser.Thelinkstheuserclickedonaremarkedinbold.andBarepresentedinthesameway)itisprovenandempiricallyveri edin[16]thattheconclu-sionsdrawnfromthismethodleadtothesameresultasanevaluationwithexplicitmanualrelevancejudgmentsforlarges.5.2OfineExperimentThisexperimentveri esthattheRankingSVMcanin-deedlearnregularitiesusingpartialfeedbackfromclick-throughdata.Togeneratea rsttrainingset,IusedtheStriversearchengineforallofmyownqueriesduringOc-tober,2001.StriverdisplayedtheresultsofGoogleandMSNSearchusingthecombinationmethodfromtheprevi-oussection.Allclickthroughtripletswererecorded.Thisresultedin112querieswithanon-emptysetofclicks.Thisdataprovidesthebasisforthefollowingoineexperiment.TolearnaretrievalfunctionusingtheRankingSVM,itisnecessarytodesignasuitablefeaturemapping(q;d)describingthematchbetweenaqueryqandadocumentd.Thefollowingfeaturesareusedintheexperiment.However,thissetoffeaturesislikelytobefarfromoptimal.Whiletheattributesre ectsomeofmyintuitionaboutwhatcouldbeimportantforlearningagoodranking,Iincludedonlythosefeaturesthatwereeasytoimplement.Furthermore,Ididnotdoanyfeatureselectionorsimilartuning,sothatanappropriatedesignoffeaturespromisesmuchroomforimprovement.Theimplementedfeaturesarethefollowing:1.Rankinothersearchengines(38featurestotal):rank X:100minusrankinX2fGoogle,MSN-Search,Altavista,Hotbot,Excitegdividedby100(mini-mum0)top1 X:ranked#1inX2fGoogle,MSNSearch,Al-tavista,Hotbot,Exciteg(binaryf0;1g)top10 X:rankedintop10inX2fGoogle,MSN-Search,Altavista,Hotbot,Exciteg(binaryf0;1g)top50 X:rankedintop50inX2fGoogle,MSN-Search,Altavista,Hotbot,Exciteg(binaryf0;1g)top1count X:ranked#1inXofthe5searchenginestop10count X:rankedintop10inXofthe5searchenginestop50count X:rankedintop50inXofthe5searchengines2.Query/ContentMatch(3featurestotal):query url cosine:cosinebetweenURL-wordsandquery(range[0;1])query abstract cosine:cosinebetweentitle-wordsandquery(range[0;1])domain name in query:querycontainsdomain-namefromURL(binaryf0;1g)3.Popularity-Attributes(20:000featurestotal):url length:lengthofURLincharactersdividedby30country X:countrycodeXofURL(binaryattributef0;1gforeachcountrycode) weightfeature 0.60query abstract cosine0.48top10 google0.24query url cosine0.24top1count 10.24top10 msnsearch0.22host citeseer0.21domain nec0.19top10count 30.17top1 google0.17country de...0.16abstract contains home0.16top1 hotbot...0.14domain name in query...-0.13domain tu-bs-0.15country -0.16top50count 4-0.17url length-0.32top10count 0-0.38top1count 0 Table3:Featureswithlargestandsmallestweightsaslearnedfromthetrainingdataintheonlineex-periment.5.4AnalysisoftheLearnedFunctionThepreviousresultshowsthatthelearnedfunctionim-provesretrieval.Butwhatdoesthelearnedfunctionlooklike?Isitreasonableandintuitive?SincetheRankingSVMlearnsalinearfunction,onecananalyzethefunctionbystudyingthelearnedweights.Table3displaystheweightsofsomefeatures,inparticular,thosewiththehighestabso-luteweights.Roughlyspeaking,ahighpositive(negative)weightindicatesthatdocumentswiththesefeaturesshouldbehigher(lower)intheranking.TheweightsinTable3arereasonableforthisgroupofusers.Sincemanyquerieswereforscienti cmaterial,itap-pearsnaturalthatURLsfromthedomain\citeseer"(andthealias\nec")receivedpositiveweight.Themostin u-entialweightsareforthecosinematchbetweenqueryandabstract,whethertheURLisinthetop10fromGoogle,andforthecosinematchbetweenqueryandthewordsintheURL.Adocumentreceiveslargenegativeweights,ifitisnotrankedtop1byanysearchengine,ifitnotinthetop10ofanysearchengine(notethatthesecondimpliesthe rst),andiftheURLislong.Alltheseweightsarereasonableandmakesenseintuitively.6.DISCUSSIONANDRELATEDWORKTheexperimentalresultsshowthattheRankingSVMcansuccessfullylearnanimprovedretrievalfunctionfromclick-throughdata.Withoutanyexplicitfeedbackormanualpa-rametertuning,ithasautomaticallyadaptedtothepartic-ularpreferencesofagroupof20users.Thisimprovementisnotonlyaveri cationthattheRankingSVMcanlearnusingpartialrankingfeedback,butalsoanargumentforper-sonalizingretrievalfunctions.Unlikeconventionalsearchenginesthathaveto\ t"theirretrievalfunctiontolargeandthereforeheterogeneousgroupsofusersduetothecostofmanualtuning,machinelearningtechniquescanimproveretrievalsubstantiallybytailoringtheretrievalfunctiontosmallandhomogenousgroups(orevenindividuals)withoutprohibitivecosts.Whilepreviousworkonlearningretrievalfunctionsexists(e.g.[10]),mostmethodsrequireexplicitrelevancejudg-ments.MostcloselyrelatedistheapproachofBartelletal.[2].Theypresentamixture-of-expertsalgorithmsforlinearlycombiningrankingexpertsbymaximizingadi er-entrankcorrelationcriterion.However,intheirsetuptheyrelyonexplicitrelevancejudgments.AsimilaralgorithmforcombiningrankingswasproposedbyCohenatal.[6].Theyshowempiricallyandtheoreticallythattheiralgorithm ndsacombinationthatperformsclosetothebestofthebasicexperts.TheboostingalgorithmofFreundetal.[9]isanap-proachtocombiningmanyweakrankingrulesintoastrongrankingfunctions.Whiletheyalso(approximately)mini-mizethenumberofinversions,theydonotexplicitlycon-sideradistributionoverqueriesandtargetrankings.How-ever,theiralgorithmcanprobablybeadaptedtothesettingconsideredinthispaper.Algorithmicallymostcloselyre-latedistheSVMapproachtoordinalregressionbyHerbrichetal.[12].But,again,theyconsideradi erentsamplingmodel.Inordinalregressionallobjectsinteractandtheyarerankedonthesamescale.Fortherankingproblemininfor-mationretrieval,rankingsneedtobeconsistentonlywithinaquery,butnotbetweenqueries.Thismakestherankingproblemlessconstrained.Forexample,intherankingprob-lemtwodocumentsdianddjcanendupatverydi erentranksfortwodi erentqueriesqkandqleveniftheyhaveexactlythesamefeaturevector(i.e.(qk;di)=(ql;dj)).Anelegantperceptron-likealgorithmforordinalregressionwasrecentlyproposedbyCrammerandSinger[8].Anin-terestingquestioniswhethersuchanonlinealgorithmcanalsobeusedtosolvetheoptimizationproblemconnectedtotheRankingSVM.Someattemptshavebeenmadetouseimplicitfeedbackbyobservingclickingbehaviorinretrievalsystems[5]andbrowsingassistants[17][20].However,thesemanticsofthelearningprocessanditsresultsareunclearasdemonstratedinSection2.2.Thecommercialsearchengine\DirectHit"makesuseofclickthroughdata.Theprecisemechanism,however,isunpublished.Whileforadi erentproblem,aninterestinguseofclickthroughdatawasproposedin[3].TheyuseclickthroughdataforidentifyingrelatedqueriesandURLs.WhatarethecomputationaldemandsoftrainingtheRank-ingSVMonclickthroughdata?SinceSVMlight[15]solvesthedualoptimizationproblem,itdependsonlyoninnerproductsbetweenfeaturevectors(q;d).Ifthesefeaturevectorsaresparseasabove,SVMlightcanhandlemillionsoffeatureseciently.Mostin uentialonthetrainingtimeisthenumberofconstraintsinOptimizationProblem2.However,whenusingclickthroughdata,thenumberofcon-straintsscalesonlylinearlywiththenumberofqueries,ifthenumberofclicksperqueryisupperbounded.Inotherapplications,SVMlighthasalreadyshowedthatitcansolveproblemswithseveralmillionsofconstraintsusingaregulardesktopcomputer.However,scalingtotheorderofmag-nitudefoundinmajorsearchenginesisaninterestingopenproblem. [15]T.Joachims.LearningtoClassifyTextUsingSupportVectorMachines{Methods,Theory,andAlgorithms.Kluwer,2002.[16]T.Joachims.Unbiasedevaluationofretrievalqualityusingclickthroughdata.Technicalreport,CornellUniversity,DepartmentofComputerScience,2002.http://www.joachims.org.[17]T.Joachims,D.Freitag,andT.Mitchell.WebWatcher:atourguidefortheworldwideweb.InProceedingsofInternationalJointConferenceonArti cialIntelligence(IJCAI),volume1,pages770{777.MorganKaufmann,1997.[18]J.KemenyandL.Snell.MathematicalModelsintheSocialSciences.Ginn&Co,1962.[19]M.Kendall.RankCorrelationMethods.Hafner,1955.[20]H.Lieberman.Letizia:AnagentthatassistsWebbrowsing.InProceedingsoftheFifteenthInternationalJointConferenceonArti cialIntelligence(IJCAI'95),Montreal,Canada,1995.MorganKaufmann.[21]A.Mood,F.Graybill,andD.Boes.IntroductiontotheTheoryofStatistics.McGraw-Hill,3edition,1974.[22]L.PageandS.Brin.Pagerank,aneigenvectorbasedrankingapproachforhypertext.In21stAnnualACM/SIGIRInternationalConferenceonResearchandDevelopmentinInformationRetrieval,1998.[23]G.SaltonandC.Buckley.Termweightingapproachesinautomatictextretrieval.InformationProcessingandManagement,24(5):513{523,1988.[24]C.Silverstein,M.Henzinger,H.Marais,andM.Moricz.Analysisofaverylargealtavistaquerylog.TechnicalReportSRC1998-014,DigitalSystemsResearchCenter,1998.[25]V.Vapnik.StatisticalLearningTheory.Wiley,Chichester,GB,1998.[26]Y.Yao.Measuringretrievale ectivenessbasedonuserpreferenceofdocuments.JournaloftheAmericanSocietyforInformationScience,46(2):133{145,1995.APPENDIXTheorem1.Letrrelbetherankingplacingallrelevantdocumentsaheadofallnon-relevantdocumentsandletrsysbethelearnedranking.IfQisthenumberofdiscordantpairsbetweenrrelandrsys,thentheaverageprecisonisatleastAvgPrec(rsys;rrel)1 RQ+R+12�1 RXi=1p i!2ifthereareRrelevantdocuments.Proof.Ifp1;:::;pRaretheranksoftherelevantdocu-mentsinrsyssortedinincreasingorder,thenAveragePre-cisioncanbecomputedasAvgPrec(rsys;rrel)=1 RRXi=1i pi(24)WhatistheminimumvalueofAvgPrec(rsys;rrel),giventhatthenumberofdiscordantpairsis xed.Itiseasytoseethatthesumoftheranksp1+:::+pRisrelatedtothenumberofdiscordantQasfollows.p1+:::+pR=Q+R+12(25)Itisnowpossibletowritethelowerboundasthefollow-ingintegeroptimizationproblem.ItcomputestheworstpossibleAveragePrecisionfora xedvalueofQ.minimize:P(p1;:::;pR)=1 RRXi=1i pi(26)subject.to:p1+:::+pR=Q+R+12(27)1p1:::pR(28)p1;:::;pRinteger(29)Relaxingtheproblembyremovingthelasttwosetsofcon-straintscanonlydecreasetheminimum,sothatthesolutionwithouttheconstraintsisstillalowerbound.Theremain-ingproblemisconvexandcanbesolvedusingLagrangemultipliers.TheLagrangianisL(p1;:::;pR; )=1 RRXi=1i pi+ "RXi=1pi�Q�R+12#:(30)Attheminimumoftheoptimizationproblem,theLagrangianisknowntohavepartialderivativesequaltozero.StartingwiththepartialderivativesforthepiL(p1;:::;pR; ) pi=�iR�1p�2i+ :=0;(31)solvingforpi,andsubstitutingbackintotheLagrangianleadstoL(p1;:::;pR; )=2R�1 2 1 2RXi=1i1 2� Q�R+12:(32)Nowtakingthederivativewithrespectto L(p1;:::;pR; )  =R�1 2 �1 2RXi=1i1 2�Q�R+12:=0;(33)solvingfor ,andagainsubstitutingintotheLagrangianleadstothedesiredsolution.