/
PreferenceLearningwithGaussianProcesses PreferenceLearningwithGaussianProcesses

PreferenceLearningwithGaussianProcesses - PDF document

debby-jeon
debby-jeon . @debby-jeon
Follow
363 views
Uploaded On 2017-03-07

PreferenceLearningwithGaussianProcesses - PPT Presentation

WeiChuchuweigatsbyuclacukZoubinGhahramanizoubingatsbyuclacukGatsbyComputationalNeuroscienceUnitUniversityCollegeLondonLondonWC1N3ARUKInthispaperweproposeaprobabilistickernelapproachtopre ID: 523290

WeiChuchuwei@gatsby.ucl.ac.ukZoubinGhahramanizoubin@gatsby.ucl.ac.ukGatsbyComputationalNeuroscienceUnit UniversityCollegeLondon London WC1N3AR UKInthispaper weproposeaprobabilisticker-nelapproachtopre

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "PreferenceLearningwithGaussianProcesses" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

PreferenceLearningwithGaussianProcesses WeiChuchuwei@gatsby.ucl.ac.ukZoubinGhahramanizoubin@gatsby.ucl.ac.ukGatsbyComputationalNeuroscienceUnit,UniversityCollegeLondon,London,WC1N3AR,UKInthispaper,weproposeaprobabilisticker-nelapproachtopreferencelearningbasedonGaussianprocesses.Anewlikelihoodfunc-tionisproposedtocapturethepreferencerelationsintheBayesianframework.Thegeneralizedformulationisalsoapplicableto Appearingin PreferenceLearningwithGaussianProcesses posedforpreferencelearning.Theproblemsizeofthisapproachremainslinearwiththesizeofthetrain-inginstances,ratherthangrowingquadratically.ThisapproachprovidesageneralBayesianframeworkformodeladaptationandprobabilisticprediction.There-sultsofnumericalexperimentscomparedagainstthatoftheconstraintclassi“cationapproachofHar-Peledetal.(2002)verifytheusefulnessofouralgorithm.Thepaperisorganizedasfollows.Insection2wede-scribetheprobabilisticapproachforpreferencelearn-ingoverinstancesindetail.Insection3wegeneralizethisframeworktolearnlabelpreferences.Insection4,weempiricallystudytheperformanceofouralgorithmonthreelearningtasks,andweconcludeinsection5.2.LearningInstancePreferenceConsiderasetofdistinctinstances,andasetofobservedpairwisepreferencerelationsontheinstances,denoted,andmeansthein-ispreferredto.Forexample,thepair)couldbetwooptionsprovidedbytheauto-matedagentforrouting(Fiechter&Rogers,2000),whiletheusermaydecidetotaketheroutethantheroutebyhis/herownjudgement.2.1.BayesianFrameworkThemainideaistoassumethatthereisanunob-servablelatentfunctionvalue)associatedwitheachtrainingsample,andthatthefunctionvaluespreservethepreferencerelationsobservedinthedataset.WeimposeaGaussianprocesspriorontheselatentfunctionvalues,andemployanappropri-atelikelihoodfunctiontolearnfromthepairwisepref-erencesbetweensamples.TheBayesianframeworkisdescribedwithmoredetailsinthefollowing.2.1.1.PriorProbabilityThelatentfunctionvaluesareassumedtobearealizationofrandomvariablesinazero-meanGaussianprocess(Williams&Rasmussen,1996).TheGaussianprocessescanthenbefullyspeci“edbythecovariancematrix.Thecovariancebetweenthelatentfunctionscorrespondingtotheinputsbede“nedbyanyMercerkernelfunctions(Sch¨olkopf&Smola,2002).AsimpleexampleistheGaussiankernelde“nedas)=exp 0anddenotesthe-thelementofThusthepriorprobabilityoftheselatentfunctionval-isamultivariateGaussian )n 2||1 2Š1 f(x1),f(x2),...,f(xn)]T,andisthecovariancematrixwhose-thelementisthecovariancefunction)de“nedasin(2).2.1.2.LikelihoodAnewlikelihoodfunctionisproposedtocapturethepreferencerelationsin(1),whichisde“nedasfollowsforideallynoise-freecases:))=1if0otherwise.Thisrequiresthatthelatentfunctionvaluesofthein-stancesshouldbeconsistentwiththeirpreferencere-lations.Toallowsometolerancetonoiseintheinputsorthepreferencerelations,wecouldassumethelatentfunctionsarecontaminatedwithGaussiannoise.Gaussiannoiseisofzeromeanandunknownvarianceµ,)isusedtodenoteaGaussianrandomvariablewithmeanandvarianceThenthelikelihoodfunction(4)becomes=(  and(Inoptimization-basedapproachestomachinelearning,thequantity))isusuallyreferredtoasthelossfunction,i.e.ln().Thederivativesofthelossfunctionswithrespectto)areneededinBayesianmethods.The“rstandsecondorderderivativesofthelossfunctioncanbewrittenasln( xi)=Šsk(xi)  2N(zk ln( xi)xj)=sk(xi)sk(xj) 22N2(zk 2(zk)+zkN(zk )isanindicatorfunctionwhichis+1if1if;0otherwise.Thelikelihoodisthejointprobabilityofobservingthepreferencerelationsgiventhelatentfunctionvalues,whichcanbeevaluatedasaproductofthelikelihoodfunction(5),i.e. Inprinciple,anydistributionratherthanaGaussiancanbeassumedforthenoiseonthelatentfunctions. PreferenceLearningwithGaussianProcesses 2.1.3.PosteriorProbabilityBasedonBayestheorem,theposteriorprobabilitycanthenbewrittenas ))(9)wherethepriorprobability)isde“nedasin(3),thelikelihoodfunctionisde“nedasin(5),andthenormalizationfactorTheBayesianframeworkwedescribedaboveiscon-ditionalonthemodelparametersincludingtheker-nelparametersinthecovariancefunction(2)thatcontrolthekernelshape,andthenoiselevelinthelikelihoodfunction(5).Theseparameterscanbecol-lectedinto,whichisthehyperparametervector.Thenormalizationfactor),moreexactly),isknownastheevidenceforthehyperparameters.Inthenextsection,wediscusstechniquesforhyperpara-meterlearning.2.2.ModelSelectionInafullBayesiantreatment,thehyperparametersmustbeintegratedoverthe-spaceforpredic-tion.MonteCarlomethods(e.g.Neal,1996)canbeadoptedheretoapproximatetheintegraleec-tively.Howeverthesemightbecomputationallypro-hibitivetouseinpractice.Alternatively,weconsidermodelselectionbydetermininganoptimalsettingfor.Theoptimalvaluesofthehyperparameterscanbesimplyinferredbymaximizingtheevidence,i.e.=argmax).Apopularideaforcomputingtheevidenceistoapproximatetheposteriordistribu-)asaGaussian,andthentheevidencecanbecalculatedbyanexplicitformula.Inthissection,weappliedtheLaplaceapproximation(MacKay,1994)inevidenceevaluation.TheevidencecanbecalculatedanalyticallyafterapplyingtheLaplaceapproximationatthemaximumaposteriori(MAP)estimate,andgradient-basedoptimizationmethodscanthenbeem-ployedtoimplementmodeladaptationbymaximizingtheevidence.2.2.1.maximumaposterioriestimateTheMAPestimateofthelatentfunctionvaluesreferstothemodeoftheposteriordistribution,i.e.argmax),whichisequivalenttotheminimizerofthefollowingfunctional:ln( Lemma1.Theminimizationofthefunctionalde“nedasin(10),isaconvexprogrammingproblem.Proof.TheHessianmatrixof)canbewritten +whereisanmatrixwhose-thentryisln( .FromMercerstheo-rem(Sch¨olkopf&Smola,2002),thecovariancematrixispositivesemide“nite.Thematrixcanbeshowntobepositivesemide“nitetooasfollows.Letnoteacolumnvector[,andassumethepair()inthe-thpreferencerelationisasso-ciatedwiththe-thand-thsamples.Byexploit-ingthepropertyofthesecondorderderivative(7),wehaveln( ln( .So0holdsThereforetheHessianmatrixisapositivesemide“nitematrix.Thisprovesthelemma.TheNewton-Raphsonformulacanbeusedto“ndthesolutionforsimplecases.As =0attheMAPestimate,wehaveln( 2.2.2.EvidenceApproximationTheLaplaceapproximationof)referstocarryingouttheTaylorexpansionattheMAPpointandretain-ingthetermsuptothesecondorder(MacKay,1994).Thisisequivalenttoapproximatingtheposteriordis-)asaGaussiandistributioncenteredwiththecovariancematrix(wheredenotesthematrixattheMAPesti-mate.Theevidencecanthenbecomputedasanex-plicitexpression,i.e.+ isanidentitymatrix.Thequantity(12)isaconvenientyardstickformodelselection.2.2.3.GradientDescentGridsearchcanbeusedto“ndtheoptimalhyperpa-rametervalues,butsuchanapproachisveryex-pensivewhenalargenumberofhyperparametersareinvolved.Forexample,automaticrelevancedetermi-nation(ARD)parameterscouldbeembeddedintothecovariancefunction(2)asameansoffeaturese-lection.TheARDGaussiankernelcanbede“nedas)=exp 2d(xiŠxj)2 PracticallywecaninsertajitterŽtermonthediago-nalentriesofthematrixtomakeitpositivede“nite.ThetechniquesofautomaticrelevancedeterminationwereoriginallyproposedbyMacKay(1994)andNeal(1996)inthecontextofBayesianneuralnetworksasahi-erarchicalpriorovertheweights. PreferenceLearningwithGaussianProcesses 0istheARDparameterforthe-thfeaturethatcontrolsthecontributionofthisfeatureinthemodelling.Thenumberofhyperparametersincreases+1inthiscase.Gradient-basedoptimizationmethodsareregardedassuitabletoolstodeterminethevaluesofthesehyper-parameters,asthegradientsofthelogarithmoftheev-idence(12)withrespecttothehyperparametersbederivedanalytically.Weusuallycollectasthesetofvariablestotune.Thisde“nitionoftun-ablevariablesishelpfultoconverttheconstrainedop-timizationproblemintoanunconstrainedoptimizationproblem.Thegradientsofln)withrespecttothesevariablescanbederivedasfollows: = 2fTŠ1 Š1f 21Š1Š1 Š 21Š1MAP ,() ln(  2)  21Š1MAP Thengradient-descentmethodscanbeemployedtosearchforthemaximizerofthelogevidence.2.3.PredictionNowletustakeatestpair(r,s)onwhichthepref-erencerelationisunknown.Thezero-meanlatentvariablesriablesf(r),f(s)]Thavecorrelationswithzero-meanrandomvariablesoftrainingsamplesThecorrelationsarede“nedbytheco-variancefunctionin(2),sothatwehavethepriorjointmultivariateGaussiandistribution,i.e.r,xr,xr,xs,xs,xs,xr,rr,ss,rs,s.Sotheconditionaldistrib-)isaGaussiantoo.Thepredictivedis-tributionof)canbecomputedasanintegralover-space,whichcanbewrittenasTheposteriordistribution)canbeapproxi-matedasaGaussianbytheLaplaceapproximation.Thepredictivedistribution(16)canbe“nallysimpli-“edasaGaussian)withmean Thelatentvariables)and)areassumedtobedistinctfromandvariance(+Thepredictivepreference)canbeevalu-atedbytheintegral 2.4.DiscussionTheevidenceevaluation(12)involvessolvingacon-vexprogrammingproblem(10)andthencomputingthedeterminantofanmatrix,whichcostsCPUtimeat),whereisthenumberofdistinctin-stancesinthepreferencepairsfortrainingwhichispotentiallymuchfewerthanthenumberofpreference.Activelearningcanbeappliedtolearnonverylargedatasetseciently(Brinker,2004).ThefasttrainingalgorithmforGaussianprocesses(Csat´&Opper,2002)canalsobeadaptedinthesettingsofpreferencelearningforspeedup.Lawrenceetal.(2002)proposedagreedyselectioncriterionrootedininformation-theoreticprinciplesforsparserepresenta-tionofGaussianprocesses.IntheBayesianframeworkwehavedescribed,theexpectedinformativenessofanewpairwisepreferencerelationcanbemeasuredasthechangeinentropyoftheposteriordistributionofthelatentfunctionsbytheinclusionofthispreference.Apromisingapproachtoactivelearningistoselectfromthedatapoolthesamplewiththehighestex-pectedinformationgain.Thisisadirectionforfuturework.3.LearningLabelPreferencePreferencerelationscanbede“nedovertheinstanceslabelsinsteadofovertheinstances.Inthiscase,eachinstanceisassociatedwithaprede“nedsetoflabels,andthepreferencerelationsoverthelabelsetarede-“ned.Thislearningtaskisalsoknownaslabelrank-.Thepreferencesofeachtrainingsamplecanbepresentedintheformofadirectedgraph,knownasapreferencegraph(Dekeletal.,2004;Aiolli&Sper-duti,2004),wherethelabelsarethegraphvertices.Thepreferencegraphcanbedecomposedintoasetofpairwisepreferencerelationsoverthelabelsetforeachsample.InFigure1,wepresentthreepopularexamplesasanillustration.Supposethatweareprovidedwithatrainingdatasetisasamplefortrainingandisthesetofdirectededgesinthepreferencegraph PreferenceLearningwithGaussianProcesses (a) Classification (b) Ordinal Regression 1 (c) Hierarchicalmulticlass settingFigure1.Graphsoflabelpreferences,whereanedgefromnodetonodeindicatesthatlabelispreferredtola-bel.(a)standardmulticlassclassi“cationwhere3isthecorrectlabel.(b)thecaseofordinalregressionwhere3isthecorrectordinalscale.(c)amulti-layergraphinhierar-chicalmulticlasssettingsthatspeci“esthreelevelsoflabel,denotedasistheinitiallabelvertexofthe-thedgewhileistheterminallabel,andisthenumberofedges.Eachsamplecanhaveadierentpreferencegraphoverthelabels.TheBayesianframeworkforinstancepref-erencescanbegeneralizedtolearnlabelpreferencessimilarlytoGaussianprocessesformulticlassclassi“-cation(Williams&Barber,1998).Weintroducedis-tinctGaussianprocessesforeachprede“nedlabel,andthelabelpreferenceofthesamplesarepreservedbythelatentfunctionvaluesintheseGaussianprocessesviathelikelihoodfunction(5).ThepriorprobabilityoftheselatentfunctionsbecomesaproductofmultivariateGaussians,i.e. )n 2|a|1 2Š1 fa(x1),fa(x2),...,fa(xn)]andisthenumberofthelabels.isthecovariancematrixde-“nedbythekernelfunctionasin(2).Theobservedrequirethecorrespondingfunc-tionvalues.Usingthelike-lihoodfunctionforpairwisepreferences(5),thelikeli-hoodofobservingthesepreferencegraphscanbecom-putedas)(21)  and(.Theposteriorprobabilitycanthenbewrittenas )(22)isthemodelevidence.ApproximateBayesianmethodscanbeappliedtoin-fertheoptimalhyperparameter.WeappliedtheLaplaceapproximationagaintoevaluatetheevidence.TheMAPestimateisequivalenttothesolutiontothefollowingoptimizationproblem: ln()(23)Like(10),thisisalsoaconvexprogrammingproblem.AttheMAPestimatewehaveln( .Theevidence),moreexactly),withtheLaplaceapprox-imation(MacKay,1994),canbeapproximatedasthefollowingexpressionaccordingly: isanidentitymatrix,ln( isanblock-diagonalmatrixwithblocks.Thegradientswithrespecttocanbederivedasin(14)…(15)accordingly.Theoptimalhyperparameterscanbediscoveredbyagradient-descentoptimizationpackage.Duringprediction,thetestcaseisassociatedwithlatentfunctionsfortheprede“nedla-belsrespectively.Thecorrelationsbetweenarede“nedbythekernelfunctionasin(2),(2),Ka(xt,x1),Ka(xt,x2),...,Ka(xt,xn)].5Themeanofthepredictivedistribution)canbeapproximatedassfa(xt)]=isde-“nedasin(24)at.Thelabelpreferencecanthenbedecidedbyargsortortfa(xt)].(26)ThislabelpreferencelearningmethodcanbeappliedtotackleordinalregressionusingthepreferencegraphinFigure1(b).Suchanapproachisdierentfromourpreviousworkonordinalregression(Chu&Ghahra-mani,2004).BothmethodsuseGaussianprocesspriorandimplementorderinginformationbyinequal-ities.HowevertheaboveapproachneedsprocesseswhiletheapproachinChuandGhahramani(2004)usesonlyasingleGaussianprocess.Foror-dinalregressionproblems,theapproachinChuandGhahramani(2004)seemsmorenatural.4.NumericalExperimentsIntheimplementationofourGaussianprocessalgo-rithmforpreferencelearning,gradient-descentmeth- Inthecurrentwork,wesimplyconstrainedalltheco-variancefunctionstousethesamekernelparameters. PreferenceLearningwithGaussianProcesses odshavebeenemployedtomaximizetheapproxi-matedevidenceformodeladaptation.Westartedfromtheinitialvaluesofthehyperparameterstoin-fertheoptimalones.Wealsoimplementedthecon-straintclassi“cationmethodofHar-Peledetal.(2002)usingsupportvectormachinesforcomparisonpur-pose().5-foldcrossvalidationwasusedtode-terminetheoptimalvaluesofmodelparameters(theGaussiankernelandtheregularizationfactor)in-volvedintheformulation,andthetesterrorwasobtainedusingtheoptimalmodelparametersforeachformulation.Theinitialsearchwasdoneona7coarsegridlinearlyspacedby1.0intheregion,followedbya“nesearchona99uniformgridlinearlyspacedby0.2inthe(log)space.Webeginthissectiontocomparethegeneralizationperformanceofouralgorithmagainsttheproachon“vedatasetsofinstancepreferences.Thenweempiricallystudythescalingpropertiesofthetwoalgorithmsonaninformationretrievaldataset.Wealsoapplyouralgorithmtoseveralclassi“cationandlabelrankingtaskstoverifytheusefulness.4.1.InstancePreferenceWe“rstcomparedtheperformanceofouralgorithmagainsttheapproachonthetasksoflearn-inginstancepreferences.Wecollected“vebenchmarkdatasetsthatwereusedformetricregressionprob-Thetargetvalueswereusedtodecidethepreferencerelationsbetweenpairsofinstances.Foreachdataset,werandomlyselectedanumberoftrain-ingpairsasspeci“edinTable1,and20000pairsfortesting.Theselectionwasrepeated20timesindepen-dently.TheGaussiankernel(2)wasusedforbothandouralgorithm.Intherithm,eachpreferencerelationistransformedtoapairofnewsampleswithlabels+1and1re-spectively.WereporttheirtestresultsinTable1,alongwiththeresultsofouralgorithmusingtheARDGaussiankernel(13).TheGPalgorithmgivessignif-icantlybettertestresultsthanthatoftheapproachonthreeofthe“vedatasets.TheARDker-nelyieldsbetterperformanceontheBostonHousingandcomparableresultsonotherdatasets. ThesourcecodewritteninANSICcanbefoundathttp://www.gatsby.ucl.ac.uk/chuwei/plgp.htm.Innumericalexperiments,theinitialvaluesofthehy-perparameterswereusuallychosenas0andistheinputdimension.Wesuggesttotrymorestartingpointsinpracticeandthenchoosethebestmodelbytheevidence.Theseregressiondatasetsareavailableathttp://www.liacc.up.pt/ltorgo/Regression/DataSets.html.Table1.Testresultsonthe“vedatasetsforpreferencelearning.ErrorRateŽisthepercentofincorrectpref-erencepredictionaveragedover20trialsalongwithstan-darddeviation.ŽisthenumberoftrainingpairsandŽistheinputdimension.CC-SVMŽandGPŽde-notestheCC-SVMandouralgorithmusingtheGaussiankernel.GPARDŽdenotesouralgorithmusingtheARDGaussiankernel.Weuseboldfacetoindicatethelowesterrorrate.ThesymbolsindicatethecasesofCC-SVMsigni“cantlyworsethanthatofGP;Ap-valuethresholdof01inWilcoxonranksumtestwasusedtodecidethis. ErrorRate(%) Dataset CC-SVMGPGPARD 10027 30060 MachineCpu 5006 70013 1.0812.85 1000 Hershetal.(1994)generatedtheforinformationretrieval,wheretherelevancelevelofthedocumentswithrespecttothegiventextualquerywereassessedbyhumanexperts,usingthreerankscales:de“nitelyrelevant,possiblyrelevantornotrelevant.Inourexperimenttostudythescal-ingproperties,weusedtheresultsofQuery3Žinthatcontain201referencestakenfromthewholedatabase(99de“nitely,59possibly,and43ir-relevant).Thebag-of-wordsrepresentationwasusedtotranslatethesedocumentsintothevectorsoftermfrequencyŽ(TF)componentsscaledbyinversedocu-mentfrequenciesŽ(IDF).WeusedtheRainbowŽsoft-warereleasedbyMcCallum(1996)toscanthetitleandabstractofthesedocumentstocomputetheTFIDFvectors.Inthepreprocessing,weskippedthetermsinthestoplistŽ,andrestrictedourselvestotermsthatappearinatleast3ofthe201documents.SoeachdocumentisrepresentedbyitsTFIDFvectorwith576distinctelements.Toaccountfordierentdocumentlengths,wenormalizedthelengthofeachdocumentvectortounity.Thepreferencerelationofapairofdocumentscanbedeterminedbytheirrankscales.Werandomlyselectedasubsetofpairsofdocuments(havingdierentrankscales)withsize,...,fortraining,andthentestedontheremainingpairs.Ateachsize,therandomse-lectionwascarriedout20times.Thelinearkernelwasusedfortheouralgorithm.ThetestresultsofthetwoalgorithmsarepresentedintheleftgraphofFigure2.Theper-formancesofthetwoalgorithmsareverycompetitiveonthisapplication.IntherightgraphofFigure2,thecirclespresenttheCPUtimeconsumedtosolve(10)andevaluatetheevidence(12)onceinouralgorithm ThestoplistŽistheSMARTsystemslistof524com-monwords,liketheŽandofŽ. PreferenceLearningwithGaussianProcesses 200 400 600 800 1000 0.05 0 0.05 0.1 0.15 0.2 Number of Preference Pairs in Training CCSVM 2 103 102 101 100 CPU Time in SecondsNumber of Preference Pairs in Training Figure2.TheleftgraphpresentsthetesterrorratesonpreferencerelationsoftheOHSUMEDdatasetatdierenttrainingdatasizes.Thecrossesindicatetheaveragevaluesoverthe20trialsandtheverticallinesindicatethestan-darddeviation.TherightgraphpresentstheCPUtimeinsecondsconsumedbythetwoalgorithms.whilethecrossespresenttheCPUtimeforsolvingthequadraticprogrammingproblemonceintheapproach.Weobservedthatthecomputationalcostoftheapproachisdependentonthenumberofpreferencepairsintrainingwithscalingexponentabout22,whereastheoverheadofouralgorithmisalmostindependentofthenumberoftrainingprefer-encepairs.AswehavediscussedinSection2.4,thecomplexityofouralgorithmismainlydependentonthenumberofdistinctinstancesinvolvedinthetrain-ingdata.Sincethenumberofpairwisepreferencesfortrainingisusuallymuchlargerthanthenumberofinstances,thecomputationaladvantageisoneofthemeritsofouralgorithmoverthe-likeal-4.2.ClassiÞcationNext,weselected“vebenchmarkdatasetsformulti-classclassi“cationusedbyWuetal.(2004)andap-pliedboththeandouralgorithmonthesetasks.Allthedatasetscontain300trainingsamplesand500testsamples.Thepartitionswererepeated20timesforeachdataset.Thenumberofclassesandfeaturesofthe“vedatasetsarerecordedinTable2de-notedbyrespectively.Inouralgorithm,thepreferencegraphofeachtrainingsamplecontainsedgesasdepictedinFigure1(a),andthepredictiveclasscanbedeterminedbyargmaxmaxfa(xt)]wherewherefa(xt)]isde“nedasin(26).Intherithm,eachtrainingsampleistransformedto2(newsamplesthatrepresentthe1pairwiseprefer-ences(Har-Peledetal.,2002).TheGaussiankernel(2)wasusedforboththeandouralgorithm.WereportthetestresultsinTable2,alongwiththeresultsofSVMwithpairwisecouplingcitedfromtheTable2ofWuetal.(2004).OurGPapproachareverycompetitivewithpairwisecouplingSVMandthe Theseclassi“cationdatasetsaremaintainedatwww.csie.ntu.edu.tw/cjlin/papers/svmprob/data.Table2.Testresultsonthe“vedatasetsforstandardmul-ticlassclassi“cation.LŽisthenumberofclassesanddŽdenotesthenumberofinputfeatures.LabelErrorRateŽdenotesthepercentofincorrectpredictionsonclasslabelsaveragedover20trials.PrefErrorRateŽdenotesthepercentofincorrectpredictionsonpreferencerelationsav-eragedover20trialsalongwithstandarddeviation.PWŽdenotestheresultsofSVMwithpairwisecouplingcitedfromWuetal.(2004).CC-SVMŽandGPŽdenotestheCC-SVMandouralgorithmusingtheGaussiankernel.Weuseboldfacetoindicatethelowesterrorrate.ThesymbolsindicatethecasesofCC-SVMsigni“cantlyworsethanthatofGP;Ap-valuethresholdof001inWilcoxonranksumtestwasusedtodecidethestatisticalsigni“cance. LabelErrorRate(%) PrefErrorRate(%) Dataset PWCC-SVMGP CC-SVMGP 10.8510.67 Waveform 16.2316.76 Satimage 14.2315.21 4.84 13.3012.13 3.20 algorithmonclasslabelprediction,andsig-ni“cantlybetterthanthealgorithminpref-erencepredictionontwoofthe“vedatasets.4.3.LabelRankingTotestonthelabelrankingtasks,weusedthedecision-theoreticsettingsrelatedtoexpectedutil-itytheorydescribedbyF¨urnkranzandH¨(2003).AnagentattemptstotakeoneactionfromasetofalternativeactionswiththepurposeofmaximizingtheexpectedutilityundertheuncertaintyoftheworldstatesTheexpectedutilityofactisgivenbywheretheprobabilityofstatetheutilityofactinginthestatetate,1].Inourexperiment,thesetofsamplescorrespondingtothesetofprobabilityvectors,wererandomlygeneratedaccordingtoauniformdistributionover.We“xedthenumberofworldstates/features=10andthenumberofsam-=50,butvariedthenumberofactions/labelsfrom2to10.Theutilitymatrixwasgeneratedatrandombydrawingindependentlyanduniformlydis-tributedentriestries,1].Ateachlabelsize,weindependentlyrepeatedthisprocedure20times.Thetwoalgorithmsemployedthelinearkerneltolearntheunderlyingutilitymatrix.Inouralgorithm,theprefer-encegraphofeachtrainingsamplecontainsedgesandthelabelpreferencefortestsampleswasde-cidedby(26).Inthealgorithm,eachtrainingsamplewastransformedto1)newsampleswithaugmentedfeatures.ThepreferencetestratesandaveragedSpearmanrankcorrelationsarepresentedinFigure3.Therankcorrelationcoecientforeachtest PreferenceLearningwithGaussianProcesses 4 6 8 10 0 0.02 0.04 0.06 0.08 0.1 Preference Test Error RateNumber of Labels CCSVM 4 6 8 10 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Rank CorrelationNumber of Labels Figure3.Theleftgraphpresentsthepreferencetestratesofthetwoalgorithmsonthelabelrankingtaskswithdier-entnumberoflabels,whiletherightpresentstheaveragedrankcorrelationcoecients.Themiddlecrossesindicatetheaveragevaluesoverthe20trialsandtheverticallinesindicatethestandarddeviations.caseisde“nedas1 isthetruerankandisthepredictiverank.Onthisapplication,ourGPalgorithmisclearlysuperiortotheapproachongeneralizationcapacity,especiallywhenthenumberoflabelsbecomeslarge.Thepotentialreasonforthisobservationmightbethatlearningwithinthe-dimensionalaugmentedinputspacebecomesmuchharder.5.ConclusionsInthispaperweproposedanonparametricBayesianapproachtopreferencelearningoverinstancesorla-bels.Theformulationoflearninglabelpreferenceisalsoapplicabletomanymulticlasslearningtasks.Inbothformulations,theproblemsizeremainslin-earwiththenumberofdistinctsamplesinthetrain-ingpreferencepairs.TheexistentfastalgorithmsforGaussianprocessescanbeadaptedtotacklelargedatasets.Experimentalresultsonbenchmarkdatasetsshowthegeneralizationperformanceofouralgorithmiscompetitiveandoftenbetterthantheconstraintclassi“cationapproachwithsupportvectormachines.AcknowledgmentsThisworkwassupportedbytheNationalInstitutesofHealthanditsNationalInstituteofGeneralMedicalSci-encesdivisionunderGrantNumber1P01GM63208.Aiolli,F.,&Sperduti,A.(2004).Learningpreferencesformulticlassproblems.AdvancesinNeuralInformationProcessingSystems17Bahamonde,A.,Bay´on,G.F.,D´šez,J.,Quevedo,J.R.,Lu-aces,O.,delCoz,J.J.,Alonso,J.,&Goyache,F.(2004).Featuresubsetselectionforlearningpreferences:Acasestudy.Proceedingsofthe21thInternationalConferenceonMachineLearning(pp.49…56).Brinker,K.(2004).Activelearningoflabelrankingfunc-Proceedingsofthe21thInternationalConferenceonMachineLearning(pp.129…136).Chu,W.,&Ghahramani,Z.(2004).Gaussianprocessesforordinalregression(TechnicalReport).GatsbyCompu-tationalNeuroscienceUnit,UniversityCollegeLondon.http://www.gatsby.ucl.ac.uk/chuwei/paper/gpor.pdf.o,L.,&Opper,M.(2002).Sparseonlineprocesses.NeuralComputation,641…668.Dekel,O.,Keshet,J.,&Singer,Y.(2004).Log-linearmod-elsforlabelranking.Proceedingsofthe21stInterna-tionalConferenceonMachineLearning(pp.209…216).Doyle,D.(2004).Prospectsofpreferences.Intelligence,111…136.Fiechter,C.-N.,&Rogers,S.(2000).Learningsubjectivefunctionswithlargemargins.Proc.17thInternationalConf.onMachineLearning(pp.287…294).urnkranz,J.,&H¨ullermeier,E.(2003).Pairwiseprefer-encelearningandranking.Proceedingsofthe14thEu-ropeanConferenceonMachineLearning(pp.145…156).urnkranz,J.,&H¨ullermeier,E.(2005).Preferencelearn-unstlicheIntelligenz.inpress.Har-Peled,S.,Roth,D.,&Zimak,D.(2002).Constraintclassi“cation:Anewapproachtomulticlassclassi“-cationandranking.AdvancesinNeuralInformationProcessingSystems15Herbrich,R.,Graepel,T.,Bollmann-Sdorra,P.,&Ober-mayer,K.(1998).Learningpreferencerelationsforinfor-mationretrieval.Proc.ofWorkshopTextCategorizationandMachineLearning,ICML(pp.80…84).Hersh,W.,Buckley,C.,Leone,T.,&Hickam,D.(1994).Ohsumed:Aninteractiveretrievalevaluationandnewlargetestcollectionforresearch.Proceedingsofthe17thAnnualACMSIGIRConference(pp.192…201).Lawrence,N.D.,Seeger,M.,&Herbrich.,R.(2002).Fastaussianprocessmethods:Theinformativevec-tormachine.AdvancesinNeuralInformationProcessingSystems15(pp.609…616).MacKay,D.J.C.(1994).Bayesianmethodsforbackprop-agationnetworks.ModelsofNeuralNetworksIII,211…McCallum,A.K.(1996).Bow:Atoolkitforstatisticallan-guagemodeling,textretrieval,classi“cationandcluster-ing.http://www.cs.cmu.edu/mccallum/bow.Neal,R.M.(1996).BayesianlearningforneuralnetworksLectureNotesinStatistics.Springer.Sch¨olkopf,B.,&Smola,A.J.(2002).Learningwithker-.Cambridge,MA:TheMITPress.Williams,C.K.I.,&Barber,D.(1998).ayesianclassi“-cationwithaussianprocesses.IEEETrans.onPatternAnalysisandMachineIntelligence,1342…1351.Williams,C.K.I.,&Rasmussen,C.E.(1996).processesforregression.AdvancesinNeuralInformationProcessingSystems(pp.598…604).MITPress.Wu,T.-F.,Lin,C.-J.,&Weng,R.C.(2004).Probabilityestimatesformulti-classclassi“cationbypairwisecou-JournalofMachineLearningResearch,975…

Related Contents


Next Show more