/
TighterVariationalRepresentationsoff-Divergences TighterVariationalRepresentationsoff-Divergences

TighterVariationalRepresentationsoff-Divergences - PDF document

briana-ranney
briana-ranney . @briana-ranney
Follow
378 views
Uploaded On 2016-03-16

TighterVariationalRepresentationsoff-Divergences - PPT Presentation

1986fordetailsForfunctionsfXRdenedoveraBanachspaceXtheFenchelorconvexdualfXRofafunctionisdenedoverthedualspaceXbyfxsupx2XhxxifxwherehiisthedualpairingofXandXInnitedimens ID: 257885

1986)fordetails.Forfunctionsf:X!RdenedoveraBanachspaceXthe(Fenchelorconvex)dualf?:X?!RofafunctionisdenedoverthedualspaceX?byf?(x?):=supx2Xhx?;xif(x)whereh;iisthedualpairingofXandX?.Innitedi-mens

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "TighterVariationalRepresentationsoff-Div..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

TighterVariationalRepresentationsoff-Divergences 1986)fordetails.Forfunctionsf:X!RdenedoveraBanachspaceXthe(Fenchelorconvex)dualf?:X?!RofafunctionisdenedoverthedualspaceX?byf?(x?):=supx2Xhx?;xi�f(x)whereh;iisthedualpairingofXandX?.Innitedi-mensionalspacessuchasRdthedualspaceisalsoRdandh;iistheusualinnerproduct.Thebidualf??:X!Roffisjustthedualoff?(restrictedtoX),thatis,f??(x)=supx?2X?hx;x?i�f?(x?).Forconvex,lowersemi-continuous(l.s.c.)functionsfthebidualistheidentitytransformation,thatis,f??=f.Asseenbelow,thisfactformsthebasisofmanyvari-ationalrepresentationsofoperatorsdenedbyconvexfunctions.1Wemakeuseofafewspecic,one-dimensionalin-stancesofduals.Specically,whenf(t)=jt�1jwehavef?(t?)=t?fort?2[�1;1];andwhenf(t)=�lntfort�0wehavef?(t?)=�1�lnt?fort?0.Furtherdetailsandpropertiesofconvexdualscanbefoundintextsonconvexanalysis,e.g.,(Hiriart-Urruty&Lemaréchal,2001).Oneobviousbutimportantpropertywemakeuseofisthattherestrictionofasupremumleadstosmalleroptima.Specically,ifX0Xthensupx2X0(x)supx2X(x).Whenappliedtoconvexduals,thismeansifRX?isarestrictionofthedualspacethenf(x)fR(x):=supx?2Rhx?;xi�f?(x?)forallx2X:(1)1.2.VariationalApproximationsoff-divergencesAnf-divergenceisdenedviaaconvexfunctionf:[0;1)!Rsatisfyingf(1)=0.Givensuchafunc-tion,thef-divergencefromanitemeasurePtoadistributionQdenedonacommonspaceXisde-ned2asIf(P;Q):=EQfdP dQ=XfdP dQ(x)dQ(x)ifPQand+1otherwise.Wewillrefertothedenitionaboveasthegeneral(orunrestricted)f-divergenceincontrasttotherestrictedf-divergencethatisonlydenedwhenPandQarebothprob-abilitydistributions.Whennecessary,therestricted 1Fornitedimensionalspaces,thisis,insomesense,theonlydualthatcanbeusedforthekindofvariationalrepresentationsweareinterestedin(see(Artstein-Avidan&Milman,2009)fordetails).2ThechoiceoforderoftheargumentsPandQisar-bitraryandotherauthors,notably(Nguyenetal.,2010),denef-divergenceintermsofdQ=dP.f-divergencewillbedistinguishedbyasuperscriptR:IRf(P;Q).Weemphasisethisdistinctiontolatershowhowatightervariationalrepresentationcanbeobtainedfromexplicitlytakingintoaccountthere-striction.Severalcommondivergencesaremembersofthisclass:thevariationaldivergenceisobtainedbychoosingf(t)=jt�1j,Hellingerdivergenceviaf(t)=p t2�1,andtheKLdivergenceviaf(t)=tlnt(see,e.g.,(Reid&Williamson,2011)).Fortechnicalreasonswealsorequireftobelowersemi-continuous.Allf-divergencesdiscussedaboveandusedinpracticesatisfythiscondition.Asin(Altun&Smola,2006;Barbu&Precu-panu,1986;Broniatowski&Keziou,2009),wenowwishtoconsiderf-divergencesasactingoverspacesoffunctions.GivenameasureoverX(withsome-algebra),thenormskgk1:=Xjgjdandkgk1:=inffK0:jg(x)jKfor-almostallxgcanbeusedtodenethespaceofabsolutelyin-tegrablefunctionsL1():=fg:X!R:kgk11ganditsdualspaceL1():=fg:X!R:kgk11goffunctionswithboundedessentialsupremum.Thespaceofprobabilitydensitiesw.r.t.willbedenoted():=g2L1():g0;kgk1=1 .Onnitedo-mainsX=fx1;:::;xng,thespaceofdensitieswillbedenotedn.Ageneralf-divergencecanbeseenasactingonL1(Q)bydeningIf;Q(r):=EQ[f(r)]forallr2L1(Q).Therestrictedf-divergenceisthenjustIRf;Q(r):=If;Q(r)whenr2(Q)and+1otherwise. Figure1.IllustrationofTheorem1.Thedashedlineandsolidlinerepresentournewexpressionandtheexpressionusedby(Nguyenetal.,2010)respectivelyastheyvaryoverL1(Q).Whilebothexpressionshavethesamesupremum,everywhereelseoursisclosertothesupremum.Asshownbelow,thesefunctionsareconvexandlowersemi-continuousandthereforeadmitdualrepresenta-tions.Asexploredin(Nguyenetal.,2010;Altun&Smola,2006)andinsection2below,variationalrepre-sentationssuchasthesereadilyadmitapproximation TighterVariationalRepresentationsoff-Divergences intoaccounttheregulariser ,i.e.,E(Pn;Qn):=suph2HnEPn[h]��IRf;Qn?(h)� (h)o:(6)ThefollowingresultgivesanexplicitdualoptimisationprogramforcomputingEintheRKHSH.Theorem2.LetHbeanRKHSoffunctionsoverXwithassociatedfeaturemap.ThentheestimatorE(Pn;Qn)satises,forallPnandQnE(Pn;Qn)=min 2n(1 nnXi=1f(n i)+ ? 1 nnXi=1(xi)�1 nnXi=1n i(yi)!)(7)wheretheminimisationisoverthen-simplexn.Proof.Theprooftechniquesherearebasedonthosein(Nguyenetal.,2007).SinceHisaRKHS,wecanrepresenteachfunctionh2Hbyh(x)=hw;(x)iforx2XwhereisthefeaturemapcorrespondingtoK.Inthiscase,theestimatorin(6)isgivenbysupw(1 nnXi=1hw;(xi)i��IRf;Qn?(hw;()i)� (w)):Letting (w):=�1 nPni=1hw;(xi)iand'(w):=IRf;Qn?(hw;()i)andsubstitutingintotheaboveexpressiongivessupwfhw;0i�( (w)+'(w)+ (w))g=( +'+ )?(0)bythedenitionofadual.Bytheinmalconvolutiontheorem(Rockafellar,1997)wethereforehaveE(Pn;Qn)=mins;rf ?(s)+'?(r)+ ?(�s�r)g:(8)Now,since islinearinwitsdualissimply ?(s)=(0;ifs=�1 nPni=1(xi)+1,otherwise:Tocomputethedualof'weobservethatIRf;Qn?(h)onlydependsonthevaluesofhaty1;:::;ynsowecanwrite'?(r)=supwnhw;ri��IRf;Qn?(hw;()i)o=supw; ;hnhw;ri��IRf;Qn?(h)�nXi=1 (yi)(hw;(yi)i�h(yi)))=supw; ;hnhw;ri��IRf;Qn?(h)�*w;nXi=1 (yi)(yi)++1 nnXi=1n (yi)h(yi))bytheintroductionofLagrangemultipliers (yi)fortheconstraintsh(yi)=hw;(yi)i.Notingthat1 nPni=1n (yi)h(yi)=EQn[n h]weget'?(r)=supw; ;hnEQn[n h]��IRf;Qn?(h)�*w;r�nXi=1 i(yi)+)=sup suphnEQn[n h]��IRf;Qn?(h)o+supw(*w;nXi=1 i(yi)�r+))=sup (IRf;Qn(n ):nXi=1 i(yi)=r)sincetherstinnersupremumisthebidualofIRf;Qnandthesecondsuppliestheconstraint.Furthermore,since(yi)arelinearlyindependent,eachruniquelydetermines aty1;:::;ynso'?(r)=IRf;Qn(n ).Sub-stituting ?,'?,andthecorrespondingconstraintsonsandrbackintotheminimisation(8)andnotingthatIRf;Qnis+1forn =2(Qn)wngivestherequiredresult:E(Pn;Qn)=min 2n(1 nnXi=1f(n i)+ ? 1 nnXi=1(xi)�1 nnXi=1(n i)(yi)!): WenotethatE(Pn;Qn)isnotadirectestimateofIf(P;Q)duetotheinclusionoftheregularisationterm (h).However,'?(r)=IRf;Qn(n )=1 nPni=1f(n i)canbeusedasanempiricalestimateofIf(P;Q)oncethevaluesof iareobtainedand^r=n canbeseen TighterVariationalRepresentationsoff-Divergences ˆWangetal(Wangetal.,2009):Thisestimatorisbasedonnearestneighbourestimatesofthetwodensitiesanddoesnotmakeuseofavariationalrepresentation.ˆKanamorietal(Kanamorietal.,2009):Thisisaleast-squaresestimatorforthedensityratio,bypassingindividualdensityestimations.Oncethedensityratioisestimated,itcanbedirectlypluggedinthef-divergenceformulae.Wealsoex-perimentedwithanotherdensityratioestimationmethod(Sugiyamaetal.,2008),withverysimilarresults.ˆGarcíaetal(García-Garcíaetal.,2011):Thisestimatorusesnearestneighbourmisclassica-tionratesandareformulationoff-divergencesintermsofrisks.3.1.MethodBothourestimatorbasedon(9)andtheM2estimatorofNWJwereimplementedusingthenonlinearconvexoptimisationroutinefromthepythonpackageCvx-opttoperformtheoptimization.Theimplementa-tionoftheWangetal.(Wangetal.,2009)estimator(henceforthWKV)wasbasedonthecKDTreenearestneighbourroutinefromtheSciPylibrary.Kanamorietal4.(uLSIF)andGarcíaetal5((f;l))algorithmswereimplementedusingcodeprovidedbytherespec-tiveauthors6.Themethodforchoosingtheparame-tersnandfortheNWJestimatorarenotspeciedin(Nguyenetal.,2010).ForbothNWJandoures-timator,wethereforesetn=1 n(asdiscussedabove)andsettothesamplevarianceoverXn[Yntoensureinvariancewithrespecttorescalingofthedata.Ineveryexperiment,thedistributionsPandQweresettobetadistributionsB( ; )forsomechoiceofparameters ; �0.BetadistributionswerechosenastheycoverawidevarietyofshapesandhaveaKLdivergencewiththefollowinganalyticformKL(B( 1; 1);B( 2; 2))=lnB( 2; 2) B( 1; 1)�d ( 1)�d ( 1)+(d +d ) ( 1+ 1)whered = 2� 1and (v)=�0(v) �(v)isthedigammafunction.ForeachchoiceofPandQ,a1-dimensionalanda10-dimensionalexperimentwasperformed.Inthe1-dexperiment,samplesofn=100values,X100 4http://sugiyama-www.cs.titech.ac.jp/~sugi/software/uLSIF5http://www.tsc.uc3m.es/~dggarcia/code.html6Allcodehasbeensubmittedassupplementarymate-rial.andY100,wereeachdrawni.i.d.fromPandQrespec-tively.Inthe10-dexperiment,eachx2X100R10andy2Y100R10wasdrawni.i.d.fromthere-spectiveproductdistributionsPQ9i=1N(0;0:01)andQQ9i=1N(0;0:01).Thisgivessamplesfromtwodis-tributionsembeddedina10-dimensionalspacewhereallbutoneofthedimensionsiszeromeanGaussiannoise.TheKLdivergencesforthe10-dproductdistri-butionsforeachchoiceofPandQarethesameasforthe1-dcase,thatis,KL(P;Q).ThespecicPandQintheexperimentswerechosentogivearangeofdif-ferentKLdivergencevaluesandexploreafewdierentpairingsofdistributionalshapes.3.2.ResultsTable1summarisestheapplicationofallveesti-matorsover250runsofthe1-d(oddrows)and10-dexperiments(evenrows)forvariouschoicesofPandQshownintherstcolumn.ThepairsofrowsareorderedinincreasingvalueoftrueKLdivergence(showninthesecondcolumn)andisthesameforbothrows.ThetableliststhedivergenceestimatesaveragedoverthedierentrunsaswellastheempiricalMeanSquaredError(MSE).TheboldvaluesfortheMSEcorrespondtothelowestamongstthedierentesti-mators.WheretheMSEofourestimatororthatofNWJisstrictlylowerthantheother,wehaveitalicisedtheMSE.Thelastthreecolumnsareingreyastheyarenotthemainpointoftheexperiments.3.3.DiscussionForthemostpartourestimatorperformsbetterintermsofMSEthanthatoftheNWJ.Whenthetruedivergenceislarge,thedierenceisespeciallypro-nounced.ThisisunsurprisingasourestimatorisbasedonatighterboundforthedivergenceasThe-orem1shows.Forsmalldivergencesthedierenceissmallersinceroughlyspeaking,0NWJOursKL(P;Q).Thus,forsmalldivergencestheestimatorsmustnecessarilyreturnsimilarvalues.Incontrast,thenearestneighbour-basedmethods(WKVand(f;l))behaveverydierentlytovariationalestimators.Ingeneral,theirbiasissignicantlylowerthanbothvariationalmethodswhentherealdiver-genceislarge.Thisisanaturalconclusion,sincethevariationalmethodspresentedhereareintrinsi-callylowerboundsoftherealdivergence.Finally,wenotethatuLSIFdoesnotperformaswellastheothermethods.ThisistobeexpectedasuLSIFisprimar-ilydesignedfordensityratioestimationwhiletherestofthemethodsarederivedspecicallyfordivergenceestimation. TighterVariationalRepresentationsoff-Divergences f-divergenceestimation(hypothesistestingandsta-tisticalinference).Wealsoplantondgeneralcon-ditionsunderwhichconsistencyofourfamilyofesti-matorsholds.Theworkof(Nguyenetal.,2010)hasalreadypavedthewayforthisinvestigation.Failingthat,theverygeneralconsistencyresultsof(Altun&Smola,2006)forsinglesampledivergenceestimationmayalsobeamenabletotheanalysisofourestimator.Theperformanceofourestimatorondistributionsonlowdimensionalmanifoldssuggeststhatitwouldbeworthtestingondomainsinvolvingaudioorimages.ItwouldalsobeinterestingtoapplyourmethodforestimatingdivergencesotherthanKL.Forinstancewecouldstudy -divergenceestimationasin(Poczos&Schneider,2011).ReferencesAli,S.M.andSilvey,S.D.Ageneralclassofcoecientsofdivergenceofonedistributionfromanother.JournaloftheRoyalStatisticalSociety.SeriesB(Methodological),28(1):131142,1966.Altun,Y.andSmola,A.Unifyingdivergenceminimizationandstatisticalinferenceviaconvexduality.Learningtheory,pp.139153,2006.Artstein-Avidan,S.andMilman,V.Theconceptofdualityinconvexanalysis,andthecharacterizationoftheleg-endretransform.Annalsofmathematics,169:661674,2009.Banerjee,A.OnBayesianbounds.InProceedingsofthe23rdInternationalConferenceonMachinelearning,pp.8188.ACMPressNewYork,NY,USA,2006.Barbu,V.andPrecupanu,T.ConvexityandoptimizationinBanachspaces,volume10.Springer,1986.Broniatowski,MichelandKeziou,Amor.Parametrices-timationandteststhroughdivergencesandthedualitytechnique.JournalofMultivariateAnalysis,100(1):1636,2009.Csiszár,I.Information-typemeasuresofdierenceofprobabilitydistributionsandindirectobservations.Stu-diaScientiarumMathematicarumHungarica,2:299318,1967.Dembo,A.andZeitouni,O.Largedeviationstechniquesandapplications,volume38.SpringerVerlag,2009.Donsker,MDandVaradhan,SRS.Asymptoticevaluationofcertainmarkovprocessexpectationsforlargetime.iv.CommunicationsonPureandAppliedMathematics,36(2):183212,1983.García-García,D.,vonLuxburg,U.,andSantos-Rodríguez,R.Risk-basedgeneralizationsoff-divergences.InProceedingsoftheInternationalCon-ferenceonMachinelearning,2011.Gretton,A.,Borgwardt,K.M.,Rasch,M.,Schölkopf,B.,andSmola,A.J.AKernelMethodfortheTwo-Sample-Problem.JournalofMachineLearningResearch,2008.Hiriart-Urruty,Jean-BaptisteandLemaréchal,Claude.FundamentalsofConvexAnalysis.Springer,Berlin,2001.Kanamori,T.,Hido,S.,andSugiyama,M.Aleast-squaresapproachtodirectimportanceestimation.JournalofMachineLearningResearch,10:13911445,2009.Kanamori,T.,Suzuki,T.,andSugiyama,M.f-divergenceestimationandtwo-samplehomogeneitytestundersemi-parametricdensity-ratiomodels.InformationTheory,IEEETransactionson(Toappear),2011.Nguyen,X.,Wainwright,M.J.,andJordan,M.I.Estimat-ingdivergencefunctionalsandthelikelihoodratiobyconvexriskminimization.Technicalreport,DepartmentofStatistics,UCBerkeley,2007.Nguyen,X.,Wainwright,M.J.,andJordan,M.I.Onsurro-gatelossfunctionsandf-divergences.AnnalsofStatis-tics,37:876904,2009.Nguyen,X.L.,Wainwright,M.J.,andJordan,M.I.Es-timatingdivergencefunctionalsandthelikelihoodratiobyconvexriskminimization.InformationTheory,IEEETransactionson,56(11):58475861,2010.Poczos,BarnabasandSchneider,Je.Ontheestimationofalpha-divergences.InAISTATS2011,2011.Reid,MarkD.andWilliamson,RobertC.Information,divergenceandriskforbinaryexperiments.JournalofMachineLearningResearch,12:731817,March2011.Rockafellar,R.T.Convexanalysis,volume28.PrincetonUnivPr,1997.Sugiyama,M.,Suzuki,T.,Nakajima,S.,Kashima,H.,vonBunau,P.,andKawanabe,M.Directimportanceesti-mationforcovariateshiftadaptation.AnnalsoftheIn-stituteofStatisticalMathematics,60(4):699746,2008.Wang,Q.,Kulkarni,S.R.,andVerdú,S.Divergencees-timationformultidimensionaldensitiesviak-nearest-neighbordistances.InformationTheory,IEEETrans-actionson,55(5):23922405,2009.