/
LearningMonotonicTransformationsforClassication LearningMonotonicTransformationsforClassication

LearningMonotonicTransformationsforClassi cation - PDF document

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
399 views
Uploaded On 2015-09-08

LearningMonotonicTransformationsforClassi cation - PPT Presentation

AndrewGHowardDepartmentofComputerScienceColumbiaUniversityNewYorkNY10027ahowardcscolumbiaeduTonyJebaraDepartmentofComputerScienceColumbiaUniversityNewYorkNY10027jebaracscolumbiaeduAbstractAdi ID: 124729

AndrewG.HowardDepartmentofComputerScienceColumbiaUniversityNewYork NY10027ahoward@cs.columbia.eduTonyJebaraDepartmentofComputerScienceColumbiaUniversityNewYork NY10027jebara@cs.columbia.eduAbstractAdi

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "LearningMonotonicTransformationsforClass..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

LearningMonotonicTransformationsforClassi cation AndrewG.HowardDepartmentofComputerScienceColumbiaUniversityNewYork,NY10027ahoward@cs.columbia.eduTonyJebaraDepartmentofComputerScienceColumbiaUniversityNewYork,NY10027jebara@cs.columbia.eduAbstractAdiscriminativemethodisproposedforlearningmonotonictransforma-tionsofthetrainingdatawhilejointlyestimatingalarge-marginclassi er.Inmanydomainssuchasdocumentclassi cation,imagehistogramclassi -cationandgenemicroarrayexperiments, xedmonotonictransformationscanbeusefulasapreprocessingstep.However,mostclassi ersonlyexplorethesetransformationsthroughmanualtrialanderrororviapriordomainknowledge.Theproposedmethodlearnsmonotonictransformationsauto-maticallywhiletrainingalarge-marginclassi erwithoutanypriorknowledgeofthedomain.Amonotonicpiecewiselinearfunctionislearnedwhichtransformsdataforsubsequentprocessingbyalinearhyperplaneclassi er.Twoalgorithmicimplementationsofthemethodareformalized.The rstsolvesaconvergentalternatingsequenceofquadraticandlinearprogramsuntilitobtainsalocallyoptimalsolution.Animprovedalgorithmisthenderivedusingaconvexsemide niterelaxationthatovercomesinitializa-tionissuesinthegreedyoptimizationproblem.Thee ectivenessoftheselearnedtransformationsonsyntheticproblems,textdataandimagedataisdemonstrated.1IntroductionMany eldshavedevelopedheuristicmethodsforpreprocessingdatatoimproveperfor-mance.Thisoftentakestheformofapplyingamonotonictransformationpriortousingaclassi cationalgorithm.Forexample,whenthebagofwordsrepresentationisusedindocumentclassi cation,itiscommontotakethesquarerootofthetermfrequency[6,5].Monotonictransformsarealsousedwhenclassifyingimagehistograms.In[3],transforma-tionsoftheformxawhere0a1aredemonstratedtoimproveperformance.Whenclassifyinggenesfromvariousmicroarrayexperimentsitiscommontotakethelogarithmofthegeneexpressionratio[2].Monotonictransformationscanalsocapturecrucialpropertiesofthedatasuchasthresholdandsaturatione ects.Inthispaper,weproposetosimultaneouslylearnahyperplaneclassi erandamonotonictransformation.Thesolutionproducedbyouralgorithmisapiecewiselinearmonotonicfunctionandamaximummarginhyperplaneclassi ersimilartoasupportvectormachine(SVM)[4].Byallowingforaricherclassoftransformslearnedattrainingtime(asopposedtoaruleofthumbappliedduringpreprocessing),weimproveclassi cationaccuracy.Thelearnedtransformisspeci callytunedtotheclassi cationtask.Themaincontributionsofthispaperinclude,anovelframeworkforestimatingamonotonictransformationandahyperplaneclassi ersimultaneouslyattrainingtime,anecientmethodfor ndinga ,1 n x ,2 n x , nD x n y b 1 w 2 w D w Figure1:Monotonictransformappliedtoeachdimensionfollowedbyahyperplaneclassi er.locallyoptimalsolutiontotheproblem,andaconvexrelaxationto ndagloballyoptimalapproximatesolution.Thepaperisorganizedasfollows.Insection2,wepresentourformulationforlearningapiecewiselinearmonotonicfunctionandahyperplane.Weshowhowtolearnthiscombinedmodelthroughaniterativecoordinateascentoptimizationusinginterleavedquadraticandlinearprogramsto ndalocalminimum.Insection3,wederiveaconvexrelaxationbasedonLasserre'smethod[8].Insection4syntheticexperimentsaswellasdocumentandimageclassi cationproblemsdemonstratethediverseutilityofourmethod.Weconcludewithadiscussionandfuturework.2LearningMonotonicTransformationsForanunknowndistributionP(~x;y)overinputs~x2dandlabelsy2f1;1g,weassumethatthereisanunknownnuisancemonotonictransformation(x)andunknownhyperplaneparameterizedby~wandbsuchthatpredictingwith(x)=sign(~wT(~x)+b)yieldsalowexpectedtesterrorRR1 2jy(x)jdP(~x;y).Wewouldliketorecover(~x);~w;bfromalabeledtrainingsetSf(~x1;y1);:::;(~xN;yN)gwhichissampledi.i.d.fromP(~x;y).ThetransformationactselementwiseascanbeseeninFigure1.Weproposetolearnbothamaximummarginhyperplaneandtheunknowntransform(x)simultaneously.Inourformulation,(x)isapiecewiselinearfunctionthatweparameterizewithasetofKknotsfz1;:::;zKgandassociatedpositiveweightsfm1;:::;mKgwherezj2andmj2+.Thetransformationcanbewrittenas(x)=PKj=1mjj(x)wherej(x)aretruncatedrampfunctionsactingonvectorsandmatriceselementwiseasfollows:j(x)=0xzjzj zj+1zjzjxzj+11zj+1x(1)Thisisalesscommonwaytoparameterizepiecewiselinearfunctions.Thepositivitycon-straintsenforcemonotonicityon(x)forallx.Amorecommonmethodistoparameterizethefunctionvalue(z)ateachknotzandapplyorderconstraintsbetweensubsequentknotstoenforcemonotonicity.Valuesinbetweenknotsarefoundthroughlinearinterpolation.Thisisthemethodusedinisotonicregression[10],butinpractice,theseareequivalentformulations.Usingtruncatedrampfunctionsispreferablefornumerousreasons.Theycanbeeasilyprecomputedandaresparse.Onceprecomputed,mostcalculationscanbedoneviasparsematrixmultiplications.Thepositivityconstraintsontheweights~mwillalsoyieldasimplerformulationthanorderconstraintsandinterpolationwhichbecomesimportantinsubsequentrelaxationsteps.Figure2ashowsthetruncatedrampfunctionassociatedwithknotz1.Figure2bshowsaconiccombinationoftruncatedrampsthatbuildsapiecewiselinearmonotonicfunction.Combiningthiswiththesupportvectormachineformulationleadsustothefollowinglearn-ingproblem: z1 z2 0 0.2 0.4 0.6 0.8 1 z1 z2 z3 z4 z5 m1 m1+m2 m1+m2+m3 m1+m2+m3+m4 m1+m2+m3+m4+m5 a)Truncatedrampfunction1(x).b)(x)=P5j=1mjj(x).Figure2:Buildingblocksforpiecewiselinearfunctions.min~w;~;b;~mk~wk22CNXi=1i(2)subjecttoyi0@*~w;KXj=1mjj(~xi)+b1A1iii0;mj0;Xjmj1i;jwhere~arethestandardSVMslackvariables,~wandbarethemaximummarginsolutionforthetrainingsetthathasbeentransformedvia(x)withlearnedweights~m.Beforetraining,theknotlocationsarechosenattheempiricalquantilessothattheyareevenlyspacedinthedata.Thisproblemisnonconvexduetothequadraticterminvolving~wand~mintheclassi cationconstraints.Althoughitisdicultto ndagloballyoptimalsolution,thestructureoftheproblemsuggestsasimplemethodfor ndingalocallyoptimalsolution.Wecandividetheproblemintotwoconvexsubproblems.Thisamountstosolvingasupportvectormachinefor~wandbwitha xed(x)andalternativelysolvingfor(x)asalinearprogramwiththeSVMsolution xed.Inbothsubproblems,weoptimizeover~asitispartofthehingeloss.Thisyieldsanecientconvergentoptimizationmethod.However,thismethodcangetstuckinlocalminima.Inpractice,weinitializeitwithalinear(x)anditeratefromthere.Alternativeinitializationsdonotyieldmuchhelp.Thisleadsustolookforamethodtoeciently ndglobalsolutions.3ConvexRelaxationWhenfacedwithanonconvexquadraticproblem,anincreasinglypopulartechniqueistorelaxitintoaconvexone.Lasserre[8]proposedasequenceofconvexrelaxationsforthesetypesofnonconvexquadraticprograms.Thismethodreplacesallquadratictermsintheoriginaloptimizationproblemwithentriesinamatrix.Initssimplestformthismatrixcorrespondstotheouterproductofthetheoriginalvariableswithrankoneandsemide niteconstraints.Therelaxationcomesfromdroppingtherankoneconstraintontheouterproductmatrix.Lasserreproposedmoreelaboraterelaxationsusinghigherordermomentsofthevariables.However,wemainlyusethe rstmomentrelaxationalongwithafewofthesecondordermomentconstraintsthatdonotrequireanyadditionalvariablesbeyondtheouterproductmatrix.Aconvexrelaxationcouldbederiveddirectlyfromtheprimalformulationofourproblem.Both~wand~mwouldberelaxedastheyinteractinthenonconvexquadraticterms.Un- fortunately,thisyieldsasemide niteconstraintthatscaleswithboththenumberofknotsandthedimensionalityofthedata.Thisistroublesomebecausewewishtoworkwithhighdimensionaldatasuchasabagofwordsrepresentationfortext.However,ifwe rst ndthedualformulationfor~w,b,and~,weonlyhavetorelax~mwhichyieldsbothatighterrelaxationandalesscomputationallyintensiveproblem.Findingthedualleavesuswiththefollowingminmaxsaddlepointproblemthatwillbesubsequentlyrelaxedandtransformedintoasemide niteprogram:min~mmax~ 2~ T~1~ T0@Y0@Xi;jmimji(X)Tj(X)1AY1A~ (3)0 iC;~ T~y=0;mj0;Xjmj1i;jwhere~1isavectorofones,~yisavectorofthelabels,Y=diag(~y)isamatrixwiththelabelsonitsdiagonalwithzeroselsewhere,andXisamatrixwith~xiintheithcolumn.WeintroducetherelaxationviathesubstitutionM=mmTandconstraintM0wheremisconstructedbyconcatenating1with~m.Wecanthentransformtherelaxedminmaxproblemintoasemide niteprogramsimilartothemultiplekernellearningframework[7]by ndingthedualwithrespectto~ andusingtheSchurcomplementlemmatogeneratealinearmatrixinequality[1]:minM;t;;~;~t(4)subjectto YPi;jMi;ji(X)Tj(X)Y~1+~~~y(~1+~~~y)Tt2C~T~1!0M0;M0;M1~0;M0;0=1;~~0;~~0where~0isavectorofzerosand1isavectorwith1inthe rstdimensionandonesintherest.Thevariables,~,~arisefromthedualtransformation.ThisrelaxationisexactifMisarankonematrix.Theabovecanbeseenasageneralizationofthemultiplekernellearningframework.Insteadoflearningakernelfromacombinationofkernels,wearelearningacombinationofinnerproductsofdi erentfunctionsappliedtoourdata.Inourcase,thesearetruncatedrampfunctions.Thetermsi(X)Tj(X)arenotMercerkernelsexceptwhenij.ThismoregeneralcombinationrequiresthestricterconstraintsthatthemixingweightsMformapositivesemide nitematrix,aconstraintwhichisintroducedviatherelaxation.ThisisasucientconditionfortheresultingmatrixPi;jMi;ji(X)Tj(X)toalsobepositivesemide nite.Whenusingthisrelaxation,wecanrecoverthemonotonictransformbyusingthe rstcolumn(row)asthemixingweights,~m,ofthetruncatedrampfunctions.Inpractice,however,weusethelearnedkernelinourpredictionsk(~x;~x0)=Pi;jMi;ji(~x)Tj(~x0).4Experiments4.1SyntheticExperimentInthisexperimentwewilldemonstrateourmethod'sabilitytorecoveramonotonictrans-formationfromdata.Wesampleddatanearalineardecisionboundaryandgeneratedlabelsbasedonthisboundary.Wethenappliedastrictlymonotonicfunctiontothissampleddata.Thetrainingsetismadeupofthetransformedpointsandtheoriginallabels.Alinearal-gorithmwillhavedicultybecausethemappeddataisnotlinearlyseparable.However, 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 a)b)c) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 d)e)f) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 g)h)i)Figure3:a)Originaldata.b)Datatransformedbyalogarithm.c)Datatransformedbyaquadraticfunction.d-f)Thetransformationfunctionslearnedusingthenonconvexalgorithm.g-i)Thetransformationfunctionslearnedusingtheconvexalgorithm.ifwecouldrecovertheinversemonotonicfunction,thenalineardecisionboundarywoulperformwell.Figure3ashowstheoriginaldataanddecisionboundary.Figure3bshowsthedataandhyperplanetransformedwithanormalizedlogarithm.Figure3cdepictsaquadratictrans-form.600datapointsweresampled,andthentransformed.200wereusedfortraining,200forcrossvalidationand200fortesting.Wecomparedourlocallyoptimalmethod(Lmono),ourconvexrelaxation(Cmono)andalinearSVM(linear).ThelinearSVMstruggledonallofthetransformeddatawhiletheothermethodsperformedwellasreportedinFigure4.ThelearnedtransformsforLmonoareplottedinFigure3(d-f).Thesolidbluelineisthemeanover10experiments,andthedashedblueisthestandarddeviation.Theblackliisthetruetargetfunction.ThelearnedfunctionsforCmonoareinFigure3(g-i).Bothalgorithmsperformedquitewellonthetaskofclassi cationandrecovernearlytheexactmonotonictransform.Thelocalmethodoutperformedtherelaxationslightlybecausethiswasaneasyproblemwithfewlocalminima.4.2DocumentClassi cationInthisexperimentweusedthefouruniversitiesWebKBdataset.Thedataismadeupofwebpagesfromfouruniversitiesplusanadditionallargersetfrommiscellaneousuniversities. linear exponential squareroot total Linear 0.0005 0.0375 0.0685 0.0355 LMono 0.0020 0.0005 0.0020 0.0015 CMono 0.0025 0.0075 0.0025 0.0042 Figure4:Testingerrorratesforthesyntheticexperiments. 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4 total Linear 0.0509 0.0879 0.1381 0.0653 0.1755 0.0941 0.1025 TFIDF 0.0428 0.0891 0.1623 0.0486 0.1910 0.1096 0.1059 Sqrt 0.0363 0.0667 0.0996 0.0456 0.1153 0.0674 0.0711 Poly 0.0499 0.0861 0.1389 0.0599 0.1750 0.0950 0.1009 RBF 0.0514 0.0836 0.1356 0.0641 0.1755 0.0981 0.1024 LMono 0.0338 0.0739 0.0854 0.0511 0.1060 0.0602 0.0683 CMono 0.0322 0.0776 0.0812 0.0501 0.0973 0.0584 0.0657 Figure5:TestingerrorratesforWebKB.Thesewebpagesarethencategorized.Wewillbeworkingwiththelargestfourcategories:student,faculty,course,andproject.Thetaskistosolveallsixpairwiseclassi cationproblems.In[6,5]preprocessingthedatawithasquarerootwasdemonstratedtoyieldgoodresults.Wewillcompareournonconvexmethod(Lmono),andourconvexrelaxation(Cmono)toalinearSVMwithandwithoutthesquareroot,withTFIDFfeaturesandalsoakernelizedSVMwithboththepolynomialkernelandtheRBFkernel.Wewillfollowthesetupof[6]bytrainingonthreeuniversitiesandthemiscellaneousuniversitysetandtestingonwebpagesfromthefourthuniversity.Werepeatedthisfourfoldexperiment vetimes.Foreachfold,weuseasubsetof200pointsfortraining,200tocrossvalidatetheparametersettings,andallofthefourthuniversity'spointsfortesting.OurtwomethodsoutperformthecompetitiononaverageasreportedinFigure5.Theconvexrelaxationchoosesastepfunctionnearlyeverytime.Thisoutputsa1ifawordisinthetrainingvectorand0ifitisabsent.Thenonconvexgreedyalgorithmdoesnotenduprecoveringthissolutionasreliablyandseemstogetstuckinlocalminima.Thisleadstoslightlyworseperformancethantheconvexversion.4.3ImageHistogramClassi cationInthisexperiment,weusedtheCorelimagedataset.In[3],itwasshownthatmonotonictransformsoftheformxafor0a1workedwell.TheCorelimagedatasetismadeupofvariouscategories,eachcontaining100images.Wechosefourcategoriesofanimals:1)eagles,2)elephants,3)horses,and4)tigers.ImagesweretransformedintoRGBhistogramsfollowingthebinningstrategyof[3,5].Weranaseriesofsixpairwiseexperimentswherethedatawasrandomlysplitinto80percenttraining,10percentcrossvalidation,and10percenttesting.Thesesixexperimentswererepeated10times.Wecomparedourtwomethodstoalinearsupportvectormachine,aswellasanSVMwithRBFandpolynomialkernels.Wealsocomparedtothesetoftransformsxafor0a1wherewecrossvalidatedoveraf0;:125;:25;:5;:625;:75;:875;1g.Thissetincludeslineara=1atoneend,abinarythresholda=0attheother(choosing00=0),andthesquareroottransforminthemiddle.Theconvexrelaxationperformedbestortiedforbeston4out6oftheexperimentsandwasthebestoverallasreportedinFigure6.Thenonconvexversionalsoperformedwellbutendedupwithaloweraccuracythanthecrossvalidatedfamilyofxatransforms.Thekeytothisdatasetisthatmostofthedataisveryclosetozeroduetofewpixelsbeinginagivenbin.Crossvalidationoverxamostoftenchoselownonzeroavalues.Ourmethodhadmanyknotsintheseextremelylowvaluesbecausethatwaswherethedatasupportwas.PlotsofourlearnedfunctionsonthesesmallvaluescanbefoundinFigure7(a-f).Solidblueisthemeanforthenonconvexalgorithmanddashedblueisthestandarddeviation.Similarly,theconvexrelaxationisinred. 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4 total Linear 0.08 0.10 0.28 0.11 0.14 0.26 0.1617 Sqrt 0.03 0.05 0.09 0.12 0.08 0.20 0.0950 Poly 0.07 0.10 0.28 0.11 0.15 0.23 0.1567 RBF 0.06 0.08 0.22 0.10 0.13 0.23 0.1367 xa 0.08 0.04 0.03 0.03 0.09 0.06 0.0550 LMono 0.05 0.06 0.04 0.05 0.13 0.05 0.0633 CMono 0.04 0.03 0.03 0.04 0.06 0.05 0.0417 Figure6:TestingerrorratesonCoreldataset. 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 Figure7:Thelearnedtransformationfunctionsfor6Corelproblems.4.4Genderclassi cationInthisexperimentwetrytodi erentiatebetweenimagesofmalesandfemales.Wehave1755labelledimagesfromtheFERETdatasetprocessedasin[9].Eachprocessedimageisa21by12pixel256colorgrayscaleimagethatisrastorizedtoformtrainingvectors.Thereare1044maleimagesand711femaleimages.Werandomlysplitthedatainto80percenttraining,10percentcrossvalidation,andand10percenttesting.WethencomparealinearSVMtoourtwomethodson5randomsplitsofthedata.ThelearnedmonotonicfunctionfromLMonoandCMonoaresimilartoasigmoidfunctionwhichindicatesthatusefulsaturationandthresholde ectswhereuncoveredbyourmethods.Figure8ashowsexamplesoftrainingimagesbeforeandaftertheyhavebeentransformedbyourlearnedfunction.Figure8bsummarizestheresults.OurlearnedtransformationoutperformsthelinearSVMwiththeconvexrelaxationperformingbest.5DiscussionAdatadrivenframeworkwaspresentedforjointlylearningmonotonictransformationsofinputdataandadiscriminativelinearclassi er.Thejointoptimizationimprovesclassi -cationaccuracyandproducesinterestingtransformationsthatotherwisewouldrequireaprioridomainknowledge.Twoimplementationswerediscussed.The rstisafastgreedyalgorithmfor ndingalocallyoptimalsolution.Subsequently,asemide niterelaxationoftheoriginalproblemwaspresentedwhichdoesnotsu erfromlocalminima.Thegreedyalgorithmhassimilarscalingpropertiesasasupportvectormachineyethaslocalminimatocontendwith.Thesemide niterelaxationismorecomputationallyintensiveyetensuresareliableglobalsolution.Nevertheless,bothimplementationswerehelpfulinsyntheticandrealexperimentsincludingtextandimageclassi cationandimprovedoverstandardsupportvectormachinetools. Algorithm Error Linear .0909 LMono .0818 CMono .0648 a)b)Figure8:a)Originalandtransformedgenderimages.b)Errorratesforgenderclassi cation.Anaturalnextstepistoexplorefaster(convex)algorithmsthattakeadvantageofthespeci cstructureoftheproblem.Thesefasteralgorithmswillhelpusexploreextensionssuchaslearningtransformationsacrossmultipletasks.Wealsohopetoexploreapplicationstootherdomainssuchasgeneexpressiondatatore nethecurrentlogarithmictransformsnecessarytocompensateforwell-knownsaturatione ectsinexpressionlevelmeasurements.WearealsointerestedinlookingatfMRIandaudiodatawheremonotonictransformationsareuseful.6AcknowledgementsThisworkwassupportedinpartbyNSFAwardIIS-0347499andONRAwardN000140710507.References[1]S.BoydandL.Vandenberghe.ConvexOptimization.CambridgeUniversityPress,2004.[2]M.Brown,W.Grundy,D.Lin,N.Christianini,C.Sugnet,M.Jr,andD.Haussler.Supportvectormachineclassi cationofmicroarraygeneexpressiondata,1999.[3]O.Chapelle,P.Hafner,andV.N.Vapnik.Supportvectormachinesforhistogram-basedclassi cation.NeuralNetworks,IEEETransactionson,10:1055{1064,1999.[4]C.CortesandV.Vapnik.Support-vectornetworks.MachineLearning,20(3):273{297,1995.[5]M.HeinandO.Bousquet.Hilbertianmetricsandpositivede nitekernelsonprobabilitymeasures.InProceedingsofArti cialIntelligenceandStatistics,2005.[6]T.Jebara,R.Kondor,andA.Howard.Probabilityproductkernels.JournalofMachineLearningResearch,5:819{844,2004.[7]G.Lanckriet,N.Cristianini,P.Bartlett,L.ElGhaoui,andM.I.Jordan.Learningthekernelmatrixwithsemide niteprogramming.JournalofMachineLearningResearch,5:27{72,2004.[8]J.B.Lasserre.ConvergentLMIrelaxationsfornonconvexquadraticprograms.InProceedingsof39thIEEEConferenceonDecisionandControl,2000.[9]B.MoghaddamandM.H.Yang.Sexwithsupportvectormachines.InToddK.Leen,ThomasG.Dietterich,andVolkerTresp,editors,AdvancesinNeuralInformationPro-cessing13,pages960{966.MITPress,2000.[10]T.Robertson,F.T.Wright,andR.L.Dykstra.OrderRestrictedStatisticalInference.Wiley,1988.