AndrewGHowardDepartmentofComputerScienceColumbiaUniversityNewYorkNY10027ahowardcscolumbiaeduTonyJebaraDepartmentofComputerScienceColumbiaUniversityNewYorkNY10027jebaracscolumbiaeduAbstractAdi ID: 124729
Download Pdf The PPT/PDF document "LearningMonotonicTransformationsforClass..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
LearningMonotonicTransformationsforClassication AndrewG.HowardDepartmentofComputerScienceColumbiaUniversityNewYork,NY10027ahoward@cs.columbia.eduTonyJebaraDepartmentofComputerScienceColumbiaUniversityNewYork,NY10027jebara@cs.columbia.eduAbstractAdiscriminativemethodisproposedforlearningmonotonictransforma-tionsofthetrainingdatawhilejointlyestimatingalarge-marginclassier.Inmanydomainssuchasdocumentclassication,imagehistogramclassi-cationandgenemicroarrayexperiments,xedmonotonictransformationscanbeusefulasapreprocessingstep.However,mostclassiersonlyexplorethesetransformationsthroughmanualtrialanderrororviapriordomainknowledge.Theproposedmethodlearnsmonotonictransformationsauto-maticallywhiletrainingalarge-marginclassierwithoutanypriorknowledgeofthedomain.Amonotonicpiecewiselinearfunctionislearnedwhichtransformsdataforsubsequentprocessingbyalinearhyperplaneclassier.Twoalgorithmicimplementationsofthemethodareformalized.Therstsolvesaconvergentalternatingsequenceofquadraticandlinearprogramsuntilitobtainsalocallyoptimalsolution.Animprovedalgorithmisthenderivedusingaconvexsemideniterelaxationthatovercomesinitializa-tionissuesinthegreedyoptimizationproblem.Theeectivenessoftheselearnedtransformationsonsyntheticproblems,textdataandimagedataisdemonstrated.1IntroductionManyeldshavedevelopedheuristicmethodsforpreprocessingdatatoimproveperfor-mance.Thisoftentakestheformofapplyingamonotonictransformationpriortousingaclassicationalgorithm.Forexample,whenthebagofwordsrepresentationisusedindocumentclassication,itiscommontotakethesquarerootofthetermfrequency[6,5].Monotonictransformsarealsousedwhenclassifyingimagehistograms.In[3],transforma-tionsoftheformxawhere0a1aredemonstratedtoimproveperformance.Whenclassifyinggenesfromvariousmicroarrayexperimentsitiscommontotakethelogarithmofthegeneexpressionratio[2].Monotonictransformationscanalsocapturecrucialpropertiesofthedatasuchasthresholdandsaturationeects.Inthispaper,weproposetosimultaneouslylearnahyperplaneclassierandamonotonictransformation.Thesolutionproducedbyouralgorithmisapiecewiselinearmonotonicfunctionandamaximummarginhyperplaneclassiersimilartoasupportvectormachine(SVM)[4].Byallowingforaricherclassoftransformslearnedattrainingtime(asopposedtoaruleofthumbappliedduringpreprocessing),weimproveclassicationaccuracy.Thelearnedtransformisspecicallytunedtotheclassicationtask.Themaincontributionsofthispaperinclude,anovelframeworkforestimatingamonotonictransformationandahyperplaneclassiersimultaneouslyattrainingtime,anecientmethodforndinga ,1 n x ,2 n x , nD x n y b 1 w 2 w D w Figure1:Monotonictransformappliedtoeachdimensionfollowedbyahyperplaneclassier.locallyoptimalsolutiontotheproblem,andaconvexrelaxationtondagloballyoptimalapproximatesolution.Thepaperisorganizedasfollows.Insection2,wepresentourformulationforlearningapiecewiselinearmonotonicfunctionandahyperplane.Weshowhowtolearnthiscombinedmodelthroughaniterativecoordinateascentoptimizationusinginterleavedquadraticandlinearprogramstondalocalminimum.Insection3,wederiveaconvexrelaxationbasedonLasserre'smethod[8].Insection4syntheticexperimentsaswellasdocumentandimageclassicationproblemsdemonstratethediverseutilityofourmethod.Weconcludewithadiscussionandfuturework.2LearningMonotonicTransformationsForanunknowndistributionP(~x;y)overinputs~x2dandlabelsy2f 1;1g,weassumethatthereisanunknownnuisancemonotonictransformation(x)andunknownhyperplaneparameterizedby~wandbsuchthatpredictingwith(x)=sign(~wT(~x)+b)yieldsalowexpectedtesterrorRR1 2jy(x)jdP(~x;y).Wewouldliketorecover(~x);~w;bfromalabeledtrainingsetSf(~x1;y1);:::;(~xN;yN)gwhichissampledi.i.d.fromP(~x;y).ThetransformationactselementwiseascanbeseeninFigure1.Weproposetolearnbothamaximummarginhyperplaneandtheunknowntransform(x)simultaneously.Inourformulation,(x)isapiecewiselinearfunctionthatweparameterizewithasetofKknotsfz1;:::;zKgandassociatedpositiveweightsfm1;:::;mKgwherezj2andmj2+.Thetransformationcanbewrittenas(x)=PKj=1mjj(x)wherej(x)aretruncatedrampfunctionsactingonvectorsandmatriceselementwiseasfollows:j(x)=0xzj zj zj+1 zjzjxzj+11zj+1x(1)Thisisalesscommonwaytoparameterizepiecewiselinearfunctions.Thepositivitycon-straintsenforcemonotonicityon(x)forallx.Amorecommonmethodistoparameterizethefunctionvalue(z)ateachknotzandapplyorderconstraintsbetweensubsequentknotstoenforcemonotonicity.Valuesinbetweenknotsarefoundthroughlinearinterpolation.Thisisthemethodusedinisotonicregression[10],butinpractice,theseareequivalentformulations.Usingtruncatedrampfunctionsispreferablefornumerousreasons.Theycanbeeasilyprecomputedandaresparse.Onceprecomputed,mostcalculationscanbedoneviasparsematrixmultiplications.Thepositivityconstraintsontheweights~mwillalsoyieldasimplerformulationthanorderconstraintsandinterpolationwhichbecomesimportantinsubsequentrelaxationsteps.Figure2ashowsthetruncatedrampfunctionassociatedwithknotz1.Figure2bshowsaconiccombinationoftruncatedrampsthatbuildsapiecewiselinearmonotonicfunction.Combiningthiswiththesupportvectormachineformulationleadsustothefollowinglearn-ingproblem: z1 z2 0 0.2 0.4 0.6 0.8 1 z1 z2 z3 z4 z5 m1 m1+m2 m1+m2+m3 m1+m2+m3+m4 m1+m2+m3+m4+m5 a)Truncatedrampfunction1(x).b)(x)=P5j=1mjj(x).Figure2:Buildingblocksforpiecewiselinearfunctions.min~w;~;b;~mk~wk22CNXi=1i(2)subjecttoyi0@*~w;KXj=1mjj(~xi)+b1A1iii0;mj0;Xjmj1i;jwhere~arethestandardSVMslackvariables,~wandbarethemaximummarginsolutionforthetrainingsetthathasbeentransformedvia(x)withlearnedweights~m.Beforetraining,theknotlocationsarechosenattheempiricalquantilessothattheyareevenlyspacedinthedata.Thisproblemisnonconvexduetothequadraticterminvolving~wand~mintheclassicationconstraints.Althoughitisdiculttondagloballyoptimalsolution,thestructureoftheproblemsuggestsasimplemethodforndingalocallyoptimalsolution.Wecandividetheproblemintotwoconvexsubproblems.Thisamountstosolvingasupportvectormachinefor~wandbwithaxed(x)andalternativelysolvingfor(x)asalinearprogramwiththeSVMsolutionxed.Inbothsubproblems,weoptimizeover~asitispartofthehingeloss.Thisyieldsanecientconvergentoptimizationmethod.However,thismethodcangetstuckinlocalminima.Inpractice,weinitializeitwithalinear(x)anditeratefromthere.Alternativeinitializationsdonotyieldmuchhelp.Thisleadsustolookforamethodtoecientlyndglobalsolutions.3ConvexRelaxationWhenfacedwithanonconvexquadraticproblem,anincreasinglypopulartechniqueistorelaxitintoaconvexone.Lasserre[8]proposedasequenceofconvexrelaxationsforthesetypesofnonconvexquadraticprograms.Thismethodreplacesallquadratictermsintheoriginaloptimizationproblemwithentriesinamatrix.Initssimplestformthismatrixcorrespondstotheouterproductofthetheoriginalvariableswithrankoneandsemideniteconstraints.Therelaxationcomesfromdroppingtherankoneconstraintontheouterproductmatrix.Lasserreproposedmoreelaboraterelaxationsusinghigherordermomentsofthevariables.However,wemainlyusetherstmomentrelaxationalongwithafewofthesecondordermomentconstraintsthatdonotrequireanyadditionalvariablesbeyondtheouterproductmatrix.Aconvexrelaxationcouldbederiveddirectlyfromtheprimalformulationofourproblem.Both~wand~mwouldberelaxedastheyinteractinthenonconvexquadraticterms.Un- fortunately,thisyieldsasemideniteconstraintthatscaleswithboththenumberofknotsandthedimensionalityofthedata.Thisistroublesomebecausewewishtoworkwithhighdimensionaldatasuchasabagofwordsrepresentationfortext.However,ifwerstndthedualformulationfor~w,b,and~,weonlyhavetorelax~mwhichyieldsbothatighterrelaxationandalesscomputationallyintensiveproblem.Findingthedualleavesuswiththefollowingminmaxsaddlepointproblemthatwillbesubsequentlyrelaxedandtransformedintoasemideniteprogram:min~mmax~2~T~1~T0@Y0@Xi;jmimji(X)Tj(X)1AY1A~(3)0iC;~T~y=0;mj0;Xjmj1i;jwhere~1isavectorofones,~yisavectorofthelabels,Y=diag(~y)isamatrixwiththelabelsonitsdiagonalwithzeroselsewhere,andXisamatrixwith~xiintheithcolumn.WeintroducetherelaxationviathesubstitutionM=mmTandconstraintM0wheremisconstructedbyconcatenating1with~m.Wecanthentransformtherelaxedminmaxproblemintoasemideniteprogramsimilartothemultiplekernellearningframework[7]byndingthedualwithrespectto~andusingtheSchurcomplementlemmatogeneratealinearmatrixinequality[1]:minM;t;;~;~t(4)subjectto YPi;jMi;ji(X)Tj(X)Y~1+~~~y(~1+~~~y)Tt2C~T~1!0M0;M0;M1~0;M0;0=1;~~0;~~0where~0isavectorofzerosand1isavectorwith1intherstdimensionandonesintherest.Thevariables,~,~arisefromthedualtransformation.ThisrelaxationisexactifMisarankonematrix.Theabovecanbeseenasageneralizationofthemultiplekernellearningframework.Insteadoflearningakernelfromacombinationofkernels,wearelearningacombinationofinnerproductsofdierentfunctionsappliedtoourdata.Inourcase,thesearetruncatedrampfunctions.Thetermsi(X)Tj(X)arenotMercerkernelsexceptwhenij.ThismoregeneralcombinationrequiresthestricterconstraintsthatthemixingweightsMformapositivesemidenitematrix,aconstraintwhichisintroducedviatherelaxation.ThisisasucientconditionfortheresultingmatrixPi;jMi;ji(X)Tj(X)toalsobepositivesemidenite.Whenusingthisrelaxation,wecanrecoverthemonotonictransformbyusingtherstcolumn(row)asthemixingweights,~m,ofthetruncatedrampfunctions.Inpractice,however,weusethelearnedkernelinourpredictionsk(~x;~x0)=Pi;jMi;ji(~x)Tj(~x0).4Experiments4.1SyntheticExperimentInthisexperimentwewilldemonstrateourmethod'sabilitytorecoveramonotonictrans-formationfromdata.Wesampleddatanearalineardecisionboundaryandgeneratedlabelsbasedonthisboundary.Wethenappliedastrictlymonotonicfunctiontothissampleddata.Thetrainingsetismadeupofthetransformedpointsandtheoriginallabels.Alinearal-gorithmwillhavedicultybecausethemappeddataisnotlinearlyseparable.However, 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 a)b)c) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 d)e)f) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 g)h)i)Figure3:a)Originaldata.b)Datatransformedbyalogarithm.c)Datatransformedbyaquadraticfunction.d-f)Thetransformationfunctionslearnedusingthenonconvexalgorithm.g-i)Thetransformationfunctionslearnedusingtheconvexalgorithm.ifwecouldrecovertheinversemonotonicfunction,thenalineardecisionboundarywoulperformwell.Figure3ashowstheoriginaldataanddecisionboundary.Figure3bshowsthedataandhyperplanetransformedwithanormalizedlogarithm.Figure3cdepictsaquadratictrans-form.600datapointsweresampled,andthentransformed.200wereusedfortraining,200forcrossvalidationand200fortesting.Wecomparedourlocallyoptimalmethod(Lmono),ourconvexrelaxation(Cmono)andalinearSVM(linear).ThelinearSVMstruggledonallofthetransformeddatawhiletheothermethodsperformedwellasreportedinFigure4.ThelearnedtransformsforLmonoareplottedinFigure3(d-f).Thesolidbluelineisthemeanover10experiments,andthedashedblueisthestandarddeviation.Theblackliisthetruetargetfunction.ThelearnedfunctionsforCmonoareinFigure3(g-i).Bothalgorithmsperformedquitewellonthetaskofclassicationandrecovernearlytheexactmonotonictransform.Thelocalmethodoutperformedtherelaxationslightlybecausethiswasaneasyproblemwithfewlocalminima.4.2DocumentClassicationInthisexperimentweusedthefouruniversitiesWebKBdataset.Thedataismadeupofwebpagesfromfouruniversitiesplusanadditionallargersetfrommiscellaneousuniversities. linear exponential squareroot total Linear 0.0005 0.0375 0.0685 0.0355 LMono 0.0020 0.0005 0.0020 0.0015 CMono 0.0025 0.0075 0.0025 0.0042 Figure4:Testingerrorratesforthesyntheticexperiments. 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4 total Linear 0.0509 0.0879 0.1381 0.0653 0.1755 0.0941 0.1025 TFIDF 0.0428 0.0891 0.1623 0.0486 0.1910 0.1096 0.1059 Sqrt 0.0363 0.0667 0.0996 0.0456 0.1153 0.0674 0.0711 Poly 0.0499 0.0861 0.1389 0.0599 0.1750 0.0950 0.1009 RBF 0.0514 0.0836 0.1356 0.0641 0.1755 0.0981 0.1024 LMono 0.0338 0.0739 0.0854 0.0511 0.1060 0.0602 0.0683 CMono 0.0322 0.0776 0.0812 0.0501 0.0973 0.0584 0.0657 Figure5:TestingerrorratesforWebKB.Thesewebpagesarethencategorized.Wewillbeworkingwiththelargestfourcategories:student,faculty,course,andproject.Thetaskistosolveallsixpairwiseclassicationproblems.In[6,5]preprocessingthedatawithasquarerootwasdemonstratedtoyieldgoodresults.Wewillcompareournonconvexmethod(Lmono),andourconvexrelaxation(Cmono)toalinearSVMwithandwithoutthesquareroot,withTFIDFfeaturesandalsoakernelizedSVMwithboththepolynomialkernelandtheRBFkernel.Wewillfollowthesetupof[6]bytrainingonthreeuniversitiesandthemiscellaneousuniversitysetandtestingonwebpagesfromthefourthuniversity.Werepeatedthisfourfoldexperimentvetimes.Foreachfold,weuseasubsetof200pointsfortraining,200tocrossvalidatetheparametersettings,andallofthefourthuniversity'spointsfortesting.OurtwomethodsoutperformthecompetitiononaverageasreportedinFigure5.Theconvexrelaxationchoosesastepfunctionnearlyeverytime.Thisoutputsa1ifawordisinthetrainingvectorand0ifitisabsent.Thenonconvexgreedyalgorithmdoesnotenduprecoveringthissolutionasreliablyandseemstogetstuckinlocalminima.Thisleadstoslightlyworseperformancethantheconvexversion.4.3ImageHistogramClassicationInthisexperiment,weusedtheCorelimagedataset.In[3],itwasshownthatmonotonictransformsoftheformxafor0a1workedwell.TheCorelimagedatasetismadeupofvariouscategories,eachcontaining100images.Wechosefourcategoriesofanimals:1)eagles,2)elephants,3)horses,and4)tigers.ImagesweretransformedintoRGBhistogramsfollowingthebinningstrategyof[3,5].Weranaseriesofsixpairwiseexperimentswherethedatawasrandomlysplitinto80percenttraining,10percentcrossvalidation,and10percenttesting.Thesesixexperimentswererepeated10times.Wecomparedourtwomethodstoalinearsupportvectormachine,aswellasanSVMwithRBFandpolynomialkernels.Wealsocomparedtothesetoftransformsxafor0a1wherewecrossvalidatedoveraf0;:125;:25;:5;:625;:75;:875;1g.Thissetincludeslineara=1atoneend,abinarythresholda=0attheother(choosing00=0),andthesquareroottransforminthemiddle.Theconvexrelaxationperformedbestortiedforbeston4out6oftheexperimentsandwasthebestoverallasreportedinFigure6.Thenonconvexversionalsoperformedwellbutendedupwithaloweraccuracythanthecrossvalidatedfamilyofxatransforms.Thekeytothisdatasetisthatmostofthedataisveryclosetozeroduetofewpixelsbeinginagivenbin.Crossvalidationoverxamostoftenchoselownonzeroavalues.Ourmethodhadmanyknotsintheseextremelylowvaluesbecausethatwaswherethedatasupportwas.PlotsofourlearnedfunctionsonthesesmallvaluescanbefoundinFigure7(a-f).Solidblueisthemeanforthenonconvexalgorithmanddashedblueisthestandarddeviation.Similarly,theconvexrelaxationisinred. 1vs2 1vs3 1vs4 2vs3 2vs4 3vs4 total Linear 0.08 0.10 0.28 0.11 0.14 0.26 0.1617 Sqrt 0.03 0.05 0.09 0.12 0.08 0.20 0.0950 Poly 0.07 0.10 0.28 0.11 0.15 0.23 0.1567 RBF 0.06 0.08 0.22 0.10 0.13 0.23 0.1367 xa 0.08 0.04 0.03 0.03 0.09 0.06 0.0550 LMono 0.05 0.06 0.04 0.05 0.13 0.05 0.0633 CMono 0.04 0.03 0.03 0.04 0.06 0.05 0.0417 Figure6:TestingerrorratesonCoreldataset. 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 0 0.5 1 1.5 2x 10-3 -0.2 0 0.2 0.4 0.6 0.8 1 Figure7:Thelearnedtransformationfunctionsfor6Corelproblems.4.4GenderclassicationInthisexperimentwetrytodierentiatebetweenimagesofmalesandfemales.Wehave1755labelledimagesfromtheFERETdatasetprocessedasin[9].Eachprocessedimageisa21by12pixel256colorgrayscaleimagethatisrastorizedtoformtrainingvectors.Thereare1044maleimagesand711femaleimages.Werandomlysplitthedatainto80percenttraining,10percentcrossvalidation,andand10percenttesting.WethencomparealinearSVMtoourtwomethodson5randomsplitsofthedata.ThelearnedmonotonicfunctionfromLMonoandCMonoaresimilartoasigmoidfunctionwhichindicatesthatusefulsaturationandthresholdeectswhereuncoveredbyourmethods.Figure8ashowsexamplesoftrainingimagesbeforeandaftertheyhavebeentransformedbyourlearnedfunction.Figure8bsummarizestheresults.OurlearnedtransformationoutperformsthelinearSVMwiththeconvexrelaxationperformingbest.5DiscussionAdatadrivenframeworkwaspresentedforjointlylearningmonotonictransformationsofinputdataandadiscriminativelinearclassier.Thejointoptimizationimprovesclassi-cationaccuracyandproducesinterestingtransformationsthatotherwisewouldrequireaprioridomainknowledge.Twoimplementationswerediscussed.Therstisafastgreedyalgorithmforndingalocallyoptimalsolution.Subsequently,asemideniterelaxationoftheoriginalproblemwaspresentedwhichdoesnotsuerfromlocalminima.Thegreedyalgorithmhassimilarscalingpropertiesasasupportvectormachineyethaslocalminimatocontendwith.Thesemideniterelaxationismorecomputationallyintensiveyetensuresareliableglobalsolution.Nevertheless,bothimplementationswerehelpfulinsyntheticandrealexperimentsincludingtextandimageclassicationandimprovedoverstandardsupportvectormachinetools. Algorithm Error Linear .0909 LMono .0818 CMono .0648 a)b)Figure8:a)Originalandtransformedgenderimages.b)Errorratesforgenderclassication.Anaturalnextstepistoexplorefaster(convex)algorithmsthattakeadvantageofthespecicstructureoftheproblem.Thesefasteralgorithmswillhelpusexploreextensionssuchaslearningtransformationsacrossmultipletasks.Wealsohopetoexploreapplicationstootherdomainssuchasgeneexpressiondatatorenethecurrentlogarithmictransformsnecessarytocompensateforwell-knownsaturationeectsinexpressionlevelmeasurements.WearealsointerestedinlookingatfMRIandaudiodatawheremonotonictransformationsareuseful.6AcknowledgementsThisworkwassupportedinpartbyNSFAwardIIS-0347499andONRAwardN000140710507.References[1]S.BoydandL.Vandenberghe.ConvexOptimization.CambridgeUniversityPress,2004.[2]M.Brown,W.Grundy,D.Lin,N.Christianini,C.Sugnet,M.Jr,andD.Haussler.Supportvectormachineclassicationofmicroarraygeneexpressiondata,1999.[3]O.Chapelle,P.Hafner,andV.N.Vapnik.Supportvectormachinesforhistogram-basedclassication.NeuralNetworks,IEEETransactionson,10:1055{1064,1999.[4]C.CortesandV.Vapnik.Support-vectornetworks.MachineLearning,20(3):273{297,1995.[5]M.HeinandO.Bousquet.Hilbertianmetricsandpositivedenitekernelsonprobabilitymeasures.InProceedingsofArticialIntelligenceandStatistics,2005.[6]T.Jebara,R.Kondor,andA.Howard.Probabilityproductkernels.JournalofMachineLearningResearch,5:819{844,2004.[7]G.Lanckriet,N.Cristianini,P.Bartlett,L.ElGhaoui,andM.I.Jordan.Learningthekernelmatrixwithsemideniteprogramming.JournalofMachineLearningResearch,5:27{72,2004.[8]J.B.Lasserre.ConvergentLMIrelaxationsfornonconvexquadraticprograms.InProceedingsof39thIEEEConferenceonDecisionandControl,2000.[9]B.MoghaddamandM.H.Yang.Sexwithsupportvectormachines.InToddK.Leen,ThomasG.Dietterich,andVolkerTresp,editors,AdvancesinNeuralInformationPro-cessing13,pages960{966.MITPress,2000.[10]T.Robertson,F.T.Wright,andR.L.Dykstra.OrderRestrictedStatisticalInference.Wiley,1988.