/
MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding

MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding - PDF document

danika-pritchard
danika-pritchard . @danika-pritchard
Follow
405 views
Uploaded On 2016-04-29

MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding - PPT Presentation

DanZhang1danzhang2008gmailcomJingruiHe2jingruihegmailcomLuoSi3lsicspurdueeduRichardDLawrence4ricklawrusibmcom1FacebookIncorporationMenloParkCA940252ComputerScienceDepartmentStevensInsti ID: 298005

DanZhang1danzhang2008@gmail.comJingruiHe2jingrui.he@gmail.comLuoSi3lsi@cs.purdue.eduRichardD.Lawrence4ricklawr@us.ibm.com1FacebookIncorporation MenloPark CA940252ComputerScienceDepartment StevensInsti

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "MILEAGE:MultipleInstanceLEArningwithGlob..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding DanZhang1danzhang2008@gmail.comJingruiHe2jingrui.he@gmail.comLuoSi3lsi@cs.purdue.eduRichardD.Lawrence4ricklawr@us.ibm.com1FacebookIncorporation,MenloPark,CA940252ComputerScienceDepartment,StevensInstituteofTechnology,Hoboken,NJ070303ComputerScienceDepartment,PurdueUniversity,WestLafayette,IN479074IBMT.J.WatsonResearchCenter,YorktownHeights,NY10562AbstractMultipleInstanceLearning(MIL)generallyrepresentseachexampleasacollectionofin-stancessuchthatthefeaturesforlocalob-jectscanbebettercaptured,whereastra-ditionalmethodstypicallyextractaglobalfeaturevectorforeachexampleasanin-tegralpart.However,thereislimitedre-searchworkoninvestigatingwhichofthetwolearningscenariosperformsbetter.Thispaperproposesanovelframework{Multi-pleInstanceLEArningwithGlobalEmbed-ding(MILEAGE),inwhichtheglobalfeaturevectorsfortraditionallearningmethodsareintegratedintotheMILsetting.Withintheproposedframework,alargemarginmethodisformulatedtoadaptivelytunetheweightsonthetwodi erentkindsoffeaturerepresen-tations(i.e.,globalandmultipleinstance)foreachexampleandtrainstheclassi ersimul-taneously.Anextensivesetofexperimentsareconductedtodemonstratetheadvantagesoftheproposedmethod.1.IntroductionTraditionallearningmethodsusuallyconsidereachexampleasonenon-separableentity,andrepresentthewholecontentoftheexamplebyonefeaturevector.However,thesemanticmeaningsofanex-amplesometimesvaryamongitsconstituentparts.MultipleInstanceLearning(MIL)hasbeenproposedtodealwithproblemswhoseoutputinformationis Proceedingsofthe30thInternationalConferenceonMa-chineLearning,Atlanta,Georgia,USA,2013.JMLR:W&CPvolume28.Copyright2013bytheauthor(s).onlyknownforbagsofitems/instances,asopposedtoforeachexample.Moreprecisely,inaMILset-ting,eachexample/bagisdividedintoseveraldi er-entparts/instances.Thelabelsareassignedtobags,ratherthanindividualinstances.Abagislabeledaspositiveifitcontainsmorethanonepositivein-stance;otherwiseitislabeledasnegative.Inthispaper,foreachexample,thefeaturevectorextractedbyusingthesamewayaswedofortraditionalnon-MILmethods(i.e.,treatingeachexampleasanin-tegralentity)isreferredtoastheglobalrepresen-tationofthisexample,whileitslocalrepresenta-tionisasetofinstancesextractedforeachpartofthisexample,asinMIL.Tosomeextent,theglobalrepresentationforeachexamplecanalsobeconsid-eredasitsbaglevelfeatures.NumerousmethodshavebeendevelopedforMILclassi cation( Andrewsetal. , 2003 ; Dietterichetal. , 1998 ; Kim&laTorre , 2010 )anditsvariants,suchasoutlierdetection( Wuetal. , 2010 ),onlinelearning( Babenkoetal. , 2011 ),rank-ing( Huetal. , 2008 ),etc.Thesemethodshavebeenwidelyemployedinareassuchastextmining( Andrewsetal. , 2003 )andlocalizedcontentbasedim-ageretrieval(LCBIR)( Rahmani&Goldman , 2006 ).MostpreviousMILmethodsfocusedonimprovingclassi cationperformanceunderlocalrepresentation.However,fewoftheminvestigatedwhetherthelocalrepresentationisalwaysbetterthantheglobalone.Thisproblemhasposedabigchallengeforresearcherstodecidewhatkindofalgorithmsshouldbeusedwhenfacingrealworldapplications.In( Ray&Craven , 2005 ),theauthorscomparedtheperformancesoftra-ditionalandMILmethods.However,theirworkisstillbasedonthelocalrepresentation,andadaptsthetra-ditionallearningmethodstothelocalrepresentation.Althoughrarelystudied,itisintuitivethatthetrue MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding positiveratesinpositivebagscoulda ecttheper-formancesoflocalandglobalrepresentationssignif-icantly.Thisisbecauseifthetruepositiverateinapositivebagislow,thenitsglobalrepresentationwillbedominatedbytheirrelevantpartsofthisex-ample,whilemethodsbasedonlocalrepresentationcouldpickthetruepositiveinstancesfortraining.Onthecontrary,ifanexamplehasfewirrelevantparts,thentheglobalrepresentationtendstobemoreinfor-mativethanthelocalone,sincemethodsbasedonlo-calrepresentationsnormallyfocusonsomelocalpartsofeachexample.Thisintuitioncanalsobeveri edempiricallybytheexperimentsconductedinSection 4.1 .Whenincorporatingthisintuitionintorealappli-cations,themajorchallengeishowtolearnforeachtrainingexample,whetherlocalrepresentationisbet-terorglobalonetendstoprevail.Tosolvethischallenge,anovelresearchframework{MultipleInstanceLEArningwithGlobalEmbed-ding(MILEAGE)isproposed.MILEAGEleveragesthebene tsfrombothlocalandglobalrepresentationssuchthatingeneralitcanachieveabetterperfor-mancethanbothMILandtraditionallearningmeth-ods.Fromanotherperspective,localandglobalfea-turerepresentationscanbetreatedastwoinformationsources,andeachofthemcarriessomeauxiliaryinfor-mationtoimproveclassi cationperformance,whichissimilartothebasicmotivationofmulti-viewlearn-ingmethods( Joachimsetal. , 2001 ).Tosolvethepro-posedframework,anovelmethodisdesignedbyadap-tivelytuningtheimportanceofthetwodi erentrep-resentations.Itisbasedontheintuitionthatlocalrepresentationtendstoperformbetterwhenthepos-itiveratioissmall.Aniterativemethodisemployedtosolvethederivedoptimizationproblem.Toacceler-atetheoptimizationspeed,inspiredby( Fudulietal. , 2004 ),weadaptthebundlemethodtosolvetheresult-ingnon-convexnon-smoothproblembyexplicitlycon-sideringtheconvexregularizationandthenon-convexlossterms.Somediscussionsandtheoreticalanalysishavebeenprovidedonimportantpropertiessuchasconvergencerateandgeneralizederrorrateofthepro-posedmethod.Experimentsonimage,textdatasetsandanovelapplication{InsiderThreatDetection,demonstratetheadvantagesoftheproposedmethod.2.Methodology2.1.ProblemStatementandNotationSupposeasetofexamples:D=f(Bi;Bi;Yi);i=1;:::;ngaregiven,whereBi2Rd1denotestheglobalrepresentationforthei-thexampleandYi2f1;1gisitsbinarylabel.Alongwiththeglobalfea-turerepresentation,foreachexample,itslocalfea-turerepresentations,i.e.,instancesfordi erentpartsofthisexample,arealsoavailable(Thenotionsofglobalandlocalrepresentationsarede nedinSec-tion1).Theinstancesinthei-thbagaredenotedas:Bi=fBi1;Bi2;:::;Binig2Rdni 1 ,andniisthenumberofinstancesinthei-thbag.Throughoutthepaper,subscriptmeansj=1;:::;ni.GivenanunlabeledexampleBuanditsassociatedlocalrepre-sentations,i.e.,Bu,theobjectiveofMultipleInstanceLEArningwithGlobalEmbedding(MILEAGE)istodesignafunctionf:(Bu;Bu)!R,suchthattheclassi cationonthisunlabeledexampleisaccurate.Iff(Bu;Bu)�0,thisexampleisclassi edaspositiveandotherwisenegative.2.2.MethodForeachbag,aweightvariableisintroducedtobal-ancetheimportanceofthetworepresentations.Theweightisdecidedbyboththepriorknowledgefromthepositiveratioforeachbagandthe tnessofthedata.Withoutlossofgenerality,givenaspeci cexampleBianditsassociatedinstancesBi,theclassi ertakesthefollowingform:f(Bi;Bi)=imaxjwTBij+(1i)wTBi;(1)where1i0istheconvexcombinationcoef- cientforthei-thexample,w2Rd1isthelinearclassi erandweassumethatthebiashasalreadybeenabsorbedintofeaturevectors.maxjwTBijistheout-putfromthelocalrepresentationofthei-thexample 2 ,whereaswTBiistheoutputfromitsglobalrepresen-tation.f(Bi;Bi)balancesthesetwooutputsthroughtheweighti.FromaBayesianperspective,givenadataset,thelogarithmoftheposteriordistributionforwandcanbewrittenasfollows:logP(w;jD)/logP(Djw;)P(w)nYi=1P(i);(2)where=[1;:::;n].Here,weassumethattheexamplesarei.i.d.generated.P(w)followstheGaus-siandistributionN(0;I).P(i)followstheBetadis-tributionwithbeta(\reri;\re(1ri)),whereand\rarethehyper-parametersandpartiallycontrolthemeanandskewnessofthedistribution.ri2[0;1]isthepriorknowledgeonthepositiveratioforthei-thbag,andcanbeobtainedthroughvariousways.Forexample,ricanbesimplysetto0:5ifnopriorknowledgeisavailable.Inpractice,apreliminaryclas-si ercanbetrainedbeforehandbyusingSVMon 1Weassumethatthelocalandglobalrepresentationssharethesamefeaturespace.Buttheproposedformula-tioncanbeextendedtothecasewhentheirfeaturespacesaredi erent.2TheoutputofeachexampleinMILisnormallydecidedbytheinstancethatappearstobemostpositiveunderaclassi erw( Andrewsetal. , 2003 ) MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding f(Bi;Yi);i=1;:::;ng.Then,ricanbeestimatedbyapplyingthisclassi erontheinstancesineachbag.ItisclearthatE(i)=eri=(eri+e(1ri)).Givenwand,theprobabilityofgeneratingadatasetDcanbedescribedbythehingelossas:P(Djw;)/Qni=1eCmaxf0;1Yi(imaxjwTBij+(1i)wTBi)g,whereCisaparameter.Then,maximizingEq.( 2 )isequivalenttosolvingthefollowingproblem:minw;;i01 2kwk2+CnXi=1inXi=1(\reri1)logi+(\re(1ri)1)log(1i)(3)s:t:8i2f1;:::;ng;Yi(imaxjwTBij+(1i)wTBi)1i:Thisformulationisnon-convexandcannotbesolveddirectly.Aniterativemethodisemployedtosolvethisproblem.Inparticular,forthek-thiteration,givenw(k1),1;:::;ncanbeupdatedby:minCnXi=1maxf0;1Yi(iw(k1)TBij(k1)i+(1i)w(k1)TBi)gnXi=1((\reri1)logi+(\re(1ri)1)log(1i))(4)wherej(k1)i=argmaxjw(k1)TBij.Theconvexityofthisobjectivefunctioncannotbedetermined,sincethesignsof(\reri1)and(\re(1ri)1)arenotclear.Butsomemethods,suchastheadaptedsub-gradientmethod,canstillbeusedto nditsoptimalorlocaloptimalsolutioneciently.Givenfromthepreviousstep,w(k)canbeoptimizedby:minw1 2kwk2+(5)CnXi=1maxf0;1Yi(imaxjwTBij+(1i)wTBi)gItisstillanon-convexnon-smoothoptimizationprob-lem.Buttheformismuchlesscomplicatedthanthatofproblem( 3 ).Itcanbesolvedthroughvar-iousways,suchasconstrainedconcave-convexpro-cedure(CCCP)( Yuille&Rangarajan , 2003 ).How-ever,thecomputationalcostforsolvingthisproblemisnon-trivial.Inseveralrecentworks,thebundlemethodhasshownitssuperiorperformanceinbothef- ciencyande ectivenessoverstate-of-the-artmethods( Joachims , 2006 ; Joachimsetal. , 2009 ; Smolaetal. , 2007 ; Teoetal. , 2010 ).However,onemajordrawbackforthismethodisthatitcanonlybeemployedtosolveconvexoptimizationproblems.In( Fudulietal. , 2004 ; Hare&Sagastizabal , 2010 ; Noll , 2012 ),severalheuris-ticsareemployedtohandlethisissueforthebun-dlemethod.Inthispaper,inspiredby( Fudulietal. , 2004 ),weadaptthebundlemethodtosolvethispro-posedoptimizationprobleminthenextsection.Basedontheseupdatingschemes,problem( 4 )andproblem( 5 )willbeconductediterativelyuntilconvergence.Itisclearthattheproposedformulationisinduc-tiveontheclassi erbuttransductiveoni.So,ifweonlyneedtopredicttheunlabeledinstancesintheunlabeledset,thenwecandirectlyapplythelearnedclassi er.Ifthepredictionismadeonthebaglevel,onanunlabeledexample(Bu;Bu);j=1;:::;nu.Itshiddenvariableucanbeestimatedas:u=E(ujBu;Bu)=eru=(eru+e(1ru)),whereruisthepositiveinstanceratiowithinthisbagestimatedfromthelearnedclassi erw.Then,f(Bu;Bu)=umaxjwTBuj+(1u)wTBu.Iff(Bu;Bu)�0,theexampleislabeledaspositiveandotherwiseitislabeledasnegative.2.3.BundleMethodforNon-ConvexNon-SmoothOptimizationThetraditionalbundlemethodlooksforasetofcut-tingplanesthatcouldserveaslowerboundsoftheoriginalconvexobjectivefunction.Fornon-convexoptimizationproblems,however,thesecuttingplanescouldnolongerserveaslowerboundsoftheobjectivefunctions,asshowninFig. 1 .Someresearchworkscon-sidershiftingofanepiecesdownwards( Noll , 2012 ; Schramm&Zowe , 1992 ).However,theamountoftheshiftingappearsarbitrary( Fudulietal. , 2004 ).Inthissection,thebundlemethod,whichisbasedon rstorderapproximation,isadaptedtosolveproblem( 5 ).Inparticular,theintendedobjectivefunctioncanbecastedasthefollowingframework:minwF(w)=\n(w)+Remp(w);(6)where\n(w)isanon-negativeconvexdi erentiablereg-ularizer,andRemp(w)isanon-convexnon-smoothlossfunction.Inproblem( 5 ),\n(w)=1 2kwk2andRemp(w)=Cmaxf0;1Yi(imaxjwTBij+(1i)wTBi)g.Thismethodhandlesthisnon-convexnon-smoothprobleminaniterativewayandexhibitsakindofbothconvexandnonconvexbehaviorrelativetothecurrentpointintheiterativeprocedure.Moreprecisely,forthet-thiterationofbundlemethod,itmaintainstwosetsofcuttingplanes,i.e.,I+,fjj (t)j0g;I,fjj (t)j0g,wherej=1;:::;t1and (t)j,Remp(w(t1))Remp(w(j))gTj(w(t1)w(j)):(7)Here,gj2@wRemp(w(j)) 3 .Then,thefollowingtwo 3Forsimpli cation,weabusedthesuperscript.Pleasenotethatinthissection,thesuperscripttdenotesthet-thiterationinthebundlemethod. MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding Figure1.ApproximationofRemp(w)atw4.Thecuttingplanesfromotherpointseitheroverorunderestimatethevalueatandinthevicinityofw4,andthesignof iwillnotchangeinthevicinityofw4( 10, 2&#x-3.6;┠0, 30).Basedonthislocalitycharacteristic,weadaptedthebundlemethodinSection 2.3 .setsofanefunctionsarede nedas:+(w),maxj2I+gTj(ww(t1)) (t)j;(w),minj2IgTj(ww(t1)) (t)j:(8)Itisclearthat+(w)isanapproximationofRemp(w)Remp(w(t1)),while(w)isitslocallypessimisticestimation.Theseapproximationsareonlylocallyvalidaroundthelocalminimalpoint.Here,themeaningsof (t)jandthelocalitypropertycanbeshowninFig. 1 .Therefore,duringeachitera-tion,thenewoptimalpointshouldtradeo minimizing+(w)andproximitykww(t1)kwiththeconstraint+(w)(w)asfollows:minw;P(w;\r(t))=\r(t)(+\n(w))+1 2kww(t1)k2(9)s:t:gTj(ww(t1)) (t)j;j2I+;gTj(ww(t1)) (t)j;j2I;where\r(t)isthenon-negativeproximitycontrolpa-rameterforthet-thiterationthatbalancestheobjec-tivefunctionvalueandtheproximityoftheupdatedpoint.Thisproblemcanbesolvedecientlythroughitsdualform,sincebothofthesetsI+andIaresmall.Supposew(t)=argminwP(w;\r(t)).Ifnotcomputationallyexpensive,alinesearchcanbeper-formedbetweenw(t)andw(t1)onF(w)suchthatabettersolutioncanbefound.IftheoptimalsolutioncanresultinadrasticdecreaseintheobjectivefunctionF(w),itiscalledaseriousstepandtheoptimalsolutionforwwillbeupdated.Otherwise,itisconsideredasanullstep,theoptimalsolutionforthepreviousstepiskept,andtheprox-imityparameterwillshrinkforabettersolution.Ifkw(t)w(t1)kislessthanaprede nedthreshold,theproximityparameterwillalsoshrinktodoamorethoroughsearchwithinthatregion.Theclassicbundlemethodusuallycheckswhetherthedi erencebetweentheobjectivefunctionvalueandthecuttingplanefunctionvalueislessthanathresh-old.Ifso,theiterationterminates.Here,thisstrat-egycannotbeusedbecausethecuttingplanesofthenon-convexfunctioncannotbeconsideredasthelowerboundsfortheoriginalobjectivefunctionanymore.Intheproposedmethod,duringeachiteration,twostoppingcriteriawillbechecked.The rststoppingcriteriaistocheckwhether\r(t)issmallerthanaspec-i edthreshold1.Thisisbecausealthoughwehopethatthenewupdatedpointshouldfallwithinasmallregionofw(t1),if\r(t)becomestoosmall,w(t)isun-likelytodeviatetoomuchfromw(t1),andthere-sultswillnotbemeaningful.Anextremeexampleisif\r(t)=0,thenw(t)=w(t1).Thesecondstoppingcriteriaistocheckwhether02@F(w(t)),i.e.,whetherw(t)canbeconsideredasastationarypointforF(w).Inpractice,wecheckwhetherkok=F(w(t)),whereo=mino2convfgjjj2J+gko+@\n(w(t))kandJ+=fi2I+j (t)i2g.Inparticular,o=G+@\n(w(t));(10)whereGisamatrixwithitscolumnsbe-ingthesubgradientsgjfromJ+andcanbeoptimizedbysolving=argminTGTG+2(@\n(w(t)))TGs:t:T1=1;0.3.DiscussionsandTheoreticalAnalysisTheproposedbundlemethodissummarizedinTable 1 .Itisclearthatthemajoradvantageoftheproposedmethodover( Fudulietal. , 2004 )isthattheproposedmethodbetterexploitsthestructureoftheobjectivefunctionbytreatingtheconvexandnon-convexpartsseparately.Itthereforeeliminatestheunnecessary rstorderapproximationfortheconvexpart.Inthisway,theoreticallythecuttingplaneapproximationforthewholeobjectivefunctionismoreaccuratethantheoneusedin( Fudulietal. , 2004 ).In( Bergeronetal. , 2012 ),theauthorsdirectlyap-plied( Fudulietal. , 2004 )toMIL.However,thereareseveralmajordi erencesbetweenthesetwopa-pers.1.( Bergeronetal. , 2012 )onlyfocusesonthetraditionalMIL,andcannotbeusedtosolveMILEAGE.2.Bydirectlyemploying( Fudulietal. , 2004 ),( Bergeronetal. , 2012 )doesnottreatthecon-vexandnon-convexpartsseparatelyeitherandthere-foreits rstorderapproximationislessaccuratethantheoneusedinthispaper. MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding Input:1.Theobjectivefunction:\n(w)+Remp(w).2.Parameters:descentcoecient:m=0:1,initialproximitycontrolparameter\r(1)=1,deviationparameters1=0:01,2=0:1and=0:01,decaycoecient=0:9,gradientprecision=0:01.Output:w. 1.Initializew(1),t=1repeat:2.t=t+1.3.Get(w(t);(t))bysolvingthedualofproblem( 9 ).4.IfF(w(t))F(w(t1))+m((t)+\n(w(t))\n(w(t1)))andkw(t)w(t1)k5.\r(t)=\r(t1)6.else7.\r(t)=\r(t1)8.If\r(t)1,thenexit.9.IfF(w(t))�F(w(t1))+m((t)+\n(w(t))\n(w(t1))),then,w(t)=w(t1).10.endif11.I+=,I=12.forj=1...t13.Evaluate (t)jaccordingtoEq.( 7 ),if (t)j0,then,I+=I+Sj;if j0,then,I=ISj;14.endfor15.ComputeoaccordingtoEq.( 10 ).Ifkok=F(w(t)),thenexit.untilalgorithmterminates16.w=w(t). Table1.Theproposedbundlemethodfornon-convexnon-smoothoptimizationIn( Do&Artieres , 2009 ),thenon-convexformulationforhiddenmarkovmodelsisalsosolvedbyadaptingthebundlemethodtothenon-convexcase,andtreat-ingtheconvexandnon-convexpartsseparately.Theadaptedmethodisreasonablebytuningthecuttingplaneateachiterationaccordingtothecomparisonwiththeprevious\optimal"cuttingplane.However,evenwiththistuning,theobtainedcuttingplaneisstillnotabletoserveasthelowerboundoftheobjec-tivefunction.Onthecontrary,theproposedmethoddoesnotfocusonlookingforthelowerbound,butsomeimportantlocalpropertiesaroundeachpoint.Furthermore,basedontheproposedbundlemethod,someimportantpropertiesareanalyzedinTheorem1andTheorem2.Theorem1:SupposeD=maxt\n(w(t))andR=maxjkgjk,then\r20 2R2P(w(t);\r(t))\r0D.Insolvingproblem( 5 ),\r20C2 2maxfmaxi;jkBijk2;maxikBik2gP(w(t);\r(t))\r0D.Proof:PleaserefertoSupplementalMaterials.Theorem2:Thebundlemethodterminatesafteratmostlog1 \r0=log()+2E\r0 m2steps,givenminRemp(w)+\n(w)isupperboundedbyE.Insolvingproblem( 5 ),thealgorithmterminatesafteratmostlog1 \r0=log()+2nC\r0 m2steps.Proof:PleaserefertoSupplementalMaterials.Supposetheclassofclassi ersatis eskwkBandareobtainedfromiterativeupdates.Sincetheproposedmethodcanbeeasilyextendedtothekernelcase,FBisde nedas:ffjf:(Bi;Bi)!imaxjwT(Bij)+(1i)wT(Bi);kwkBg,whereisanonlinearmapwithkernelfunctionK(;).Thegeneralizederrorboundcanbederivedbythefol-lowingtheorems:Theorem3:TheempiricalRademachercom-plexityofthefunctionalspaceFBonD=f(Bi;Bi;Yi);i=1;:::;ngisupperboundedby:2B nmax'ij0;'Ti1=1q Pni=1Pnij=12i'2ijK(Bij;Bij)+2B np Pni=1(1i)2K(Bi;Bi).Proof:PleaserefertoSupplementalMaterials.Theorem4:Fix2(0;1).Then,withprob-abilityatleast1,everyf2FBsatis es:P(y=sign(f(Bi;Bi)))1 nPni=1maxf0;1Yi(imaxjwTBij+(1i)wTBi)g+2B nmax'ij0;'Ti1=1q Pni=1Pnij=12i'2ijK(Bij;Bij)+2B np Pni=1(1i)2K(Bi;Bi)+3q ln(2=) 2n.Proof:ItcanbeprovedbyapplyingTheorem3toTheorem4.9in( Shawe-Taylor&Cristianini , 2004 ).FromTheorem3andTheorem4,itcanbeseenthatthederivedRademachercomplexityandgeneralizederrorboundarerelatedtoboththelocalandglobalfeaturerepresentations.Theorem5statesthecasewhentheRademacherComplexitycanbeimproved,comparedwithbothlocalandglobalfeaturerepresen-tations.Theorem5:SupposeaimaxfC2 C1a;1C2 C1(1a)g;i=1;:::;n,a2[0;1],whereC1=2B nmax'ij0;'Ti1=1q Pni=1Pnij=1'2ijK(Bij;Bij)andC2=2B np Pni=1K(Bi;Bi),then,2B nmax'ij0;'Ti1=1q Pni=1Pnij=12i'2ijK(Bij;Bij)+