DanZhang1danzhang2008gmailcomJingruiHe2jingruihegmailcomLuoSi3lsicspurdueeduRichardDLawrence4ricklawrusibmcom1FacebookIncorporationMenloParkCA940252ComputerScienceDepartmentStevensInsti ID: 298005
Download Pdf The PPT/PDF document "MILEAGE:MultipleInstanceLEArningwithGlob..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding DanZhang1danzhang2008@gmail.comJingruiHe2jingrui.he@gmail.comLuoSi3lsi@cs.purdue.eduRichardD.Lawrence4ricklawr@us.ibm.com1FacebookIncorporation,MenloPark,CA940252ComputerScienceDepartment,StevensInstituteofTechnology,Hoboken,NJ070303ComputerScienceDepartment,PurdueUniversity,WestLafayette,IN479074IBMT.J.WatsonResearchCenter,YorktownHeights,NY10562AbstractMultipleInstanceLearning(MIL)generallyrepresentseachexampleasacollectionofin-stancessuchthatthefeaturesforlocalob-jectscanbebettercaptured,whereastra-ditionalmethodstypicallyextractaglobalfeaturevectorforeachexampleasanin-tegralpart.However,thereislimitedre-searchworkoninvestigatingwhichofthetwolearningscenariosperformsbetter.Thispaperproposesanovelframework{Multi-pleInstanceLEArningwithGlobalEmbed-ding(MILEAGE),inwhichtheglobalfeaturevectorsfortraditionallearningmethodsareintegratedintotheMILsetting.Withintheproposedframework,alargemarginmethodisformulatedtoadaptivelytunetheweightsonthetwodierentkindsoffeaturerepresen-tations(i.e.,globalandmultipleinstance)foreachexampleandtrainstheclassiersimul-taneously.Anextensivesetofexperimentsareconductedtodemonstratetheadvantagesoftheproposedmethod.1.IntroductionTraditionallearningmethodsusuallyconsidereachexampleasonenon-separableentity,andrepresentthewholecontentoftheexamplebyonefeaturevector.However,thesemanticmeaningsofanex-amplesometimesvaryamongitsconstituentparts.MultipleInstanceLearning(MIL)hasbeenproposedtodealwithproblemswhoseoutputinformationis Proceedingsofthe30thInternationalConferenceonMa-chineLearning,Atlanta,Georgia,USA,2013.JMLR:W&CPvolume28.Copyright2013bytheauthor(s).onlyknownforbagsofitems/instances,asopposedtoforeachexample.Moreprecisely,inaMILset-ting,eachexample/bagisdividedintoseveraldier-entparts/instances.Thelabelsareassignedtobags,ratherthanindividualinstances.Abagislabeledaspositiveifitcontainsmorethanonepositivein-stance;otherwiseitislabeledasnegative.Inthispaper,foreachexample,thefeaturevectorextractedbyusingthesamewayaswedofortraditionalnon-MILmethods(i.e.,treatingeachexampleasanin-tegralentity)isreferredtoastheglobalrepresen-tationofthisexample,whileitslocalrepresenta-tionisasetofinstancesextractedforeachpartofthisexample,asinMIL.Tosomeextent,theglobalrepresentationforeachexamplecanalsobeconsid-eredasitsbaglevelfeatures.NumerousmethodshavebeendevelopedforMILclassication( Andrewsetal. , 2003 ; Dietterichetal. , 1998 ; Kim&laTorre , 2010 )anditsvariants,suchasoutlierdetection( Wuetal. , 2010 ),onlinelearning( Babenkoetal. , 2011 ),rank-ing( Huetal. , 2008 ),etc.Thesemethodshavebeenwidelyemployedinareassuchastextmining( Andrewsetal. , 2003 )andlocalizedcontentbasedim-ageretrieval(LCBIR)( Rahmani&Goldman , 2006 ).MostpreviousMILmethodsfocusedonimprovingclassicationperformanceunderlocalrepresentation.However,fewoftheminvestigatedwhetherthelocalrepresentationisalwaysbetterthantheglobalone.Thisproblemhasposedabigchallengeforresearcherstodecidewhatkindofalgorithmsshouldbeusedwhenfacingrealworldapplications.In( Ray&Craven , 2005 ),theauthorscomparedtheperformancesoftra-ditionalandMILmethods.However,theirworkisstillbasedonthelocalrepresentation,andadaptsthetra-ditionallearningmethodstothelocalrepresentation.Althoughrarelystudied,itisintuitivethatthetrue MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding positiveratesinpositivebagscouldaecttheper-formancesoflocalandglobalrepresentationssignif-icantly.Thisisbecauseifthetruepositiverateinapositivebagislow,thenitsglobalrepresentationwillbedominatedbytheirrelevantpartsofthisex-ample,whilemethodsbasedonlocalrepresentationcouldpickthetruepositiveinstancesfortraining.Onthecontrary,ifanexamplehasfewirrelevantparts,thentheglobalrepresentationtendstobemoreinfor-mativethanthelocalone,sincemethodsbasedonlo-calrepresentationsnormallyfocusonsomelocalpartsofeachexample.ThisintuitioncanalsobeveriedempiricallybytheexperimentsconductedinSection 4.1 .Whenincorporatingthisintuitionintorealappli-cations,themajorchallengeishowtolearnforeachtrainingexample,whetherlocalrepresentationisbet-terorglobalonetendstoprevail.Tosolvethischallenge,anovelresearchframework{MultipleInstanceLEArningwithGlobalEmbed-ding(MILEAGE)isproposed.MILEAGEleveragesthebenetsfrombothlocalandglobalrepresentationssuchthatingeneralitcanachieveabetterperfor-mancethanbothMILandtraditionallearningmeth-ods.Fromanotherperspective,localandglobalfea-turerepresentationscanbetreatedastwoinformationsources,andeachofthemcarriessomeauxiliaryinfor-mationtoimproveclassicationperformance,whichissimilartothebasicmotivationofmulti-viewlearn-ingmethods( Joachimsetal. , 2001 ).Tosolvethepro-posedframework,anovelmethodisdesignedbyadap-tivelytuningtheimportanceofthetwodierentrep-resentations.Itisbasedontheintuitionthatlocalrepresentationtendstoperformbetterwhenthepos-itiveratioissmall.Aniterativemethodisemployedtosolvethederivedoptimizationproblem.Toacceler-atetheoptimizationspeed,inspiredby( Fudulietal. , 2004 ),weadaptthebundlemethodtosolvetheresult-ingnon-convexnon-smoothproblembyexplicitlycon-sideringtheconvexregularizationandthenon-convexlossterms.Somediscussionsandtheoreticalanalysishavebeenprovidedonimportantpropertiessuchasconvergencerateandgeneralizederrorrateofthepro-posedmethod.Experimentsonimage,textdatasetsandanovelapplication{InsiderThreatDetection,demonstratetheadvantagesoftheproposedmethod.2.Methodology2.1.ProblemStatementandNotationSupposeasetofexamples:D=f(Bi;Bi;Yi);i=1;:::;ngaregiven,whereBi2Rd1denotestheglobalrepresentationforthei-thexampleandYi2f1; 1gisitsbinarylabel.Alongwiththeglobalfea-turerepresentation,foreachexample,itslocalfea-turerepresentations,i.e.,instancesfordierentpartsofthisexample,arealsoavailable(ThenotionsofglobalandlocalrepresentationsaredenedinSec-tion1).Theinstancesinthei-thbagaredenotedas:Bi=fBi1;Bi2;:::;Binig2Rdni 1 ,andniisthenumberofinstancesinthei-thbag.Throughoutthepaper,subscriptmeansj=1;:::;ni.GivenanunlabeledexampleBuanditsassociatedlocalrepre-sentations,i.e.,Bu,theobjectiveofMultipleInstanceLEArningwithGlobalEmbedding(MILEAGE)istodesignafunctionf:(Bu;Bu)!R,suchthattheclassicationonthisunlabeledexampleisaccurate.Iff(Bu;Bu)0,thisexampleisclassiedaspositiveandotherwisenegative.2.2.MethodForeachbag,aweightvariableisintroducedtobal-ancetheimportanceofthetworepresentations.Theweightisdecidedbyboththepriorknowledgefromthepositiveratioforeachbagandthetnessofthedata.Withoutlossofgenerality,givenaspecicexampleBianditsassociatedinstancesBi,theclassiertakesthefollowingform:f(Bi;Bi)=imaxjwTBij+(1 i)wTBi;(1)where1i0istheconvexcombinationcoef-cientforthei-thexample,w2Rd1isthelinearclassierandweassumethatthebiashasalreadybeenabsorbedintofeaturevectors.maxjwTBijistheout-putfromthelocalrepresentationofthei-thexample 2 ,whereaswTBiistheoutputfromitsglobalrepresen-tation.f(Bi;Bi)balancesthesetwooutputsthroughtheweighti.FromaBayesianperspective,givenadataset,thelogarithmoftheposteriordistributionforwandcanbewrittenasfollows:logP(w;jD)/logP(Djw;)P(w)nYi=1P(i);(2)where=[1;:::;n].Here,weassumethattheexamplesarei.i.d.generated.P(w)followstheGaus-siandistributionN(0;I).P(i)followstheBetadis-tributionwithbeta(\re ri;\re (1 ri)),whereand\rarethehyper-parametersandpartiallycontrolthemeanandskewnessofthedistribution.ri2[0;1]isthepriorknowledgeonthepositiveratioforthei-thbag,andcanbeobtainedthroughvariousways.Forexample,ricanbesimplysetto0:5ifnopriorknowledgeisavailable.Inpractice,apreliminaryclas-siercanbetrainedbeforehandbyusingSVMon 1Weassumethatthelocalandglobalrepresentationssharethesamefeaturespace.Buttheproposedformula-tioncanbeextendedtothecasewhentheirfeaturespacesaredierent.2TheoutputofeachexampleinMILisnormallydecidedbytheinstancethatappearstobemostpositiveunderaclassierw( Andrewsetal. , 2003 ) MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding f(Bi;Yi);i=1;:::;ng.Then,ricanbeestimatedbyapplyingthisclassierontheinstancesineachbag.ItisclearthatE(i)=e ri=(e ri+e (1 ri)).Givenwand,theprobabilityofgeneratingadatasetDcanbedescribedbythehingelossas:P(Djw;)/Qni=1e Cmaxf0;1 Yi(imaxjwTBij+(1 i)wTBi)g,whereCisaparameter.Then,maximizingEq.( 2 )isequivalenttosolvingthefollowingproblem:minw;;i01 2kwk2+CnXi=1i nXi=1 (\re ri 1)logi+(\re (1 ri) 1)log(1 i)(3)s:t:8i2f1;:::;ng;Yi(imaxjwTBij+(1 i)wTBi)1 i:Thisformulationisnon-convexandcannotbesolveddirectly.Aniterativemethodisemployedtosolvethisproblem.Inparticular,forthek-thiteration,givenw(k 1),1;:::;ncanbeupdatedby:minCnXi=1maxf0;1 Yi(iw(k 1)TBij(k 1)i+(1 i)w(k 1)TBi)g nXi=1((\re ri 1)logi+(\re (1 ri) 1)log(1 i))(4)wherej(k 1)i=argmaxjw(k 1)TBij.Theconvexityofthisobjectivefunctioncannotbedetermined,sincethesignsof(\re ri 1)and(\re (1 ri) 1)arenotclear.Butsomemethods,suchastheadaptedsub-gradientmethod,canstillbeusedtonditsoptimalorlocaloptimalsolutioneciently.Givenfromthepreviousstep,w(k)canbeoptimizedby:minw1 2kwk2+(5)CnXi=1maxf0;1 Yi(imaxjwTBij+(1 i)wTBi)gItisstillanon-convexnon-smoothoptimizationprob-lem.Buttheformismuchlesscomplicatedthanthatofproblem( 3 ).Itcanbesolvedthroughvar-iousways,suchasconstrainedconcave-convexpro-cedure(CCCP)( Yuille&Rangarajan , 2003 ).How-ever,thecomputationalcostforsolvingthisproblemisnon-trivial.Inseveralrecentworks,thebundlemethodhasshownitssuperiorperformanceinbothef-ciencyandeectivenessoverstate-of-the-artmethods( Joachims , 2006 ; Joachimsetal. , 2009 ; Smolaetal. , 2007 ; Teoetal. , 2010 ).However,onemajordrawbackforthismethodisthatitcanonlybeemployedtosolveconvexoptimizationproblems.In( Fudulietal. , 2004 ; Hare&Sagastizabal , 2010 ; Noll , 2012 ),severalheuris-ticsareemployedtohandlethisissueforthebun-dlemethod.Inthispaper,inspiredby( Fudulietal. , 2004 ),weadaptthebundlemethodtosolvethispro-posedoptimizationprobleminthenextsection.Basedontheseupdatingschemes,problem( 4 )andproblem( 5 )willbeconductediterativelyuntilconvergence.Itisclearthattheproposedformulationisinduc-tiveontheclassierbuttransductiveoni.So,ifweonlyneedtopredicttheunlabeledinstancesintheunlabeledset,thenwecandirectlyapplythelearnedclassier.Ifthepredictionismadeonthebaglevel,onanunlabeledexample(Bu;Bu);j=1;:::;nu.Itshiddenvariableucanbeestimatedas:u=E(ujBu;Bu)=e ru=(e ru+e (1 ru)),whereruisthepositiveinstanceratiowithinthisbagestimatedfromthelearnedclassierw.Then,f(Bu;Bu)=umaxjwTBuj+(1 u)wTBu.Iff(Bu;Bu)0,theexampleislabeledaspositiveandotherwiseitislabeledasnegative.2.3.BundleMethodforNon-ConvexNon-SmoothOptimizationThetraditionalbundlemethodlooksforasetofcut-tingplanesthatcouldserveaslowerboundsoftheoriginalconvexobjectivefunction.Fornon-convexoptimizationproblems,however,thesecuttingplanescouldnolongerserveaslowerboundsoftheobjectivefunctions,asshowninFig. 1 .Someresearchworkscon-sidershiftingofanepiecesdownwards( Noll , 2012 ; Schramm&Zowe , 1992 ).However,theamountoftheshiftingappearsarbitrary( Fudulietal. , 2004 ).Inthissection,thebundlemethod,whichisbasedonrstorderapproximation,isadaptedtosolveproblem( 5 ).Inparticular,theintendedobjectivefunctioncanbecastedasthefollowingframework:minwF(w)=\n(w)+Remp(w);(6)where\n(w)isanon-negativeconvexdierentiablereg-ularizer,andRemp(w)isanon-convexnon-smoothlossfunction.Inproblem( 5 ),\n(w)=1 2kwk2andRemp(w)=Cmaxf0;1 Yi(imaxjwTBij+(1 i)wTBi)g.Thismethodhandlesthisnon-convexnon-smoothprobleminaniterativewayandexhibitsakindofbothconvexandnonconvexbehaviorrelativetothecurrentpointintheiterativeprocedure.Moreprecisely,forthet-thiterationofbundlemethod,itmaintainstwosetsofcuttingplanes,i.e.,I+,fjj(t)j0g;I ,fjj(t)j0g,wherej=1;:::;t 1and(t)j,Remp(w(t 1)) Remp(w(j)) gTj(w(t 1) w(j)):(7)Here,gj2@wRemp(w(j)) 3 .Then,thefollowingtwo 3Forsimplication,weabusedthesuperscript.Pleasenotethatinthissection,thesuperscripttdenotesthet-thiterationinthebundlemethod. MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding Figure1.ApproximationofRemp(w)atw4.Thecuttingplanesfromotherpointseitheroverorunderestimatethevalueatandinthevicinityofw4,andthesignofiwillnotchangeinthevicinityofw4(10,2-3.6;┠0,30).Basedonthislocalitycharacteristic,weadaptedthebundlemethodinSection 2.3 .setsofanefunctionsaredenedas:+(w),maxj2I+gTj(w w(t 1)) (t)j; (w),minj2I gTj(w w(t 1)) (t)j:(8)Itisclearthat+(w)isanapproximationofRemp(w) Remp(w(t 1)),while (w)isitslocallypessimisticestimation.Theseapproximationsareonlylocallyvalidaroundthelocalminimalpoint.Here,themeaningsof(t)jandthelocalitypropertycanbeshowninFig. 1 .Therefore,duringeachitera-tion,thenewoptimalpointshouldtradeominimizing+(w)andproximitykw w(t 1)kwiththeconstraint+(w) (w)asfollows:minw;P(w;\r(t))=\r(t)(+\n(w))+1 2kw w(t 1)k2(9)s:t:gTj(w w(t 1)) (t)j;j2I+;gTj(w w(t 1)) (t)j;j2I ;where\r(t)isthenon-negativeproximitycontrolpa-rameterforthet-thiterationthatbalancestheobjec-tivefunctionvalueandtheproximityoftheupdatedpoint.Thisproblemcanbesolvedecientlythroughitsdualform,sincebothofthesetsI+andI aresmall.Supposew(t)=argminwP(w;\r(t)).Ifnotcomputationallyexpensive,alinesearchcanbeper-formedbetweenw(t)andw(t 1)onF(w)suchthatabettersolutioncanbefound.IftheoptimalsolutioncanresultinadrasticdecreaseintheobjectivefunctionF(w),itiscalledaseriousstepandtheoptimalsolutionforwwillbeupdated.Otherwise,itisconsideredasanullstep,theoptimalsolutionforthepreviousstepiskept,andtheprox-imityparameterwillshrinkforabettersolution.Ifkw(t) w(t 1)kislessthanapredenedthreshold,theproximityparameterwillalsoshrinktodoamorethoroughsearchwithinthatregion.Theclassicbundlemethodusuallycheckswhetherthedierencebetweentheobjectivefunctionvalueandthecuttingplanefunctionvalueislessthanathresh-old.Ifso,theiterationterminates.Here,thisstrat-egycannotbeusedbecausethecuttingplanesofthenon-convexfunctioncannotbeconsideredasthelowerboundsfortheoriginalobjectivefunctionanymore.Intheproposedmethod,duringeachiteration,twostoppingcriteriawillbechecked.Therststoppingcriteriaistocheckwhether\r(t)issmallerthanaspec-iedthreshold1.Thisisbecausealthoughwehopethatthenewupdatedpointshouldfallwithinasmallregionofw(t 1),if\r(t)becomestoosmall,w(t)isun-likelytodeviatetoomuchfromw(t 1),andthere-sultswillnotbemeaningful.Anextremeexampleisif\r(t)=0,thenw(t)=w(t 1).Thesecondstoppingcriteriaistocheckwhether02@F(w(t)),i.e.,whetherw(t)canbeconsideredasastationarypointforF(w).Inpractice,wecheckwhetherkok=F(w(t)),whereo=mino2convfgjjj2J+gko+@\n(w(t))kandJ+=fi2I+j(t)i2g.Inparticular,o=G+@\n(w(t));(10)whereGisamatrixwithitscolumnsbe-ingthesubgradientsgjfromJ+andcanbeoptimizedbysolving=argminTGTG+2(@\n(w(t)))TGs:t:T1=1;0.3.DiscussionsandTheoreticalAnalysisTheproposedbundlemethodissummarizedinTable 1 .Itisclearthatthemajoradvantageoftheproposedmethodover( Fudulietal. , 2004 )isthattheproposedmethodbetterexploitsthestructureoftheobjectivefunctionbytreatingtheconvexandnon-convexpartsseparately.Itthereforeeliminatestheunnecessaryrstorderapproximationfortheconvexpart.Inthisway,theoreticallythecuttingplaneapproximationforthewholeobjectivefunctionismoreaccuratethantheoneusedin( Fudulietal. , 2004 ).In( Bergeronetal. , 2012 ),theauthorsdirectlyap-plied( Fudulietal. , 2004 )toMIL.However,thereareseveralmajordierencesbetweenthesetwopa-pers.1.( Bergeronetal. , 2012 )onlyfocusesonthetraditionalMIL,andcannotbeusedtosolveMILEAGE.2.Bydirectlyemploying( Fudulietal. , 2004 ),( Bergeronetal. , 2012 )doesnottreatthecon-vexandnon-convexpartsseparatelyeitherandthere-foreitsrstorderapproximationislessaccuratethantheoneusedinthispaper. MILEAGE:MultipleInstanceLEArningwithGlobalEmbedding Input:1.Theobjectivefunction:\n(w)+Remp(w).2.Parameters:descentcoecient:m=0:1,initialproximitycontrolparameter\r(1)=1,deviationparameters1=0:01,2=0:1and=0:01,decaycoecient=0:9,gradientprecision=0:01.Output:w. 1.Initializew(1),t=1repeat:2.t=t+1.3.Get(w(t);(t))bysolvingthedualofproblem( 9 ).4.IfF(w(t))F(w(t 1))+m((t)+\n(w(t)) \n(w(t 1)))andkw(t) w(t 1)k5.\r(t)=\r(t 1)6.else7.\r(t)=\r(t 1)8.If\r(t)1,thenexit.9.IfF(w(t))F(w(t 1))+m((t)+\n(w(t)) \n(w(t 1))),then,w(t)=w(t 1).10.endif11.I+=,I =12.forj=1...t13.Evaluate(t)jaccordingtoEq.( 7 ),if(t)j0,then,I+=I+Sj;ifj0,then,I =I Sj;14.endfor15.ComputeoaccordingtoEq.( 10 ).Ifkok=F(w(t)),thenexit.untilalgorithmterminates16.w=w(t). Table1.Theproposedbundlemethodfornon-convexnon-smoothoptimizationIn( Do&Artieres , 2009 ),thenon-convexformulationforhiddenmarkovmodelsisalsosolvedbyadaptingthebundlemethodtothenon-convexcase,andtreat-ingtheconvexandnon-convexpartsseparately.Theadaptedmethodisreasonablebytuningthecuttingplaneateachiterationaccordingtothecomparisonwiththeprevious\optimal"cuttingplane.However,evenwiththistuning,theobtainedcuttingplaneisstillnotabletoserveasthelowerboundoftheobjec-tivefunction.Onthecontrary,theproposedmethoddoesnotfocusonlookingforthelowerbound,butsomeimportantlocalpropertiesaroundeachpoint.Furthermore,basedontheproposedbundlemethod,someimportantpropertiesareanalyzedinTheorem1andTheorem2.Theorem1:SupposeD=maxt\n(w(t))andR=maxjkgjk,then \r20 2R2P(w(t);\r(t))\r0D.Insolvingproblem( 5 ), \r20C2 2maxfmaxi;jkBijk2;maxikBik2gP(w(t);\r(t))\r0D.Proof:PleaserefertoSupplementalMaterials.Theorem2:Thebundlemethodterminatesafteratmostlog1 \r0=log()+2E\r0 m2steps,givenminRemp(w)+\n(w)isupperboundedbyE.Insolvingproblem( 5 ),thealgorithmterminatesafteratmostlog1 \r0=log()+2nC\r0 m2steps.Proof:PleaserefertoSupplementalMaterials.SupposetheclassofclassiersatiseskwkBandareobtainedfromiterativeupdates.Sincetheproposedmethodcanbeeasilyextendedtothekernelcase,FBisdenedas:ffjf:(Bi;Bi)!imaxjwT(Bij)+(1 i)wT(Bi);kwkBg,whereisanonlinearmapwithkernelfunctionK(;).Thegeneralizederrorboundcanbederivedbythefol-lowingtheorems:Theorem3:TheempiricalRademachercom-plexityofthefunctionalspaceFBonD=f(Bi;Bi;Yi);i=1;:::;ngisupperboundedby:2B nmax'ij0;'Ti1=1q Pni=1Pnij=12i'2ijK(Bij;Bij)+2B np Pni=1(1 i)2K(Bi;Bi).Proof:PleaserefertoSupplementalMaterials.Theorem4:Fix2(0;1).Then,withprob-abilityatleast1 ,everyf2FBsatises:P(y=sign(f(Bi;Bi)))1 nPni=1maxf0;1 Yi(imaxjwTBij+(1 i)wTBi)g+2B nmax'ij0;'Ti1=1q Pni=1Pnij=12i'2ijK(Bij;Bij)+2B np Pni=1(1 i)2K(Bi;Bi)+3q ln(2=) 2n.Proof:ItcanbeprovedbyapplyingTheorem3toTheorem4.9in( Shawe-Taylor&Cristianini , 2004 ).FromTheorem3andTheorem4,itcanbeseenthatthederivedRademachercomplexityandgeneralizederrorboundarerelatedtoboththelocalandglobalfeaturerepresentations.Theorem5statesthecasewhentheRademacherComplexitycanbeimproved,comparedwithbothlocalandglobalfeaturerepresen-tations.Theorem5:SupposeaimaxfC2 C1a;1 C2 C1(1 a)g;i=1;:::;n,a2[0;1],whereC1=2B nmax'ij0;'Ti1=1q Pni=1Pnij=1'2ijK(Bij;Bij)andC2=2B np Pni=1K(Bi;Bi),then,2B nmax'ij0;'Ti1=1q Pni=1Pnij=12i'2ijK(Bij;Bij)+