/
LearningFromWeaklySupervisedDatabyTheExpectationLossSVM(e-SVM)algorith LearningFromWeaklySupervisedDatabyTheExpectationLossSVM(e-SVM)algorith

LearningFromWeaklySupervisedDatabyTheExpectationLossSVM(e-SVM)algorith - PDF document

lindy-dunigan
lindy-dunigan . @lindy-dunigan
Follow
436 views
Uploaded On 2015-12-05

LearningFromWeaklySupervisedDatabyTheExpectationLossSVM(e-SVM)algorith - PPT Presentation

JunZhuDepartmentofStatisticsUniversityofCaliforniaLosAngelesjzhuclaeduJunhuaMaoDepartmentofStatisticsUniversityofCaliforniaLosAngelesmjhustcuclaeduAlanYuilleDepartmentofStatisticsUniversityofCal ID: 215083

JunZhuDepartmentofStatisticsUniversityofCalifornia LosAngelesjzh@ucla.eduJunhuaMaoDepartmentofStatisticsUniversityofCalifornia LosAngelesmjhustc@ucla.eduAlanYuilleDepartmentofStatisticsUniversityofCal

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "LearningFromWeaklySupervisedDatabyTheExp..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

LearningFromWeaklySupervisedDatabyTheExpectationLossSVM(e-SVM)algorithm JunZhuDepartmentofStatisticsUniversityofCalifornia,LosAngelesjzh@ucla.eduJunhuaMaoDepartmentofStatisticsUniversityofCalifornia,LosAngelesmjhustc@ucla.eduAlanYuilleDepartmentofStatisticsUniversityofCalifornia,LosAngelesyuille@stat.ucla.eduAbstractInmanysituationswehavesomemeasurementofcondenceon“positiveness”forabinarylabel.The“positiveness”isacontinuousvaluewhoserangeisaboundedinterval.Itquantiestheafliationofeachtrainingdatatothepositiveclass.WeproposeanovellearningalgorithmcalledexpectationlossSVM(e-SVM)thatisdevotedtotheproblemswhereonlythe“positiveness”insteadofabinarylabelofeachtrainingsampleisavailable.Oure-SVMalgorithmcanalsobereadilyextendedtolearnsegmentclassiersunderweaksupervisionwheretheexactpositivenessvalueofeachtrainingexampleisunobserved.Inexperiments,weshowthatthee-SVMalgorithmcaneffectivelyaddressthesegmentproposalclassicationtaskunderbothstrongsupervision(e.g.thepixel-levelannotationsareavailable)andtheweaksupervision(e.g.onlybounding-boxannotationsareavailable),andoutperformsthealternativeapproaches.Besides,wefurthervali-datethismethodontwomajortasksofcomputervision:semanticsegmentationandobjectdetection.Ourmethodachievesthestate-of-the-artobjectdetectionperformanceonPASCALVOC2007dataset.1IntroductionRecentworkincomputervisionreliesheavilyonmanuallylabeleddatasetstoachievesatisfactoryperformance.However,thedetailedhand-labellingofdatasetsisexpensiveandimpracticalforlargedatasetssuchasImageNet[6].Itisbettertohavelearningalgorithmsthatcanworkwithdatathathasonlybeenweaklylabelled,forexamplebyputtingaboundingboxaroundanobjectinsteadofsegmentingitorparsingitintoparts.InthispaperwepresentalearningalgorithmcalledexpectationlossSVM(e-SVM).Itrequiresamethodthatcangenerateasetofproposalsforthetruelabel(e.g.,theexactsilhouetteoftheobject).Butthissetofproposalsmaybeverylarge,eachproposalmaybeonlypartiallycorrect(thecorrectnesscanbequantiedbyacontinuesvaluebetween0and1called”positiveness”),andseveralproposalsmayberequiredtoobtainthecorrectlabel.Inthetrainingstage,ouralgorithmcandealwiththestrongsupervisedcasewherethepositivenessoftheproposalsareobserved,andcaneasilyextendtotheweaklysupervisedcasebytreatingthepositivenessaslatentvariables.Inthetestingstage,itwillpredictthelabelforeachproposalandprovideacondencescore.Therearesomealternativeapproachesforthisproblem,suchasSupportVectorClassication(SVC)andSupportVectorRegression(SVR).FortheSVCalgorithm,becausethisisnotastandardbinary1 Figure1:Theillustrationofouralgorithm.Inthetrainingprocess,thee-SVMmodelcanhandletwotypesofannotations:pixellevel(strongsupervision)andboundingbox(weaksupervision)annotations.Forpixellevelannotations,wesetthepositivenessoftheproposalasIoUoverlapratioswiththegroundtruthandtrainclassiersusingbasice-SVM.Forboundingboxannotations,wetreatthepositivenessaslatentvariablesanduselatente-SVMtotrainclassiers.Inthetestingprocess,thee-SVMwillprovideeachsegmentproposalaclasslabelandacondencescore.(Bestviewedincolor)classicationproblem,onemightneedtobinarizethepositivenessusingad-hocheuristicstodeter-mineathreshold,whichdegradesitsperformance[18].Toaddressthisproblem,previousworksusuallyusedSVR[4,18]totraintheclasscondencepredictionmodelsinsegmenticsegmenta-tion.However,itisalsonotastandardregressionproblemsincethevalueofpositivenessbelongstoaboundedinterval[0;1].Wecompareoure-SVMtothesetworelatedmethodsintheseg-mentproposalcondencepredictionproblem.ThepositivenessofeachsegmentproposalissetastheIntersectionoverUnion(IoU)overlapratiobetweentheproposalandthepixellevelinstancegroundtruth.Wetestouralgorithmundertwotypesofscenarioswithdifferentannotations:thepixellevelannotations(positivenessisobserved)andtheboundingboxannotations(positivenessisunobserved).ExperimentsshowthatourmodeloutperformsSVCandSVRinbothscenarios.Figure1illustratestheframeworkofouralgorithm.Wefurthervalidateourapproachontwofundamentalcomputervisiontasks:(i)semanticsegmenta-tion,and(ii)objectdetection.Firstly,weconsidersemanticsegmentation.Therehasrecentlybeenimpressiveprogressatthistaskusingrichappearancecues.Segmentsareextractedfromimages[1,3,4,12],appearancecuesarecomputedforeachsegment[5,21,25],andclassiersaretrainedusinggroundtruthpixellabeling[18].MethodsofthistypearealmostalwaysamongthewinnersofthePASCALVOCsegmentationchallenge[5].Butallthesemethodsrelyondatasetswhichhavebeenhand-labelledatthepixellevel.ForthisapplicationwegeneratethesegmentproposalsusingCPMCsegments[4].ThepositivenessofeachproposalissetastheIntersectionoverUnion(IoU)overlapratio.Weshowthatappearancecueslearntbye-SVM,usingeithertheboundingboxannotationsorpixellevelannotations,aremoreeffectivethanthoselearntwithSVCandSVRonPASCALVOC2011[9]segmentationdataset.Ouralgorithmisalsoexibleenoughtoutilizeadditionalboundingboxannotationstofurtherimprovetheresults.Secondly,weaddressobjectdetectionbyexploitingtheeffectivenessofsegmentationcuesandcou-plingthemtoexistingobjectdetectionmethods.Forthisapplication,thedataisonlyweaklylabeledbecausethegroundtruthforobjectdetectionistypicallyspeciedbyboundingboxes(e.g.PASCALVOC[8,9]andImagenet[6]),whichmeansthatpixellevelgroundtruthisnotavailable.WeuseeitherCPMCorsuper-pixelsasmethodsforproducingsegmentproposals.IoUisagainusedtorep-resentthepositivenessoftheproposals.WetestourapproachonthePASCALdatasetusing,asourbasedetector,theRegionswithCNNfeatures(RCNN)[14](currentlystateoftheartonPASCALandoutperformspreviousworksbyalargemargin).Thismethodrstusedselectivesearchmethod[24]toextractcandidateboundingboxes.Foreachcandidateboundingbox,itextractsfeaturesbydeepnetworks[16]learnedonImagenetdatasetandne-tunedonPASCAL.Wecoupleourappear-ancecuestothissystembysimpleconcatenatingourspatialcondencemapfeaturesbasedonthetrainede-SVMclassiersandthedeeplearningfeatures,andthentrainalinearSVM.Weshowthatthissimpleapproachyieldsanaverageimprovementof1.5percentonper-classaverageprecision(AP).Wenotethatourapproachisgeneral.Itcanuseanysegmentproposaldetectors,anyimagefeatures,andanyclassiers.Whenappliedtoobjectdetectionitcoulduseanybasedetector,andwecouldcoupletheappearancecueswiththebasedetectorinmanydifferentways(wechoosethesimplest).2 atent latent latent Annotations Segment Proposals IoU Ratios e - SVM Train 0 . 79 0 . 02 0 ... ... Test images ... Classifiers 3 . 49 0 . 25 - 2 . 76 Confidence of : ... Inaddition,itcanhandleotherclassicationproblemswhereonlythe”positiveness”ofthesamplesinsteadofbinarylabelsareavailable.2RelatedworkonweaklysupervisedlearningandweightedSVMsWehaveintroducedsomeofthemostrelevantworkspublishedrecentlyforsemanticsegmentationorobjectdetection.Inthissection,wewillbrieyreviewrelatedworkofweaklysupervisedlearn-ingmethodsforsegmentclassication,anddiscusstheconnectiontoinstanceweightedSVMinliterature.Theproblemsettingsformostpreviousworksgenerallyassumedthattheyonlygetasetofaccom-panyingwordsofanimageorasetofimagelevellabeling,whichisdifferentfromtheproblemsettingsinthispaper.MultipleInstanceLearning(MIL)[7,2]wasadoptedtosolvetheseproblems[20,22].MILhandlescaseswhereatleastonepositiveinstanceispresentinapositivebagandonlythelabelsofasetofbagsareavailable.Vezhnevetset.al.[26]proposedaMulti-ImageModel(MIM)tosolvethisproblemandshowedthatMILin[22]isaspecialcaseofMIM.Later,[26]developedMIMtoageneralizedMIManduseditastheirsegmentationmodel.Recently,Liuet.al.[19]presentedaweakly-superviseddualclusteringapproachtohandlethistask.Ourweaklysupervisedproblemsettingisinthemiddlebetweenthesesettingsandthestrongsuper-visioncase(i.e.thefullpixellevelannotationsareavailable).Itisalsoveryimportantandusefulbecauseboundingboxannotationsoflarge-scaleimagedatasetarealreadyavailable(e.g.Imagenet[6])whilethepixellevelannotationsoflargedatasetsarestillhardtoobtain.Thisweaklysuper-visedproblemcannotbesolvedbyMIL.Wecannotassumethatatleastone”completely”positiveinstance(i.e.aCPMCsegmentproposals)ispresentinapositivebag(i.e.agroundtruthinstance)sincemostoftheproposalswillcontainbothforegroundpixelsandbackgroundpixels.Wewillshowhowoure-SVManditslatentextensionaddressthisprobleminthenextsections.Inmachinelearningliterature,theweightedSVM(WSVM)methods[23,27,?]alsouseaninstance-dependentweightonthecostofeachexample,andcanimprovetherobustnessofmodelestimation[23],alleviatetheeffectofoutliers[27],leverageprivilegedinformation[17]ordealwithunbal-ancedclassicationproblems.Thedifferencebetweenoure-SVMandWSVMsmainlyliesinthatitweightslabelsinsteadofdatapoints,whichleadstoeachexamplecontributingbothtothecostsofpositiveandnegativelabels.Althoughthelossfunctionofe-SVMmodelisdifferentfromthoseofWSVMs,itcanbeeffortlesslysolvedbyanystandardSVMsolver(e.g.,LibLinear[10])likethoseusedinWSVMs.Thisisanadvantagebecauseitdoesnotrequireaspecicsolverfortheimplementationofoure-SVM.3TheexpectationlossSVMmodelInthissection,wewillrstdescribethebasicformulationofourexpectationlossSVMmodel(e-SVM)insection3.1whenthepositivenessofeachsegmentproposalisobserved.Then,insec-tion3.2,alatente-SVMmodelisintroducedtohandletheweaksupervisionsituationwherethepositivenessofeachsegmentproposalisunobserved.3.1Thebasice-SVMmodelWearegivenasetoftrainingimagesD.Usingsomesegmentationmethod(weadoptCPMC[4]inthiswork),wecangenerateasetofforegroundsegmentproposalsfS1;S2;:::;SNgfromtheseimages.ForeachsegmentSi,weextractfeaturexi,xi2Rd.SupposethepixelwiseannotationsareavailableforallthegroundtruthinstancesinD.Foreachobjectclass,wecancalculatetheIoUratioui(ui2[0;1])betweeneachsegmentSiandthegroundtruthinstanceslabeling,andsetthepositivenessofSiasui(althoughpositivenesscanbesomefunctionsofIoUratio,forsimplicity,wejustsetitasIoUanduseuitorepresentthepos-itivenessinthefollowingparagraphs).Becausemanyforegroundsegmentsoverlappartiallywiththegroundtruthinstances(i.e.0ui1),itisnotastandardbinaryclassicationproblemfortraining.Ofcourse,wecandeneathresholdbandtreatallthesegmentswhoseuibaspositiveexamplesandthesegmentswhoseuibasnegativeexamples.Inthisway,thisproblemistrans-ferredtoaSupportVectorClassication(SVC)problem.Butitneedssomeheuristicstodeterminebanditsperformanceisonlypartiallysatisfactory[18].3 Toaddressthisissue,weproposedourexpectationlossSVMmodelasanextensionoftheclassicalSVCmodels.Inthismodel,wetreatthelabelYiofeachsegmentasanunobservedrandomvariable.Yi2f�1;+1g.Givenxi,weassumethatYifollowsaBernoullidistribution.TheprobabilityofYi=1givenxi(i.e.thesuccessprobabilityoftheBernoullidistribution)isdenotedasi.Weassumethatiisafunctionofthepositivenessui,i.e.i=g(ui).Intheexperiment,wesimplyseti=ui.SimilartothetraditionallinearSVCproblem,weadoptalinearfunctionasthepredictionfunction:F(xi)=wTxi+b.Forsimplicity,wedenote[wb]asw,[xi1]asxiandF(xi)=wTxiintheremainingpartofthepaper.Thelossfunctionofoure-SVMistheexpectationovertherandomvariablesYi:L(w)=w1 2wTw+1 NNXi=1EYi[max(0;1�YiwTxi)]=w1 2wTw+1 NNXi=1[l+iPr(Yi=+1jxi)+l�iPr(Yi=�1jxi)]=w1 2wTw+1 NNXi=1fl+ig(ui)+l�i[1�g(ui)]g(1)wherel+i=max(0;1�wTxi)andl�i=max(0;1+wTxi).Giventhepixelwisegroundtruthannotations,g(ui)isknown.FromEquation1,wecanseethatitisequivalentto”weight”eachsamplewithafunctionofitspositiveness.ThestandardlinearSVMsolverisusedtosolvethismodelwithlossfunctionofL(w).Intheexperiments,weshowthattheperformanceofoure-SVMismuchbetterthanSVCandslightlybetterthanSupportVectorRegression(SVR)inthesegmentclassicationtask.3.2Thelatente-SVMmodelOneoftheadvantageofoure-SVMmodelisthatwecaneasilyextendittothesituationwhereonlyboundingboxannotationsareavailable(thistypeoflabelingisofmostinterestinthepaper).Underthisweaklysupervisedsetting,wecannotobtaintheexactvalueofthepositiveness(IoU)uiforeachsegment.Instead,uiwillbetreatedasalatentvariablewhichwillbedeterminedbyminimizingthefollowinglossfunction:L(w;u)=w1 2wTw+1 NNXi=1fl+ig(ui)+l�i[1�g(ui)]g+RR(u)(2)whereudenotesfuigi=1;:::;N.R(u)isaregularizationtermforu.WecanseethatthelossfunctioninEquation1isaspecialcaseofthatinEquation2bysettinguasconstantandRequalto0.Whenuisxed,L(w;u)isastandardlinearSVMloss,whichisconvexwithrespecttow.Whenwisxed,L(w;u)isalsoaconvexfunctionifR(u)isaconvexfunctionwithrespecttou.TheIoUbetweenasegmentSiandgroundtruthboundingboxes,denotedasubbi,canserveasaninitializationforui.Wecaniterativelyxuandw,andsolvethetwoconvexoptimizationproblemsuntilitconverges.Thepseudo-codefortheoptimizationalgorithmisshowninAlgorithm1. Algorithm1Theoptimizationfortraininglatente-SVM Initialization:1:u(cur) ubb;Process:2:repeat3:w(new) argminwL(w;u(cur));4:u(new) argminuL(w(new);u);5:u(cur) u(new);6:untilConverge 4 Ifwedonotaddanyregularizationtermonu(i.e.setR=0),uwillbecome0or1intheoptimizationstepinline4ofalgorithm1becausethelossfunctionbecomesalinearfunctionwithrespecttouwhenwisxed.ItturnstobesimilartoalatentSVMandcanleadthealgorithmtostuckinthelocalminimalasshownintheexperiments.Theregularizationtermwillpreventthissituationundertheassumptionthatthetruevalueofushouldbearoundubb.TherearealotofdifferentdesignsoftheregularizationtermR(u).Inpractice,weusethefollowingonebasedonthecrossentropybetweentwoBernoullidistributionswithsuccessprobabilityubbianduirespectively.R(u)=�1 NNXi=1[ubbilog(ui)+(1�ubbi)log(1�ui)]=�1 NNXi=1DKL[Bern(ubbi)jjBern(ui)]+C(3)whereCisaconstantvaluewithrespecttou.DKL(:)representstheKLdistancebetweentwoBernoullidistributions.Thisregularizationtermisaconvexfunctionwithrespecttouandachievesitsminimalwhenu=ubb.Itisastrongregularizationtermsinceitsvalueincreasesveryfastwhenu6=ubb.4VisualTasks4.1SemanticsegmentationWecaneasilyapplyoure-SVMmodeltothesemanticsegmentationtaskwiththeframeworkpro-posedbyCarreiraetal.[5].Firstly,CPMCsegmentproposals[4]aregeneratedandthesecond-orderpoolingfeatures[5]areextractedfromeachsegment.Thenwetrainthesegmentclassiersusingeithere-SVMorlatente-SVMaccordingtowhetherthegroundtruthpixel-levelannotationsareavailable.Inthetestingstage,theCPMCsegmentsaresortedbasedontheircondencescoresoutputbythetrainedclassiers.Thetoponeswillbeselectedtoproducethepredictedsemanticlabelmap.4.2ObjectdetectionForthetaskofobjectdetection,wecanonlyacquirebounding-boxannotationsinsteadofpixel-levellabeling.Therefore,itisnaturaltoapplyourlatente-SVMinthistasktoprovidecomplementaryinformationforthecurrentobjectdetectionsystem.Inthestate-of-the-artobjectdetectionsystems[11,13,24,14],thewindowcandidatesofforegroundobjectareextractedfromimagesandthecondencescoresarepredictedonthem.Windowcandi-datesareextractedeitherbyslidingwindowapproaches(usedine.g.thedeformablepart-basedmodel[11,13])ormostrecently,theSelectiveSearchmethod[24](usedine.g.theRegionCon-volutionalNeuralNetworks[14]).Thismethodlowersdownthenumberofwindowcandidatescomparedtothetraditionalslidingwindowapproach. Figure2:Theillustrationofourspatialcondencemapfeaturesforwindowcandidatesbasedone-SVM.Thecondencescoresofthesegmentsaremappedtopixelstogenerateapixel-levelcon-dencemap.Wewilldivideawindowcandidateintommspatialbinsandpoolthecondencescoresofthepixelsineachbin.Itleadstoammdimensionalfeature.5 riginal Image e - SVM classifiers Mapping segment confidence to pixels Confidence Map Pooling in each bins Features ( a ) ( b ) ( c ) Itisnoteasytodirectlyincorporatecondencescoresofthesegmentsintotheseobjectdetectionsystemsbasedonwindowcandidates.Thedifcultyliesintwoaspects.First,onlysomeofthesegmentsaretotallyinsideawindowcandidateortotallyoutsidethewindowcandidate.Itmightbehardtocalculatethecontributionofthecondencescoreofasegmentthatonlypartiallyoverlapswithawindowcandidate.Second,thewindowcandidates(eventhegroundtruthboundingboxes)willcontainsomeofthebackgroundregions.Someregions(e.g.theregionsneartheboundaryofthewindowcandidates)willhavehigherprobabilitytobethebackgroundregionthantheregionsinthecenter.Treatingthemequallywillharmtheaccuracyofthewholedetectionsystem.Inordertosolvetheseissues,weproposeanewspatialcondencemapfeature.Givenanimageandasetofwindowcandidates,werstcalculatethecondencescoresofallthesegmentsintheimageusingthelearnede-SVMmodels.ThecondencescoreforasegmentSisdenotedasCfdScore(S).Foreachpixel,thecondencescoreissetasthemaximumcondencescoreofallthesegmentsthatcontainthispixel.CfdScore(p)=max8S;p2SCfdScore(S).Inthisway,wecanhandlethedifcultyofpartialoverlappingbetweensegmentsandcandidatewindows.Fortheseconddifculty,wedivideeachcandidatewindowintoM=mmspatialbinsandpoolthecondencescoresofthepixelsineachbin.Becausetheclassiersaretrainedwiththeone-vs-allscheme,ourspatialcondencemapfeatureisclass-specic.Itleadstoa(MK)-dimensionalfeatureforeachcan-didatewindow,whereKreferstothetotalnumberofobjectclasses.Afterthat,weencodeitbyadditivekernelsapproximationmapping[25]andobtainthenalfeaturerepresentationofcandidatewindows.ThefeaturegeneratingprocessisillustratedinFigure2.Inthetestingstage,wecanconcatenatethissegmentfeaturewiththefeaturesfromotherobjectdetectionsystems.5ExperimentsInthissection,werstevaluatetheperformanceofe-SVMmethodonsegmentproposalclassica-tion,byusingtwonewevaluationcriterionsforthistask.Afterthat,weapplyourmethodtotwoessentialtasksincomputervision:semanticsegmentationandobjectdetection.Forsemanticseg-mentationtask,wetesttheproposedeSVMandlatenteSVMontwodifferentscenarios(i.e.,withpixel-levelgroundtruthlabelannotationandwithonlybounding-boxobjectannotation)respectively.Forobjectdetectiontask,wecombineourcondencemapfeaturewiththestate-of-the-artobjectde-tectionsystem,andshowourmethodcanobtainnon-trivialimprovementondetectionperformance.5.1Performanceevaluationone-SVMWeusePASCALVOC2011[9]segmentationdatasetinthisexperiment.ItisasubsetofthewholePASCAL2011datasetswith1112imagesinthetrainingsetand1111imagesinthevalidationset,with20foregroundobjectclassesintotal.Weusetheofcialtrainingsetandvalidationsetfortrainingandtestingrespectively.Similarto[5],weextract150CPMC[4]segmentproposalsforeachimageandcomputethesecond-orderpoolingfeaturesoneachsegment.Besides,weusethesamesequentialpastingscheme[5]astheinferencealgorithmintesting.5.1.1EvaluationcriteriaInliterature[5],thesupervisedlearningframeworkofsegment-basedpredictionmodeleitherre-gressedtheoverlappingvalueorconvertedittoabinaryclassicationproblemviaathresholdval-ue,andevaluatetheperformancebycertaintask-speciccriterion(i.e.,thepixel-wiseaccuracyusedforsemanticsegmentation).Inthispaper,weadoptadirectperformanceevaluationcriteriaforthesegment-wisetargetclasspredictiontask,whichisconsistentwiththelearningproblemitselfandnotbiasedtoparticulartasks.Unfortunately,wehavenotfoundanyworkonthissortofdirectperformanceevaluation,andthusintroducetwonewevaluationcriteriaforthispurpose.Werstbrieydescribethemasfollows:ThresholdAveragePrecisionCurve(TAPC)Althoughtheground-truthtargetvalue(i.e.,theoverlaprateofsegmentandboundingbox)isarealvalueintherangeof[0,1],wecantransformoriginalpredictionproblemtoaseriesofbinaryproblems,eachofwhichisconductedbythresh-oldingtheoriginalgroundtruthoverlaprate.Thus,wecalculatethePrecison-RecallCurveaswellasAPoneachofbinaryclassicationproblem,andcomputethemeanAPw.r.t.differentthresholdvaluesasaperformancemeasurementforthesegment-basedclasscondencepredictionproblem.6 (a)Usingpixellevelannotations (b)UsingboundingboxannotationsFigure3:PerformanceevaluationandcomparisontoSVCandSVRNormalizedDiscountedCumulativeGain(NDCG)[15]Consideringthatahighercondencevalueisexpectedtobepredictedforthesegmentwithhigheroverlaprate,wethinkthispredictionproblemcanbetreatedasarankingproblem,andthusweusetheNormalizedDiscountedCumu-lativeGain(NDCG),whichiscommonperformancemeasurementforrankingproblem,asanotherkindofperformanceevaluationcriterioninthispaper.5.1.2ComparisonstoSVCandSVRBasedontheTAPCandNDCGintroducedabove,weevaluatetheperformanceofoure-SVMmodelonPASCALVOC2011segmentationdataset,andcomparetheresultstotwocommonmethods(i.e.SVCandSVR)inliterature.NotethatwetesttheSVC'sperformancewithavarietyofbinaryclas-sicationproblems,eachofwhicharetrainedbyusingdifferentthresholdvalues(e.g.,0,0.2,0.4,0.6and0.8asshowningure3).Ingure3(a)and(b),weshowtheexperimentalresultsw.r.t.themodel/classiertrainedwithcleanpixel-wiseobjectclasslabelsandweakly-labelledbounding-boxannotation,respectively.Forbothcases,wecanseethatourmethodobtainsconsistentlysuperiorperformancethanSVCmodelforalldifferentthresholdvalues.Besides,wecanseethattheTAPCandNDCGofourmethodarehigherthanthoseofSVR,whichisapopularregressionmodelforcontinuouslyvaluedtargetvariablebasedonthemax-marginprinciple.5.2ResultsofsemanticsegmentationForthesemanticsegmentationtask,wetestoure-SVMmodelwithPASCALVOC2011segmtationdatasetusingtrainingsetfortrainingandvalidationsetfortesting.Weevaluatetheperformanceundertwodifferentdataannotationsettings,i.e.,trainingwithpixel-wisesemanticclasslabelmapsandobjectbounding-boxannotations.Theaccuracyw.r.t.thesetwosettingsare36:8%and27:7%respectively,whicharecomparabletotheresultsofthestate-of-the-artsegmentcondencepredic-tionmodel(i.e.,SVR)[5]usedinsemanticsegmentationtask.5.3ResultsofobjectdetectionAsmentionedinSection4.2,oneofthenaturalapplicationsofoure-SVMmethodistheobjectdetectiontask.Mostrecently,Girshicket.al[14]presentedaRegionswithCNNfeaturesmethod(RCNN)usingtheConvolutionalNeuralNetworkpre-trainedontheImageNetDataset[6]andne-tunedonthePASCALVOCdatasets.Theyachievedasignicantlyimprovementoverthepreviousstate-of-the-artalgorithms(e.g.DeformablePart-basedModel(DPM)[11])andpushthedetection7 0.7500 0.7700 0.7900 0.8100 0.8300 0.8500 0.8700 L-eSVM SVR SVC-0 SVC-0.2 SVC-0.4 SVC-0.6 SVC-0.8 NDCG 12.00 15.50 19.00 22.50 26.00 29.50 33.00 L-eSVM SVR SVC-0.0 SVC-0.2 SVC-0.4 SVC-0.6 SVC-0.8 TAPC 0.7500 0.7700 0.7900 0.8100 0.8300 0.8500 0.8700 0.8000 0.8100 0.8200 0.8300 0.8400 0.8500 0.8600 0.8700 0.8800 e-SVM SVR SVC-0 SVC-0.2 SVC-0.4 SVC-0.6 SVC-0.8 NDCG TAPC NDCG e-SVM 36.69 e-SVM 0.8750 SVR 35.23 SVR 0.8652 SVC-0.0 22.48 SVC-0 0.8153 SVC-0.2 33.96 SVC-0.2 0.8672 SVC-0.4 35.62 SVC-0.4 0.8656 SVC-0.6 32.57 SVC-0.6 0.8485 SVC-0.8 26.73 SVC-0.8 0.8244 20 22 24 26 28 30 32 34 36 38 e-SVM SVR SVC-0.0 SVC-0.2 SVC-0.4 SVC-0.6 SVC-0.8 TAPC planebikebirdboatbottlebuscarcatchaircow RCNN64.169.250.441.233.262.870.561.832.458.4Ours63.770.251.942.533.463.271.362.034.758.7Gain-0.41.01.51.30.20.40.80.22.30.2 RCNN(bb)68.172.856.843.036.866.374.267.634.463.5Ours(bb)70.474.259.144.738.067.274.669.036.764.3Gain2.31.42.31.61.21.00.31.32.30.8 tabledoghorsemotor.personplantsheepsofatraintvAverage RCNN45.855.861.066.853.930.953.349.256.964.154.1Ours47.857.961.267.554.934.555.851.058.465.055.3Gain2.02.10.30.81.03.72.51.81.60.91.2 RCNN(bb)54.561.269.168.658.733.462.951.162.564.858.5Ours(bb)56.462.969.369.959.635.664.653.264.365.560.0Gain(bb)1.91.80.21.40.92.21.72.11.80.71.5 Table1:DetectionresultsonPASCAL2007.”bb”meanstheresultafterapplyingboundingboxregression.GainmeanstheimprovedAPofoursystemcomparedtoRCNNunderthesamesettings(bothwithboundingboxorwithout).Thebetterresultsinthecomparisonsarebold.performanceintoaveryhighlevel(TheaverageAPis58.5withboundaryregularizationonPAS-CALVOC2007).Aquestionarises:canwefurtherimprovetheirperformance?Theanswerisyes.Inourmethod,werstlearnthelatente-SVMmodelsbasedontheobjectbounding-boxannotation,andcalculatethespatialcondencemapfeaturesasinsection4.2.ThenwesimplyconcatenatethemwithRCNNthefeaturestotrainobjectclassiersoncandidatewindows.WeusePASCALVOC2007datasetinthisexperiment.Asshownintable1,ourmethodcanimprovetheaverageAPby1.2beforeapplyingboundingboxesregression.ForsomecategoriesthattheoriginalRCNNdoesnotperformwell,suchaspottedplant,thegainofAPisupto3.65.AfterapplyingboundingboxregressionforbothRCNNandouralgorithm,thegainofperformanceis1.5onaverage.Intheexperiment,wesetm=5andadoptaveragepoolingonthepixellevelcondencescoreswithineachspatialbin.Wealsomodiedtheboundingboxregularizationmethodusedin[14]byaugmentingthefthlayerfeatureswithadditivekernelsapproximationmethods[25].Itwillleadtoaslightlyimprovedperformance.Insummary,weachievedanaverageAPof60.0,whichis1.5higherthanthebestknownresults(theoriginalRCNNwithboundingboxregression)ofthisdataset.PleasenotethatweonlyusetheannotationsonPASCALVOC2007totrainthee-SVMclassiersandhavenotconsideredcontext.TheresultsareexpectedtobefurtherimprovedifthedatainImageNetisused.6ConclusionWepresentanovellearningalgorithmcalle-SVMthatcanwellhandlethesituationinwhichthelabelsoftrainingdataarecontinuousvalueswhoserangeisaboundedinterval.Itcanbeappliedtosegmentproposalclassicationtaskandcanbeeasilyextendedtolearnsegmentclassiersunderweaksupervision(e.g.onlyboundingboxannotationsareavailable).Weapplythismethodontwomajortasksofcomputervision(i.e.,semanticsegmentationandobjectdetection),andobtainthestate-of-the-artobjectdetectionperformanceonPASCALVOC2007dataset.Webelievethat,withtheevergrowingsizeofdatesets,itisincreasinglyimportanttolearnsegmentclassiersunderweaksupervisiontoreducetheamountoflabelingrequired.Infuturework,wewillconsiderusingtheboundingboxannotationfromlargedatasets,suchasImageNet,tofurtherimprovesemanticsegmentationperformanceonPASCALVOC.Acknowledgements.WegratefullyacknowledgefundingsupportfromtheNationalScienceFoun-dation(NSF)withawardCCF-1317376,andfromtheNationalInstituteofHealthNIHGrant5R01EY022247-03.WealsothanktheNVIDIACorporationforprovidingGPUsinourexperi-ments.8 References[1]R.Achanta,A.Shaji,K.Smith,A.Lucchi,P.Fua,andS.Susstrunk.SLICsuperpixelscomparedtostate-of-the-artsuperpixelmethods.TPAMI,34(11):2274–2282,2012.[2]S.Andrews,I.Tsochantaridis,andT.Hofmann.Supportvectormachinesformultiple-instancelearning.InAdvancesinNeuralInformationProcessingSystems15,pages561–568.MITPress,2003.[3]P.Arbelaez,B.Hariharan,C.Gu,S.Gupta,andJ.Malik.Semanticsegmentationusingregionsandparts.InCVPR,2012.[4]J.CarreiraandC.Sminchisescu.Cpmc:Automaticobjectsegmentationusingconstrainedparametricmin-cuts.TPAMI,34(7):1312–1328,2012.[5]J.a.Carreira,R.Caseiro,J.Batista,andC.Sminchisescu.Semanticsegmentationwithsecond-orderpooling.InECCV,pages430–443,2012.[6]J.Deng,A.Berg,,J.Winn,andA.Zisserman.ThePASCALVisualObjectClassesChallenge2010(VOC2010)Results.http://www.image-net.org/challenges/LSVRC/2012/index.[7]T.G.Dietterich,R.H.Lathrop,andT.Lozano-P´erez.Solvingthemultipleinstanceproblemwithaxis-parallelrectangles.Artif.Intell.,89(1-2):31–71,Jan.1997.[8]M.Everingham,L.VanGool,C.K.I.Williams,J.Winn,andA.Zisserman.ThePASCALVisualObjectClassesChallenge2007(VOC2007)Results.http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html.[9]M.Everingham,L.VanGool,C.K.I.Williams,J.Winn,andA.Zisserman.ThePASCALVisualObjectClassesChallenge2011(VOC2011)Results.http://www.pascal-network.org/challenges/VOC/voc2011/workshop/index.html.[10]R.-E.Fan,K.-W.Chang,C.-J.Hsieh,X.-R.Wang,andC.-J.Lin.LIBLINEAR:Alibraryforlargelinearclassication.JMLR,9:1871–1874,2008.[11]P.F.Felzenszwalb,R.B.Girshick,D.McAllester,andD.Ramanan.Objectdetectionwithdiscrimina-tivelytrainedpart-basedmodels.TPAMI,32(9):1627–1645,2010.[12]P.F.FelzenszwalbandD.P.Huttenlocher.Efcientgraph-basedimagesegmentation.IJCV,59(2):167–181,Sept.2004.[13]S.Fidler,R.Mottaghi,A.L.Yuille,andR.Urtasun.Bottom-upsegmentationfortop-downdetection.InCVPR,pages3294–3301,2013.[14]R.Girshick,J.Donahue,T.Darrell,andJ.Malik.Richfeaturehierarchiesforaccurateobjectdetectionandsemanticsegmentation.InCVPR,2014.[15]K.J¨arvelinandJ.Kek¨al¨ainen.Cumulatedgain-basedevaluationofirtechniques.TOIS,20(4):422–446,2002.[16]A.Krizhevsky,I.Sutskever,andG.E.Hinton.Imagenetclassicationwithdeepconvolutionalneuralnetworks.InNIPS,pages1106–1114,2012.[17]M.Lapin,M.Hein,andB.Schiele.Learningusingprivilegedinformation:Svm+andweightedsvm.NeuralNetworks,53:95–108,2014.[18]F.Li,J.Carreira,andC.Sminchisescu.Objectrecognitionasrankingholisticgure-groundhypotheses.InCVPR,pages1712–1719,2010.[19]Y.Liu,J.Liu,Z.Li,J.Tang,andH.Lu.Weakly-superviseddualclusteringforimagesemanticsegmenta-tion.InComputerVisionandPatternRecognition(CVPR),2013IEEEConferenceon,pages2075–2082.IEEE,2013.[20]A.M¨ullerandS.Behnke.Multi-instancemethodsforpartiallysupervisedimagesegmentation.InPSL,pages110–119,2012.[21]X.Ren,L.Bo,andD.Fox.Rgb-(d)scenelabeling:Featuresandalgorithms.InCVPR,June2012.[22]J.Shotton,M.Johnson,andR.Cipolla.Semantictextonforestsforimagecategorizationandsegmenta-tion.InCVPR,pages1–8,2008.[23]J.Suykens,J.D.Brabanter,L.Lukas,andJ.Vandewalle.Weightedleastsquaressupportvectormachines:robustnessandsparseapproximation.NEUROCOMPUTING,48:85–105,2002.[24]J.Uijlings,K.vandeSande,T.Gevers,andA.Smeulders.Selectivesearchforobjectrecognition.IJCV,104(2):154–171,2013.[25]A.VedaldiandA.Zisserman.Efcientadditivekernelsviaexplicitfeaturemaps.TPAMI,34(3):480–492,2012.[26]A.Vezhnevets,V.Ferrari,andJ.M.Buhmann.Weaklysupervisedstructuredoutputlearningforsemanticsegmentation.InCVPR,pages845–852,2012.[27]X.Yang,Q.Song,andA.Cao.Weightedsupportvectormachinefordataclassication.InIJCNN,2005.9