/
ImprovedDeepMetricLearningwithMulticlassNpairLossObjective ImprovedDeepMetricLearningwithMulticlassNpairLossObjective

ImprovedDeepMetricLearningwithMulticlassNpairLossObjective - PDF document

ceila
ceila . @ceila
Follow
343 views
Uploaded On 2021-10-05

ImprovedDeepMetricLearningwithMulticlassNpairLossObjective - PPT Presentation

KihyukSohnNECLaboratoriesAmericaIncksohnneclabscomAbstractFigure1DeepmetriclearningwithlefttripletlossandrightN1tupletlossEmbeddingvectorsfofdeepnetworksaretrainedtosatisfytheconstraintsofeachlossTr ID: 896108

x0000 pair triplet cation pair x0000 cation triplet ovo incvpr pairloss tupletloss vrf 2015 ands car 200 196 cally

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "ImprovedDeepMetricLearningwithMulticlass..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 ImprovedDeepMetricLearningwithMulti-clas
ImprovedDeepMetricLearningwithMulti-classN-pairLossObjective KihyukSohnNECLaboratoriesAmerica,Inc.ksohn@nec-labs.comAbstract Figure1:Deepmetriclearningwith(left)tripletlossand(right)(N+1)-tupletloss.Embeddingvectorsfofdeepnetworksaretrainedtosatisfytheconstraintsofeachloss.Tripletlosspullspositiveexamplewhilepushingonenegativeexampleatatime.Ontheotherhand,(N+1)-tupletlosspushesN-1negativeexamplesallatonce,basedontheirsimilaritytotheinputexample.lossalreadyconsiderscomparisontoN-1negativeexamplesinitstrainingobjectives,negativedataminingwon'tbenecessaryinlearningfromsmallormedium-scaledatasetsintermsofthenumberofoutputclasses.Fordatasetswithlargenumberofoutputclasses,weproposeahardnegative“class”miningschemewhichgreedilyaddsexamplestoformabatchfromaclassthatviolatestheconstraintwiththepreviouslyselectedclassesinthebatch.Inexperiment,wedemonstratethesuperiorityofourproposedN-pair-mclosstothetripletlossaswellasothercompetingmetriclearningobjectivesonvisualrecognition,verication,andretrievaltasks.Specically,wereportmuchimprovedrecognitionandvericationperformanceonourne-grainedcarandowerrecognitiondatasets.Incomparisontothesoftmaxloss,N-pair-mclossisascompetitiveforrecognitionbutsignicantlybetterforverication.Moreover,wedemonstratesub-stantialimprovementinimageclusteringandretrievaltasksonOnlineproduct[21],Car-196[12],andCUB-200[25],aswellasfacevericationandidenticationaccuracyonLFWdatabase[8].2Preliminary:DistanceMetricLearningLetx2Xbeaninputdataandy2f1;;Lgbeitsoutputlabel.Weusex+andx�todenotepositiveandnegativeexamplesofx,meaningthatxandx+arefromthesameclassandx�isfromdifferentclasstox.Thekernelf(;):X!RKtakesxandgeneratesanembeddingvectorf(x).Weoftenomitxfromf(x)forsimplicity,whilefinheritsallsuperscriptsandsubscripts.Contrastiveloss[3,7]takespairsofexamplesasinputandtrainsanetworktopredictwhethertwoinputsarefromthesameclassornot.Specically,thelossiswrittenasfollows:Lmcont(xi;xj;f)=1fyi=yjgkfi�fjk22+1fyi6=yjgmax�0;m�kfi�fjk22(1)wheremisamarginparameterimposingthedistancebetweenexamplesfromdifferentclassestobelargerthanm.Tripletloss[27,2,19]sharesasimilarspirittocontrastiveloss,butiscomposedoftriplets,eachconsistingofaquery,apositiveexample(tothequery),andanegativeexample:Lmtri(x;x+;x�;f)=max�0;kf�f+k22�kf�f�k22+m(2)Comparedtocontrastiveloss,tripletlossonlyrequiresthedifferenceof(dis-)similaritiesbetweenpositiveandnegativeexamplestothequerypointtobelargerthanamarginm.Despitetheirwideuse,bothlossfunctionsareknowntosufferfromslowconvergenceandtheyoftenrequireexpensivedatasamplingmethodtoprovidenontrivialpairsortripletstoacceleratethetraining[2,19,17,4].3DeepMetricLearningwithMultipleNegativeExamplesThefundamentalphilosophybehindtripletlossisthefollowing:foraninput(query)example,wedesiretoshortenthedistancesbetweenitsembeddingvectorandthoseofpositiveexampleswhileenlargingthedistancesbetweenthatofnegativeexamples.However,duringoneupdate,thetripletlossonlycomparesanexamplewithonenegativeexamplewhileignoringnegativeexamplesfromtherestoftheclasses.Asaconsequence,theembeddingvectorforanexampleisonlyguaranteedtobefarfromtheselectednegativeclassbutnotnecessarilytheothers.Thuswecanenduponlydifferentiatinganexamplefromalimitedselectionofnegat

2 iveclassesyetstillmaintainasmalldistance
iveclassesyetstillmaintainasmalldistancefrommanyotherclasses.Inpractice,thehopeisthat,afterloopingoversufcientlymanyrandomlysampledtriplets,thenaldistancemetriccanbebalancedcorrectly;butindividualupdatecanstillbeunstableandtheconvergencewouldbeslow.Specically,towardstheendoftraining,mostrandomlyselectednegativeexamplescannolongeryieldnon-zerotripletlosserror.2 I I  I  I I  I  I '11 [     I  1 I  1 I   I  1 I I  I   I   I    I I  I   I    I  1     Anevidentwaytoimprovethevanillatripletlossistoselectanegativeexamplethatviolatesthetripletconstraint.However,hardnegativedataminingcanbeexpensivewithalargenumberofout-putclassesfordeepmetriclearning.Weseekanalternative:alossfunctionthatrecruitsmultiplenegativesforeachupdate,asillustratedbyFigure1.Inthiscase,aninputexampleisbeingcom-paredagainstnegativeexamplesfrommultipleclassesanditneedstobedistinguishablefromallofthematthesametime.Ideally,wewouldlikethelossfunctiontoincorporateexamplesacrosseveryclassallatonce.Butitisusuallynotattainableforlargescaledeepmetriclearningduetothemem-orybottleneckfromtheneuralnetworkbasedembedding.Motivatedbythisthoughtprocess,weproposeanovel,computationallyfeasiblelossfunction,illustratedbyFigure2,whichapproximatesourideallossbypushingNexamplessimultaneously.3.1LearningtoidentifyfrommultiplenegativeexamplesWeformalizeourproposedmethod,whichisoptimizedtoidentifyapositiveexamplefrommultiplenegativeexamples.Consideran(N+1)-tupletoftrainingexamplesfx;x+;x1;;xN�1g:x+isapositiveexampletoxandfxigN�1i=1arenegative.The(N+1)-tupletlossisdenedasfollows:L(fx;x+;fxigN�1i=1g;f)=log1+N�1Xi=1exp(f�fi�f�f+)(3)wheref(;)isanembeddingkerneldenedbydeepneuralnetwork.RecallthatitisdesirableforthetupletlosstoinvolvenegativeexamplesacrossallclassesbutitisimpracticalinthecasewhenthenumberofoutputclassesLislarge;evenifwerestrictthenumberofnegativeexamplesperclasstoone,itisstilltooheavy-liftingtoperformstandardoptimization,suchasstochasticgradientdescent(SGD),withamini-batchsizeaslargeasL.WhenN=2,thecorresponding(2+1)-tupletlosshighlyresemblesthetripletlossasthereisonlyonenegativeexampleforeachpairofinputandpositiveexamples:L(2+1)-tuplet(fx;x+;xig;f)=log�1+exp(f�fi�f�f+);(4)Ltriplet(fx;x+;xig;f)=max�0;f�fi�f�f+:(5)Indeed,undermildassumptions,wecanshowthatanembeddingfminimizesL(2+1)-tupletifandonlyifitminimizesLtriplet,i.e.,twolossfunctionsareequivalent.1WhenN�2,wefurtherarguetheadvantagesof(N+1)-tupletlossovertripletloss.Wecompare(N+1)-tupletlosswiththetripletlossintermsofpartitionfunctionestimationofanideal(L+1)-tupletloss,wherean(L+1)-tupletlosscoupledwithasingleexamplepernegativeclasscanbewrittenasfollows:log1+L�1Xi=1exp(f�fi�f�f+)=�logexp(f�f+) exp(f�f+)+PL�1i=1exp(f�fi)(6)Equation(6)issimilartothemulti-classlogisticloss(i.e.,softmaxloss)formulationwhenweviewfasafeaturevector,f+andfi'sasweightvectors,andthedenominatorontherighthandsideofEquation(6)asapartitionfunctionofthelikelihoodP(y=y+).Weobservethatthepartitionfuncti

3 oncorrespondingtothe(N+1)-tupletapproxim
oncorrespondingtothe(N+1)-tupletapproximatesthatof(L+1)-tuplet,andlargerthevalueofN,moreaccuratetheapproximation.Therefore,itnaturallyfollowsthat(N+1)-tupletlossisabetterapproximationthanthetripletlosstoanideal(L+1)-tupletloss.3.2N-pairlossforefcientdeepmetriclearningSupposewedirectlyapplythe(N+1)-tupletlosstothedeepmetriclearningframework.WhenthebatchsizeofSGDisM,thereareM(N+1)examplestobepassedthroughfatoneupdate.SincethenumberofexamplestoevaluateforeachbatchgrowsinquadratictoMandN,itagainbecomesimpracticaltoscalethetrainingforaverydeepconvolutionalnetworks.Now,weintroduceaneffectivebatchconstructiontoavoidexcessivecomputationalburden.Letf(x1;x+1);;(xN;x+N)gbeNpairsofexamplesfromNdifferentclasses,i.e.,yi6=yj;8i6=j.WebuildNtuplets,denotedasfSigNi=1,fromtheNpairs,whereSi=fxi;x+1;x+2;;x+Ng.Here,xiisthequeryforSi,x+iisthepositiveexampleandx+j;j6=iarethenegativeexamples. 1WeassumeftohaveunitnorminEquation(5)toavoiddegeneracy.3 (a)Tripletloss (b)(N+1)-tupletloss (c)N-pair-mclossFigure2:Tripletloss,(N+1)-tupletloss,andmulti-classN-pairlosswithtrainingbatchconstruc-tion.Assumingeachpairbelongstoadifferentclass,theN-pairbatchconstructionin(c)leveragesall2NembeddingvectorstobuildNdistinct(N+1)-tupletswithffigNi=1astheirqueries;there-after,wecongregatetheseNdistincttupletstoformtheN-pair-mcloss.ForabatchconsistingofNdistinctqueries,tripletlossrequires3Npassestoevaluatethenecessaryembeddingvectors,(N+1)-tupletlossrequires(N+1)NpassesandourN-pair-mclossonlyrequires2N.Figure2(c)illustratesthisbatchconstructionprocess.Thecorresponding(N+1)-tupletloss,whichwerefertoasthemulti-classN-pairloss(N-pair-mc),canbeformulatedasfollows:2LN-pair-mc(f(xi;x+i)gNi=1;f)=1 NNXi=1log1+Xj6=iexp(f�if+j�f�if+i)(7)ThemathematicalformulationofourN-pairlosssharessimilarspiritswithotherexistingmethods,suchastheneighbourhoodcomponentanalysis(NCA)[6]andthetripletlosswithliftedstruc-ture[21].3Nevertheless,ourbatchconstructionisdesignedtoachievetheutmostpotentialofsuch(N+1)-tupletloss,whenusingdeepCNNsasembeddingkernelonlargescaledatasetsbothintermsoftrainingdataandnumberofoutputclasses.Therefore,theproposedN-pair-mclossisanovelframeworkconsistingoftwoindispensablecomponents:the(N+1)-tupletloss,asthebuildingblocklossfunction,andtheN-pairconstruction,asthekeytoenablehighlyscalabletraining.LaterinSection4.4,weempiricallyshowtheadvantageofourN-pair-mclossframeworkincomparisontoothervariationsofmini-batchconstructionmethods.Finally,wenotethatthetupletbatchconstructionisnotspecictothe(N+1)-tupletloss.WecallthesetoflossfunctionsusingtupletconstructionmethodanN-pairloss.Forexample,whenintegratedintothestandardtripletloss,weobtainthefollowingone-vs-oneN-pairloss(N-pair-ovo):LN-pair-ovo(f(xi;x+i)gNi=1;f)=1 NNXi=1Xj6=ilog1+exp(f�if+j�f�if+i):(8)3.2.1HardnegativeclassminingThehardnegativedataminingisconsideredasanessentialcomponenttomanytriplet-baseddistancemetriclearningalgorithms[19,17,4]toimproveconvergencespeedorthenaldiscriminativeperformance.Whenthenumberofoutputclassesarenottoolarge,itmaybeunnecessaryforN-pairlosssincetheexamplesfrommostofthenegativeclassesareconsideredjointlyalready.Whenwetrainonthedatasetwithlargeoutputclasses,theN-pairlosscanbebenetedfromcarefullyselectedimpostore

4 xamples.Evaluatingdeepembeddingvectorsfo
xamples.Evaluatingdeepembeddingvectorsformultipleexamplesfromlargenumberofclassesiscomputa-tionallydemanding.Moreover,forN-pairloss,onetheoreticallyneedsNclassesthatarenegativetooneanother,whichsubstantiallyaddstothechallengeofhardnegativesearch.Toovercomesuchdifculty,weproposenegative“class”mining,asopposedtonegative“instance”mining,whichgreedilyselectsnegativeclassesinarelativelyefcientmanner.Morespecically,thenegativeclassminingforN-pairlosscanbeexecutedasfollows: 2WealsoconsiderthesymmetriclosstoEquation(7)thatswapsfandf+tomaximizetheefcacy.3OurN-pairbatchconstructioncanbeseenasaspecialcaseofliftedstructure[21]wherethebatchincludesonlypositivepairsthatarefromdisjointclasses.Besides,thelossfunctionin[21]isbasedonthemax-marginformulation,whereasweoptimizethelogprobabilityofidenticationlossdirectly.4 I  I  I    I 1 I 1 I 1   I  I  I    I  I  I     I 1 I 1   I  I    I      I 1  I 1  I 11  I   I   I 1 I 1 I  I   I 1 I 1  I  I   I  I   I   I   I   I   I 1  I 1  1.EvaluateEmbeddingVectors:chooserandomlyalargenumberofoutputclassesC;foreachclass,randomlypassafew(oneortwo)examplestoextracttheirembeddingvectors.2.SelectNegativeClasses:selectoneclassrandomlyfromCclassesfromstep1.Next,greedilyaddanewclassthatviolatestripletconstraintthemostw.r.t.theselectedclassestillwereachNclasses.Whenatieappears,werandomlypickoneoftiedclasses[28].3.FinalizeN-pair:drawtwoexamplesfromeachselectedclassfromstep2.3.2.2L2normregularizationofembeddingvectorsThenumericalvalueoff�f+canbeinuencedbynotonlythedirectionoff+butalsoitsnorm,eventhoughtheclassicationdecisionshouldbedeterminedmerelybythedirection.Normalizationcanbeasolutiontoavoidsuchsituation,butitistoostringentforourlossformulationsinceitboundsthevalueofjf�f+jtobelessthan1andmakestheoptimizationdifcult.Instead,weregularizetheL2normoftheembeddingvectorstobesmall.4ExperimentalResultsWeassesstheimpactofourproposedN-pairlossfunctions,suchasmulti-classN-pairloss(N-pair-mc)orone-vs-oneN-pairloss(N-pair-ovo),onseveralgenericandne-grainedvisualrecognitionandvericationtasks.Asabaseline,wealsoevaluatetheperformanceoftripletlosswithnegativedatamining4(triplet-nm).Inourexperiments,wedrawapairofexamplesfromtwodifferentclassesandthenformtwotriplets:eachwithoneofthepositiveexamplesasquery,theotheroneaspositive,(any)oneofthenegativeexamplesasnegative.Thus,abatchof2NtrainingexamplescanproduceN=2N 42triplets,whichismoreefcientthantheformulationinEquation(2)thatweneed3NexamplestoformNtriplets.WeadaptthesmoothupperboundoftripletlossinEquation(4)insteadoflarge-marginformulation[27]inallourexperimentstobeconsistentwithN-pair-mclosses.WeuseAdam[11]formini-batchstochasticgradientdescentwithdataaugmentation,namelyhor-izontalipsandrandomcrops.Forevaluation,weextractafeaturevectorandcomputethecosinesimilarityforverication.Whenmorethanonefeaturevectorsareextractedviahorizontaliporfrommultiplecrops,weusethecosinesimilarityaveragedoverallpossiblecombinationsbetweenfeaturevectorsof

5 twoexamples.Forallourexperimentsexceptfo
twoexamples.Forallourexperimentsexceptforthefaceverication,weuseImageNetpretrainedGoogLeNet5[23]fornetworkinitialization;forfaceverication,weusethesamenetworkarchitectureasCasiaNet[31]buttrainedfromscratchwithoutthelastfully-connectedlayerforsoftmaxclassication.OurimplementationisbasedonCaffe[10].4.1Fine-grainedvisualobjectrecognitionandvericationWeevaluatedeepmetriclearningalgorithmsonne-grainedobjectrecognitionandvericationtasks.Specically,weconsidercarandowerrecognitionproblemsonthefollowingdatabase:•Car-333[29]datasetiscomposedof164;863imagesofcarsfrom333modelcategoriescol-lectedfromtheinternet.Followingtheexperimentalprotocol[29],wesplitthedatasetinto157;023imagesfortrainingand7;840fortesting.•Flower-610datasetcontains61;771imagesofowersfrom610differentowerspeciesandamongallcollected,58;721imagesareusedfortrainingand3;050fortesting.Wetrainnetworksfor40kiterationswith144examplesperbatch.Thiscorrespondsto72pairsperbatchforN-pairlosses.Weperform5-foldcross-validationonthetrainingsetandreporttheaverageperformanceonthetestset.Weevaluatebothrecognitionandvericationaccuracy.Specically,weconsidervericationsettingwheretherearedifferentnumberofnegativeexamplesfromdifferentclasses,anddetermineassuccessonlywhenthepositiveexampleisclosertothequeryexamplethananyothernegativeexample.Sincetherecognitiontaskisinvolved,wealsoevaluatetheperformanceofdeepnetworkstrainedwithsoftmaxloss.ThesummaryresultsaregiveninTable1.Weobserveconsistentimprovementof72-pairlossmodelsovertripletlossmodels.Althoughthenegativedataminingcouldbringsubstantialimprovementtothebaselinemodels,theperformanceisnotascompetitiveas72-pairlossmodels.Moreover,the72-pairlossmodelsaretrainedwithoutnegativedatamining,thusshouldbemoreeffectivefordeepmetriclearningframework.Between 4Throughoutexperiments,negativedataminingreferstothenegativeclassminingforbothtripletandN-pairlossinsteadofnegativeinstancemining.5https://github.com/BVLC/caffe/tree/master/models/bvlc_googlenet5 Database,evaluationmetric triplet triplet-nm 72-pair-ovo 72-pair-mc softmax Car-333 Recognition 70.240:38 83.220:09 86.840:13 88.370:05 89.210:16 88.690:20y VRF(neg=1) 96.780:04 97.390:07 98.090:07 97.920:06 96.190:07 VRF(neg=71) 48.960:35 65.140:24 73.050:25 76.020:30 55.360:30 Flower-610 Recognition 71.550:26 82.850:22 84.100:42 85.570:25 84.380:28 84.590:21y VRF(neg=1) 98.730:03 99.150:03 99.320:03 99.500:02 98.720:04 VRF(neg=71) 73.040:13 83.130:15 87.420:18 88.630:14 78.440:33 Table1:MeanrecognitionandvericationaccuracywithstandarderroronthetestsetofCar-333andFlower-610datasets.TherecognitionaccuracyofallmodelsareevaluatedusingkNNclassier;formodelswithsoftmaxclassier,wealsoevaluaterecognitionaccuracyusingsoftmaxclassier(y).Thevericationaccuracy(VRF)isevaluatedatdifferentnumbersofnegativeexamples.N-pairlossmodels,themulti-classloss(72-pair-mc)showsbetterperformancethantheone-vs-oneloss(72-pair-ovo).AsdiscussedinSection3.1,superiorperformanceofmulti-classformulationisexpectedsincetheN-pair-ovolossisdecoupledinthesensethattheindividuallossesaregeneratedforeachnegativeexampleindependently.Whenitcomparestothesoftmax

6 loss,therecognitionperformanceofthe72-pa
loss,therecognitionperformanceofthe72-pair-mclossmodelsarecompetitive,showingslightlyworseonCar-333butbetteronFlower-610datasets.However,theperformanceofsoftmaxlossmodelbreaksdownseverelyonthevericationtask.Wearguethattherepresentationofthemodeltrainedwithclassicationlossisnotoptimalforvericationtasks.Forexample,examplesneartheclassicationdecisionboundarycanstillbeclassiedcorrectly,butarepronetobemissedforvericationwhenthereareexamplesfromdifferentclassneartheboundary.4.2DistancemetriclearningforunseenobjectrecognitionDistancemetriclearningallowstolearnametricthatcanbegeneralizedtoanunseencategories.Wehighlightthisaspectofdeepmetriclearningonseveralvisualobjectrecognitionbenchmark.Followingtheexperimentalprotocolin[21],weevaluateonthefollowingthreedatasets:•StanfordOnlineProduct[21]datasetiscomposedof120;053imagesfrom22;634onlineprod-uctcategories,andispartitionedinto59;551imagesof11;318categoriesfortrainingand60;502imagesof11;316categoriesfortesting.•StanfordCar-196[12]datasetiscomposedof16;185imagesofcarsfrom196modelcategories.Therst98modelcategoriesareusedfortrainingandtherestfortesting.•Caltech-UCSDBirds(CUB-200)[25]datasetiscomposedof11;788imagesofbirdsfrom200differentspecies.Similarly,weusetherst100categoriesfortraining.UnlikeinSection4.1,theobjectcategoriesbetweentrainandtestsetsaredisjoint.Thismakestheproblemmorechallengingsincedeepnetworkscaneasilyoverttothecategoriesinthetrainsetandgeneralizationofdistancemetrictounseenobjectcategoriescouldbedifcult.Wecloselyfollowexperimentalsettingof[21].Forexample,weinitializethenetworkusingIma-geNetpretrainedGoogLeNetandtrainfor20kiterationsusingthesamenetworkarchitecture(e.g.,64dimensionalembeddingforCar-196andCUB-200datasetsand512dimensionalembeddingforOnlineproductdataset)andthenumberofexamples(e.g.,120examples)perbatch.Besides,weuseAdamforstochasticoptimizationandotherhyperparameterssuchaslearningratearetunedac-cordinglyvia5-foldcross-validationonthetrainset.WereporttheperformanceforbothclusteringandretrievaltasksusingF1andnormalizedmutualinformation(NMI)[16]scoresforclusteringaswellasrecall@K[9]scoreforretrievalinTable2.WeobservesimilartrendasinSection4.1.Thetripletlossmodelperformstheworstamongalllossesconsidered.Negativedataminingcanalleviatethemodeltoescapefromthelocaloptimum,buttheN-pairlossmodelsoutperformsevenwithoutadditionalcomputationalcostfornegativedatamining.TheperformanceofN-pairlossfurtherimproveswhencombinedwiththeproposednegativedatamining.Overall,weimproveby9:6%onF1score,1:99%onNMIscore,and14:41%onrecall@1scoreonOnlineproductdatasetcomparedtothebaselinetripletlossmodels.Lastly,ourmodeloutperformstheperformanceoftripletlosswithliftedstructure[21],whichdemonstratestheeffectivenessoftheproposedNpairbatchconstruction.6 Onlineproduct triplet triplet-nm triplet-lifted 60-pair-ovo 60-pair-ovo 60-pair-mc 60-pair-mc structure[21] -nm -nm F1 19.59 24.27 25.6 23.13 25.31 26.53 28.19NMI 86.11 87.23 87.5 86.98 87.45 87.77 88.10 K=1 53.32 62.39 61.8 60.71 63.85 65.25 67.73K=10 72.75 79.69 79.9 78.74 81.22 82.15 83.76K=100 87.66 91.10 91.1 91.03 91.89 92.60 92.98K=1000 96.43 97.25 97.3 97.50 97.51 97.92 97.81 Car-196 CUB-200 triplet triplet-nm 60-pair-ovo 60-pair-mc triplet triplet-nm 60-pair-ovo 60-pair-mc F1 24.73 27.86 33.52 33.

7 55 21.88 24.37 25.21 27.24NMI 58.25 59.9
55 21.88 24.37 25.21 27.24NMI 58.25 59.94 63.87 63.95 55.83 57.87 58.55 60.39 K=1 53.84 61.62 69.52 71.12 43.30 46.47 48.73 50.96K=2 66.02 73.48 78.76 79.74 55.84 58.58 60.48 63.34K=4 75.91 81.88 85.80 86.48 67.30 71.03 72.08 74.29K=8 84.18 87.81 90.94 91.60 77.48 80.17 81.62 83.22 Table2:F1,NMI,andrecall@Kscoresonthetestsetofonlineproduct[21],Car-196[12],andCUB-200[25]datasets.F1andNMIscoresareaveragedover10differentrandomseedsforkmeansclusteringbutstandarderrorsareomittedduetospacelimit.Thebestperformingmodelandthosewithoverlappingstandarderrorsarebold-faced. triplet triplet-nm 192-pair-ovo 192-pair-mc 320-pair-mc VRF 95.880:30 96.680:30 96.920:24 98.270:19 98.330:17Rank-1 55.14 60.93 66.21 88.58 90.17DIR@FIR=1% 25.96 34.60 34.14 66.51 71.76 Table3:Meanvericationaccuracy(VRF)withstandarderror,rank-1accuracyofclosedsetiden-ticationandDIR@FAR=1%rateofopen-setidentication[1]onLFWdataset.Thenumberofexamplesperbatchisxedto384forallmodelsexceptfor320-pair-mcmodel.4.3FacevericationandidenticationFinally,weapplyourdeepmetriclearningalgorithmsonfacevericationandidentication,aprob-lemthatdetermineswhethertwofaceimagesarethesameidentities(verication)andaproblemthatidentiesthefaceimageofthesameidentityfromthegallerywithmanynegativeexamples(iden-tication).WetrainournetworksontheWebFacedatabase[31],whichiscomposedof494;414imagesfrom10;575identities,andevaluatethequalityofembeddingnetworkstrainedwithdif-ferentmetriclearningobjectivesonLabeledFacesintheWild(LFW)[8]database.Wefollowthenetworkarchitecturein[31].Allnetworksaretrainedfor240kiterations,whilethelearningrateisdecreasedfrom0:0003to0:0001and0:00003at160kand200kiterations,respectively.Wereporttheperformanceoffaceverication.ThesummaryresultisprovidedinTable3.Thetripletlossmodelshows95:88%vericationaccuracy,buttheperformancebreaksdownonidenticationtasks.Althoughnegativedatamininghelps,theimprovementislimited.Comparedtothese,theN-pair-mclossmodelimprovestheperformancebyasignicantmargin.Furthermore,weobserveadditionalimprovementbyincreasingNto320,obtaining98:33%forverication,90:17%forclosed-setand71:76%foropen-setidenticationaccuracy.Itisworthnotingthat,althoughitshowsbetterperformancethanthebaselinetripletlossmodels,theN-pair-ovolossmodelperformsmuchworsethantheN-pair-mclossonthisproblem.Interestingly,theN-pair-mclossmodelalsooutperformsthemodeltrainedwithcombinedcon-trastivelossandsoftmaxlosswhosevericationaccuracyisreportedas96:13%[31].Sincethismodelistrainedonthesamedatasetusingthesamenetworkarchitecture,thisclearlydemonstratestheeffectivenessofourproposedmetriclearningobjectivesonfacerecognitiontasks.Nevertheless,thereareotherworksreportedhigheraccuracyforfaceverication.Forexample,[19]demonstrated99:63%testsetvericationaccuracyonLFWdatabaseusingtripletnetworktrainedwithhundredmillionsofexamplesand[22]reported98:97%bytrainingmultipledeepneuralnetworksfromdifferentfacialkeypointregionswithcombinedcontrastivelossandsoftmaxloss.Sinceourcontri-butioniscomplementarytothescaleofthetrainingdataorthenetworkarchitecture,itisexpectedtobringfurtherimprovementbyreplacingtheexistingtrainingobjectivesintoourproposal.7 (a)Tripletand192-pairloss (b)Tripletand192-wayclassicationaccuracyFigure3:Trainingcurveoftr

8 iplet,192-pair-ovo,and192-pair-mclossmod
iplet,192-pair-ovo,and192-pair-mclossmodelsonWebFacedatabase.Wemeasureboth(a)tripletand192-pairlossaswellas(b)classicationaccuracy. Onlineproduct Car-196 CUB-200 602 304 602 304 1012 602 304 1012 F1 26.53 25.01 33.55 31.92 29.87 27.24 27.54 26.66NMI 87.77 87.40 63.87 62.94 61.84 60.39 60.43 59.37K=1 65.25 63.58 71.12 69.30 65.49 50.96 50.91 49.65 1922 964 646 3212 VRF 98.270:19 98.250:25 97.980:22 97.570:33Rank-1 88.58 87.53 83.96 79.61DIR@FIR=1% 66.51 66.22 64.38 56.46 Table4:F1,NMI,andrecall@1scoresononlineproduct,Car-196,andCUB-200datasets,andvericationandrank-1accuracyonLFWdatabase.FormodelnameofNM,wereferNthenumberofdifferentclassesineachbatchandMthenumberofpositiveexamplesperclass.Finally,weprovidetrainingcurveinFigure3.Sincethedifferenceoftripletlossbetweenmodelsisrelativelysmall,wealsomeasure192-pairloss(andaccuracy)ofthreemodelsatevery5kiteration.Weobservesignicantlyfastertrainingprogressusing192-pair-mclossthantripletloss;itonlytakes15kiterationstoreachatthelossatconvergenceoftripletlossmodel(240kiteration).4.4AnalysisontupletconstructionmethodsInthissection,wehighlighttheimportanceoftheproposedtupletconstructionstrategyusingNpairsofexamplesbyconductingcontrolexperimentsusingdifferentnumbersofdistinguishableclassesperbatchwhilexingthetotalnumberofexamplesperbatchthesame.Forexample,ifwearetouseN=2differentclassesperbatchratherthanNdifferentclasses,weselect4examplesfromeachclassinsteadofapairofexamples.SinceN-pairlossisnotdenedtohandlemultiplepositiveexamples,wefollowthedenitionofNCAinthisexperimentsasfollows:L=1 2NXi�logPj6=i:yj=yiexp(f�ifj) Pj6=iexp(f�ifj)(9)WerepeatexperimentsinSection4.2and4.3andprovidethesummaryresultsinTable4.Weobserveacertaindegreeofperformancedropaswedecreasethenumberofclasses.Despite,alloftheseresultsaresubstantiallybetterthanthoseoftripletloss,conrmingtheimportanceoftrainingwithmultiplenegativeclasses,andsuggestingtotrainwithasmanynegativeclassesaspossible.5ConclusionTripletlosshasbeenwidelyusedfordeepmetriclearning,eventhoughwithsomewhatunsatisfac-toryconvergence.Wepresentascalablenovelobjective,multi-calssN-pairloss,fordeepmetriclearning,whichsignicantlyimprovesuponthetripletlossbypushingawaymultiplenegativeex-amplesjointlyateachupdate.WedemonstratetheeffectivenessofN-pair-mclossonne-grainedvisualrecognitionandverication,aswellasvisualobjectclusteringandretrieval.AcknowledgmentsWeexpressoursincerethankstoWenlingShangforhersupportinmanypartsofthisworkfromalgorithmdevelopmenttopaperwriting.WealsothankJunhyukOhandPaulVernazaforhelpfuldiscussion.8 References[1]L.Best-Rowden,H.Han,C.Otto,B.F.Klare,andA.K.Jain.Unconstrainedfacerecognition:Identifyingapersonofinterestfromamediacollection.IEEETransactionsonInformationForensicsandSecurity,9(12):2144–2157,2014.[2]G.Chechik,V.Sharma,U.Shalit,andS.Bengio.Largescaleonlinelearningofimagesimilaritythroughranking.JournalofMachineLearningResearch,11:1109–1135,2010.[3]S.Chopra,R.Hadsell,andY.LeCun.Learningasimilaritymetricdiscriminatively,withapplicationtofaceverication.InCVPR,2005.[4]Y.Cui,F.Zhou,Y.Lin,andS.Belongie.Fine-grainedcategorizationanddatasetbootstrappingusingdeepmetriclearningwithhumansintheloop.InCVPR,2016.[5]R.Girshick,J

9 .Donahue,T.Darrell,andJ.Malik.Region-bas
.Donahue,T.Darrell,andJ.Malik.Region-basedconvolutionalnetworksforaccurateobjectdetectionandsegmentation.IEEETransactionsonPatternAnalysisandMachineIntelligence,PP(99):1–1,2015.[6]J.Goldberger,G.E.Hinton,S.T.Roweis,andR.Salakhutdinov.Neighbourhoodcomponentsanalysis.InNIPS,2004.[7]R.Hadsell,S.Chopra,andY.LeCun.Dimensionalityreductionbylearninganinvariantmapping.InCVPR,2006.[8]G.B.Huang,M.Narayana,andE.Learned-Miller.Towardsunconstrainedfacerecognition.InCVPRWorkshop,2008.[9]H.Jegou,M.Douze,andC.Schmid.Productquantizationfornearestneighborsearch.IEEETransactionsonPatternAnalysisandMachineIntelligence,33(1):117–128,2011.[10]Y.Jia,E.Shelhamer,J.Donahue,S.Karayev,J.Long,R.Girshick,S.Guadarrama,andT.Darrell.Caffe:Convolutionalarchitectureforfastfeatureembedding.arXivpreprintarXiv:1408.5093,2014.[11]D.KingmaandJ.Ba.Adam:Amethodforstochasticoptimization.InICLR,2015.[12]J.Krause,M.Stark,J.Deng,andL.Fei-Fei.3dobjectrepresentationsforne-grainedcategorization.InICCVWorkshop,2013.[13]A.Krizhevsky,I.Sutskever,andG.E.Hinton.ImageNetclassicationwithdeepconvolutionalneuralnetworks.InNIPS,2012.[14]J.Liu,Y.Deng,T.Bai,andC.Huang.Targetingultimateaccuracy:Facerecognitionviadeepembedding.CoRR,abs/1506.07310,2015.[15]D.G.Lowe.Similaritymetriclearningforavariable-kernelclassier.Neuralcomputation,7(1):72–85,1995.[16]C.D.Manning,P.Raghavan,H.Sch¨utze,etal.Introductiontoinformationretrieval,volume1.CambridgeuniversitypressCambridge,2008.[17]M.Norouzi,D.J.Fleet,andR.R.Salakhutdinov.Hammingdistancemetriclearning.InNIPS,2012.[18]O.M.Parkhi,A.Vedaldi,andA.Zisserman.Deepfacerecognition.BMVC,2015.[19]F.Schroff,D.Kalenichenko,andJ.Philbin.FaceNet:Auniedembeddingforfacerecognitionandclustering.InCVPR,2015.[20]K.SimonyanandA.Zisserman.Verydeepconvolutionalnetworksforlarge-scaleimagerecognition.InICLR,2015.[21]H.O.Song,Y.Xiang,S.Jegelka,andS.Savarese.Deepmetriclearningvialiftedstructuredfeatureembedding.InCVPR,2016.[22]Y.Sun,Y.Chen,X.Wang,andX.Tang.Deeplearningfacerepresentationbyjointidentication-verication.InNIPS,2014.[23]C.Szegedy,W.Liu,Y.Jia,P.Sermanet,S.Reed,D.Anguelov,D.Erhan,V.Vanhoucke,andA.Rabinovich.Goingdeeperwithconvolutions.InCVPR,2015.[24]Y.Taigman,M.Yang,M.Ranzato,andL.Wolf.Deepface:Closingthegaptohuman-levelperformanceinfaceverication.InCVPR,2014.[25]C.Wah,S.Branson,P.Welinder,P.Perona,andS.Belongie.TheCaltech-UCSDBirds-200-2011Dataset.TechnicalReportCNS-TR-2011-001,CaliforniaInstituteofTechnology,2011.[26]J.Wang,Y.Song,T.Leung,C.Rosenberg,J.Wang,J.Philbin,B.Chen,andY.Wu.Learningne-grainedimagesimilaritywithdeepranking.InCVPR,2014.[27]K.Q.Weinberger,J.Blitzer,andL.K.Saul.Distancemetriclearningforlargemarginnearestneighborclassication.InNIPS,2005.[28]J.Weston,S.Bengio,andN.Usunier.Wsabie:Scalinguptolargevocabularyimageannotation.InIJCAI,volume11,pages2764–2770,2011.[29]S.Xie,T.Yang,X.Wang,andY.Lin.Hyper-classaugmentedandregularizeddeeplearningforne-grainedimageclassica-tion.InCVPR,2015.[30]E.P.Xing,A.Y.Ng,M.I.Jordan,andS.Russell.Distancemetriclearningwithapplicationtoclusteringwithside-information.2003.[31]D.Yi,Z.Lei,S.Liao,andS.Z.Li.Learningfacerepresentationfromscratch.CoRR,abs/1411.7923,2014.[32]X.Zhang,F.Zhou,Y.Lin,andS.Zhang.Embeddinglabelstructuresforne-grainedfeaturerepresentation.InCVP

Related Contents


Next Show more