/
F-measure,and,ii)providingauniedevaluationtobothbinaryandnon-binarym F-measure,and,ii)providingauniedevaluationtobothbinaryandnon-binarym

F -measure,and,ii)providingauniedevaluationtobothbinaryandnon-binarym - PDF document

calandra-battersby
calandra-battersby . @calandra-battersby
Follow
374 views
Uploaded On 2016-06-06

F -measure,and,ii)providingauniedevaluationtobothbinaryandnon-binarym - PPT Presentation

TPFN1FalsealarmFP TNFP2PrecisionTP TPFP3ThesequalitiesaretypicallycombinedintoasinglescoreOnecommonscoreistheF measureF measure1 2PrecisionRecall 2PrecisionRecall1 2TP 1 ID: 351037

TP+FN(1)Falsealarm=FP TN+FP(2)Precision=TP TP+FP:(3)Thesequalitiesaretypicallycombinedintoasinglescore.OnecommonscoreistheF -measure:F -measure=(1+ 2)PrecisionRecall 2Precision+Recall=(1+ 2)TP (1+

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "F -measure,and,ii)providingauniedevalua..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

F -measure,and,ii)providingauniedevaluationtobothbinaryandnon-binarymaps.Ourthirdcontributionisproposingfourmeta-measurestoanalyzetheperformanceofevaluationmeasures(Sec-tion5).Muchlikeameasureisusedtoevaluateanalgo-rithm,ameta-measureisusedtoevaluateameasure[22].Forinstance,oneofourmeta-measuresveriesthattherankingofforegroundmapsbyanevaluationmeasureagreeswiththepreferencesofapplicationsthatusetheseforegroundmaps(e.g.imageretrieval,objectdetectionandsegmentation).Usingthesemeta-measureswecomparetheevaluationmeasures,andshowthatourmeasureoutper-formsallothers.2.CurrentEvaluationMeasuresWediscernbetweenbinarymaps,whichconsistofval-uesofeither0or1,andnon-binarymaps,whichconsistofvaluesintherange[0;1].Thesevaluesrepresenttheproba-bilitythatapixelbelongstotheforeground[8].Evaluationofbinarymaps:Allcommonmeasuresforevaluatingabinarymaparebasedonasubsetofthefollow-ingfourbasicquantities:true-positive(TP),true-negative(TN),false-positive(FP)andfalse-negative(FN).Thesequantitiesareusedtoassessdifferentqualitiesofthebinarymap.Themostcommonqualitiesare:Hit-rate&False-alarm,andPrecision&Recall:Hit-rate=Recall=TP TP+FN(1)Falsealarm=FP TN+FP(2)Precision=TP TP+FP:(3)Thesequalitiesaretypicallycombinedintoasinglescore.OnecommonscoreistheF -measure:F -measure=(1+ 2)PrecisionRecall 2Precision+Recall=(1+ 2)TP (1+ 2)TP+ 2FN+FP(4)where isaparameterthatcontrolsthepreferencebetweencomplete-detectionandover-detection(typically =1).AsecondcommonscoreisthePASCALmeasure:PASCAL=TP TP+FN+FP:(5)Evaluationofnon-binarymaps:Non-binarymapsarecomparedagainstabinaryground-truthaswell.ThetwomostcommonevaluationmeasuresareAUCandAP.Bothmeasuresarecomputedbyrstthresholdingthenon-binary (a1)(a2)(a3) (a)FGmap (b1)(b2)(b3) (b)FGmap(b4)(b5)Figure2.Interpolationaw.Foregroundmap(a),whichisidenti-caltotheground-truth,isbetterthanforegroundmap(b).(a1-a3),whicharetheonlypossiblebinarymapsthresholdedfrom(a),areusedtogeneratethegreencurves.(b1-b5),whicharetheonlypossiblebinarymapsthresholdedfrom(b),areusedtogeneratethebluecurves.Thecurvesof(a)and(b)areidentical.Therefore,bothAUCandAPcannotdiscernbetween(a)and(b),andrankthembothasperfect.mapintomultiplebinarymaps.InthecaseoftheAUC,thebinarymapsarethencomparedagainsttheground-truthmapusingtheHit-rate&False-alarmmeasures.EachofthecomparisonsismarkedonaHit-rate&False-alarmgraph.Acurveistheninterpolatedbetweenthemarkedpoints.ThenalAUCscoreistheareaunderthecurve.TheAPscoreiscomputedinasimilarfashion.AcurveisinterpolatedfromthePrecisionandRecallvaluesofthebinarymaps.Theinterpolatedprecisionvalueateachrecalllevel,r,iscomputedasthemaximumprecisionmeasuredathigherrecalllevels[11]:p(r)=max~r:~rrp(~r).TheAPscoreiscomputedbyaveragingtheprecisionvaluesatevenlyspacedrecalllevels.3.LimitationsofCurrentMeasuresWhilethecurrentevaluationmeasuresoftenperformwell,theypossesseverallimitationsthathindertheirperfor-mance.Inwhatfollows,wepresentthreeassumptionsthatarethecausefortheselimitations.WebeginbydiscussinganassumptionofAUC&AP(non-binary)andthenpresenttwoadditionalassumptionsthatapplytoallfourmeasures(non-binaryandbinary).Interpolationaw:BothAUCandAPassumethattheinterpolatedcurve(betweenbinarymaps)isavalidtoolforevaluatingnon-binarymaps.Figure2demonstrateswhythisassumptionisinaccurate.(a)and(b)presenttwofore-groundmapstobeevaluated.(a)isidenticaltotheground-truth,soitshouldbescoredasmuchbetterthan(b).Sur- FGmapsFigure3.Interpolationaw.TheseAUCcurvesaregeneratedforthecyanandtheredforegroundmaps.Sincethescorereliessolelyontheinterpolatedcurveandnotonthelocationofthepointsusedtocreateit,itincorrectlyrankstheredmapasbetter.prisingly,bothmapsobtainaperfectscorebybothAUCandAP.Tounderstandwhythishappens,notethatforFig-ure2(a),thereareonlythreepossibleuniquebinarymapsthatcanbeextracted(bysettingthresholds)andplottedonthegraph.Differently,forFigure2(b),therearevepos-sibleuniquebinarymapsthatcanbeextractedandplotted.Inbothcases,however,theresultinginterpolatedcurvesareidentical.SincebothAUCandAPrelysolelyontheinter-polatedcurve,ignoringthedistributionofpointsalongthecurves,theydeem(b)asperfectas(a).AmorerealisticexampleispresentedinFigure3,whichpresentstheAUCcurvesofthecyanandtheredmapsofFigure1.Thesemapsaretheresultsofstate-of-the-artsaliencydetectionalgorithms[9,13].Intuitively,thecyanmapisbetterthanthered,sinceitismuchlessfuzzy.Fur-thermore,whenusingthesemapsaspriorsinthreedifferentapplications(imageretrieval,objectdetectionandsegmen-tation–Section5),thecyanmapproducedbetterresults.However,bothAUCandAPranktheredmapasbetter.Thisissincetheyignorethelocationofthepointsinthegraph.Bothareblindtothefactthatmanyofthebinarymaps,ob-tainedfromthecyanmap,havebothhighHit-rateandlowFalse-alarm(theregionofgooddetection;seeFigure3).Itisimportanttonotethatthedifferenceinpointdistributionalongthecurvesbetweenthecyanandredcurves,wouldnotchangeregardlessofthechosenthresholdingintervals.Theinterpolationawappliessolelytothemeasuresofnon-binarymaps.Wenextdescribetwomoreawsthatapplytotheevaluationofbothbinaryandnon-binarymaps.Dependencyaw:Currentmeasuresassumethatthepix-elsareindependentofeachother.Figure4demon-strateswhythisassumptionmaybewrong.BothFig-ures4(a)and4(b)haveidenticalTP;TN;FPandFNval- Ground-truth(a)FGmap(b)FGmapFigure4.Dependencyaw.(a-b)aretwobinarymapswiththesameTP;TN;FPandFNvalues.Currentmeasuresconsidereachpixelasindependent.Hence,theyignorethefactthatthefalse-negativesin(b)aresparselyspreadwithintrue-positivede-tections,thusofferingagoodsamplingoftheforegroundregion.Rank:1st2nd F PASCAL Image Ours Apps Ground-truthFigure5.Dependencyaw.Basedonthreeapplications(“Apps”–Section5),thedetectionofferedbythecyanmapissuperiortothatofthered.However,bothF -measureandPASCALranktheredmapashigher.Byincorportaingpixeldependency,ourmeasurecorrectlyranksthecyanmapashigher.ues.Hence,theygettheexactsamescorebyallcurrentevaluationmeasures.However,thefalse-negativesinFig-ure4(a)areconcentrated,thusawholepieceofthefore-groundisnotdetectedatall.Conversely,inFigure4(b)thefalse-negativesaresparselyscatteredamongthetrue-positives,hence,theentireobjectissampled.Formostap-plications,themapsinFigures4(a)and4(b)arenotofthesamequalityandshouldnotreceivethesamescore.Figure5illustratesanothercaseofthedependencyaw,thistimeonareal-worldexample.Thecyanmapcontainsfalse-negativesthataremostlyinregionsoftrue-positivedetections,thusofferingagoodsamplingoftheforegroundregion.Conversely,whiletheredmaphasmoretrue-positivedetections,italsohasnumerousfalse-positivede-tections.Whenusingthesemapsaspriorsinthreedifferentapplications(“Apps”–Section5)thecyanmapproducedthebestresults.Yet,bothPASCALandtheF -measureranktheredmaphigherthanthecyanmap.Equal-importanceaw:Thelastassumptionmadebyallthecurrentmeasuresisthatallerroneousdetectionshave Figure8.ApplicationRanking:Torankforegroundmapsac-cordingtoanapplication,wecomparetheoutputachievedwhenusingtheground-truth,totheoutputwhenusingtheforegroundmap.Theclosertheforegroundistotheground-truth,thecloseritsapplicationoutputshouldbetotheground-truthoutput. Non-binaryBinaryFigure9.Meta-measure1–results:Therankingcorrelationofanevaluationmeasuretothatgivenbytheimageretrievalapplica-tion.Theresultspresentedare1�(denotingSpearman'sRhomeasure).Thelowerthescore,thebetteranevaluationmeasureisatpredictingthepreferenceoftheapplication.Ourmeasureoffersimprovementovertheothermeasures.4.Therankingofanevaluationmeasureshouldnotbesensitivetoinaccuraciesinthemanuallymarkedboundariesintheground-truthmaps.Allofourmeta-measureswereexaminedontheASDdataset[1],whichconsistsof1000naturalimageswithbi-naryground-truthmaps(similarresultswerefoundontheSODdataset[20]).Binaryandnon-binaryforegroundmapsweregeneratedforeachimageusingvestate-of-the-artalgorithmsforsalientobjectdetection[9,10,12,13,19](binarymapsareobtainedbythresholdingthenon-binarymaps).5.1.Meta­Measure1:ApplicationRankingOurrstmeta-measureexaminestherankingcorrelationoftheevaluationmeasuretothatofanapplicationthatusesforegroundmaps.Weassumethattheground-truthmapistheoptimalpriorfortheapplication(upperpathinFig-ure8).Then,givenaforegroundmap,wecomparetheapplication'soutput(lowerpathinFigure8)tothatoftheground-truthoutput.Themoresimilaraforegroundmapistotheground-truthmap,thecloseritsapplication'sout-putshouldbetotheground-truthoutput.Therankingoftheforegroundmapsisdeterminedbythesimilarityoftheiroutputtothatobtainedwhenusingtheground-truth.Fi-nally,therstmeta-measurecomparestherankingbyeachevaluationmeasure:AP,AUC,PASCAL,F -measureandours,totherankingbytheapplication.Weexaminedthreeapplications:imageretrieval,ob-jectdetectionandsegmentation.Similarresultswerefoundinallthreeapplications.Forlackofspace,AppendixAdiscussestherealizationofonlyoneapplication:context-basedimageretrieval.Therealizationoftheotherapplica-tionwasperformedsimilarly.Weperformedthisexperimentusingtheresultsofvestate-of-the-artalgorithms[9,10,12,13,19].Theresultsonthe1000imagesoftheASDdataset[1]areshowninFig-ure9.1�Spearman'sRhomeasure[5]wasusedtoassesstherankingaccuracyofthemeasures.Ascoreof0isgiventoevaluationmeasuresthatrankedthedetectionalgorithmsidenticallytothatoftheapplication.Ascoreof2isgiventomeasuresthatrankedtheforegroundmapsinacompletereversedorder.Inthecaseofnon-binarymaps,wecanseeagreatimprovementoverthepreviouslyusedAUCandAPmeasures.Someimprovementisalsoachievedforbinarymaps,whencomparedtoPASCALandF -measure.Fig-ures5and7illustrateseveralexamplesofhowourmeasurebetterpredictsthepreferenceoftheseapplications.5.2.Meta­Measure2:State­of­the­artvs.GenericThepropertyonwhichwebaseoursecondmeta-measureisthatanevaluationmeasureshouldpreferare-sultobtainedbyastate-of-the-artmethodoveramapcre-atedwithouttakingintoaccountthecontentoftheimage.WeuseacenteredGaussianandcenteredcircleasgenericmapsthatdonotconsiderthecontentoftheimage.TwoexamplesareprovidedinFigure10,onenon-binaryandtheotherbinary.Weexpecttheevaluationmeasuretoscoretheresultobtainedbythestate-of-the-artalgorithminFigure10(c)higherthanthegenericGaussianorcirclemapsinFigure10(d).Yet,currentlyusedmeasurespreferthegenericresults.Conversely,ourmeasurecorrectlyranksthestate-of-the-artresulthigher.Weexaminedthenumberoftimesagenericmapscoredhigherthanthemeanscoreobtainedbythevestate-of-the-artalgorithms[9,10,12,13,19].(Themeanscoreprovidesrobustnesstocasesinwhichaspecicalgorithmproducesapoorresult.)Figure11summarizestheresults:thelowerthescore,thebetterthemeasureis.Ourmeasureoutper-formsthecurrentmethodsofbothnon-binaryandbinarymeasures.Thisisthankstoourconsiderationoftheneigh-borhoodsofdetectionsandtheirlocation. ImageGTGTForegroundForeground(a)(b)(c)(d)(e)Figure14.Meta-Measure4:Therankingofanevaluationmea-sureshouldnotbesensitivetoinaccuraciesinthemanuallymarkedboundariesintheground-truthmaps.Whileground-truthmaps(b)&(c)differslightly,bothAUCandAPswitchedtherankingorderofthetwoforegroundmaps(d)&(e),dependingontheground-truthused.Ourmeasureconsistentlyranked(d)higherthan(e).Bestviewedonscreen. Non-binaryFigure15.Meta-measure4–results:Therankingconsistencyofanevaluationmeasureundersmallannotationinaccuracies.Theresultspresentedare1�ofSpearman'sRhomeasure.Thelowerthescore,thebetter.importance.Wefurthersuggestedanevaluationmeasurethatamendstheseassumptions.Ourmeasureisbasedontwokeyideas.Therstisextendingthebasicquantities(TP;TN;FPandFN)tonon-binaryvalues.Thesecondisweightingerrorsaccordingtotheirlocationandtheirneighborhood.Basedonthese,ourmeasurecanbede-nedasaweightedFw -measure.Anadditionalbenetofourmeasureisofferingauniedsolutiontotheevalua-tionofnon-binaryandbinarymaps.Theadvantagesofourmeasurewereshownviafourdifferentmeta-measures,bothqualitativelyandquantitatively.Acknowledgments:Thisresearchwasfunded(inpart)bytheIn-telCollaborativeResearchInstituteforComputationalIntelligence(ICRI–CI),Minerva,theOllendorffFoundation,andtheIsraelSci-enceFoundationunderGrant1179/11.References[1]R.Achanta,S.Hemami,F.Estrada,andS.Susstrunk.Frequency-tunedsalientregiondetection.InCVPR,2009.1,6[2]S.Alpert,M.Galun,R.Basri,andA.Brandt.Imagesegmentationbyprobabilisticbottom-upaggregationandcueintegration.InCVPR,pages1–8,June2007.1[3]P.Arbel´aez,B.Hariharan,C.Gu,S.Gupta,L.Bourdev,andJ.Ma-lik.Semanticsegmentationusingregionsandparts.InCVPR,pages3378–3385,2012.1[4]P.Arbelaez,M.Maire,C.Fowlkes,andJ.Malik.Contourdetectionandhierarchicalimagesegmentation.PAMI,33(5),2011.1[5]D.J.BestandD.E.Roberts.AlgorithmAS89:Theuppertailprob-abilitiesofSpearman'srho.JournaloftheRoyalStatisticalSociety,24(3):377–379,1975.6[6]A.Blake,C.Rother,M.Brown,P.P´erez,andP.Torr.InteractiveimagesegmentationusinganadaptiveGMMRFmodel.InECCV,pages428–441,2004.1[7]A.BorjiandL.Itti.Exploitinglocalandglobalpatchraritiesforsaliencydetection.InCVPR,pages478–485,2012.1[8]A.Borji,D.Sihite,andL.Itti.Salientobjectdetection:Abench-mark.InECCV,pages414–429,2012.2[9]K.Chang,T.Liu,H.Chen,andS.Lai.Fusinggenericobjectnessandvisualsaliencyforsalientobjectdetection.InICCV,pages914–921,2011.1,3,6[10]M.-M.Cheng,G.-X.Zhang,N.J.Mitra,X.Huang,andS.-M.Hu.Globalcontrastbasedsalientregiondetection.InCVPR,pages409–416,2011.1,6[11]M.Everingham,L.VanGool,C.K.I.Williams,J.Winn,andA.Zis-serman.ThePascalVisualObjectClasses(VOC)Challenge.IJCV,88(2):303–338,2010.1,2,7[12]S.Goferman,L.Zelnik-Manor,andA.Tal.Context-awaresaliencydetection.InCVPR,2010.1,6[13]H.Jiang,J.Wang,Z.Yuan,T.Liu,N.Zheng,andS.Li.Auto-maticsalientobjectsegmentationbasedoncontextandshapeprior.InBMVC,volume3,page7,2012.1,3,6[14]A.Joulin,F.Bach,andJ.Ponce.Multi-classcosegmentation.InCVPR,pages542–549,2012.1[15]T.Judd,F.Durand,andA.Torralba.Abenchmarkofcomputationalmodelsofsaliencytopredicthumanxations.Technicalreport,MIT,2012.1[16]M.S.Lew,N.Sebe,C.Djeraba,andR.Jain.Content-basedmul-timediainformationretrieval:Stateoftheartandchallenges.ACMTransactionsonMultimediaComputing,Communications,andAp-plications,2:1–19,2006.8[17]T.Liu,Z.Yuan,J.Sun,J.Wang,N.Zheng,X.Tang,andH.Shum.Learningtodetectasalientobject.PAMI,pages1–8,2010.1[18]M.Lux.ContentbasedimageretrievalwithLIRE.InACMInterna-tionalConferenceonMultimedia,2011.8[19]R.Margolin,A.Tal,andL.Zelnik-Manor.Whatmakesapatchdis-tinct?InCVPR,pages1139–1146,2013.1,6[20]V.MovahediandJ.Elder.Designandperceptualvalidationofperfor-mancemeasuresforsalientobjectsegmentation.InCVPRW,pages49–56,2010.6[21]F.Perazzi,P.Krahenbuhl,Y.Pritch,andA.Hornung.Saliencylters:Contrastbasedlteringforsalientregiondetection.InCVPR,pages733–740,2012.1[22]J.Pont-TusetandF.Marqu´es.Measuresandmeta-measuresforthesupervisedevaluationofimagesegmentation.InCVPR,pages2131–2138,2013.2,5[23]X.ShenandY.Wu.Auniedapproachtosalientobjectdetectionvialowrankmatrixrecovery.InCVPR.1[24]Y.Wei,F.Wen,W.Zhu,andJ.Sun.Geodesicsaliencyusingback-groundpriors.InECCV,pages29–42,2012.1A.Meta-Measure1:ApplicationRealizationContent-basedimageretrievalndsforagivenqueryim-agethemostsimilarimagesinadataset[16].Thesimilarityisdeterminedbyvariousfeaturessuchascolor-histograms,histogramsoforientedgradients(HOG),andGaborre-sponses.WeusedLIRE[18],apubliclyavailableimageretrievalsystemwith12differentfeatures,weightedac-cordingtotheforegroundmaps.ForeachimageweusedLIREtoretrieveanorderedlistofthe100mostsimilarim-ages.Theground-truthoutputistheorderedlistreturnedwhenusingtheground-truthmap.Thecomparisonbetweentheground-truthoutputtothatofaforegroundmapisper-formedusingSpearman'sRhomeasure.