Presentation on theme: "Efcient Inference in Fully Connected CRFs with Gaussian Edge Potentials Philipp Kr ahenb uhl Computer Science Department Stanford University philkrcs"— Presentation transcript
structurethattilesthefeaturespacewithsimplicesarrangedalongd+1axes[1].ThepermutohedrallatticeexploitstheseparabilityofunitvarianceGaussiankernels.Thusweneedtoapplyawhiteningtransform~f=Uftothefeaturespaceinordertouseit.Thewhiteningtransformationisfoundus-ingtheCholeskydecompositionof(m)intoUUT.Inthetransformedspace,thehigh-dimensionalconvolutioncanbeseparatedintoasequenceofone-dimensionalconvolutionsalongtheaxesofthelattice.Theresultingapproximatemessagepassingprocedureishighlyefcientevenwithafullysequentialimplementationthatdoesnotmakeuseofparallelismorthestreamingcapabilitiesofgraphicshardware,whichcanprovidefurtheraccelerationifdesired.4LearningWelearntheparametersofthemodelbypiecewisetraining.First,theboostedunaryclassiersaretrainedusingtheJointBoostalgorithm[21],usingthefeaturesdescribedinSection5.Nextwelearntheappearancekernelparametersw(1),,andforthePottsmodel.w(1)canbefoundefcientlybyacombinationofexpectationmaximizationandhigh-dimensionalltering.Unfortunately,thekernelwidthsandcannotbecomputedeffectivelywiththisapproach,sincetheirgradientinvolvesasumofnon-Gaussiankernels,whicharenotamenabletothesameaccelerationtechniques.Wefoundittobemoreefcienttousegridsearchonaholdoutvalidationsetforallthreekernelparametersw(1),and.Thesmoothnesskernelparametersw(2)and
donotsignicantlyaffectclassicationaccuracy,butyieldasmallvisualimprovement.Wefoundw(2)=
=1toworkwellinpractice.Thecompatibilityparameters(a;b)=(b;a)arelearnedusingL-BFGStomaximizethelog-likelihood`(:I;T)ofthemodelforavalidationsetofimagesIwithcorrespondinggroundtruthlabelingsT.L-BFGSrequiresthecomputationofthegradientof`,whichisintractabletoestimateexactly,sinceitrequirescomputingthegradientofthepartitionfunctionZ.Instead,weusethemeaneldapproximationdescribedinSection3toestimatethegradientofZ.Thisleadstoasimpleapproximationofthegradientforeachtrainingimage:@
@(a;b)`(:I(n);T(n))XiT(n)i(a)Xj6=ik(fi;fj)T(n)j(b)+XiQi(a)Xj6=ik(fi;fj)Qi(b);(6)where(I(n);T(n))isasingletrainingimagewithitsgroundtruthlabelingandT(n)(a)isabinaryimageinwhichtheithpixelT(n)i(a)hasvalue1ifthegroundtruthlabelattheithpixelofT(n)isaand0otherwise.AdetailedderivationofEquation6isgiveninthesupplementarymaterial.ThesumsPj6=ik(fi;fj)Tj(b)andPj6=ik(fi;fj)Qi(b)arebothcomputationallyexpensivetoeval-uatedirectly.AsinSection3.2,weusehigh-dimensionallteringtocomputebothsumsefciently.TheruntimeofthenallearningalgorithmislinearinthenumberofvariablesN.5ImplementationTheunarypotentialsusedinourimplementationarederivedfromTextonBoost[19,13].Weusethe17-dimensionallterbanksuggestedbyShottonetal.[19],andfollowLadick´yetal.[13]byaddingcolor,histogramoforientedgradients(HOG),andpixellocationfeatures.OurevaluationontheMSRC-21datasetusesthisextendedversionofTextonBoostfortheunarypotentials.FortheVOC2010datasetweincludetheresponseofboundingboxobjectdetectors[4]foreachobjectclassas20additionalfeatures.ThisincreasestheperformanceoftheunaryclassiersontheVOC2010from13%to22%.Wegainanadditional5%bytrainingalogisticregressionclassierontheresponsesoftheboostedclassier.Forefcienthigh-dimensionalltering,weuseapubliclyavailableimplementationofthepermuto-hedrallattice[1].Wefoundadownsamplingrateofonestandarddeviationtoworkbestforallourexperiments.Sampling-basedlteringalgorithmsunderestimatetheedgestrengthk(fi;fj)forverysimilarfeaturepoints.Propernormalizationcancanceloutmostofthiserror.Thepermutohedrallatticeallowsfortwotypesofnormalizations.Aglobalnormalizationbytheaveragekernelstrength5
Runtime
Standardgroundtruth
Accurategroundtruth
Global
Average
Global
Average
Unaryclassiers
84:0
76:6
83:21:5
80:62:3
GridCRF
1s
84:6
77:2
84:81:5
82:41:8
RobustPnCRF
30s
84:9
77:5
86:51:0
83:11:5
FullyconnectedCRF
0:2s
86:0
78:3
88:20:7
84:70:7
Figure3:QualitativeandquantitativeresultsontheMSRC-21dataset.rameters.Theunarypotentialswerelearnedonaseparatetrainingsetthatdidnotincludethe94accuratelyannotatedimages.WealsoadoptthemethodologyproposedbyKohlietal.[9]forevaluatingsegmentationaccuracyaroundboundaries.Specically,wecounttherelativenumberofmisclassiedpixelswithinanar-rowband(trimap)surroundingactualobjectboundaries,obtainedfromtheaccurategroundtruthimages.AsshowninFigure4,ouralgorithmoutperformspreviousworkacrossalltrimapwidths.PASCALVOC2010.DuetothelackofapubliclyavailablegroundtruthlabelingforthetestsetinthePASCALVOC2010,weusethetrainingandvalidationdataforallourexperiments.Werandomlypartitionedtheimagesinto3groups:40%training,15%validation,and45%testset.Seg-mentationaccuracywasmeasuredusingthestandardVOCmeasure[3].Theunarypotentialswerelearnedonthetrainingsetandyieldedanaverageclassicationaccuracyof27:6%.TheparametersforthePottspotentialsinthefullyconnectedCRFmodelwerelearnedonthevalidationset.The
(a)Trimapsofdifferentwidths
(b)SegmentationaccuracywithintrimapFigure4:Segmentationaccuracyaroundobjectboundaries.(a)Visualizationofthetrimapmeasure.(b)Percentofmisclassiedpixelswithintrimapsofdifferentwidths.7
References[1]A.Adams,J.Baek,andM.A.Davis.Fasthigh-dimensionallteringusingthepermutohedrallattice.ComputerGraphicsForum,29(2),2010.2,5[2]A.Adams,N.Gelfand,J.Dolson,andM.Levoy.Gaussiankd-treesforfasthigh-dimensionalltering.ACMTransactionsonGraphics,28(3),2009.2[3]M.Everingham,L.VanGool,C.K.I.Williams,J.Winn,andA.Zisserman.ThePASCALVisualObjectClasses(VOC)challenge.IJCV,88(2),2010.6,7[4]P.F.Felzenszwalb,R.B.Girshick,andD.A.McAllester.Cascadeobjectdetectionwithdeformablepartmodels.InProc.CVPR,2010.5[5]B.Fulkerson,A.Vedaldi,andS.Soatto.Classsegmentationandobjectlocalizationwithsuperpixelneighborhoods.InProc.ICCV,2009.1[6]C.Galleguillos,A.Rabinovich,andS.Belongie.Objectcategorizationusingco-occurrence,locationandappearance.InProc.CVPR,2008.1[7]S.Gould,J.Rodgers,D.Cohen,G.Elidan,andD.Koller.Multi-classsegmentationwithrelativelocationprior.IJCV,80(3),2008.1[8]X.He,R.S.Zemel,andM.A.Carreira-Perpinan.Multiscaleconditionalrandomeldsforimagelabeling.InProc.CVPR,2004.1[9]P.Kohli,L.Ladick´y,andP.H.S.Torr.Robusthigherorderpotentialsforenforcinglabelconsistency.IJCV,82(3),2009.1,2,6,7[10]D.KollerandN.Friedman.ProbabilisticGraphicalModels:PrinciplesandTechniques.MITPress,2009.3[11]V.KolmogorovandR.Zabih.Whatenergyfunctionscanbeminimizedviagraphcuts?PAMI,26(2),2004.2[12]S.KumarandM.Hebert.Ahierarchicaleldframeworkforuniedcontext-basedclassication.InProc.ICCV,2005.1[13]L.Ladick´y,C.Russell,P.Kohli,andP.H.S.Torr.Associativehierarchicalcrfsforobjectclassimagesegmentation.InProc.ICCV,2009.1,5[14]L.Ladick´y,C.Russell,P.Kohli,andP.H.S.Torr.Graphcutbasedinferencewithco-occurrencestatistics.InProc.ECCV,2010.1[15]J.D.Lafferty,A.McCallum,andF.C.N.Pereira.Conditionalrandomelds:Probabilisticmodelsforsegmentingandlabelingsequencedata.InProc.ICML,2001.3[16]S.ParisandF.Durand.Afastapproximationofthebilaterallterusingasignalprocessingapproach.IJCV,81(1),2009.2,4[17]N.PayetandS.Todorovic.(RF)2randomforestrandomeld.InProc.NIPS.2010.1,2[18]A.Rabinovich,A.Vedaldi,C.Galleguillos,E.Wiewiora,andS.Belongie.Objectsincontext.InProc.ICCV,2007.1[19]J.Shotton,J.M.Winn,C.Rother,andA.Criminisi.Textonboostforimageunderstanding:Multi-classobjectrecognitionandsegmentationbyjointlymodelingtexture,layout,andcontext.IJCV,81(1),2009.1,3,5,6[20]S.W.Smith.Thescientistandengineer'sguidetodigitalsignalprocessing.CaliforniaTechnicalPub-lishing,1997.4[21]A.Torralba,K.P.Murphy,andW.T.Freeman.Sharingvisualfeaturesformulticlassandmultiviewobjectdetection.PAMI,29(5),2007.5[22]T.ToyodaandO.Hasegawa.Randomeldmodelforintegrationoflocalinformationandglobalinfor-mation.PAMI,30,2008.1[23]J.J.VerbeekandB.Triggs.Scenesegmentationwithcrfslearnedfrompartiallylabeledimages.InProc.NIPS,2007.19