Figure3NRSfMviewpointestimationEstimatedviewpointsvisualizedusinga3Dcarwireframe223DBasisShapeModelLearningEquippedwithcameraprojectionparametersandkeypointcorrespondencesliftedto3DbyNRSfMont ID: 411883
Download Pdf The PPT/PDF document "Here,Pnisthe2Dprojectionofthe3DshapeSnwi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Here,Pnisthe2Dprojectionofthe3DshapeSnwithwhitenoiseNnandtherigidtransformationgivenbytheortho-graphicprojectionmatrixRn,scalecnand2DtranslationTn.TheshapeisparameterizedasafactoredGaussianwithameanshapeS,mbasisvectors[V1;V2;;Vm]=Vandlatentdeformationparameterszn.Ourkeymodicationisconstraint(2)whereCmaskndenotestheChamferdistanceeldofthenthinstance'sbinarymaskandsaysthatallkey-pointspk;nofinstancenshouldlieinsideitsbinarymask.Weobservedthatthisresultsinmoreaccurateviewpointsaswellasmoremeaningfulshapebaseslearntfromthedata.Learning.Thelikelihoodoftheabovemodelismaxi-mizedusingtheEMalgorithm.Missingdata(occludedkeypoints)isdealtwithbylling-inthevaluesusingtheforwardequationsaftertheE-step.Thealgorithmcom-putesshapeparametersfS;Vg,rigidbodytransformationsfcn;Rn;Tngaswellasthedeformationparametersfzngforeachtraininginstancen.Inpractice,weaugmentthedatausinghorizontallymirroredimagestoexploitbilateralsymmetryintheobjectclassesconsidered.Wealsoprecom-putetheChamferdistanceeldsforthewholesettospeedupcomputation.AsshowninFigure3,NRSfMallowsustoreliablypredictviewpointwhilebeingrobusttointraclassvariations. Figure3:NRSfMviewpointestimation:Estimatedview-pointsvisualizedusinga3Dcarwireframe.2.2.3DBasisShapeModelLearningEquippedwithcameraprojectionparametersandkey-pointcorrespondences(liftedto3DbyNRSfM)onthewholetrainingset,weproceedtobuilddeformable3Dshapemodelsfromobjectsilhouetteswithinaclass.3Dshapereconstructionfrommultiplesilhouettesprojectedfromasingleobjectincalibratedsettingshasbeenwidelystudied.Twoprominentapproachesarevisualhulls[24]andvariationalmethodsderivedfromsnakese.g[14,30]whichdeformasurfacemeshiterativelyuntilconvergence.Someinterestingrecentpapershaveextendedvariationalapproachestohandlecategories[12,13]buttypicallyre-quiresomeformof3Dannotationstobootstrapmodels.Arecentlyproposedvisual-hullbasedapproach[36]requiresonly2Dannotationsaswedoforclass-basedreconstructionanditwassuccessfullydemonstratedonPASCALVOCbutdoesnotserveourpurposesasitmakesstrongassumptionsabouttheaccuracyofthesegmentationandwillinfactllentirelyanysegmentationwithavoxellayer.ShapeModelFormulation.Wemodelourcategoryshapesasdeformablepointcloudsoneforeachsubcategoryoftheclass.Theunderlyingintuitionisthefollowing:sometypesofshapevariationmaybewellexplainedbyapara-metricmodele.g.aToyotasedanandaLexussedan,butitisunreasonabletoexpectthemtomodelthevariationsbe-tweensailboatsandcruiseliners.Suchmodelstypicallyre-quireknowledgeofobjectparts,theirspatialarrangementsetc.[22]andinvolvecomplicatedformulationsthataredif-culttooptimize.Weinsteadtrainseparatelinearshapemodelsfordifferentsubcategoriesofaclass.AsintheNRSfMmodel,weusealinearcombinationofbasestomodelthesedeformations.Notethatwelearnsuchmod-elsfromsilhouettesandthisiswhatenablesustolearnde-formablemodelswithoutrelyingonpointcorrespondencesbetweenscanned3Dexemplars[8].OurshapemodelM=( S;V)comprisesofameanshape SanddeformationbasesV=fV1;:;VKglearntfromatrainingsetT:f(Oi;Pi)gNi=1,whereOiistheinstancesilhouetteandPiistheprojectionfunctionfromworldtoimagecoordinates.NotethatthePiweobtainus-ingNRSfMcorrespondstoorthographicprojectionbutouralgorithmcouldhandleperspectiveprojectionaswell.EnergyFormulation.Weformulateourobjectivefunc-tionprimarilybasedonimagesilhouettes.Forexample,theshapeforaninstanceshouldalwaysprojectwithinitssil-houetteandshouldagreewiththekeypoints(liftedto3DbyNRSfM).Wecapturethesebydeningcorrespondingenergytermsasfollows:(hereP(S)correspondstothe2DprojectionofshapeS,CmaskreferstotheChamferdis-tanceeldofthebinarymaskofsilhouetteOandk(p;Q)isdenedasthesquaredaveragedistanceofpointptoitsknearestneighborsinsetQ)SilhouetteConsistency.Silhouetteconsistencysimplyen-forcesthepredictedshapeforaninstancetoprojectinsideitssilhouette.Thiscanbeachievedbypenalizingthepointsprojectedoutsidetheinstancemaskbytheirdistancefromthesilhouette.Inournotationitcanbewrittenasfollows:Es(S;O;P)=XCmask(p)01(p;O)(3)SilhouetteCoverage.Usingsilhouetteconsistencyalone Figure4:MeanshapeslearntforrigidclassesinPASCALVOCobtainedusingourbasisshapeformulation.Coloren-codesdepthwhenviewedfrontally.accordingly.Finally,themeanshapeisrotatedasperthepredictedviewpointandtranslatedtothecenterofthepre-dictedboundingbox.ShapeInference.Afterinitialization,wesolveforthede-formationweights(initializedto0)aswellasallthecam-eraprojectionparameters(scale,translationandrotation)byoptimizingequation(9)forxedS;V.Notethatwedonothaveaccesstoannotatedkeypointlocationsattesttime,the`KeypointConsistency'energyEkpisignoredduringtheoptimization.Bottom-upShapeRenement.Theaboveoptimiza-tionresultsinatop-down3Dreconstructionbasedonthecategory-levelmodels,inferredobjectsilhouette,viewpointandourshapepriors.Weproposeanadditionalprocess-ingsteptorecoverhighfrequencyshapeinformationbyadaptingtheintrinsicimagesalgorithmofBarronandMalik[5,4],SIRFS,whichexploitsstatisticalregularitiesbetweenshapes,reectanceandilluminationFormally,SIRFSisfor-mulatedasthefollowingoptimizationproblem:minimizeZ;Lg(IS(Z;L))+f(Z)+h(L)whereR=IS(Z;L)isalog-reectanceimage,ZisadepthmapandLisaspherical-harmonicmodelofillu-mination.S(Z;L)isarenderingenginewhichproducesalogshadingimagewiththeilluminationL.g,fandharethelossfunctionscorrespondingtoreectance,shapeandilluminationrespectively.WeincorporateourcurrentcoarseestimateofshapeintoSIRFSthroughanadditionallossterm:fo(Z;Z0)=Xi((ZiZ0i)2+2) owhereZ0istheinitialcoarseshapeandaparameteraddedtomakethelossdifferentiableeverywhere.WeobtainZ0foranobjectbyrenderingadepthmapofourtted3Dshapemodelwhichguidestheoptimizationofthishighlynon-convexcostfunction.Theoutputsfromthisbottom-uprenementarereectance,shapeandilluminationmapsofwhichweretaintheshape.ImplementationDetails.Thegradientsinvolvedinouroptimizationforshapeandprojectionparametersareex-tremelyefcienttocompute.Weuseapproximatenearestneighborscomputedusingk-dtreetoimplementthe`Sil-houetteCoverage'gradientsandleverageChamferdistanceeldsforobtaining`SilhouetteConsistency'gradients.Ouroverallcomputationtakesonlyabout2sectoreconstructanovelinstanceusingasingleCPUcore.Ourtrainingpipelineisalsoequallyefcient-takingonlyafewmin-utestolearnashapemodelforagivenobjectcategory.4.ExperimentsExperimentswereperformedtoassesstwothings:1)howexpressiveourlearned3Dmodelsarebyevaluatinghowwelltheymatchedtheunderlying3Dshapesofthetrainingdata2)studytheirsensitivitywhenttoimagesusingnoisyautomaticsegmentationsandposepredictions.Datasets.Forallourexperiments,weconsiderimagesfromthechallengingPASCALVOC2012dataset[15]whichcontainobjectsfromthe10rigidobjectcategories(aslistedinTable1).Weusethepubliclyavailablegroundtruthclass-specickeypoints[9]andobjectsegmentations[19].Sincegroundtruth3DshapesareunavailableforPASCALVOCandmostotherdetectiondatasets,weevaluatedtheexpressivenessofourlearned3Dmodelsonthenextbestthingwemanagedtoobtain:thePASCAL3D+dataset[39]whichhasupto103DCADmodelsfortherigidcategoriesinPASCALVOC.PASCAL3D+providesbetween4dif-ferentmodelsfortvmonitorandtrainand10forcarandchair.Thedifferentmeshesprimarilydistinguishbe-tweensubcategoriesbutmayalsoberedundant(e.g.,therearemorethan3meshesforsedansincar).Weobtainoursubcategorylabelsonthetrainingdatabymergingsomeofthesecases,whichalsohelpsusintacklingdatasparsityforsomesubcategories.ThesubsetofPASCALweconsideredafterlteringoccludedinstances,whichwedonottackleinthispaper,hadbetween70imagesforsofaand500im-agesforclassesaeroplanesandcars.Wewillmakeallourimagesetsavailablealongwithourimplementation.Metrics.Wequantifythequalityofour3Dmodelsbycom-paringagainstthePASCAL3D+modelsusingtwometrics Classesaerobikeboatbuscarchairmbikesofatraintv mean Mesh KP+Mask 5.006.279.946.225.185.204.986.5812.609.64 7.16 Carvi[36] 5.076.038.808.764.385.744.866.4917.528.37 7.60 Puffball[35] 9.7310.3911.6815.4011.778.588.998.6223.689.45 11.83 Depth KP+Mask 9.257.8712.3611.777.227.518.979.7030.916.84 11.24 Carvi[36] 9.397.2411.4318.426.867.398.0612.2129.575.75 11.63 SIRFS[4] 12.9812.3116.0329.2121.5815.5316.3018.0838.5421.36 20.19 Table1:Studyingtheexpressivenessofourlearnt3Dmodels:comparisonbetweenourmethodand[36,35]usinggroundtruthkeypointsandmasksonPASCALVOC.Notethat[36]operateswithgroundtruthannotationsandreconstructsanimagecorpusandourmethodisusedhereonthesametaskforafaircomparison.Pleaseseetextformoredetails. Classes aerobikeboatbuscarchairmbikesofatraintv mean Mesh KP+Mask 5.136.4610.465.895.075.345.1515.0712.1611.69 8.24 KP+SDS 4.966.5810.584.674.975.405.2115.0812.7812.18 8.24 PP+SDS 6.5814.0214.436.657.967.477.5715.2115.2313.24 10.84 Puffball[35](SDS) 9.6810.2311.8015.9512.428.289.459.6023.389.26 12.00 Depth KP+Mask 9.027.2613.5112.108.048.0210.0023.0525.577.48 12.41 KP+SDS 9.077.9813.579.907.987.969.9922.5723.597.64 12.03 PP+SDS 10.9411.6412.2615.9513.1710.0612.5521.1936.378.98 15.31 SIRFS[4] 11.8011.8315.9829.1521.6415.5816.9119.6437.5823.01 20.31 Table2:Ablationstudyforourmethodassuming/relaxingvariousannotationsattesttimeonobjectsinPASCALVOC.Ascanbeseen,ourmethoddegradesgracefullywithrelaxedannotations.Notethattheseexperimentsareinatrain/testsettingandnumberswilldifferfromtable1.Pleaseseetextformoredetails.-1)theHausdorffdistancenormalizedbythe3Dbound-ingboxsizeofthegroundtruthmodel[3]and2)adepthmaperrortoevaluatethequalityofthereconstructedvisi-bleobjectsurface,measuredasthemeanabsolutedistancebetweenreconstructedandgroundtruthdepth:Z-MAE(^Z;Z)=1 n minXx;yj^Zx;yZx;yj(10)where^ZandZrepresentpredictedandgroundtruthdepthmapsrespectively.Analytically,canbecomputedasthemedianof^ZZand isanormalizationfactortoaccountforabsoluteobjectsizeforwhichweusetheboundingboxdiagonal.Notethatourdepthmaperroristranslationandscaleinvariant.4.1.ExpressivenessofLearned3DModelsWelearnandtour3Dmodelsonthesamewholedataset(notrain/testsplit),followingthesetupofVicenteetal[36].Table1comparesourreconstructionsonPASCALVOCwiththoseofthisrecentlyproposedmethodwhichisspecializedforthistask(e.g.itisnotdesignedforttingtonoisydata),aswellastoastateoftheartclass-agnosticshapeinationmethodthatreconstructsalsofromasin-glesilhouette.Wedemonstratecompetitiveperformanceonbothbenchmarkswithourmodelsshowinggreaterro-bustnestoperspectiveforeshorteningeffectsontrainsandbuses.Category-agnosticmethodsPuffball[35]andSIRFS[4]consistentlyperformworseonthebenchmarkbythemselves.Certainclasseslikeboatandtvmonitorareespeciallyhardbecauseoflargeintraclassvarianceanddatasparsityrespectively.4.2.SensitivityAnalysisInordertoanalyzesensitivityofourmodelstonoisyinputswereconstructedheld-outtestinstancesusingourmodelsgivenjustgroundtruthboundingboxes.Wecomparevariousversionsofourmethodus-inggroundtruth(Mask)/imperfectsegmentations(SDS)andkeypoints(KP)/ourposepredictor(PP)forviewpointestima-tionrespectively.Forposeprediction,weusetheCNN-basedsystemof[34]andaugmentittopredictsubtypesattesttime.Thisisachievedbytrainingthesystemasde-scribedin[34]withadditionalsubcategorylabelsobtainedfromPASCAL3D+asdescribedabove.Toobtainanap-proximatesegmentationfromtheboundingbox,weusetherenementstageofthestate-of-the-artjointdetectionandsegmentationsystemproposedin[20].Here,weuseatrain/testsettingwhereourmodelsaretrainedononlyasubsetofthedataandusedtoreconstructtheheldoutdatafromboundingboxes.Table2showsthatourresultsdegradegracefullyfromthefullyannotatedtothefullyautomaticsetting.Ourmethodisrobusttosome Figure5:Fullyautomaticreconstructionsondetectedinstances(0.5IoUwithgroundtruth)usingourmodelsonrigidcategoriesinPASCALVOC.Weshowourinstancesegmentationinput,theinferredshapeoverlaidontheimage,a2.5Ddepthmap(afterthebottom-uprenementstage),themeshintheimageviewpointandtwootherviews.Itcanbeseenthatourmethodproducesplausiblereconstructionswhichisaremarkableachievementgivenjustasingleimageandnoisyinstancesegmentations.Colorencodesdepthintheimageco-ordinateframe(blueiscloser).Moreresultscanbefoundathttp://goo.gl/lmALxQ. [14]C.H.EstebanandF.Schmitt.Silhouetteandstereofu-sionfor3dobjectmodeling.Comput.Vis.ImageUnderst.,96(3):367392,Dec.2004.3,4[15]M.Everingham,L.VanGool,C.K.I.Williams,J.Winn,andA.Zisserman.ThePASCALVisualObjectClassesChallenge2012(VOC2012)Results.http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.1,5[16]R.Garg,A.Roussos,andL.Agapito.Densevariationalre-constructionofnon-rigidsurfacesfrommonocularvideo.InCVPR,June2013.2[17]R.Girshick,J.Donahue,T.Darrell,andJ.Malik.Richfea-turehierarchiesforaccurateobjectdetectionandsemanticsegmentation.InCVPR,2014.1[18]A.Gupta,A.A.Efros,andM.Hebert.Blocksworldre-visited:Imageunderstandingusingqualitativegeometryandmechanics.InComputerVisionECCV2010,pages482496.Springer,2010.2[19]B.Hariharan,P.Arbelaez,L.Bourdev,S.Maji,andJ.Malik.Semanticcontoursfrominversedetectors.InICCV,2011.5[20]B.Hariharan,P.ArbelĀ“aez,R.Girshick,andJ.Malik.Simul-taneousdetectionandsegmentation.InEuropeanConfer-enceonComputerVision(ECCV),2014.1,4,6,7[21]M.HejratiandD.Ramanan.Analyzing3dobjectsinclut-teredimages.InNIPS,pages602610,2012.2[22]E.Kalogerakis,S.Chaudhuri,D.Koller,andV.Koltun.AProbabilisticModelofComponent-BasedShapeSynthesis.ACMTransactionsonGraphics,31(4),2012.3[23]I.Kemelmacher-Shlizerman.Internetbasedmorphablemodel.InInternationalConferenceonComputerVision(ICCV),2011.1[24]A.Laurentini.Thevisualhullconceptforsilhouette-basedimageunderstanding.PatternAnalysisandMachineIntelli-gence,IEEETransactionson,16(2):150162,Feb1994.3[25]J.J.Lim,H.Pirsiavash,andA.Torralba.Parsingikeaob-jects:Fineposeestimation.InICCV,2013.2[26]C.Nandakumar,A.Torralba,andJ.Malik.Howlittledoweneedfor3-dshapeperception?Perception-London,40(3):257,2011.1[27]R.NevatiaandT.O.Binford.Descriptionandrecognitionofcurvedobjects.ArticialIntelligence,8(1):7798,1977.2[28]M.Prasad,A.Fitzgibbon,A.Zisserman,andL.VanGool.Findingnemo:Deformableobjectclassmodellingusingcurvematching.InCVPR,2010.2[29]L.G.Roberts.MachinePerceptionofThree-DimensionalSolids.PhDthesis,MassachusettsInstituteofTechnology,1963.2[30]Y.SahillioluandY.Yemez.Asurfacedeformationframe-workfor3dshaperecovery.InMultimediaContentRepre-sentation,ClassicationandSecurity,volume4105ofLec-tureNotesinComputerScience,pages570577.SpringerBerlinHeidelberg,2006.3[31]S.Satkin,M.Rashid,J.Lin,andM.Hebert.3dnn:3dnearestneighbor.InternationalJournalofComputerVision,pages129,2014.2[32]S.Suwajanakorn,I.Kemelmacher-Shlizerman,andS.Seitz.Totalmovingfacereconstruction.InD.Fleet,T.Pajdla,B.Schiele,andT.Tuytelaars,editors,ComputerVisionECCV2014,volume8692ofLectureNotesinComputerScience,pages796812.SpringerInternationalPublishing,2014.1[33]L.Torresani,A.Hertzmann,andC.Bregler.Non-rigidstructure-from-motion:Estimatingshapeandmotionwithhierarchicalpriors.TPAMI,2008.2[34]S.TulsianiandJ.Malik.Viewpointsandkeypoints.InCVPR.2015.1,4,6[35]N.R.Twarog,M.F.Tappen,andE.H.Adelson.Playingwithpuffball:simplescale-invariantinationforuseinvisionandgraphics.InACMSymp.onAppliedPerception,2012.6[36]S.Vicente,J.Carreira,L.Agapito,andJ.Batista.Recon-structingpascalvoc.CVPR2014,2014.1,2,3,6[37]S.VicenteandL.deAgapito.Balloonshapes:Reconstruct-inganddeformingobjectswithvolumefromimages.In3DV,pages223230.IEEE,2013.2[38]Z.Wu,S.Song,A.Khosla,F.Yu,L.Zhang,X.Tang,andJ.Xiao.3dshapenets:Adeeprepresentationforvolumetricshapemodeling.InCVPR.2015.7[39]Y.Xiang,R.Mottaghi,andS.Savarese.Beyondpascal:Abenchmarkfor3dobjectdetectioninthewild.InWACV,2014.5,7[40]J.Xiao,B.Russell,andA.Torralba.Localizing3dcuboidsinsingle-viewimages.InAdvancesinNeuralInformationProcessingSystems,pages746754,2012.2[41]S.Zhu,L.Zhang,andB.Smith.Modelevolution:Anin-crementalapproachtonon-rigidstructurefrommotion.InCVPR,2010.2[42]M.Z.Zia,M.Stark,B.Schiele,andK.Schindler.Detailed3drepresentationsforobjectrecognitionandmodeling.Pat-ternAnalysisandMachineIntelligence,IEEETransactionson,35(11):26082623,2013.2