Efros and Martial Hebert Robotics Institute Carnegie Mellon University Abstract Since most current scene understanding approaches operate either on the 2D image or using a surfacebased representation they do not allow reasoning about the physical co ID: 24499
Download Pdf The PPT/PDF document "Blocks World Revisited Image Understandi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
2Guptaet.al Fig.1.Exampleoutputofourautomaticsceneunderstandingsystem.The3Dparsegraphsummarizestheinferredobjectproperties(physicalboundaries,geometrictype,andmechanicalproperties)andrelationshipsbetweenobjectswithinthescene.Seemoreexamplesonprojectwebpage.impossibleorhighlyunlikely.Second,evenifsuccessful,thesepop-upmodels(alsoknownas\billboards"ingraphics)lackthephysicalsubstanceofatrue3Drepresentation.LikeinaPotemkinvillage,thereisnothingbehindtheprettyfacades!Thispaperarguesthatamorephysicalrepresentationofthescene,whereobjectshavevolumeandmass,canprovidecrucialhigh-levelconstraintstohelpconstructaglobally-consistentmodelofthescene,aswellasallowforpowerfulwaysofunderstandingandinterpretingtheunderlyingimage.Thesenewcon-straintscomeintheformofgeometricrelationshipsbetween3Dvolumesaswellaslawsofstaticsgoverningthebehaviorofforcesandtorques.Ourmaininsightisthattheproblemcanbeframedqualitatively,withoutrequiringametricre-constructionofthe3Dscenestructure(whichis,ofcourse,impossiblefromasingleimage).Figure1showsarealoutputfromourfully-automaticsystem.Thepaper'smaincontributionsare:(a)anovelqualitativescenerepresen-tationbasedonvolumes(blocks)drawnfromasmalllibrary;(b)theuseof3Dgeometryandmechanicalconstraintsforreasoningaboutscenestructure;(c)aniterativeInterpretation-by-Synthesisframeworkthat,startingfromtheemptygroundplane,progressively\buildsup"aconsistentandcoherentinterpretation BlocksWorldRevisited3oftheimage;(d)atop-downsegmentationadjustmentprocedurewherepartialsceneinterpretationsguidethecreationofnewsegmentproposals.RelatedWork:Theideathatthebasicphysicalandgeometricconstraintsofourworld(so-calledlawsofnature)playacrucialroleinvisualperceptiongoesbackatleasttoHelmholtzandhisargumentfor\unconsciousinference".Incomputervision,thisthemecanbetracedbacktotheverybeginningsofourdiscipline,withLarryRobertsarguingin1965that\theperceptionofsolidobjectsisaprocesswhichcanbebasedonthepropertiesofthree-dimensionaltransformationsandthelawsofnature"[17].Roberts'famousBlocksWorldwasadaringearlyattemptatproducingacompletesceneunderstandingsystemforaclosedarticialworldoftexturelesspolyhedralshapesbyusingagenericlibraryofpolyhedralblockcomponents.Atthesametime,researchersinroboticsalsorealizedtheimportanceofphysicalstabilityofblockassembliessincemanyblockcongurations,whilegeometricallypossible,werenotphysicallystable.Theyshowedhowtogenerateplansforthemanipulationstepsrequiredtogofromaninitialcongurationtoatargetcongurationsuchthatatanystageofassemblytheblocksworldremainedstable[1].Finally,theMITCopyDemo[21]combinedthetwoeorts,demonstratingarobotthatcouldvisuallyobserveablocksworldcongurationandthenrecreateitfromapileofunorderedblocks(recently[2]gaveamoresophisticatedreinterpretationofthisidea,butstillinahighlyconstrainedenvironment).Unfortunately,hopesthattheinsightslearnedfromtheblocksworldwouldcarryoverintotherealworlddidnotmaterializeasitbecameapparentthatalgo-rithmsweretoodependentonitsveryrestrictiveassumptions(perfectboundarydetection,texturelesssurfaces,etc).Whiletheideaofusing3Dgeometricprimi-tivesforunderstandingrealscenescarriedonintotheworkongeneralizedcylin-dersandresultedinsomeimpressivedemosinthe1980s(e.g.,ACRONYM[16]),iteventuallygavewaytothecurrentlydominantappearance-based,semanticla-belingmethods,e.g.,[19,5].Ofthese,themostambitiousistheeortofS.C.Zhuandcolleagues[23]whouseahand-craftedstochasticgrammaroverahighlyde-taileddatasetoflabelledobjectsandpartstohierarchicallyparseanimage.Whiletheyshowimpressiveresultsforafewspecicscenetypes(e.g.,kitchens,corridors)theapproachisyettobedemonstratedonmoregeneraldata.Mostrelatedtoourworkisarecentseriesofmethodsthatattempttomodelgeometricscenestructurefromasingleimage:inferringqualitativegeometryofsurfaces[8,18],ndingground/vertical\fold"lines[3],groupinglinesintosurfaces[22,13],estimatingocclusionboundaries[10],andcombininggeometricandsemanticinformation[9,14,4].However,theseapproachesdonotmodeltheglobalinteractionsbetweenthegeometricentitieswithinthescene,andattemptstoincorporatethematthe2Dimagelabelinglevel[15,12]havebeenonlypartiallysuccessful.Whilesinglevolumeshavebeenusedtomodelsimplebuildinginteriors[6]andobjects(suchasbed)[7],theseapproachesdonotmodelgeometricormechanicalinter-volumerelationships.Andwhilemodelingphysicalconstraintshasbeenusedinthecontextofdynamicobjectrelationshipsinvideo[20],wearenotawareofanyworkusingthemtoanalyzestaticimages. BlocksWorldRevisited9 Fig.6.Ourapproachforevaluatingblockhypothesisandestimatingtheassociatedcostofplacingablock.estimationalgorithmforsuperpixels.Forexample,iftheblockisassociatedwith\front-right"viewclassandthesuperpixelisontherightofthefoldingedge,thenP(gsjGi;fi;s)wouldbetheprobabilityofthesuperpixelbeinglabeledright-facingbythesurfacelayoutestimationalgorithm.Forestimatingthecontactpointslikelihoodterm,weusetheconstraintsofperspectiveprojection.Giventheblockgeometryandthefoldingedge,wetstraightlineslgandlstothethegroundandskycontactpoints,respectively,andweverifyiftheirslopesareinagreementwiththesurfacegeometry:forafrontalsurface,lgandlsshouldbehorizontal,andforleft-andright-facingsurfaceslgandlsshouldintersectonthehorizonline.3.5EstimatingPhysicalStabilityOurstabilitymeasure(Figure6d)consistsofthreeterms.(1)InternalSta-bility:Wepreferblockswithlowpotentialenergies,thatis,blockswhichhaveheavierbottomandlightertop.Thisisusefulforrejectingsegmentationswhichmergetwosegmentswithdierentdensities,suchasthelighterobjectbelowtheheavierobjectshownonFigure4(c).Forcomputinginternalstability,werotatetheblockbyasmallangle,,(clockwiseandanti-clockwise)aroundthecenterofeachface;andcomputethechangeinpotentialenergyoftheblockas:Pi=Xc2flight;medium;heavygXs2Sip(ds=c)mchs;(4)wherep(ds=c)istheprobabilityofassigningdensityclassctosuperpixels,hsisthechangeinheightduetotherotationandmcisaconstantrepresentingthedensityclass.Thechangeinpotentialenergyisafunctionofthreeconstants. 10Guptaet.al Fig.7.(a)Computationoftorquesaroundcontactlines.(b)Extracteddepthcon-straintsarebasedonconvexityandsupportrelationshipsamongblocks.Usingconstraintssuchashm=mheavy mmedium1andlm=mlight mmedium1,wecomputetheexpectedvalueofPiwithrespecttotheratioofdensities(hmandlm).Theprioronratioofdensitiesfortheobjectscanbederivedusingdensityandthefrequencyofoccurrenceofdierentmaterialsinourtrainingimages.(2)Stability:Wecomputethelikelihoodofablockbeingstablegiventhedensitycongurationandsupportrelations.Forthis,werstcomputethecontactpointsoftheblockwiththesupportingblockandthencomputethetorqueduetogravitationalforceexertedbyeachsuperpixelandtheresultantcontactforcearoundthecontactline(Figure7a).Thisagainleadstotorqueasafunctionofthreeconstantsandweusesimilarqualitativeanalysistocomputethestability.(3)ConstraintsfromBlockStrength:Wealsoderiveconstraintonsupportattributesbasedonthedensitiesofthetwoblockspossiblyinteractingwitheachother.Ifthedensityofthesupportingblockislessthandensityofthesupportedblock;wethenassumethatthetwoblocksarenotinphysicalcontactandtheblockbelowoccludesthecontactoftheblockabovewiththeground.3.6ExtractingDepthConstraintsThedepthorderingconstraints(Figure6(e))areusedtoguidethenextstepofreningthesegmentationbysplittingandmergingregions.Computingdepthorderingrequiresestimatingpairwisedepthconstraintsonblocksandthenusingthemtoformglobaldepthordering.TherulesforinferringdepthconstraintsareshowninFigure7(b).Thesepairwiseconstraintsarethenusedtogenerateaglobalpartialdepthorderingviaasimpleconstraintsatisfactionapproach.3.7CreatingSplitandMergeProposalsThisnalstepinvolvingchangestothesegmentation(Figure6f)iscrucialbecauseitavoidsthepitfallsofprevioussystemswhichassumedaxed,initialsegmentation(orevenmultiplesegmentations)andwereunabletorecoverfromincorrectorincompletegroupings.Forexample,nosegmentationalgorithmcangrouptworegionsseparatedbyanoccludingobjectbecausesuchamergewouldrequirereasoningaboutdepthordering.ItispreciselythistypeofreasoningthatthedepthorderingestimationofSection3.6enables.Weincludesegmentationintheinterpretationloopandusethecurrentinterpretationofthescenetogeneratemoresegmentsthatcanbeutilizedasblocksintheblocksworld.Usingestimateddepthrelationshipsandblockviewclasseswecreatemergeproposalswheretwoormorenon-adjacentsegmentsarecombinediftheyshareablockasneighborwhichisestimatedtobeinfrontoftheminthecurrentviewpoint.Inthatcase,thesharedblockisinterpretedasanoccluderwhich BlocksWorldRevisited11fragmentedthebackgroundblockintopieceswhichthemergeproposalattemptstoreconnect.Wealsocreateadditionalmergeproposalsbycombingtwoormoreneighboringsegments.Splitproposalsdivideablockintotwoormoreblocksiftheinferredpropertiesoftheblockarenotinagreementwithcondentindividualcues.Forexample,ifthesurfacelayoutalgorithmestimatesasurfaceasfrontalwithhigh-condenceandourinferredgeometryisnotfrontal,thentheblockisdividedtocreatetwoormoreblocksthatagreewiththesurfacelayout.Thesplitandmergeproposalsarethenevaluatedbyacostfunctionwhosetermsarebasedonthecondenceintheestimatedgeometryandphysicalstabilityofthenewblock(s)comparedtopreviousblock(s).Inourexperiments,approximately11%oftheblocksarecreatedusingtheresegmentationprocedure.4ExperimentalResultsSincetherehasbeensolittledoneintheareaofqualitativevolumetricsceneunderstanding,therearenoestablisheddatasets,evaluationmethodologies,orevenmuchintermsofrelevantpreviousworktocompareagainst.Therefore,wewillpresentourevaluationintwoparts:1)qualitatively,byshowingafewrepresentativesceneparseresultsinthepaper,andawidevarietyofresultsontheprojectwebpage1;2)quantitatively,byevaluatingindividualcomponentsofoursystemand,whenavailable,comparingagainsttherelevantpreviouswork.Dataset:WeusethedatasetandmethodologyofHoiemet.al[9]forcom-parison.Thisdatasetconsistsof300imagesofoutdoorsceneswithgroundtruthsurfaceorientationlabeledforallimages,butocclusionboundariesareonlyla-belledfor100images.Therst50(ofthe100)areusedfortrainingthesurfacesegmentation[8]andocclusionreasoning[10]ofoursegmenter.Theremaining250imagesareusedtoevaluateourblocksworldapproach.Thesurfaceclassiersaretrainedandtestedusingve-foldcross-validationjustlikein[9].Qualitative:Figure8showstwoexamplesofcompleteinterpretationau-tomaticallygeneratedbythesystemanda3DtoyblocksworldgeneratedinVRML.Inthetopexample,thebuildingisoccludedbyatreeintheimageandthereforenoneofthepreviousapproachescancombinethetwofacesofthebuild-ingtoproduceasinglebuildingregion.Inapop-upbasedrepresentation,theplacementoftheleftfaceisunconstrainedduetothecontactwithgroundnotbeingvisible.However,inourapproachvolumetricconstraintsaidthereasoningprocessandcombinethetwofacestoproduceablockoccludedbythetree.Thebottomexampleshowshowstaticscanhelpinselectingthebestblocksandimproveblock-viewestimation.Reasoningaboutmechanicalconstraintsrejectsthesegmentcorrespondingtothewholebuilding(duetounbalancedtorque).Fortheselectedconvexblock,thecuesfromgroundandskycontactpointsaidinpropergeometricclassicationoftheblock.Figure9showsafewotherqualitativeexampleswiththeoverlaidblockandestimatedsurfaceorientations.Quantitative:Weevaluatevariouscomponentsofoursystemseparately.Itisnotpossibletoquantitativelycomparetheperformanceoftheentiresystem 1http://www.cs.cmu.edu/abhinavg/blocksworld