/
InternationalJournalofComputerVision43(1),29 InternationalJournalofComputerVision43(1),29

InternationalJournalofComputerVision43(1),29 - PDF document

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
397 views
Uploaded On 2015-11-15

InternationalJournalofComputerVision43(1),29 - PPT Presentation

LeungandMalik Figure1Somenatural3DtexturesfromtheColumbiaUtrechtdatabaseDanaetal1999LefttorightAluminumFoilRabbitFurPaintedSpheresThesetexturesillustratetheeffectscausedbythe3Dnatureofthemate ID: 194113

LeungandMalik Figure1.Somenatural3DtexturesfromtheColumbia-Utrechtdatabase(Danaetal. 1999).Lefttoright:AluminumFoilRabbitFurPaintedSpheres.Thesetexturesillustratetheeffectscausedbythe3Dnatureofthemate

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "InternationalJournalofComputerVision43(1..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

InternationalJournalofComputerVision43(1),29Ð44,20012001KluwerAcademicPublishers.ManufacturedinTheNetherlands.RepresentingandRecognizingtheVisualAppearanceofMaterialsusingThree-dimensionalTextonsTHOMASLEUNGANDJITENDRAMALIKComputerScienceDivision,UniversityofCaliforniaatBerkeley,Berkeley,CA94720-1776,USA LeungandMalik Figure1.Somenatural3DtexturesfromtheColumbia-Utrechtdatabase(Danaetal.,1999).Lefttoright:AluminumFoilRabbitFurPaintedSpheres.Thesetexturesillustratetheeffectscausedbythe3Dnatureofthematerial:,and Figure2.ThesamepatchofthematerialCrumpledPaperimagedunderthreedifferentlightingandviewingconditions.Theaspectratioofthegureisdeterminedbytheslantofthesurface.Eventhoughthethreeimagesarecorrespondingpatchesfromthesamematerial,theappearancesaredrasticallydifferent.problems:tworegionswillhavethesamebrightnessunderoneillumination;whiletheshadowedregionwillbedarkerinanother.Thesetwoproblemcasesareil-lustratedinFig.3.Thecomplexityintherelationshipbetweentheim-ageintensityvaluestotheviewing/lightingsettingsandthepropertiesof3Dtexturesledtorecentinterestinbuildingexplicitmodelsfor3Dtextures(Chantler,1994;ChantlerandMcGunnigle,1995;DanaandNayar,1998;DanaandNayar,1999b;Danaetal.,1999;KoenderinkandvanDoorn,1996;Koenderinketal.,1999;LeungandMalik,1997;vanGinnekenetal.,1998).Fromtheseanalyticalmodels,suchasGaussiandistributedheightvariation,orcylindricalmodels,low-orderstatisticalquantities,e.g.brightnessdistributionorcorrelationlength,arederived.However,thesemod-elsarerathersimpleandtheylacktheexpressivenesstosolvethegeneralproblemsofnaturalmaterialrep-resentation,recognition,andsynthesisundervaryinglightingandviewingconditions.Themainideaofthispaperisthefollowingatthelocalscale,thereareonlyasmallnumberofpercep-tuallydistinguishablemicro-structuresonthesurface.Forexample,thelocalsurfacereliefmightcor-respondtoridges,grooves,bumps,hollows,etc.Thesecouldoccuratacontinuumoforientationsandheights,butperceptuallywecanonlydistinguishthemuptoanequivalenceclass.Similarly,reectancevariationsfallintoprototypeslikestripes,spots,etc.Ofcourseonecanhavetheproductofthesetwosourcesofvariation.Ourgoalistobuildasmall,nitevocabularyofmicro-structures,whichwecall3Dtextons.Thistermisbyanalogyto2Dtextons,theputativeunitsofpreat-tentivehumantextureperceptionproposedbyJulesznearly20yearsago.Juleszstextons(Julesz,1981)orientationelements,crossingsandterminators RepresentingandRecognizingtheVisualAppearance31 Figure3.(a)Aridge:underoneillumination,theridgeappearsasalight-darktransition,whileitappearsasadark-lighttransitionunderanother.(b)Shadows:underoneillumination,tworegionshavethesamebrightness,whileunderanother,thebrightnessisdifferent.intodisuseastheydidnothaveaprecisedeforgraylevelimages.Inthispaper,were-inventtheconceptandoperationalizeitintermsoflearnedco-occurencesofoutputsoflinearorientedGaussianderivativelters.Inthecaseof3Dtextons,welookattheconcatenationoflterresponsevectorscorre-spondingtodifferentlightingandviewingdirections.Oncewehavebuiltsuchauniversalvocabularyof3Dtextons,thesurfaceofanymaterialsuchasmarble,concrete,leather,orrugcanberepresentedasaspatialarrangement(perhapsstochastic)ofsymbolsfromthisvocabulary.Onlyasmallnumberofviewsareneededforthis.Supposewehavelearnedtheserepresentationsforsomematerials,andthenwearepresentedwithasingleimageofapatchfromoneofthesematerialsunderanovelilluminationorviewpoint,theobjectiveistorecognizewhichone.Wehavedevelopedarecog-nitionalgorithmusingaMarkovChainMonteCarlo(MCMC)samplingmethod.Thestructureofthispaperisasfollows.InSection2,weshowanoperationalizationofnding2Dtextonsfromimages.Weanalyzeimagesofdifferentview-ingandlightingconditionstogetherandextendthenotionoftextonsto3DtextonsinSection3.Thealgo-rithmforcomputinga3DtextonvocabularyisgiveninSection4.HowamaterialisrepresentedintermsofthelearnedtextonsisdiscussedinSection5.Wecontrastour3Dtextonmodelwithprinciplecomponentanaly-sisinSection6.Theproblemof3DtexturerecognitionispresentedinSection7.Resultsareshownforclassi-fyingmaterialsundernovelviewingandlightingcon-ditions.InSection8,wepresentanapplicationofthe3Dtextonvocabularytopredicttheappearanceoftex-turesundernovelviewingandlightingconditions.WeconcludeinSection9.SomeoftheresultspresentedherewerepublishedinLeungandMalik(1999).2.2DTextonsWewillcharacterizeatexturebyitsresponsestoasetoforientationandspatial-frequencyselectivelinearlters(alterbank).Thisapproachhasprovedtobeusefulforsegmentation(FogelandSagi,1989;MalikandPerona,1990),recognition(Puzichaetal.,1997; LeungandMalik Figure4.Thelterbankusedinouranalysis.Totalof48lters:36orientedlters,with6orientations,3scales,and2phases,8center-surroundderivativeltersand4low-passGaussianRubnerandTomasi,1999),aswellassynthesis(deBonetandViola,1998;HeegerandBergen,1995;Zhuetal.,1998).Therepresentationoftexturesusinglterresponsesisextremelyversatile,however,onemightsaythatitisoverlyredundant(eachpixelisrepresentedbyfillterresponses,wherefilisusuallyaround50).More-over,itshouldbenotedthatwearecharacterizingtex-tures,entitieswithsomespatiallyrepeatingpropertiesbydenition.Therefore,wedonotexpectthelterre-sponsestobetotallydifferentateachpixeloverthetexture.Thus,thereshouldbeseveraldistinctlterre-sponsevectorsandallothersarenoisyvariationsofThisintuitionleadstoourproposalofclusteringlterresponsesintoasmallsetofprototypere-sponsevectors.Wecalltheseprototypestextons.Al-gorithmically,eachtextureisanalyzedusingthebankshowninFig.4.Thereareatotalof48(36elongatedltersat6orientations,3scales,and2phases,8center-surrounddifferenceofGaussianters,and4low-passGaussianlters.).Eachpixelisnowtransformedtoafil48dimensionalvector.Thesevectorsareclusteredusingavectorquantiza-tionalgorithm,inparticular-means(BallandHall,1967;DudaandHart,1973;GershoandGray,1992;MacQueen,1967;Ripley,1996;Sebestyen,1962).ThecriterionforthisalgorithmistoKcentersthatafterassigningeachdatavectortothenearestcen-ter,thesumofthesquareddistancefromthecentersareminimized.-meansisagreedyalgorithmwhichiterativelyperformsthefollowingtwooperations:(1)assigndatavectorstothenearestoftheters;(2)updateeachofthecenterstothethedatavectorsassignedtoit.Thesetwostepsarecon-tinueduntilthealgorithmconvergesandalocalmini-mumofthecriterionisachieved.Thesecentersarethetextons.TheassociatedlterresponsevectorsarecalledappearancevectorsfilWhatthetextonsencodecanbevisualizedbyre-constructinglocalimagepatchesfromtheappearancevectors.Theselocalimagepatches,,canbecon-sideredasÞltersdetectingparticularimagestructures.Thereconstructiontaskistrivialfororthogonalorself-invertinglterbanks(BurtandAdelson,1983;HeegerandBergen,1995;Vaidyanathan,1993).Fornon-orothogonal,andnon-self-invertinglterbanks,reconstructingtheimagefromlterresponsescanbesetupasaleast-squareproblem.Firstconstructthetermatrixasfollows:arrangeeachlterasarowvec-torandconcatenatethemtoformamatrix.Inthisrepre-sentation,convolutionwiththelterbankisequivalenttomultiplyingeachlocalimagepatchwith.Eachofthecanbeobtainedbymultiplyingtheappear-ancevectorwiththepseudo-inverse(JonesandMalik,1992).ThisisillustratedinFig.5.Theorigi-nalimageisshownin(a).Thetextons(-meansclus-tercenters)arereconstructedandshownin(b).Noticethattheycorrespondtothedominantlocalstructuresintheimage.Wecanquantizethelterresponsesateachpixeloftheimagetothetextons.Toreconstructthewholeimagefromthisarrayoflterresponses,rstindependentlyreconstructtheintensityateachpixelfromthequantizedlterresponses.Then,weit-eratetomakesurethatthelterresponsesoftherecon-structedimageagreeswiththerawlterresponses.TheresultofthisprocessforFig.5(a)isshownin(c).Thecloseresemblancebetween(a)and(c)suggeststhatthequantizationdoesnotintroducemucherrorperceptu-allyandthatthereconstructionalgorithmisdoingagoodjob.Inthenextsection,wewillextendthetextontheoryto3Dtexturestexturewithsignicantlocalsurfacerelief.Formorediscussionson2Dtextons,thereaders RepresentingandRecognizingtheVisualAppearance33 Figure5.Illustrationof-meansclusteringandreconstructionfromlterresponseswith20.(a)Originalimage.(b)the-meanscentersreconstructedaslocallters.Thesecenterscorrespondtothedominantfeaturesintheimage:barsandedgesatvariousorientationsandphases;(c)Reconstructionofthequantizedimage.Closeresemblancebetween(a)and(c)suggeststhatquantizationdoesnotintroducemucherrorperceptually.arereferredtoMaliketal.(1999),whereweappliedtheideaoftextonstotheproblemofimagesegmentation.3.3DTextonsForpaintedtextureswithLambertianmaterial,charac-terizingoneimageisequivalenttocharacterizingalltheimagesunderalllightingandviewingdirections.However,for3Dtextures,thisisnotthecase.Theef-fectsofmasking,shadowing,specularity,andmutualilluminationwillmaketheappearanceofthetexturelookdrasticallydifferentaccordingtothelightingandviewingdirections(Fig.2).ThepresenceofalbedovariationsonalotofnaturaltexturesonlymakestheproblemmoredifLetusrstconsiderwhattheproblemsareifwetrytocharacterizea3Dtexturewithonly1imageusing-meansclusteringalgorithmonlteroutputsde-scribedinSection2.Supposetheimageofthetextureconsistsofthindark-lightbarsarisingfrom3causes:(1)albedochange;(2)shadows;and(3)adeepgroove.Despitethedifferentunderlyingcauses,alltheseeventsproducethesameappearanceinthisparticularlightingandviewingsetting.Quitenaturally,the-meansal-gorithmwillclusterthemtogether.Whatthismeansisthatpixelswiththesamelabelwilllookdifferentunderdifferentlightingandviewingconditions:(1)thealbedochangevariesaccordingtothecosineofthelightingangle(assumingaLambertiansurface);(2)thelocationoftheshadowboundarychangesaccordingtothedirectionofthelight;and(3)thedeepgroovere-mainsthesameforawiderangeoflightingandview-ingconditions(HaddonandForsyth,1998;KoenderinkandvanDoorn,1980).Thus,wewillpayasignipriceforquantizingtheseeventstothesametexton.Tocharacterize3Dtextures,manyimagesatdiffer-entlightingandviewingdirectionswillbeneeded.Letthenumberofimagesbe,withinourexperiments).Theargumentisthatifanytwolocaltexturestructuresareequivalentunderdifferentlightingandviewingconditions,wecansafelyassumethatthetwostructureswilllookthesameunderalllightingandviewingconditions.Noticethatworkintheliteraturehaveattemptedtoshowthat36im-ageswillbeabletocompletelycharacterizeastruc-tureinalllightingandviewingconditions(BelhumeurandKriegman,1998;Shashua,1997).Theseresultsarenotapplicablebecauseoftheveryrestrictiveassump-tionstheymade:Lambertiansurfacemodelandtheab-senceofocclusion,shadows,mutualillumination,andspecularity.Indeed,deviationsfromtheseassumptionsarethedeningpropertiesofmost,ifnotall,natural3Dtextures.Whatthismeansisthattheco-occurrenceofresponsesacrossdifferentlightingandviewingcon-ditionsspeciesthelocalgeometricandphotometricpropertiesofthesurface.Ifweconcatenatetheresponsesoftheimagestogetherandclusterthesefildatavectors,theresultingtextonswillen-codetheappearancesofdominantfeaturesintheimagelightingandviewingconditions.Letusunderstandwhatthesetextonscorrespondto.Considerthefollowingtwogeometricfeatures:agrooveandaridge.Inoneimage,theymaylookthesame,however,atmanylightingandviewingangles,theirappearancesaregoingtodifferconsiderably.Withthelterresponsevectorsfromalltheimages,wecantellthedifferencebetweenthesetwofeatures.Inotherwords,eachofthe-meanscentersencodesgeometricfeaturessuchasridgesatparticularorientations,bumpsofcertainsizes, LeungandMalik Figure6.Eachimageatdifferentlightingandviewingdirectionsislteredusingthelterbank.Theresponsevectorsareconcatenatedtogethertoformdatavectorsoflengthfil.Thesedatavectorsareclusteredusingthe-meansalgorithm.Theresultingcentersarethe3Dtextonsandtheassociatedlterresponsevectorsarecalledtheappearancevectorsgroovesofsomewidth,etc..Similarly,thecenterswillalsoencodealbedochangevs.geometric3Dfeatures,aswellasreectanceproperties(e.g.shinyvs.dull).Theappearancesofdifferentfeaturesanddif-ferentmaterialsatvariouslightingandviewinganglesarecapturedbythelterresponses.Thus,wecallthese-meanscenters3Dtextons,andthecorrespondingfillterresponsevectors,theappearancevectors.Aschematicdiagramillustratingthestepsoflterresponses,and-meansclusteringisshowninFig.6.4.ConstructingtheVocabularyof3DTextonsOurgoalinthispaperistouseimagesfromasetoftrainingmaterialstolearnavocabularywhichcanchar-naturalmaterials.Thisisarealisticgoalbecause,aswehavenoted,thetextonsinthevocabu-laryaregoingtoencodetheappearancesoflocalgeo-metricandphotometricfeatures,e.g.grooves,ridges,bumps,reectanceboundariesetc.Allnaturalmateri-alsaremadeupofthesefeatures.Inthissection,wewilldescribetheexactstepstakentoconstructthisuni-versal3Dtextonvocabulary.AlltheimagesusedinthispaperaretakenfromtheColumbia-Utrechtdataset(Danaetal.,1999)Thereare60differentmaterials,eachwith205imagesatdifferentviewingandlightingangles.20materi-alsaretakenrandomlyasthetrainingset.Foreachmaterial,20imagesofdifferentlightingandviewingdirectionsareusedtobuildthetextonvocabulary.The20imagesforeachmaterialareregisteredusingthestandardarea-basedsum-of-square-differences(SSD)Tocomputetheuniversalvocabulary,thefollowingstepsaretaken:1.Foreachofthe20trainingmaterials,thelterbankisappliedtoeachofthe20imagesunderdifferentviewingandlightingconditions.There-sponsevectorsateverypixelareconcatenatedto-gethertoformafilvector.2.Foreachofthe20materialsindividually,themeansclusteringalgorithmisappliedtothedatavectors.Thenumberofcenters,denotedby,is400.The-meansalgorithmndsalocalminimumofthefollowingsum-of-squaredistanceerror:1if0otherwise RepresentingandRecognizingtheVisualAppearance35denotesthenumberofpixels;istheconcate-lterresponsevectorofthethpixelandistheappearancevectorforthethcenter.Themeansalgorithmisinitializedbyrandomsamplesfromallthedatavectors.3.Thecentersforallthematerialsaremergedto-gethertoproduceauniversalalphabetofsize4.Thecodebookispruneddownto100bymerg-ingcenterstooclosetogetherorgettingridofthosecenterswithtoofewdataassignedtothem.5.The-meansalgorithmisappliedagainonsamplesfromalltheimagestoachievealocalminimum.Steps2to4canbeviewedasndinganinitializationforthe-meansstepin5.Thelearnedvocabularyshouldpossesstwoveryim-portantproperties:1.Expressiveness:thevocabularylearnedshouldbeabletocharacterizeeachofthematerialswelltoallowforthediscriminationbetweenthem.2.Generalization:itshouldgeneralizewellbeyondthetrainingmaterials.Inotherwords,itshouldbeasex-pressivefornovelmaterialsasfortrainingmaterials.AnevaluationofthesetwopropertiesisshowninFig.7.Foreachmaterial,thelterresponsesfromafrontal-parallelimageofeachmaterialisquantizedintothe3Dtextonvocabularylterresponsesateachpixelarereplacedbytheappearancevectorofthe3Dtextonlabeledatthepixel.AnimageisreconstructedfromthequantizedlterresponsevectorsusingthealgorithmdescribedinSection2.TheSSDerrorbe-tweenthereconstructedimageandtheoriginalimageisplottedintheTheerrorsfromvocabular-iesofdifferentsizesarealsoplottedforcomparison.TheupperdiagramistheSSDerrorforthetrainingmaterials.Thelowerdiagramistheonefornovelma-terials.Noticetwopoints:(1)thereisnosignicantdif-ferenceinaveragereconstructionerrorbetweentrain-ingmaterialsandnovelmaterials.Inotherwords,ourtextonvocabularyisencodinggenericfeatures,ratherthanmaterial-specicproperties.Thisisanindicationofgoodgeneralization.(2)TheSSDerrorsaresmallforalmostallmaterials.The3Dtextonvocabularyisdoingaverygoodjobencodingthepropertiesofthematerials.Thisreconrmsourintuitionthattex-turesaremadeofasmallsetoffeatures.Moreover,thedifferencesbetweenreconstructionerrorsfromvocab-ulariesofdifferentsizesarenotsignicant.Inallthetexturerecognitionresultsinthispaper,thesametextonvocabularyofsize100isused.Ofcourse,comparingreconstructionerrorisnotthebestwaytoevaluatethevocabulary.Therealtestistousethevocabularyfortherecognitionandsynthesisofnaturalmaterials,whichwewillshowinSections7and8.Inourstudieshere,only20(20)differentviewingandlightingdirectionsareused.20imagesformaverysparsesamplingoftheviewingandil-luminationspheres.Whenmoreimagesareavailable,weshouldtakeadvantageofthem.However,thisdoesnotmeanthatweneedtoruntheclusteringalgorithmonaformidablylargedimensionalspace.Wearguethat20imagesareenoughtomakesurethateach3Dtextonrepresentsdifferentlocalgeometric/photometricstructures.Therefore,toenlargetheappearancevectorofeachtexton,wecansimplyappendtothevectorstheaverageoflterresponsesatpixelswiththecorre-spondinglabel.5.RepresentingVisualAppearanceofMaterialsusing3DTextonsOncewehavebuiltsuchavocabularyof3Dtextons,wecanacquireamodelforeachmaterialtobeclassi-ed.Usingalltheimages(underdifferentviewingandlightingconditions)availableforeachmaterial,eachpointonthesurfaceisassignedoneofthe100tex-tonlabelsbyndingtheminimumdistancebetweenthetextonappearancevectorstothelterresponsesatthepoint.Thesurfaceofanymaterialsuchasmar-ble,concrete,leather,orrugcannowberepresentedasaspatialarrangementofsymbolsfromthisvocabu-lary.Fortheproblemofmaterialrecognition,weignoretheprecisespatialrelationshipofthesymbolsanduseahistogramrepresentationforeachmaterial.Samplehistogramsfor4materialsareshowninFig.8.Noticethatthesehistogramsareverydifferentfromeachother,thusallowinggooddiscrimination.Thechi-squaresig-cancetestisusedtoprovideameasurebetweenthesimilarityoftwohistograms( 2#binsn1h1n h2n2 Thesignicanceforacertainchi-squaredistanceisgivenbythechi-squareprobabilityfunction:istheprobabilitythattwohistogramsfromthesamemodelwillhaveadistancelargerthan LeungandMalik Figure7.SSDreconstructionerrorfordifferentmaterials.Top:the20trainingmaterialsusedtocreatethetextonvocabulary.Bottom:20novelmaterials.Severalvocabulariesofdifferentsizesarecreated:200;andNoticetwopointsaboutthetextonvocabulary:(1)thereisnosignicantdifferenceinaveragereconstructionerrorbetweentrainingandnovelgoodgeneralization;(2)SSDerrorsaresmallforalmostallmaterialshighdescriptivepower.chance;andisgivenbytheincompletegammafunction(Pressetal.,1988): isthegammafunction.6.TextonsversusPrincipalComponentAnalysisThetextonrepresentationcanbeconsideredasaformofdatacompression.Itisnottheonlywayforcompressingdata.Principalcomponentanalysis(PCA)isoneofthemostcommonones.PCAisinfactaverypopularapproachforobjectandtexture RepresentingandRecognizingtheVisualAppearance37 Figure8.Toptobottom:thehistogramsoflabelsforthematerials:RoughPlasticPlaster-aTerryclothrespectively.Thesehistogramsareusedasthematerialrepresentationforthetaskoftexturerecognition.Thehistogramsareverydifferentfromeachother,thusallowinggooddiscrimination.recognition(BelhumeurandKriegman,1998;DanaandNayar,1999a;Georghiadesetal.,1998;MuraseandNayar,1995;SirovitchandKirby,1987;TurkandPentland,1991).InPCAapproaches,eachimageisrepresentedasanelementinofthepixelspace:isthenumberofpixelsand).Objectmodelsarerepresentedbyacollectionofexampletrainingimages.Inotherwords,eachobjectmodelisasubsetof.Classicationisdoneusingnearestneighbor,orothermoresophisticatedclassi-ers,likesupportvectormachines.ThemainproblemofapplyingPCAtorepresent3DtextureisthatPCAisintrinsicallylinear.Themajoref-fectscausedbythesurfacereliefofnaturalmaterialsshadows,occlusion,specularities,mutualillumination,etc,arenon-linearproperties.Moreover,PCAisap-plicableonlyifthesurfacereectanceisLambertian.However,most,ifnotall,materialsarehighlynon-Lambertian.Becauseofallthese,wearguethatthetextonrepresentation,whichisnotbasedonanylin-earityassumption,ismoreappropriate.7.TextureRecognitionInthissection,wewilldemonstratealgorithmsandresultsontexturerecognition.7.1.3DTextureRecognitionfromMultipleViewpoint/LightingImagesrstinvestigate3Dtexturerecognitionwhenmulti-pleimagesofeachsamplearegiven.Everytimewegetasampleofthematerial,20imagesofdifferentlight-ingandviewingdirectionsareprovided.Fromtheseimages,atextonlabelingiscomputed.Thenthesam-pleisclassiedtobethematerialwiththesmallest LeungandMalikchi-squaredistancebetweenthesamplehistogramandthemodelhistogram.Inthisexperiment,20trainingmaterialsareusedtoconstructthetextonvocabularyofsize100.40differentmaterialsaretobeclassied.Themodelsareobtainedfromrandom100100patchesfromtheimages.Foreachmaterial,3novelsamplesofsize100100aretobeclassied.Theoverallrecog-nitionrateis95.6%.AnotherwaytodemonstratetheresultistousethesimilaritymatrixinFig.9.Eachelementinthema-isgivenbythechi-squareprobabilityfunction Figure9.Similaritymatrixfor14materials.Eachentryisgivenbythechi-squareprobabilityfunction(Eq.(2))thatsamplesofmaterialwillbeclassiedasmaterial.Asshowninthisgure,forexample,RoughPlasticarelikelytobeclassiedcorrectly;whilePlaster-aPlaster-barelikelytobemistakenbetweenthem.Sampleimagesfromthesefourmaterialsareshownaswell.(Eq.(2))thatsamplesofmaterialwillbeclassiasmaterial.Here,weonlyshowtheprobabilityfor14materialsbecauseofspacelimitations.Asshowningure,forexample,RoughPlasticarelikelytobeclassiedcorrectly;whilePlaster-aPlaster-barelikelytobemistakenbetweeneachother.Thisisreasonablebecausethetwodifferenttypesofplasterindeedlookverysimilar,asshownfromtheimagesinthebottomoftheReceiverOperationCharacteristics(ROC)curvesarealsogoodindicationsofthepreformance.TheROC RepresentingandRecognizingtheVisualAppearance39 Figure10.Receiveroperationcharacteristics(ROC)curveforaverysimpletexturerecognitionproblem.Thetop-leftcornerrepre-sentsperfectrecognitionperformance.Thediagonallinereferstochance.Theperformanceforouralgorithmisverygood.Therecog-nitionachievesa97%detectionratewithonlya3%falsealarmrate.curveisaplotoftheprobabilityofdetectionversustheprobabilityoffalsealarms.Itisparametrizedbyatectionthreshold.Inourcase,itisathresholdonthechi-squaredistance.Foranyincomingsample,wede-clarethatitisthesameasmaterialifthechi-squaredistancebetweentheirhistogramsissmallerthanIfthesampleisindeedmaterial,wehaveadetec-tion,otherwise,itisafalsealarm.Figure10showstheROCcurveforourrecognitionproblem.Thetop-leftcornerrepresentsperfectrecognition.Ouralgo-rithmperformsverywell.Therecognitionperformanceachievea97%detectionratewithonlya3%falsealarmTherequirementthatanymaterialispresentedwithmultipleimagesatdifferentlightingandviewingcondi-tionsmayseemunreasonable.However,ifthematerialisonacurvedsurface,itisessentiallyequivalenttohav-ingmultipleimagesofthesamematerialilluminatedandvieweddifferently.7.2.3DTextureRecognitionfromaSingleImageLetusnowconsiderthemuchmoredifcultproblemof3Dtexturerecognition:foreachmaterial,thehistogrammodelisbuiltfrom4differentlight/viewconditions;andforeachsampletobeclassied,weonlyhaveasingleimageunderknownilluminationconditionandviewinggeometry.Thisproblemisverysimilartotheproblemformulationofobjectrecognitiongivenanumberofinstancesoftheobject,trytorecognizeitunderallposesandillumination.However,inthecontextoftexturerecognition,thisproblemisrarelyGiventheillumationandviewingconditionsforthenovelimage,weknowtowhichportionoftheappear-ancevectorthelteroutputsoftheincomingimageistobecompared.However,aproblemarisesfromthefactthatgivenonly1image,ndingthetextonlabelforeachpixelisverydifcult.Asnotedbefore,injustonesingleviewingandlightingcondition,physicallydifferentfeaturesmayhavethesameappearance.Thus,textonassignmenttothepixelsisambiguous.Simplycommitingtothelabelwiththesmallestdistancecanresultinatextonhistogramthathasnoresemblancetothatofthetargetmaterial.Theintuitionofourapproachisthefollowing:ifthetextonlabelingoftheincomingimageisknown,thematerialidentitycanbeassignedtothemodelwiththeminimumchi-squaredistancebetweentheincom-ingtextonhistogramandthehistogramofthemodelmaterial.Ontheotherhand,ifthematerialidentityisknown,atextonlabelingoftheimagecanbeestimatedbymatchingthehistogramsofthelabelingtothatofthematerial.Wesolvethischicken-and-eggproblemusingaMarkovchainMonteCarlo(MCMC)algo-rithm.First,eachpixelisallowedpossibletextonlabelings.TheMCMCalgorithmwilltrytondthebestlabellinggiventhepossibilitiesandthematerialAnMCMCalgorithmwithmetropolissamplingforndingtextonlabellingisshownbelow.Foreachma-andthecorrespondingmodelhistogram,do:1.Randomlyassignalabeltoeachpixelamongthepossibilities.Callthisassignmenttheinitialstate2.ComputetheprobabilityofthecurrentstateusingEq.(2)withasthemodelhistogram;3.Obtainatentativenewstatebyrandomlychang-labelsofthecurrentstate;4.ComputeusingEq.(2);5.Compute 6.If1,thenewstateisaccepted,otherwise,acceptthenewstatewithprobability7.Gotostep2untilthestatesconvergetoastabledis-tribution. LeungandMalik Figure11.Thedecayofthedistancebetweenthehistogramofthestateandthehistogramofamodelmaterial.Solidline:correctmaterial.Dashedline:wrongmaterial.Thedecayofthedistanceismuchfasterandtheminimummuchsmallerforthecorrectmaterial.WhattheMCMCalgorithmdoesistodrawsamplesfromthefollowingdistribution:)orisinthespaceofpossiblela-isgivenbythechi-squareprobabil-ityfunctioninEq.(2).Oncethestatessettleinastabledistribution,wecancomputetheprobabilitythattheincomingimagesampleisdrawnfrommaterialcomputingmaxMCMCalgorithmshavebeenappliedtocomputervisionforalongtime,mostwell-knowninthepaperbyGemanandGeman(GemanandGeman,1984),wheretheproblemofimagerestorationisstudied.Forde-tailsaboutvariationsinMCMCalgorithms,conver-genceproperties,andmethodstospeedupconver-gence,pleaseconsult(Gilksetal.,1996).Inourexperiments,eachpixelisallowedtohave5possiblelabels,chosenfromtheclosest5textons.Inotherwords,.Foreachiteration,weareallowedtochangethelabelsof5%ofthepixels(instep3).Figure11showstypicalbehavioroftheMCMCalgorithm.Thesolidlineisthedecayofthedistancebetweenthehistogramofthestatewherematerialisthecorrectmaterialwhilethedashedlineisthatforawrongmaterial.TherecognitionperformanceisshownintheROCcurvesinFig.12.(TheseROCcurvesareobtainedthesamewayastheoneinSection7.1.)The5different Figure12.Texturerecognitionundernovellightingandviewingconditions.The5differentcurvesrepresent5randomlychosennovelviewingandlightingdirectionsforthesamplestobeclassied.Eachcurveistheaverageperformancefor40materials.Themodelhis-togramforeachmaterialisobtainedusingimagesfrom4differentview/lightsettings.Theperformanceofouralgorithmisexcellent87%detectionratewith13%falsealarm. RepresentingandRecognizingtheVisualAppearance41 Figure13.Texturesynthesisoftrainingmaterialsusedtocreatethe3Dtextonvocabulary.ThematerialsarePlaster-aforthersttworowsandforthelasttwo.Firstcolumn:texturemapping;middlecolumn:groundtruth;lastcolumn:synthesizedresults.Texturemappingproducesimagesthatlook,whileouralgorithmcorrectlycapturesthehighlights,shadows,andocclusions.curvescorrespondto5randomlychosennovelviewingandlightingdirectionsforthesamplestobeclassiThecurvesareshowingtheaverageperformancefor40materials.Themodelhistogramforeachmaterialisobtainedusingimagesfrom4differentview/lightsettings.Thetop-leftcorneroftheplotstandsforper-fectperformance.Giventhedifcultyofthetask,theperformanceofouralgorithmisverygood.Thealgo-rithmachievesa87%detectionratewitha13%falsealarmrate.Oneinterestingcomparisontomakewillbetocontrasttheperformanceofouralgorithmwiththatofahuman.8.NovelView/LightPredictionTheuniversal3Dtextonvocabularycanalsobeusedtopredicttheappearanceofmaterialsatnovelviewingandlightingconditions.Thisapplicationisofprimaryinterestincomputergraphics. LeungandMalik Figure14.Predictingappearanceofnovelmaterialsatvariouslightingandviewingconditions.ThematerialsarePlaster-aforthersttworowsandCrumpledPaperforthelasttwo.Firstcolumn:traditionaltexturemapping;middlecolumn:groundtruth;lastcolumn:resultsusingtextonvocabulary.Ouralgorithmcorrectlycapturesthehighlights,shadowsandocclusionswhiletraditionaltexturemappingproducesimagesthatlookSupposewearegivenimagesofanoveltexturetakenatdifferentilluminationandviewingdirections.Wecomputethelterresponsesoftheim-agesandconcatenatethemtoformafildatavector.Thesedatavectorcanbelabeledtooneoftheementsinthetextonvocabularybymatchingtothecorrespondingsectionsofthefildimensionalap-pearancevectors.Inotherwords,eachpixelintheinputtextureislabelledtooneofthe3Dtextons.Recallthattheappearancevectorsofthe3Dtextonsencodepreciselyhoweachtextonchangesitsappearancewhenilluminationorviewingdirectionsarechanged.There-fore,wecanpredictexactlyhowtheimageistrans-formedunderanovelilluminationdirectionorviewinggeometry.Resultsfornovelview/lightpredictionareshowninFigs.13and14.Intheseexamples,4imagesofthematerialunderdifferentlight/viewarrangementsaregiven.Wethenpredicttheappearanceofthematerialatotherlightingandviewingconditionsusingthetexton RepresentingandRecognizingtheVisualAppearance43vocabulary.Thenovellightingandviewingconrationsareupto30awayfromthe4examplecondi-tions.TheresultsshowninFig.13arefortrainingma-terials(thoseusedtocomputethetextonvocabulary).Figure14showstheresultsfornovelmaterials.Therstcolumnsshowimagesobtainedusingtraditionaltexturemappingfromafrontalparallelimage;middlecolumnsshowthegroundtruthandthethirdcolumnsdisplayourresults.Becausetraditionaltexturemap-pingassumesthesurfaceispaintedandLambertian,itproducesimagesthatlook.Ourmethod,ontheotherhand,correctlycapturesthe3Dnatureofthesurfacehighlights,shadows,andocclusions.9.DiscussionInthispaper,wehavepresentedaframeworkforrepre-sentingtexturesmadeupofbothreectanceandsurfacenormalvariations.Thebasicideaistobuildauniversaltextonvocabularythatdecribesgenericlocalfeaturesoftexturesurfaces.UsingthetextonvocabularyandanMCMCalgorithm,wehavedemonstratedexcellentresultsforrecognizing3Dtexturesfromasingleim-ageunderanylightingandviewingdirections.Wealsodemonstratedhowourmodelcanbeusedtopredicttheappearanceofnaturalmaterialsundernovelillumina-tionconditionandviewinggeometry.Thecurrentworkcanbecombinedwithatexturesynthesisalgorithmtogeneratenewsamplesofmate-rialsunderallviewingandilluminationconditions.ThealgorithmproposedinEfrosandLeung(1999)ispar-ticularlypromising.ThebasicideaofEfrosandLeung(1999)istosynthesizetexturebygrowingpixels.Thepixelvaluetobegrownisobtainedbysamplingfromanexampletextureimage.Inour3Dtextonmodel,wecangrowaarrayoftextonsinsteadofpixelvalues.Giventhearrayoftextons,wecansynthesizeanimageilluminationandviewingcondition.Thisisdonebypickingasectionoftheappearancevectorsofthe3Dtextonsandreconstructinganimagefromthem.AcknowledgmentTheauthorswouldliketothanktheBerkeleyvisiongroup,especially,SergeBelongie,ChadCarson,AlyoshaEfros,DavidForsyth,JianboShi,andYairWeissforusefuldiscussions.Thisresearchwassup-portedby(ARO)DAAH04-96-1-0341,theDigitalLibraryGrantIRI-9411334,andaBerkeleyFellowshiptoTL.1.Moreimagesifthematerialisanisotropic.2.WerecognizethattheSSDerrorisbynomeansperceptuallycorrect,butitisaconvenientwayofcomparingtwoimages.3.Errorislargeforaluminum,whichisveryspecular.4.Recognitionrateis95.0%fortrainingmaterials(thoseusedtocreatethetextonvocabulary)and96.3%fornovelmaterials.Thereisnosignicantdifferencebetweentheperformanceforthetrainingmaterialsandthatofthenovelmaterialsinallourexperiments.Therefore,wewillreportonlytheoverallrecog-nitionperformance.Themainreasonforthisindifferenceinperformanceisthatthetextonvocabularyattainsgoodgeneraliza-tion,thusisencodinggenericlocalfeatures,ratherthanretainingcinformation.5.Thisisequivalenttomaking402-classdecisions.Forexample,ifthethresholdistoosmall,wewillhave1detectionand39falsepositives.Ontheotherhand,ifistoolarge,wewillhave0detectionand0falsepositive.6.However,mostobjectrecognitionalgorithmsrequirealargeberoftrainingexamples.7.Acoolingschedulecandenitelybeemployedhere.Atmoresitesareallowedtochangetospeeduptheexplorationofthespace.Whenthedistributionisclosetoconvergence,fewersitesareallowedtoaltertothedistribution.8.Intheseresults,toachievebestqualityandtoreducequantizationerror,atextonvocabularyofsize2000iscomputed.ReferencesBall,G.andHall,D.1967.Aclusteringtechniqueforsummarizingmulti-variatedata.BehavioralScience,12:153Belhumeur,P.andKriegman,D.1998.Whatisthesetofimagesofanobjectunderallpossibleilluminationconditions?.JournalofComputerVision,28(3):245Burt,P.andAdelson,E.1983.Thelaplacianpyramidasacompactimagecode.IEEETransactionsonCommunications,31(4):532Chantler,M.1994.Towardsilluminantinvarianttextureclassition.InProc.IEEColl.onTextureClassication:TheoryandChantler,M.andMcGunnigle,G.1995.Compensationofilluminanttiltvariationfortextureclassication.InProceedingsFifthInter-nationalConferenceonImageProcessinganditsApplicationspp.767Chellappa,R.andChatterjee,S.1985.ClassicationoftexturesusingGaussianMarkovrandomIEEETransactionsonAcoustics,Speech,SignalProcessing,33(4):959Cross,G.andJain,A.1983.Markovrandomeldtexturemodels.IEEETransactionsonPatternAnalysisandMachineIntelligenceDana,K.andNayar,S.1998.Histogrammodelfor3Dtextures.ProceedingsIEEEConferenceonComputerVisionandPatternRecognition,SantaBarbara,CA,pp.618Dana,K.andNayar,S.1999a.3Dtexturedsurfacemodelling.ProceedingsWorkshopontheIntegrationofAppearanceandGeometricMethodsinObjectRecognition,pp.46Dana,K.andNayar,S.1999b.Correlationmodelfor3Dtexture.ProceedingsIEEE7thInternationalConferenceonComputerVision,Vol.2.Corfu,Greece,pp.1061 LeungandMalikDana,K.,vanGinneken,B.,Nayar,S.,andKoenderink,J.1999.ectanceandtextureofreal-worldsurfaces.ACMTransactionsonGraphics,18(1):1deBonet,J.andViola,P.1998.Texturerecognitionusinganon-parametricmulti-scalestatisticalmodel.InProceedingsIEEEConferenceonComputerVisionandPatternRecognition,SantaBarbara,CA,pp.641Duda,R.andHart,P.1973.PatternClassicationandSceneAnaly-,JohnWiley&Sons.NewYork,N.Y.Efros,A.andLeung,T.1999.Texturesynthesisbynon-parametricsampling.InProceedingsIEEE7thInternationalConferenceonComputerVision,Vol.2.Corfu,Greece,pp.1033Fogel,I.andSagi,D.1989.Gaborltersastexturediscriminator.BiologicalCybernetics,61:103Geman,S.andGeman,D.1984.Stochasticrelaxation,Gibbsdistri-butions,andtheBayesianrestorationofimages.IEEETransac-tionsonPatternAnalysisandMachineIntelligence,6:721Georghiades,A.,Kriegman,D.,andBelhumeur,P.1998.Illu-minationconesforrecognitionundervariablelightin:Faces.ProceedingsIEEEConferenceonComputerVisionandPat-ternRecognition,SantaBarbara,CA,pp.52Gersho,A.andGray,R.1992.VectorQuantizationandSignalCom-pression,KluwerAcademicPublishers:Boston,MA.Gilks,W.,Richardson,S.,andSpiegelhalter,D.1996.MarkovChainMonteCarloinPractice,ChapmanandHall.Haddon,J.andForsyth,D.1998.Shadingprimitives:Findingfoldsandshallowgrooves.InProceedingsIEEE6thInternationalCon-ferenceonComputerVision,Bombay,India,pp.236Heeger,D.andBergen,J.1995.Pyramid-basedtextureanaly-sis/synthesis.InComputerGraphics(SIGGRAPH95Proceed-,LosAngeles,CA,pp.229Jain,A.andFarrokhsia,F.1991.UnsupervisedtexturesegmentationusingGaborPatternRecognition,24:1167Jones,D.andMalik,J.1992.Computationalframeworktodeter-miningstereocorrespondencefromasetoflinearspatialImageandVisionComputing,10(10):699Julesz,B.1981.Textons,theelementsoftextureperception,andtheirNature,290(5802):91Koenderink,J.andvanDoorn,A.1980.Photometricinvariantsre-latedtosolidshape.OpticaActa,27(7):981Koenderink,J.andvanDoorn,A.1996.Illuminancetextureduetosurfacemesostructure.JournaloftheOpticalSocietyAmericaAKoenderink,J.,vanDoorn,A.Dana,K.andNayar,S.1999.Bidirec-tionalreectiondistributionfunctionofthoroughlypittedsurfaces.InternationalJournalofComputerVision,31(2/3):129Leung,T.andMalik,J.1997.Onperpendiculartextureor:Whydoweseemoreowersinthedistance?.InProceedingsIEEEConferenceonComputerVisionandPatternRecognition,SanJuan,PuertoRico,pp.807Leung,T.andMalik,J.1999.Recognizingsurfacesusingthreedi-mensionaltextons.InProc.IEEEInternationalConferenceonComputerVision,Corfu,Greece.MacQueen,J.1967.Somemethodsforclassicationandanalysisofmultivariateobservations.InProc.FifthBerkeleySymposiumonMath.Stat.andProb.,Vol.I.pp.281Malik,J.,Belongie,S.,Shi,J.,andLeung,T.1999.Textons,con-toursandregions:Cueintegrationinimagesegmentation.InPro-ceedingsIEEE7thInternationalConferenceonComputerVisionCorfu,Greece,pp.918Malik,J.andPerona,P.1990.Preattentivetexturediscriminationwithearlyvisionmechanisms.JournaloftheOpticalSocietyofAmericaA,7(5):923Mao,J.andJain,A.1992.Textureclassicationandsegmentationus-ingmultiresolutionsimultaneousautoregressivemodels.PatternRecognition,25(2):173Murase,H.andNayar,S.1995.Visuallearningandrecognitionof3-Dobjectsfromappearance.InternationalJournalonComputerVision,14(1):5Press,W.,Flannery,B.,Teukolsky,S.,andVetterling,W.1988.NumericalRecipesinC,CambridgeUniversityPress.Puzicha,J.,Hofmann,T.,andBuhmann,J.1997.Non-parametricsimilaritymeasuresforunsupervisedtexturesegmentationandimageretrieval.InProceedingsIEEEConferenceonComputerVisionandPatternRecognition,SanJuan,PuertoRico,pp.267Ripley,B.1996.PatternRecognitionandNeuralNetworksCambridgeUniversityPress.Rubner,Y.andTomasi,C.1999.Texture-basedimageretrievalwith-outsegmentation.InProceedingsIEEE7thInternationalConfer-enceonComputerVision,Vol.2.Corfu,Greece,pp.1018Sebestyen,G.1962.Patternrecognitionbyanadaptiveprocessofsamplesetconstruction.IRETrans.Info.Theory,8:S82Shashua,A.1997.Onphotometricissuesin3Dvisualrecogni-tionfromasingle2Dimage.InternationalJournalonComputerVision,21(1/2).Sirovitch,L.andKirby,M.1987.Low-dimensionalprocedureforthecharacterizationofhumanfaces.JournaloftheOpticalSocietyofAmericaA,2:519Turk,M.andPentland,A.1991.Eigenfacesforrecognition.JournalofCognitiveNeuroscience,3(1):71Vaidyanathan,P.1993.MultirateSystemsandFilterBanks,Prentice-Hall:EnglewoodCliffs,N.J.vanGinneken,B.,Stavridi,M.,andKoenderink,J.1998.Diffuseandspecularreectancefromroughsurfaces.AppliedOpticsYuan,J.andRao,S.1993.SpectralestimationforrandomwithapplicationstoMarkovmodelingandtextureclassiMarkovRandomFields:TheoryandApplication,R.ChellappaandA.Jain(Eds.).AcademicPress.Zhu,S.,Wu,Y.,andMumford,D.1998.Filters,randomeldsandmaximumentropy(FRAME):TowardsauniedtheoryfortextureInternationalJournalofComputerVision,27(2):107

Related Contents


Next Show more