336K - views

Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning Adam Coates Blake Carpenter Carl Case Sanjeev Satheesh Bipin Suresh Tao Wang David J

Wu Andrew Y Ng Computer Science Department Stanford University 353 Serra Mall Stanford CA 94305 USA acoatesblakeccbcasessanjeevbipinstwangcatdwu4ang csstanfordedu Abstract Reading text from photographs is a challenging problem that has received a si

Tags : Andrew
Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Text Detection and Character Recognition..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning Adam Coates Blake Carpenter Carl Case Sanjeev Satheesh Bipin Suresh Tao Wang David J






Presentation on theme: "Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning Adam Coates Blake Carpenter Carl Case Sanjeev Satheesh Bipin Suresh Tao Wang David J"— Presentation transcript:

predicatedoncleverlyengineeredsystemsspecictothenewtask.Fortextdetection,forinstance,solutionshaverangedfromsimpleoff-the-shelfclassierstrainedonhand-codedfeatures[10]tomulti-stagepipelinescombiningmanydifferentalgorithms[11],[5].Commonfeaturesincludeedgefeatures,texturedescriptors,andshapecontexts[1].Meanwhile,variousavorsofprobabilisticmodelhavealsobeenapplied[4],[12],[13],foldingmanyformsofpriorknowledgeintothedetectionandrecognitionsystem.Ontheotherhand,somesystemswithhighlyexiblelearningschemesattempttolearnallnecessaryinformationfromlabeleddatawithminimalpriorknowledge.Forin-stance,multi-layeredneuralnetworkarchitectureshavebeenappliedtocharacterrecognitionandarecompetitivewithotherleadingmethods[14].Thismirrorsthesuccessofsuchapproachesinmoretraditionaldocumentandhand-writtentextrecognitionsystems[15].Indeed,themethodusedinoursystemisrelatedtoconvolutionalneuralnetworks.Theprimarydifferenceisthatthetrainingmethodusedhereisunsupervised,andusesamuchmorescalabletrainingalgorithmthatcanrapidlytrainmanyfeatures.Featurelearningmethodsingeneralarecurrentlythefocusofmuchresearch,particularlyappliedtocomputervisionproblems.Asaresult,awidevarietyofalgorithmsarenowavailabletolearnfeaturesfromunlabeleddata[16],[17],[18],[19],[20].Manyresultsobtainedwithfeaturelearningsystemshavealsoshownthathigherperformanceinrecognitiontaskscouldbeachievedthroughlargerscalerepresentations,suchascouldbegeneratedbyascalablefeaturelearningsystem.Forinstance,VanGemertetal.[21]showedthatperformancecangrowwithlargernumbersoflow-levelfeatures,andLietal.[22]haveprovidedevidenceofasimilarphenomenonforhigh-levelfeatureslikeobjectsandparts.Inthiswork,wefocusontraininglow-levelfeatures,butmoresophisticatedfeaturelearningmethodsarecapableoflearninghigherlevelconstructsthatmightbeevenmoreeffective[23],[7],[17],[6].III.LEARNINGARCHITECTUREWenowdescribethearchitectureusedtolearnthefeaturerepresentationsandtraintheclassiersusedforourdetectionandcharacterrecognitionsystems.Thebasicsetupiscloselyrelatedtoaconvolutionalneuralnetwork[15],butduetoitstrainingmethodcanbeusedtorapidlyconstructextremelylargesetsoffeatureswithminimaltuning.Oursystemproceedsinseveralstages: 1) Applyanunsupervisedfeaturelearningalgorithmtoasetofimagepatchesharvestedfromthetrainingdatatolearnabankofimagefeatures. 2) Evaluatethefeaturesconvolutionallyoverthetrainingimages.Reducethenumberoffeaturesusingspatialpooling[15]. 3) Trainalinearclassierforeithertextdetectionorcharacterrecognition. Wewillnowdescribeeachofthesestagesinmoredetail.A.FeaturelearningThekeycomponentofoursystemistheapplicationofanunsupervisedlearningalgorithmtogeneratethefeaturesusedforclassication.Manychoicesofunsupervisedlearn-ingalgorithmareavailableforthispurpose,suchasauto-encoders[19],RBMs[16],andsparsecoding[24].Here,however,weuseavariantofK-meansclusteringthathasbeenshowntoyieldresultscomparabletoothermethodswhilealsobeingmuchsimplerandfaster.Likemanyfeaturelearningschemes,oursystemworksbyapplyingacommonrecipe: 1) Collectasetofsmallimagepatches,~x(i)fromtrainingdata.Inourcase,weuse8x8grayscale1patches,so~x(i)2R64. 2) Applysimplestatisticalpre-processing(e.g.,whiten-ing)tothepatchesoftheinputtoyieldanewdatasetx(i). 3) Runanunsupervisedlearningalgorithmonthex(i)tobuildamappingfrominputpatchestoafeaturevector,z(i)=f(x(i)).Theparticularsystemweemployissimilartotheonepresentedin[8].First,givenasetoftrainingimages,weextractasetofm8-by-8pixelpatchestoyieldvectorsofpixels~x(i)2R64;i2f1;:::;mg.Eachvectorisbrightnessandcontrastnormalized.2Wethenwhitenthe~x(i)usingZCA3whitening[25]toyieldx(i).Giventhiswhitenedbankofinputvectors,wearenowreadytolearnasetoffeaturesthatcanbeevaluatedonsuchpatches.Fortheunsupervisedlearningstage,weuseavariantofK-meansclustering.K-meanscanbemodiedsothatityieldsadictionaryD2R64dofnormalizedbasisvectors.Specically,insteadoflearning“centroids”basedonEuclideandistance,welearnasetofnormalizedvectorsD(j);j2f1;:::;dgtoformthecolumnsofD,usinginnerproductsasthesimilaritymetric.Thatis,wesolveminD;s(i)XijjDs(i)�x(i)jj2(1)s:t:jjs(i)jj1=jjs(i)jj1;8i(2)jjD(j)jj2=1;8j(3)wherex(i)aretheinputexamplesands(i)arethecorre-sponding“onehot”encodings4oftheexamples.LikeK-means,theoptimizationisdonebyalternatingminimizationoverDandthes(i).Here,theoptimalsolutionfors(i)given 1Allofourexperimentsusegrayscaleimages,thoughthemethodshereareequallyapplicabletocolorpatches.2Wesubtractoutthemeananddividebythestandarddeviationofallthepixelvalues.3ZCAwhiteningislikePCAwhitening,exceptthatitrotatesthedatabacktothesameaxesastheoriginalinput.4Theconstraintjjs(i)jj1=jjs(i)jj1meansthats(i)mayhaveonly1non-zerovalue,thoughitsmagnitudeisunconstrained. TableITESTRECOGNITIONACCURACYONICDAR2003CHARACTERSETS.(DATASET-CLASSES) Algorithm Test-62 Sample-62 Sample-36 NeumannandMatas,2010[28] 67.0%7 - - Yokobayashietal.,2006[2] - 81.4% - SaidaneandGarcia,2007[14] - - 84.5% Thispaper 81.7% 81.4% 85.5% from62classes(10digits,26upper-and26lower-caseletters).TheaverageclassicationaccuracyontheICDARtestsetforincreasingnumbersoffeaturesisplottedinFigure5.Again,weseethataccuracyclimbsasafunctionofthenumberoffeatures.Notethattheaccuracyforthelargestsystem(1500features)isthehighest,at81.7%forthe62-wayclassicationproblem.Thisiscomparableorsuperiortoother(purpose-built)systemstestedonthesameproblem.Forinstance,thesystemin[2],achieves81.4%onthesmallerICDAR“sample”setwherewe,too,achieve81.4%.Theauthorsof[14],employingasupervisedconvolutionalnetwork,achieve84.5%onthisdatasetwhenitiscollapsedtoa36-wayproblem(removingcasesensitivity).Inthatscenario,oursystemachieves85.5%with1500features.TheseresultsaresummarizedincomparisontootherworkinTableI.V.CONCLUSIONInthispaperwehaveproducedatextdetectionandrecognitionsystembasedonascalablefeaturelearningalgorithmandappliedittoimagesoftextinnaturalscenes.Wedemonstratedthatwithlargerbanksoffeaturesweareabletoachieveincreasingaccuracywithtopperformancecomparabletoothersystems,similartoresultsobservedinotherareasofcomputervisionandmachinelearning.Thus,whilemuchresearchhasfocusedondevelopingbyhandthemodelsandfeaturesusedinscene-textapplications,ourresultspointoutthatitmaybepossibletoachievehighperformanceusingamoreautomatedandscalablesolution.Withmorescalableandsophisticatedfeaturelearningal-gorithmscurrentlybeingdevelopedbymachinelearningresearchers,itispossiblethattheapproachespursuedheremightachieveperformancewellbeyondwhatispossiblethroughothermethodsthatrelyheavilyonhand-codedpriorknowledge.ACKNOWLEDGMENTAdamCoatesissupportedbyaStanfordGraduateFel-lowship.REFERENCES [1] T.E.deCampos,B.R.Babu,andM.Varma,“Charac-terrecognitioninnaturalimages,”inProceedingsoftheInternationalConferenceonComputerVisionTheoryandApplications,Lisbon,Portugal,February2009. [2] M.YokobayashiandT.Wakahara,“Binarizationandrecog-nitionofdegradedcharactersusingamaximumseparabilityaxisincolorspaceandgatcorrelation,”inInternationalConferenceonPatternRecognition,vol.2,2006,pp.885–888. [3] J.J.Weinman,“Typographicalfeaturesforscenetextrecog-nition,”inProc.IAPRInternationalConferenceonPatternRecognition,Aug.2010,pp.3987–3990. [4] J.Weinman,E.Learned-Miller,andA.R.Hanson,“Scenetextrecognitionusingsimilarityandalexiconwithsparsebeliefpropagation,”inTransactionsonPatternAnalysisandMachineIntelligence,vol.31,no.10,2009. [5] Y.Pan,X.Hou,andC.Liu,“Textlocalizationinnaturalsceneimagesbasedonconditionalrandomeld,”inInternationalConferenceonDocumentAnalysisandRecognition,2009. [6] J.Yang,K.Yu,Y.Gong,andT.S.Huang,“Linearspatialpyramidmatchingusingsparsecodingforimageclassica-tion.”inComputerVisionandPatternRecognition,2009. [7] H.Lee,R.Grosse,R.Ranganath,andA.Y.Ng,“Convolu-tionaldeepbeliefnetworksforscalableunsupervisedlearningofhierarchicalrepresentations,”inInternationalConferenceonMachineLearning,2009. [8] A.Coates,H.Lee,andA.Y.Ng,“Ananalysisofsingle-layernetworksinunsupervisedfeaturelearning,”inInternationalConferenceonArticialIntelligenceandStatistics,2011. [9] M.Ranzato,Y.Boureau,andY.LeCun,“Sparsefeaturelearningfordeepbeliefnetworks,”inNeuralInformationProcessingSystems,2007. [10] X.ChenandA.Yuille,“Detectingandreadingtextinnaturalscenes,”inComputerVisionandPatternRecognition,vol.2,2004. [11] Y.Pan,X.Hou,andC.Liu,“Arobustsystemtodetectandlocalizetextsinnaturalsceneimages,”inInternationalWorkshoponDocumentAnalysisSystems,2008. [12] J.J.Weinman,E.Learned-Miller,andA.R.Hanson,“Adis-criminativesemi-markovmodelforrobustscenetextrecog-nition,”inProc.IAPRInternationalConferenceonPatternRecognition,Dec.2008. [13] X.FanandG.Fan,“GraphicalModelsforJointSegmentationandRecognitionofLicensePlateCharacters,”IEEESignalProcessingLetters,vol.16,no.1,2009. [14] Z.SaidaneandC.Garcia,“Automaticscenetextrecogni-tionusingaconvolutionalneuralnetwork,”inWorkshoponCamera-BasedDocumentAnalysisandRecognition,2007. [15] Y.LeCun,B.Boser,J.S.Denker,D.Henderson,R.E.Howard,W.Hubbard,andL.D.Jackel,“Backpropagationappliedtohandwrittenzipcoderecognition,”NeuralCompu-tation,vol.1,pp.541–551,1989. [16] G.Hinton,S.Osindero,andY.Teh,“Afastlearningalgorithmfordeepbeliefnets,”NeuralComputation,vol.18,no.7,pp.1527–1554,2006.