Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning - PDF document

682 views
Uploaded On 2014-12-12

Text Detection and Character Recognition in Scene Images with Unsupervised Feature Learning - PPT Presentation

Wu Andrew Y Ng Computer Science Department Stanford University 353 Serra Mall Stanford CA 94305 USA acoatesblakeccbcasessanjeevbipinstwangcatdwu4ang csstanfordedu Abstract Reading text from photographs is a challenging problem that has received a si ID: 22785

Andrew

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/22785" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Pdf The PPT/PDF document "Text Detection and Character Recognition..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

predicatedoncleverlyengineeredsystemsspecictothenewtask.Fortextdetection,forinstance,solutionshaverangedfromsimpleoff-the-shelfclassierstrainedonhand-codedfeatures[10]tomulti-stagepipelinescombiningmanydifferentalgorithms[11],[5].Commonfeaturesincludeedgefeatures,texturedescriptors,andshapecontexts[1].Meanwhile,variousavorsofprobabilisticmodelhavealsobeenapplied[4],[12],[13],foldingmanyformsofpriorknowledgeintothedetectionandrecognitionsystem.Ontheotherhand,somesystemswithhighlyexiblelearningschemesattempttolearnallnecessaryinformationfromlabeleddatawithminimalpriorknowledge.Forin-stance,multi-layeredneuralnetworkarchitectureshavebeenappliedtocharacterrecognitionandarecompetitivewithotherleadingmethods[14].Thismirrorsthesuccessofsuchapproachesinmoretraditionaldocumentandhand-writtentextrecognitionsystems[15].Indeed,themethodusedinoursystemisrelatedtoconvolutionalneuralnetworks.Theprimarydifferenceisthatthetrainingmethodusedhereisunsupervised,andusesamuchmorescalabletrainingalgorithmthatcanrapidlytrainmanyfeatures.Featurelearningmethodsingeneralarecurrentlythefocusofmuchresearch,particularlyappliedtocomputervisionproblems.Asaresult,awidevarietyofalgorithmsarenowavailabletolearnfeaturesfromunlabeleddata[16],[17],[18],[19],[20].Manyresultsobtainedwithfeaturelearningsystemshavealsoshownthathigherperformanceinrecognitiontaskscouldbeachievedthroughlargerscalerepresentations,suchascouldbegeneratedbyascalablefeaturelearningsystem.Forinstance,VanGemertetal.[21]showedthatperformancecangrowwithlargernumbersoflow-levelfeatures,andLietal.[22]haveprovidedevidenceofasimilarphenomenonforhigh-levelfeatureslikeobjectsandparts.Inthiswork,wefocusontraininglow-levelfeatures,butmoresophisticatedfeaturelearningmethodsarecapableoflearninghigherlevelconstructsthatmightbeevenmoreeffective[23],[7],[17],[6].III.LEARNINGARCHITECTUREWenowdescribethearchitectureusedtolearnthefeaturerepresentationsandtraintheclassiersusedforourdetectionandcharacterrecognitionsystems.Thebasicsetupiscloselyrelatedtoaconvolutionalneuralnetwork[15],butduetoitstrainingmethodcanbeusedtorapidlyconstructextremelylargesetsoffeatureswithminimaltuning.Oursystemproceedsinseveralstages: 1) Applyanunsupervisedfeaturelearningalgorithmtoasetofimagepatchesharvestedfromthetrainingdatatolearnabankofimagefeatures. 2) Evaluatethefeaturesconvolutionallyoverthetrainingimages.Reducethenumberoffeaturesusingspatialpooling[15]. 3) Trainalinearclassierforeithertextdetectionorcharacterrecognition. Wewillnowdescribeeachofthesestagesinmoredetail.A.FeaturelearningThekeycomponentofoursystemistheapplicationofanunsupervisedlearningalgorithmtogeneratethefeaturesusedforclassication.Manychoicesofunsupervisedlearn-ingalgorithmareavailableforthispurpose,suchasauto-encoders[19],RBMs[16],andsparsecoding[24].Here,however,weuseavariantofK-meansclusteringthathasbeenshowntoyieldresultscomparabletoothermethodswhilealsobeingmuchsimplerandfaster.Likemanyfeaturelearningschemes,oursystemworksbyapplyingacommonrecipe: 1) Collectasetofsmallimagepatches,~x(i)fromtrainingdata.Inourcase,weuse8x8grayscale1patches,so~x(i)2R64. 2) Applysimplestatisticalpre-processing(e.g.,whiten-ing)tothepatchesoftheinputtoyieldanewdatasetx(i). 3) Runanunsupervisedlearningalgorithmonthex(i)tobuildamappingfrominputpatchestoafeaturevector,z(i)=f(x(i)).Theparticularsystemweemployissimilartotheonepresentedin[8].First,givenasetoftrainingimages,weextractasetofm8-by-8pixelpatchestoyieldvectorsofpixels~x(i)2R64;i2f1;:::;mg.Eachvectorisbrightnessandcontrastnormalized.2Wethenwhitenthe~x(i)usingZCA3whitening[25]toyieldx(i).Giventhiswhitenedbankofinputvectors,wearenowreadytolearnasetoffeaturesthatcanbeevaluatedonsuchpatches.Fortheunsupervisedlearningstage,weuseavariantofK-meansclustering.K-meanscanbemodiedsothatityieldsadictionaryD2R64dofnormalizedbasisvectors.Specically,insteadoflearning“centroids”basedonEuclideandistance,welearnasetofnormalizedvectorsD(j);j2f1;:::;dgtoformthecolumnsofD,usinginnerproductsasthesimilaritymetric.Thatis,wesolveminD;s(i)XijjDs(i)�x(i)jj2(1)s:t:jjs(i)jj1=jjs(i)jj1;8i(2)jjD(j)jj2=1;8j(3)wherex(i)aretheinputexamplesands(i)arethecorre-sponding“onehot”encodings4oftheexamples.LikeK-means,theoptimizationisdonebyalternatingminimizationoverDandthes(i).Here,theoptimalsolutionfors(i)given 1Allofourexperimentsusegrayscaleimages,thoughthemethodshereareequallyapplicabletocolorpatches.2Wesubtractoutthemeananddividebythestandarddeviationofallthepixelvalues.3ZCAwhiteningislikePCAwhitening,exceptthatitrotatesthedatabacktothesameaxesastheoriginalinput.4Theconstraintjjs(i)jj1=jjs(i)jj1meansthats(i)mayhaveonly1non-zerovalue,thoughitsmagnitudeisunconstrained. TableITESTRECOGNITIONACCURACYONICDAR2003CHARACTERSETS.(DATASET-CLASSES) Algorithm Test-62 Sample-62 Sample-36 NeumannandMatas,2010[28] 67.0%7 - - Yokobayashietal.,2006[2] - 81.4% - SaidaneandGarcia,2007[14] - - 84.5% Thispaper 81.7% 81.4% 85.5% from62classes(10digits,26upper-and26lower-caseletters).TheaverageclassicationaccuracyontheICDARtestsetforincreasingnumbersoffeaturesisplottedinFigure5.Again,weseethataccuracyclimbsasafunctionofthenumberoffeatures.Notethattheaccuracyforthelargestsystem(1500features)isthehighest,at81.7%forthe62-wayclassicationproblem.Thisiscomparableorsuperiortoother(purpose-built)systemstestedonthesameproblem.Forinstance,thesystemin[2],achieves81.4%onthesmallerICDAR“sample”setwherewe,too,achieve81.4%.Theauthorsof[14],employingasupervisedconvolutionalnetwork,achieve84.5%onthisdatasetwhenitiscollapsedtoa36-wayproblem(removingcasesensitivity).Inthatscenario,oursystemachieves85.5%with1500features.TheseresultsaresummarizedincomparisontootherworkinTableI.V.CONCLUSIONInthispaperwehaveproducedatextdetectionandrecognitionsystembasedonascalablefeaturelearningalgorithmandappliedittoimagesoftextinnaturalscenes.Wedemonstratedthatwithlargerbanksoffeaturesweareabletoachieveincreasingaccuracywithtopperformancecomparabletoothersystems,similartoresultsobservedinotherareasofcomputervisionandmachinelearning.Thus,whilemuchresearchhasfocusedondevelopingbyhandthemodelsandfeaturesusedinscene-textapplications,ourresultspointoutthatitmaybepossibletoachievehighperformanceusingamoreautomatedandscalablesolution.Withmorescalableandsophisticatedfeaturelearningal-gorithmscurrentlybeingdevelopedbymachinelearningresearchers,itispossiblethattheapproachespursuedheremightachieveperformancewellbeyondwhatispossiblethroughothermethodsthatrelyheavilyonhand-codedpriorknowledge.ACKNOWLEDGMENTAdamCoatesissupportedbyaStanfordGraduateFel-lowship.REFERENCES [1] T.E.deCampos,B.R.Babu,andM.Varma,“Charac-terrecognitioninnaturalimages,”inProceedingsoftheInternationalConferenceonComputerVisionTheoryandApplications,Lisbon,Portugal,February2009. [2] M.YokobayashiandT.Wakahara,“Binarizationandrecog-nitionofdegradedcharactersusingamaximumseparabilityaxisincolorspaceandgatcorrelation,”inInternationalConferenceonPatternRecognition,vol.2,2006,pp.885–888. [3] J.J.Weinman,“Typographicalfeaturesforscenetextrecog-nition,”inProc.IAPRInternationalConferenceonPatternRecognition,Aug.2010,pp.3987–3990. [4] J.Weinman,E.Learned-Miller,andA.R.Hanson,“Scenetextrecognitionusingsimilarityandalexiconwithsparsebeliefpropagation,”inTransactionsonPatternAnalysisandMachineIntelligence,vol.31,no.10,2009. [5] Y.Pan,X.Hou,andC.Liu,“Textlocalizationinnaturalsceneimagesbasedonconditionalrandomeld,”inInternationalConferenceonDocumentAnalysisandRecognition,2009. [6] J.Yang,K.Yu,Y.Gong,andT.S.Huang,“Linearspatialpyramidmatchingusingsparsecodingforimageclassica-tion.”inComputerVisionandPatternRecognition,2009. [7] H.Lee,R.Grosse,R.Ranganath,andA.Y.Ng,“Convolu-tionaldeepbeliefnetworksforscalableunsupervisedlearningofhierarchicalrepresentations,”inInternationalConferenceonMachineLearning,2009. [8] A.Coates,H.Lee,andA.Y.Ng,“Ananalysisofsingle-layernetworksinunsupervisedfeaturelearning,”inInternationalConferenceonArticialIntelligenceandStatistics,2011. [9] M.Ranzato,Y.Boureau,andY.LeCun,“Sparsefeaturelearningfordeepbeliefnetworks,”inNeuralInformationProcessingSystems,2007. [10] X.ChenandA.Yuille,“Detectingandreadingtextinnaturalscenes,”inComputerVisionandPatternRecognition,vol.2,2004. [11] Y.Pan,X.Hou,andC.Liu,“Arobustsystemtodetectandlocalizetextsinnaturalsceneimages,”inInternationalWorkshoponDocumentAnalysisSystems,2008. [12] J.J.Weinman,E.Learned-Miller,andA.R.Hanson,“Adis-criminativesemi-markovmodelforrobustscenetextrecog-nition,”inProc.IAPRInternationalConferenceonPatternRecognition,Dec.2008. [13] X.FanandG.Fan,“GraphicalModelsforJointSegmentationandRecognitionofLicensePlateCharacters,”IEEESignalProcessingLetters,vol.16,no.1,2009. [14] Z.SaidaneandC.Garcia,“Automaticscenetextrecogni-tionusingaconvolutionalneuralnetwork,”inWorkshoponCamera-BasedDocumentAnalysisandRecognition,2007. [15] Y.LeCun,B.Boser,J.S.Denker,D.Henderson,R.E.Howard,W.Hubbard,andL.D.Jackel,“Backpropagationappliedtohandwrittenzipcoderecognition,”NeuralCompu-tation,vol.1,pp.541–551,1989. [16] G.Hinton,S.Osindero,andY.Teh,“Afastlearningalgorithmfordeepbeliefnets,”NeuralComputation,vol.18,no.7,pp.1527–1554,2006.