Wu Andrew Y Ng Computer Science Department Stanford University 353 Serra Mall Stanford CA 94305 USA acoatesblakeccbcasessanjeevbipinstwangcatdwu4ang csstanfordedu Abstract Reading text from photographs is a challenging problem that has received a si ID: 22785
Download Pdf The PPT/PDF document "Text Detection and Character Recognition..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
predicatedoncleverlyengineeredsystemsspecictothenewtask.Fortextdetection,forinstance,solutionshaverangedfromsimpleoff-the-shelfclassierstrainedonhand-codedfeatures[10]tomulti-stagepipelinescombiningmanydifferentalgorithms[11],[5].Commonfeaturesincludeedgefeatures,texturedescriptors,andshapecontexts[1].Meanwhile,variousavorsofprobabilisticmodelhavealsobeenapplied[4],[12],[13],foldingmanyformsofpriorknowledgeintothedetectionandrecognitionsystem.Ontheotherhand,somesystemswithhighlyexiblelearningschemesattempttolearnallnecessaryinformationfromlabeleddatawithminimalpriorknowledge.Forin-stance,multi-layeredneuralnetworkarchitectureshavebeenappliedtocharacterrecognitionandarecompetitivewithotherleadingmethods[14].Thismirrorsthesuccessofsuchapproachesinmoretraditionaldocumentandhand-writtentextrecognitionsystems[15].Indeed,themethodusedinoursystemisrelatedtoconvolutionalneuralnetworks.Theprimarydifferenceisthatthetrainingmethodusedhereisunsupervised,andusesamuchmorescalabletrainingalgorithmthatcanrapidlytrainmanyfeatures.Featurelearningmethodsingeneralarecurrentlythefocusofmuchresearch,particularlyappliedtocomputervisionproblems.Asaresult,awidevarietyofalgorithmsarenowavailabletolearnfeaturesfromunlabeleddata[16],[17],[18],[19],[20].Manyresultsobtainedwithfeaturelearningsystemshavealsoshownthathigherperformanceinrecognitiontaskscouldbeachievedthroughlargerscalerepresentations,suchascouldbegeneratedbyascalablefeaturelearningsystem.Forinstance,VanGemertetal.[21]showedthatperformancecangrowwithlargernumbersoflow-levelfeatures,andLietal.[22]haveprovidedevidenceofasimilarphenomenonforhigh-levelfeatureslikeobjectsandparts.Inthiswork,wefocusontraininglow-levelfeatures,butmoresophisticatedfeaturelearningmethodsarecapableoflearninghigherlevelconstructsthatmightbeevenmoreeffective[23],[7],[17],[6].III.LEARNINGARCHITECTUREWenowdescribethearchitectureusedtolearnthefeaturerepresentationsandtraintheclassiersusedforourdetectionandcharacterrecognitionsystems.Thebasicsetupiscloselyrelatedtoaconvolutionalneuralnetwork[15],butduetoitstrainingmethodcanbeusedtorapidlyconstructextremelylargesetsoffeatureswithminimaltuning.Oursystemproceedsinseveralstages: 1) Applyanunsupervisedfeaturelearningalgorithmtoasetofimagepatchesharvestedfromthetrainingdatatolearnabankofimagefeatures. 2) Evaluatethefeaturesconvolutionallyoverthetrainingimages.Reducethenumberoffeaturesusingspatialpooling[15]. 3) Trainalinearclassierforeithertextdetectionorcharacterrecognition. Wewillnowdescribeeachofthesestagesinmoredetail.A.FeaturelearningThekeycomponentofoursystemistheapplicationofanunsupervisedlearningalgorithmtogeneratethefeaturesusedforclassication.Manychoicesofunsupervisedlearn-ingalgorithmareavailableforthispurpose,suchasauto-encoders[19],RBMs[16],andsparsecoding[24].Here,however,weuseavariantofK-meansclusteringthathasbeenshowntoyieldresultscomparabletoothermethodswhilealsobeingmuchsimplerandfaster.Likemanyfeaturelearningschemes,oursystemworksbyapplyingacommonrecipe: 1) Collectasetofsmallimagepatches,~x(i)fromtrainingdata.Inourcase,weuse8x8grayscale1patches,so~x(i)2R64. 2) Applysimplestatisticalpre-processing(e.g.,whiten-ing)tothepatchesoftheinputtoyieldanewdatasetx(i). 3) Runanunsupervisedlearningalgorithmonthex(i)tobuildamappingfrominputpatchestoafeaturevector,z(i)=f(x(i)).Theparticularsystemweemployissimilartotheonepresentedin[8].First,givenasetoftrainingimages,weextractasetofm8-by-8pixelpatchestoyieldvectorsofpixels~x(i)2R64;i2f1;:::;mg.Eachvectorisbrightnessandcontrastnormalized.2Wethenwhitenthe~x(i)usingZCA3whitening[25]toyieldx(i).Giventhiswhitenedbankofinputvectors,wearenowreadytolearnasetoffeaturesthatcanbeevaluatedonsuchpatches.Fortheunsupervisedlearningstage,weuseavariantofK-meansclustering.K-meanscanbemodiedsothatityieldsadictionaryD2R64dofnormalizedbasisvectors.Specically,insteadoflearningcentroidsbasedonEuclideandistance,welearnasetofnormalizedvectorsD(j);j2f1;:::;dgtoformthecolumnsofD,usinginnerproductsasthesimilaritymetric.Thatis,wesolveminD;s(i)XijjDs(i)x(i)jj2(1)s:t:jjs(i)jj1=jjs(i)jj1;8i(2)jjD(j)jj2=1;8j(3)wherex(i)aretheinputexamplesands(i)arethecorre-spondingonehotencodings4oftheexamples.LikeK-means,theoptimizationisdonebyalternatingminimizationoverDandthes(i).Here,theoptimalsolutionfors(i)given 1Allofourexperimentsusegrayscaleimages,thoughthemethodshereareequallyapplicabletocolorpatches.2Wesubtractoutthemeananddividebythestandarddeviationofallthepixelvalues.3ZCAwhiteningislikePCAwhitening,exceptthatitrotatesthedatabacktothesameaxesastheoriginalinput.4Theconstraintjjs(i)jj1=jjs(i)jj1meansthats(i)mayhaveonly1non-zerovalue,thoughitsmagnitudeisunconstrained. TableITESTRECOGNITIONACCURACYONICDAR2003CHARACTERSETS.(DATASET-CLASSES) Algorithm Test-62 Sample-62 Sample-36 NeumannandMatas,2010[28] 67.0%7 - - Yokobayashietal.,2006[2] - 81.4% - SaidaneandGarcia,2007[14] - - 84.5% Thispaper 81.7% 81.4% 85.5% from62classes(10digits,26upper-and26lower-caseletters).TheaverageclassicationaccuracyontheICDARtestsetforincreasingnumbersoffeaturesisplottedinFigure5.Again,weseethataccuracyclimbsasafunctionofthenumberoffeatures.Notethattheaccuracyforthelargestsystem(1500features)isthehighest,at81.7%forthe62-wayclassicationproblem.Thisiscomparableorsuperiortoother(purpose-built)systemstestedonthesameproblem.Forinstance,thesystemin[2],achieves81.4%onthesmallerICDARsamplesetwherewe,too,achieve81.4%.Theauthorsof[14],employingasupervisedconvolutionalnetwork,achieve84.5%onthisdatasetwhenitiscollapsedtoa36-wayproblem(removingcasesensitivity).Inthatscenario,oursystemachieves85.5%with1500features.TheseresultsaresummarizedincomparisontootherworkinTableI.V.CONCLUSIONInthispaperwehaveproducedatextdetectionandrecognitionsystembasedonascalablefeaturelearningalgorithmandappliedittoimagesoftextinnaturalscenes.Wedemonstratedthatwithlargerbanksoffeaturesweareabletoachieveincreasingaccuracywithtopperformancecomparabletoothersystems,similartoresultsobservedinotherareasofcomputervisionandmachinelearning.Thus,whilemuchresearchhasfocusedondevelopingbyhandthemodelsandfeaturesusedinscene-textapplications,ourresultspointoutthatitmaybepossibletoachievehighperformanceusingamoreautomatedandscalablesolution.Withmorescalableandsophisticatedfeaturelearningal-gorithmscurrentlybeingdevelopedbymachinelearningresearchers,itispossiblethattheapproachespursuedheremightachieveperformancewellbeyondwhatispossiblethroughothermethodsthatrelyheavilyonhand-codedpriorknowledge.ACKNOWLEDGMENTAdamCoatesissupportedbyaStanfordGraduateFel-lowship.REFERENCES [1] T.E.deCampos,B.R.Babu,andM.Varma,Charac-terrecognitioninnaturalimages,inProceedingsoftheInternationalConferenceonComputerVisionTheoryandApplications,Lisbon,Portugal,February2009. [2] M.YokobayashiandT.Wakahara,Binarizationandrecog-nitionofdegradedcharactersusingamaximumseparabilityaxisincolorspaceandgatcorrelation,inInternationalConferenceonPatternRecognition,vol.2,2006,pp.885888. [3] J.J.Weinman,Typographicalfeaturesforscenetextrecog-nition,inProc.IAPRInternationalConferenceonPatternRecognition,Aug.2010,pp.39873990. [4] J.Weinman,E.Learned-Miller,andA.R.Hanson,Scenetextrecognitionusingsimilarityandalexiconwithsparsebeliefpropagation,inTransactionsonPatternAnalysisandMachineIntelligence,vol.31,no.10,2009. [5] Y.Pan,X.Hou,andC.Liu,Textlocalizationinnaturalsceneimagesbasedonconditionalrandomeld,inInternationalConferenceonDocumentAnalysisandRecognition,2009. [6] J.Yang,K.Yu,Y.Gong,andT.S.Huang,Linearspatialpyramidmatchingusingsparsecodingforimageclassica-tion.inComputerVisionandPatternRecognition,2009. [7] H.Lee,R.Grosse,R.Ranganath,andA.Y.Ng,Convolu-tionaldeepbeliefnetworksforscalableunsupervisedlearningofhierarchicalrepresentations,inInternationalConferenceonMachineLearning,2009. [8] A.Coates,H.Lee,andA.Y.Ng,Ananalysisofsingle-layernetworksinunsupervisedfeaturelearning,inInternationalConferenceonArticialIntelligenceandStatistics,2011. [9] M.Ranzato,Y.Boureau,andY.LeCun,Sparsefeaturelearningfordeepbeliefnetworks,inNeuralInformationProcessingSystems,2007. [10] X.ChenandA.Yuille,Detectingandreadingtextinnaturalscenes,inComputerVisionandPatternRecognition,vol.2,2004. [11] Y.Pan,X.Hou,andC.Liu,Arobustsystemtodetectandlocalizetextsinnaturalsceneimages,inInternationalWorkshoponDocumentAnalysisSystems,2008. [12] J.J.Weinman,E.Learned-Miller,andA.R.Hanson,Adis-criminativesemi-markovmodelforrobustscenetextrecog-nition,inProc.IAPRInternationalConferenceonPatternRecognition,Dec.2008. [13] X.FanandG.Fan,GraphicalModelsforJointSegmentationandRecognitionofLicensePlateCharacters,IEEESignalProcessingLetters,vol.16,no.1,2009. [14] Z.SaidaneandC.Garcia,Automaticscenetextrecogni-tionusingaconvolutionalneuralnetwork,inWorkshoponCamera-BasedDocumentAnalysisandRecognition,2007. [15] Y.LeCun,B.Boser,J.S.Denker,D.Henderson,R.E.Howard,W.Hubbard,andL.D.Jackel,Backpropagationappliedtohandwrittenzipcoderecognition,NeuralCompu-tation,vol.1,pp.541551,1989. [16] G.Hinton,S.Osindero,andY.Teh,Afastlearningalgorithmfordeepbeliefnets,NeuralComputation,vol.18,no.7,pp.15271554,2006.