/
Fast and Accurate Digit Classification Fast and Accurate Digit Classification

Fast and Accurate Digit Classification - PDF document

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
451 views
Uploaded On 2016-05-28

Fast and Accurate Digit Classification - PPT Presentation

Subhransu MajiJitendra Malik Electrical Engineering and Computer SciencesUniversity of California at BerkeleyTechnical Report No UCBEECS2009159httpwwweecsberkeleyeduPubsTechRpts2009EECS ID: 338376

Subhransu MajiJitendra Malik Electrical Engineering and

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Fast and Accurate Digit Classification" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Fast and Accurate Digit Classification Subhransu MajiJitendra Malik Electrical Engineering and Computer SciencesUniversity of California at BerkeleyTechnical Report No. UCB/EECS-2009-159http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-159.htmlNovember 25, 2009 Copyright © 2009, by the author(s).All rights reserved. Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission. FastandAccurateDigitClassicationSubhransuMajiJitendraMalikComputerScienceDivision,UniversityofCaliforniaatBerkeleyAbstractWeexploretheuseofcertainimagefeatures,blockwisehistogramsoflocalorien-tations,usedinmanycurrentobjectrecognitionalgorithms,forthetaskofhand-writtendigitrecognition.ExistingapproachesndthatpolynomialkernelSVMstrainedonrawpixelsachievestateoftheartperformance.Howeversuchker-nelSVMapproachesareimpracticalastheyhaveahugecomplexityatruntime.Wedemonstratethatwithimprovedfeaturesalowcomplexityclassier,inpar-ticularanadditive-kernelSVM,canachievestateoftheartperformance.Ourapproachachievesanerrorof0:79%ontheMNISTdatasetand3:4%errorontheUSPSdataset,whilerunningatspeedscomparabletothefastestalgorithmsonthesedatasetswhicharebasedonmultilayerneuralnetworksandaresignicantlyfasterandeasiertotrain.1IntroductionHandwrittendigitrecognitionhasbeenafertilegroundforexploringseverallearningtechniquesrangingfromautomaticallylearningfeaturerepresentations[22,14],learningclassiersinvarianttodistortions[7],matchingandalignmentbaseddistances[2]andlearningmultilayeredrepresen-tationsofdata[16],eversincedatasetslikeMNIST1andUSPS[13]havebeenintroduced.Severaltechniquesthatworkwellonthisdatasetrelyonautomaticallyselectedfeaturesorintermediaterep-resentationslearnedfromthedatarepresentedbyrawpixelvalues.Thecomputervisioncommunityontheotherhandhasoverseveralyearsconstructedrepresentationsthatarerobusttoimagechangeslikecontrastandsmalldistortionsinrotation,translationandscale.Thesefeatureoftenconsistofcomputingimagegradientfeaturesandconstructinglocalhistogramsoftheseoverorientations.Inthispaperweshowthatthesefeaturescanbeadaptedfordigitrecognition.ByacarefulchoiceofthefeatureparametersonecanobtaincompetitiveresultsontheMNISTandtheUSPSdatasets.RobustfeaturesmeansthatonecanobtainbetteraccuraciesforagivennumberoftrainingexamplesevenwithsimpleclassierslikealinearoradditivekernelSVM.Ourproposedpipelineisquitestraightforwardtoimplementandtakesrelativelysmalltimetobothtrainandtestsuchmodels.Webelievethatthesimplicityandthecompetitivenessofthisapproachmakesitidealforcomputervisionresearchersasabaselineimplementationforavarietyoftasks.2PreviousWorkHandwrittendigitrecognitionhasreceivedconsiderableattentionfromthemachinelearningcom-munity.Severalapproachesachievecompetitiveperformanceforthistaskincludingthosebasedon 1http://yann.lecun.com/exdb/mnist/1 multilayerneuralnetworks,supportvectormachines,nearestneighbormethods.LeCunet.al.[16]containsanexcellentsurveyofvariousapproachesonthesedatasets.Onecanlookattheapproachesalongtheaxisoffeaturerichandlearningrich.Featurerichapproachestrytocomputeagooddistancebetweenexamplesandoftenusenearestneighborforclassication.Simple`2distancebetweenrawpixels[24]areatthelowerendofthisaxis.Tangentdistance,whichcomputesanapproximatemanifolddistancebasedonalocallinearapproximationofthemanifoldfallsinthemiddle[23],whiledistancesbasedonalignmentoflocalfeatures,forexampletheapproachof[2],whichusedshape-contextfeaturesandthinplatesplines[2]arethesomeofthemostcomplexdis-tancesbetweenexamples.Learningrichapproachesrelyonthelearningmachinerywitharelativelysimplefeaturerepresentationandasignicantfractionoftheliteratureondigitclassicationisde-votedtothis.Manyofthesemethodsstartfromrawpixelsandlearnclassierseitherbylearningintermediaterepresentationsusingneuralnetworks[16,20,22]orprojectingontoahigherdimen-sionimplicitlyandndingaseparatinghyperplaneusingkernelSVMs[4,7].Theseapproachesworkquitewellonthisdataset.Howeverthereareveryfewapproacheswhicharebothfeaturerichandlearningrich.AnexampleistheworkofHaoZhang[25],whoproposedtheSVM-kNNmethodforlearningalocalSVMmodelbasedonkernelcomputedfrompairwiseshapecontextdistanceswhichisbothfeatureandlearningrich.Howevertheydonotpresentresultsonthefulldataset,whichmakesitdifculttocomparewiththeirresults.Ourworkisinthisspiritwhereweusethegoodfeaturerepresentationsandcombineitwithadiscriminativelearningframework,inparticularSupportVectorMachines.Anothersetofusefuldimensionstocomparevariousalgorithmsaretrainingandtesttime.Al-gorithmslikek-NN,havezerotrainingtimebutareexpensiveduringruntime(atleastthenaiveversion).Neuralnetworkshavetheoppositeproblemrequiringhugeamountsoftrainingdataandtimetolearngoodmodels,butthefeedforwardnaturemakesthemextremelyefcientduringrun-time.Somewhereinthemiddlearesupportvectormachineswhichusethelatestdevelopmentsinconvexoptimizationtheorytotrainclassierswhileattesthaveacomplexitywhichisonlyafractionofabruteforcek-NNmodel(foranonlinearkernelSVM)asthenumberofsupportvectorstendbeasmallfractionofthetrainingdata.LinearSVMsareextremelyefcient[9]forbothtrainingandtest,buttheyoftenrequirecarefullydesignedfeaturesforcompetitiveperformance.WeusethebestofbothworldsusingfasttrainingalgorithmsfromtheSVMliteratureandgoodfeaturesfromthevisionliteraturefordigits.WebeginwithexperimentsontheMNISTdatasetinSection3.WestartwithfeaturesbasedonrawpixelsinSection3.1andreproducetheresultsreportedintheliterature.WethenshowthatswitchingtogradientbasedfeaturesimprovestheperformanceofthesystemsignicantlyinSection3.2.ForcompletenesswealsoreportresultsontheUSPSdatasetinSection4.WesummarizeourresultsandpresentourconclusionsinSection5.3MNISTDatasetExperimentsOurrstexperimentsareontheMNISTdatasetintroducedbyYannLeCunandCorinnaCortes.Thedatasetcontains60;000examplesofdigits09fortrainingand10;000examplesfortesting.Ourfeaturesarebasedonspatialpyramidsoverresponsesinvariouschannelscomputedfromtheimage.Theideaistoconstructfeaturesbyaddingtheresponseswithinblocksofincreasingsizes.Thechannelcouldjustbetherawpixelvaluesorresponseinaparticularorientationdirection.Variantsofthesefeatureslikethepyramidmatchkernel[12],spatialpyramidmatchkernel[15]andhistogramsoforientedgradients[6]haveledtoimpressiveresultsontheCaltech101[10],PascalVOC[1]datasetsaswellaspedestriandetection.Weexperimentwithseveralchoicesofthefeatureliketheorientationltertypeandscale,poolingchoicesforcomputingblockedhistogramsandwaystosampleblocksontheimages.Ingeneraltheperformanceusingorientationchannelsaresignicantlybetterthanrawpixelsontypicalimages,butwepresentexperimentsusingrawpixelsforcomparisonasmanyofpreviouslypublishedworkuserawpixelsasfeatures.2 Complexity kernel errorrate(%) O(1) linear 15.38 int 13.29 O(#SV) poly 7.41 rbf 8.10 kernel errorrate(%) linear 14.84 int 9.02 poly 7.71 rbf 6.57 Table1:ErrorratesontheMNISTdatasetusingrawpixels(left)andpyramidofrawpixels(right).Onlytherst1000exampleswereusedfortraining.3.1RawPixelFeaturesTheinputimageisa2828grayscaleimagewitheachpixelvalue2f0;:::;255g.Variouskindsofpreprocessinghavebeenproposedintheliteraturetoimproveaccuracy.Twoofthemostpopularonesare:Deskewing-theimageisalignedsothattheprincipalcomponentisalongtheY-axis.Thisoperationverticallyalignsallthe1sinthedataset.InkNormalization-theimageisnormalizedsothatthe`2-normofthepixelvaluesis1.Inourexperimentsweperforminknormalization,aswefoundthatitimprovesperformance,partic-ularlywithalinearkernel,whiledonotperformthedeskewingoperation.Thesimplestfeatureistousethe2828pixelsasafeaturevectorandtrainaclassierbasedonakernelSVM.Wetrainone-vs-allclassiersusingLIBSVM[5],oneforeachdigit.Atestexampleisassignedtheclasswiththehighestposteriorprobabilitywhichisestimatedbasedonthemarginofthetestexample.Table1,showstheperformanceofvariouskernelsusingrawpixelstrainedontherst1000trainingexamplesandusingallthe10;000testexamplesfortesting.Wereporttheerrorratesusingthefulltrainingsetof60;000onlyonthebestperformingkernelsasittakesquiteawhiletotrainSVMsusingnonlinearkernels.Wheneveravailablewewillquotenumbersfromtheliterature.BoththelinearandtheintersectionkernelSVMperformsignicantlyworsethanadegree5polynomialkernelorarbfkernelSVM.Thekernelsaredenedasfollows:klin(x;y)=xy(1)kint(x;y)=min(x;y)(2)kpoly(x;y)=(xy+1)5(3)krbf(x;y)=exp(jjxyjj2)(4)Wealsoinvestigatedhierarchicalfeatureswheretheimageisoverlaidwithagridofcellsizeandpixelswithinseachcellareaddedup.Thisissameasdownsamplingtheimageandusingtherawpixelsinthedownsampledimageasfeatures.Inourexperimentswechoosegridsizes()rangingfrom1;2;3;4;5;6;7and14.Here=1,istheoriginalimageand=14isthecoarsestimagewheretheoriginal2828imageisdownsampledto22pixels.Wefoundthatoverlappinggridsoffsetbyhalfthecellsizeimprovestheperformanceovernonoverlappinggrids.Allthefeaturesareassignedequalweightsapriori.Table1,showstheaccuracyofalinearandanintersectionkernelSVMtrainedonthesefeatures.ThelinearSVMdoesnotgainanypowerusingthehierarchyasthehigherlevelsofthehierarchyarejustlinearcombination(sums)ofthefeaturesinthelowerlevels.ThisisreectedbyaverysmallincreaseinaccuracyoverthebaselinelinearSVMonrawpixels,whilethereismorethana4increaseinthefeaturedimension.TheintersectionkernelSVMbeingnonlinearhoweverexhibitsanimprovedaccuracy,butisstillworsethanapolynomialorrbfkernelSVM.3 3.2GradientHistogramFeaturesWeexperimentwithfeaturesconstructedusinghistogramsoforientedgradientswhichhavebecomepopularinthevisionliteratureforrepresentingobjects[2,6,11,15,17]andscenes[21].Eachpixelintheimageisassignedanorientationandmagnitudebasedonthelocalgradientandhistogramsareconstructedbyaggregatingthepixelresponseswithincellsofvarioussizes.Weconstructhistogramswithcellsizes1414,77and44withoverlapofhalfthecellsize.Thehistogramsateachlevelaremultipliedbyweights1;2and4andconcatenatedtogethertoformasinglehistogramwhicharethenusedtotrainkernelSVMs.Thisisverysimilartothespatialpyramidmatching[15]whenusedwiththeintersectionkernel(wedifferintheoverlappinggrids).Thevariouschoicesforthedescriptorareasfollows:1.OrientedDerivativeFilterTheinputgrayscaleimageisconvolvedwithlterswhichre-spondtohorizontalandverticalgradientsfromwhichthemagnitudeandorientationiscomputed.Letrh(p)andrv(p)betheresponseinthehorizontalandverticaldirectionatapixelprespectively,thenthemagnitudem(p)andtheanglea(p)ofthepixelisgivenby:m(p)= rh(p)2rv(p)2(5)a(p)=atan2(rh(p);rv(p))[0;360)(6)Weexperimentwithtaplters,SobelandorientedGaussianderivative(OGF)lters.2.Signedvs.UnsignedTheorientationcouldbesigned(0360)orunsigned(0180).Thesignedgradientdistinguishesbetweenblacktowhiteandwhitetoblacktransitionswhichmightbeusefulfordigits.3.NumberofOrientationBinsTheorientationateachpixelisbinnedintoadiscretesetoforientationsbylinearinterpolationbetweenbincenterstoavoidaliasing.Table3showstheperformanceoftheclassierforvariouschoicesofthedescriptor.Thereisasignicantreductionintheerrorratescomparedtorawpixelfeatures.Weobtainanerrorof2:64%usingjust1000trainingexampleswiththeintersectionkernelSVM.Theperformanceofthelinearkernelisalsoquitegoodat4:54%,whichissignicantlybetterthanboththepolynomialandrbfkernelSVMtrainedontherawpixelfeatures.Asexpectedthesignedgradientsperformbetterthantheunsignedgradients(2:64%vs.2:97%).AmongtheorientedderivativeltersforthesignedgradientstheGaussianltersperformthebest.Wefoundthat=2with12orientationbinsgaveusthelowesterrorrateasseeninTable2.WetrainedaintersectionkernelSVMonthefeaturesobtainedusing12bins,signedgradientsandthethreechoicesoftheltersontheentiretrainingsetandobtainanerrorrateof0:79%usingtheorientedGaussianderivativelters,0:83%usingtheSobellterand0:86%usingthetaplter.Table4showsthenumberofmisclassicationsforeachdigit.ThesenumbersarequiteclosetothestateoftheartusingSVMs.ForexamplethebestnumbersreportedusingSVMsis0:56%usingdegree9polynomialkernelontherawpixelfeaturesusingtheVSV2method[7].Howevertheauthorsperformdeskewingandjitteringonthetrainingexamplestoimprovetheperformance.Thisleadstosignicantlyslowertrainingtimesaswellasanaverageofabout16;000supportvectorsperclassleadingtoveryslowtesttimes.ThebestperformancewithdeskewingbutnojitteringusinganSVMis1:0%usingdegree5polynomialkernelsand1:1%usingrbfkernels[16].Weoutperformboththese,whileatthesametimebyusingadditivekernelsweavoidtheruntimeandoverheadofstoringandcomparingatestexamplewithallthesupportvectors[18].ThismakestheintersectionkernelSVMsatleastthreeordersofmagnitudefasterthantheVSV2method.Trainingadegree5polynomialkernelSVMonthesamefeaturesimprovestheperformanceevenfurtherto0:56%erroratthecostofincreasedruntime.ThisisstillfasterthanVSV2aswehaveonlyabout1;200supportvectorsonaverageandourfeaturesareonly2:77largerthantherawpixelfeaturesusedinVSV2.Burgeset.al.[4]hasproposedthereducedsetmethodstoreducethenumberofsupportvectors4 ¾ 1 2 3 ErrorRate(%) 2.74 2.64 2.67 Table2:Effectofthebandwidth()oftheorientedGaussianderivativeltersusing12orientationbins,signedresponsesand1000trainingexamples. SignedResponse UnsignedResponse TestError(%) TestError(%) nori featdim Tap Sobel OGF(=2) nori featdim Tap Sobel OGF(=2) 8 1148 2.97 2.93 2.85 4 724 4.06 4.39 4.37 9 1629 2.75 2.77 2.79 6 1086 3.53 3.58 3.70 12 2172 2.71 2.68 2.64 8 1148 3.31 3.35 3.33 16 2896 2.74 2.83 2.66 12 2172 2.97 3.08 3.21 Table3:ErrorratesontheMNISTdatasetusingpyramidofhistogramsoforientedgradients.Onlytherst1000exampleswereusedfortraining.toafractionoftheoriginalataslightlossinperformance.Thebestnumberreportedusingthattechniqueis1:1%.Figure1showstheperformanceonthetestdatausingtheorientedenergybasedfeaturesforvarioustrainingsizes.Wekeepthelearningparametersxedat=10inLIBSVMforalltherunsofboththelinearandintersectionkernelSVM.4USPSDatasetExperimentsForcompletenesswealsopresentexperimentsontheUSPSdataset.Thisdatasetcontains7291trainingexamplesand2007testexamplesofdigits09.Thisdatasetisconsideredquitehardwithreportedhumanerrorrateof2:5%.Weranexperimentsusingthefollowingsettingsoffeatures:orientedGaussianderivativelterswith=1astheimagesare1616pixels,blocksizesof1616,88and44and12orientationbins.Table5showstheerrorratesofvariousmethodsonthisdataset.ForcomparisonwealsoincluderawpixelaccuraciesusinglinearandintersectionkernelSVMs.Onceagainthesefeatureswithintersectionkernelperformclosetothestateoftheart.Usingrbfkernelsweoutperformthestateoftheartonmethodswhichusethesametrainingdataasours. PerDigitErrors Kernel Gradient 0123456789 ErrorRate Tap 15748101311918 0.86% intersection Sobel 159441014101313 0.83% OGF(=2) 0487781081413 0.79% poly,d=5 OGF(=2) 1463558879 0.56% Table4:ErrorsonMNIST(10,000testexamples).Thebesterrorrateusingtheintersectionker-nelSVMis0:79%usingOGFlters.TrainingapolynomialkernelSVMonthesamefeaturesgivesanerrorrateof0:56%whichissameasthepreviousbestresultsusingSVMs(VSV2methodfrom[7]).HoweverthepolynomialkernelSVMisatleastthreeordersofmagnitudeslowerthantheintersectionkernelSVMduringclassication.5 102 103 104 105 0 5 10 15 20 25 30 35 Number of Training ExamplesError RatePerformance on MNIST Dataset Gradient, Int Gradient, Linear Raw, Poly Raw, Rbf Figure1:ComparisonofkernelSVMforvarioustrainingsizesusingpyramidfeaturesonnthefulltrainingset(60;000examples).Usingthegradientfeaturesthetheerrorratesare0:79%usingintersectionkerneland1:44%usinglinearkernelSVM.Theperformanceusingtherawpixelsis1:41%usingrbfand1:34%usingthepolynomialkernels.Thegradientfeaturesperformbetterusingthelinearandintersectionkernelscompraredtorbfandpolynomialkernelssignicantlywhenthenumberoftrainingdataissmallsuggestingthatthegradientfeaturescapturetheinvariancesinthedigitsquitewell.WedidnottrainthepolynomialandrbfkernelSVMsonthegradientfeaturesasboththetrainingandtesttimewereveryhigh. Feature Classier ErrorRate RawPixels SVM(linear) 11.3% RawPixels SVM(intersection) 8.7% RawPixels SVM(poly,d=3)[7] 4.0% RawPixels VSV(poly,d=3)[7] 3.2% PHOG SVM(linear) 3.4% PHOG SVM(intersection) 3.4% PHOG SVM(poly,d=5) 3.2% PHOG SVM(rbf,=0:1) 2.7% RawPixels TangentDistance[23]* 2.6% RawPixels BoostedNeuralNets[8]* 2.6% HumanErrorRate[3] 2.5% Table5:SummaryofvariousresultsontheUSPSdataset.BoththelinearandtheintersectionkernelSVMsoutperformtheexistingnumbersusingSVMswhichisat4%.TheVSVmethodwhichjitterstheSupportVectorstocreateadditionaltrainingexamples,andretrainsaSVM,leadstoanimprovedaccuracyof3:2%.UsingpolynomialandrbfkernelSVMsonPHOGfeaturesreducestheerrorrateevenfurtherto3:2%and2:7%respectively.Someoftheresultsshownin*useadifferenttrainingdatasetwhichhasbeenenhancedbyaddingmachine-printedcharacters.Notethatournumbersarethebestintheunmodiedversionofthedataset.6 4 ® 9116 2 ® 7322 5 ® 3341 6 ® 0446 3 ® 5450 8 ® 2583 4 ® 9741 3 ® 5939 8 ® 9948 1 ® 3957 6 ® 01015 4 ® 61113 7 ® 21227 9 ® 41233 9 ® 51248 7 ® 11261 8 ® 21365 5 ® 31394 8 ® 71531 9 ® 51710 5 ® 21738 8 ® 31879 9 ® 41902 5 ® 32036 4 ® 92054 2 ® 02099 6 ® 02119 4 ® 92131 9 ® 82294 9 ® 12388 9 ® 42407 2 ® 02463 2 ® 42489 5 ® 32598 6 ® 12655 8 ® 02897 9 ® 52940 3 ® 52954 6 ® 03031 8 ® 93290 6 ® 03423 6 ® 43521 5 ® 03559 8 ® 93728 4 ® 63781 7 ® 83809 1 ® 33907 7 ® 14028 9 ® 84079 1 ® 74202 2 ® 74206 9 ® 74225 7 ® 34239 2 ® 84249 9 ® 54285 3 ® 24444 6 ® 84572 3 ® 54741 9 ® 84762 8 ® 04808 6 ® 04815 8 ® 64880 8 ® 25750 5 ® 35938 3 ® 85956 3 ® 06012 8 ® 96556 1 ® 86573 7 ® 16577 8 ® 36626 2 ® 18060 8 ® 58409 7 ® 29010 7 ® 29016 6 ® 49680 9 ® 79693 5 ® 69730 4 ® 99793 2 ® 79840 Figure2:Allthe79misclassicationsusingpyramidHOGfeatureswiththeintersectionkernelSVM,ontheMNISTdataset.XYonthetoprightcornerofeachexampledenotesthatXismisclassiedasY.Thenumberinthebottomleftcorneristheindexoftheexampleinthetestset.7 5ConclusionsThereareseveralinterestingaspectsofourapproach.Wediscusseachofthembriey:LearningRateTheorientedhistogrambasedfeaturessignicantlyoutperformrawpixelfeatureswhenthenumberoftrainingexamplesaresmall.InfacttheintersectionkernelSVMhassimilarperformanceusingjust4;000trainingexamplescomparedto60;000examplesfortherawpixelbasedfeaturesasseeninFigure1.Thisshowsthattheorientedgradientshistogramfeaturescapturetheinvarianceinthedigitsquitewell.NumberofSupportVectorsThenumberofsupportvectorsforthefullclassierforthehistogrambasedfeaturesaremuchsmallerthanthoseforpolynomialkernels.Wehaveonaverage1304supportvectorscomparedto3242supportvectorsforthepolynomialkernelusing10;000trainingexamples.Thissuggeststhatourfeaturesmakesthelearningeasier,i.e.,thedataismuchmoreseparable.ThisisreectedinthegoodperformanceofalinearSVMonthehistogramfeatures,2:64%comparedto15:38%usinglinearSVMontherawpixels.ClassicationComplexityBoththelinearandtheintersectionkernelSVMsarefastforclassica-tion,i.e.runtimeisindependentofthenumberofsupportvectors.Thefeaturecomputationstepisquitefast,asitinvolvesconvolutionwithseparableltersfollowedbycomputationofblockhis-tograms.Allthiscanbedoneintimelinearinthenumberofpixelsusingintegralhistograms.Intheendwehavea2172dimensionalfeaturevectorandtheclassicationusinglinearSVMrequires2172multiplicationsperclasswhiletheintersectionkernelSVMrequiresabout5timesasmanyusingthepiecewiselinearapproximationtotheclassicationfunction[18].Theestimatednumberofmultiply-addoperationsrequiredbythelinearSVMisabout40Kwhiletheintersectionkernelrequiresabout125Koperations.Notethatthisincludesthetimetocomputethefeatures.Thisissignicantlylessthanabout14millionoperationsrequiredbyapolynomialkernelSVM.There-ducedsetmethods[4]requiresapproximately650Koperations,whiletheneuralnetworkmethodslikeLeNet5(0:9%error)requires350KandtheboostedLeNet4(0:7%error)requires450Kop-erations2.Forasmallcostforcomputingfeaturesweareabletoachievecompetitiveperformancewhileatthesametimearefaster.TrainingTimeOnesignicantadvantageofkernelSVMsoverneuralnetsistherelativeeaseandspeedduringtraining.Ourintersection/linearSVMclassiershasjustonehyperparameter,C,whichtradesofftheregularizationandmisclassicationpenalties.Wesetthatto10foralldigits.WefoundthattheperformancewasfairlyrobusttothevalueofCinthatrange.WithfastlinearSVMtrainingalgorithmslikeLIBLINEAR[9],onecantraintheseclassiersinafewminutestotal.FortheintersectionkernelwetrainusingLIBSVMwhichusesthesequentialminimizationoptimizationalgorithm(SMO).Thistakesabout4hoursonaverageperclass.Howeveronemaytrytousevariantsofstochasticgradientdescentalgorithms(e.g.[19])totrainanapproximateadditiveclassiersevenfaster.ThususingvariantsofhistogramsoforientedgradientsfeaturesandtheintersectionkernelSVMwegetanapproachwhichisthebestintermsofallthreecriteria:accuracy,computationtimeattrainingandcomputationtimeattesting.References[1]Pascalvisualobjectchallenge.http://pascallin.ecs.soton.ac.uk/challenges/VOC.[2]S.Belongie,J.Malik,andJ.Puzicha.Shapematchingandobjectrecognitionusingshapecontexts.PatternAnalysisandMachineIntelligence,IEEETransactionson,24(4):509–522,Apr2002.[3]J.BromleyandE.Sackinger.Neural-networkandk-nearest-neighbor-classiers.TechnicalReport11359-910819-16TM,ATT,1991. 2Thesenumbersaretakenfrom[4]8 [4]C.J.BurgesandB.Schlkopf.Improvingtheaccuracyandspeedofsupportvectormachines.InAdvancesinNeuralInformationProcessingSystems9,pages375–381.MITPress,1997.[5]C.-C.ChangandC.-J.Lin.LIBSVM:alibraryforsupportvectormachines,2001.Softwareavailableathttp://www.csie.ntu.edu.tw/˜cjlin/libsvm.[6]N.DalalandB.Triggs.Histogramsoforientedgradientsforhumandetection.CVPR,2005.[7]D.DecosteandB.Sch¨olkopf.Traininginvariantsupportvectormachines.Mach.Learn.,46(1-3):161–190,2002.[8]H.Drucker,R.Schapire,andP.Simard.Boostingperformanceinneuralnetworks.7:705–719,1993.[9]R.-E.Fan,K.-W.Chang,C.-J.Hsieh,X.-R.Wang,andC.-J.Lin.Liblinear:Alibraryforlargelinearclassication.J.Mach.Learn.Res.,9:1871–1874,2008.[10]L.Fei-Fei,R.Fergus,andP.Perona.Learninggenerativevisualmodelsfromfewtrainingexamples:anincrementalbayesianapproachtestedon101objectcategories.InCVPR,2004.[11]P.Felzenszwalb,D.McAllester,andD.Ramanan.Adiscriminativelytrained,multiscale,deformablepartmodel.InProc.ComputerVisionandPatternRecognition,2008.[12]K.GraumanandT.Darrell.Thepyramidmatchkernel:Discriminativeclassicationwithsetsofimagefeatures.InICCV'05:ProceedingsoftheTenthIEEEInternationalConferenceonComputerVision,pages1458–1465,Washington,DC,USA,2005.IEEEComputerSociety.[13]J.Hull.Adatabaseforhandwrittentextrecognitionresearch.IEEETransactionsonPatternAnalysisandMachineIntelligence,16(5):550–554,1994.[14]K.Kavukcuoglu,M.Ranzato,R.Fergus,andY.LeCun.Learninginvariantfeaturesthroughtopographicltermaps.InProc.InternationalConferenceonComputerVisionandPatternRecognition(CVPR'09).IEEE,2009.[15]S.Lazebnik,C.Schmid,andJ.Ponce.Beyondbagsoffeatures:Spatialpyramidmatchingforrecognizingnaturalscenecategories.ComputerVisionandPatternRecognition,2006IEEEComputerSocietyConferenceon,2:2169–2178,2006.[16]Y.LeCun,L.Bottou,Y.Bengio,andP.Haffner.Gradient-basedlearningappliedtodocumentrecognition.ProceedingsoftheIEEE,86(11):2278–2324,November1998.[17]D.G.Lowe.Objectrecognitionfromlocalscale-invariantfeatures.InICCV'99:ProceedingsoftheInternationalConferenceonComputerVision-Volume2,page1150,Washington,DC,USA,1999.IEEEComputerSociety.[18]S.Maji,A.Berg,andJ.Malik.Classicationusingintersectionkernelsupportvectormachinesisefcient.ComputerVisionandPatternRecognition,2008.CVPR2008.IEEEConferenceon,pages1–8,June2008.[19]S.MajiandA.C.Berg.Maxmarginadditiveclassiersfordetection.InProc.InternationalConferenceonComputerVision,2009.[20]R.Marc'Aurelio,C.Poultney,S.Chopra,andY.LeCun.Efcientlearningofsparserepre-sentationswithanenergy-basedmodel.InJ.P.etal.,editor,AdvancesinNeuralInformationProcessingSystems(NIPS2006).MITPress,2006.[21]A.OlivaandA.Torralba.Modelingtheshapeofthescene:Aholisticrepresentationofthespatialenvelope.Int.J.Comput.Vision,42(3):145–175,2001.[22]M.Ranzato,F.Huang,Y.Boureau,andY.LeCun.Unsupervisedlearningofinvariantfeaturehierarchieswithapplicationstoobjectrecognition.InProc.ComputerVisionandPatternRecognitionConference(CVPR'07).IEEEPress,2007.[23]P.Simard,Y.LeCun,andJ.S.Denker.Efcientpatternrecognitionusinganewtransformationdistance.InAdvancesinNeuralInformationProcessingSystems5,[NIPSConference],pages50–58,SanFrancisco,CA,USA,1993.MorganKaufmannPublishersInc.[24]K.Wilder.http://oldmill.uchicago.edu/˜wilder/Mnist/.[25]H.Zhang.AdaptingLearningTechniquesforVisualRecognition.PhDthesis,EECSDepart-ment,UniversityofCalifornia,Berkeley,May2007.9