71K - views

UsingVeryDeepAutoencodersfor ContentBasedImageRetrieval Alex Krizhevsky and Geor

Hinton University of Toronto Department of Computer Science 6 Kings College Road Toronto M5S 3H5 Canada Abstract We show how to learn many layers of features on color images and we use these features to initialize deep auto enco ders We then use

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "UsingVeryDeepAutoencodersfor ContentBase..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

UsingVeryDeepAutoencodersfor ContentBasedImageRetrieval Alex Krizhevsky and Geor






Presentation on theme: "UsingVeryDeepAutoencodersfor ContentBasedImageRetrieval Alex Krizhevsky and Geor"— Presentation transcript:

featureextractionprocesstoallowaDBNtolearnreallygoodshortcodes.Asaresult,therehasbeennoproperevaluationofbinarycodesproducedbydeeplearningforimageretrieval.In[9],theauthorsintroducedanewandveryfastspectralmethodforgen-eratingbinarycodesfromhigh-dimensionaldataandshowedthatthesespectralcodesare,insomecases,moreusefulforimageretrievalthanbinarycodesgen-eratedbyautoencoderstrainedontheGISTdescriptors.WedemonstratethatspectralcodesdonotworkaswellasthecodesproducedbyDBN-initializedautoencoderstrainedontherawpixels.2HowthecodesarelearnedDBNsaremultilayer,stochasticgenerativemodelsthatarecreatedbylearningastackofRestrictedBoltzmannMachines(RBMs),eachofwhichistrainedbyusingthehiddenactivitiesofthepreviousRBMasitstrainingdata.EachtimeanewRBMisaddedtothestack,thenewDBNhasabettervariationallowerboundonthelogprobabilityofthedatathanthepreviousDBN,providedthenewRBMislearnedintheappropriateway[3].Wetrainon1.6million3232colorimagesthathavebeenpreprocessedbysubtractingfromeachpixelitsmeanvalueoverallimagesandthendividingbythestandarddeviationofallpixelsoverallimages.TherstRBMinthestackhas8192binaryhiddenunitsand3072linearvisibleunitswithunitvariancegaussiannoise.AlltheremainingRBM'shaveNbinaryhiddenunitsand2Nbinaryvisibleunits.DetailsofhowtotrainanRBMcanbefoundin[1].Weusethestandardcontrastivedivergencelearningprocedurewhichhasfoursteps:1.Foreachdata-vector,v,inamini-batch,stochasticallypickabinarystatevector,hforthehiddenunits:p(hj=1jv)=(bj+Xi2visviwij)(1)wherebjisthebias,wij,isaweight,and(x)=(1+exp(�x))�1.2.Stochasticallyreconstructeachvisiblevectorasv0usingtherstequa-tionforbinaryvisibleunitsandthesecondforlinearvisibleunits,whereN(;V)isaGaussian.p(v0i=1jh)=(bi+Xj2hidhjwij);orv0i=N(bi+Xj2hidhjwij;1)(2)3.Recomputethehiddenstatesash0usingEq.1withv0insteadofv.4.Updatetheweightsusingwij/hvihji�hv0ih0jiwheretheanglebracketsdenoteaveragesoverthemini-batch. 256-bitdeep256-bitspectralEuclideandistance Figure2:Retrievalresultsfromthe1.6milliontinyimagesdatasetusingafulllinearsearchwith256-bitdeepcodes,256-bitspectralcodes,andEuclideandistance.Thetop-leftimageineachblockisthequeryimage.Theremainingimagesaretheclosestretrievedmatchesinscan-lineorder.Thedatasetcontainssomenear-duplicateimages.classasthequeryimage,averagedover5,000queries.Weusedexactlythesameautoencoders,butusedqueryimagesfromtheCIFAR-10dataset[4],whichisacarefullylabeledsubsetofthe80milliontinyimages,containing60,000imagessplitequallybetweenthetenclasses:airplane,automobile,bird,cat,deer,dog,frog,horse,ship,andtruck.EachimageinCIFAR-10hasbeenselectedtocontainonedominantobjectoftheappropriateclassandonly3%oftheCIFAR-10imagesareinthesetof1.6milliontrainingimages.3.1RetrievalresultsQualitatively,afulllinearsearchof1.6millionimagesusing256-bitdeepcodesproducesbetterresultsthanusingEuclideandistanceinpixelspaceandisabout1000timesfaster.256-bitspectralcodesaremuchworse(seegure2).Pruningthesearchbyrestrictingittoimageswhose28-bitdeepcodediersby5bitsorlessfromthequeryimagecodeonlyveryslightlydegradestheperformanceofthe256-bitdeepcodes.Quantitatively,theorderingofthemethodsisthesame,with28-bitdeepcodesperformingaboutaswellas256-bitspectralcodes(seegure3).Thebestperformanceisachievedbyamoreelaboratemethoddescribedbelowthatcreatesacandidatelistbyusingmanysearcheswithmanydierent28-bitcodeseachofwhichcorrespondstoatransformedversionofthequeryimage.4MultiplesemantichashingSemantichashingretrievesobjectsinatimethatisindependentofthesizeofthedatabaseandanobviousquestioniswhetherthisextremespeedcanbetradedformoreaccuracybysomehowusingmanydierent28-bitcodingschemesandcombiningtheirresults.Wenowdescribeonewayofdoingthis. [7]A.Torralba,R.Fergus,andW.T.Freeman.80milliontinyimages:alargedatabasefornon-parametricobjectandscenerecognition.IEEEPAMI,30(11):19581970,November2008.[8]A.Torralba,R.Fergus,andY.Weiss.Smallcodesandlargeimagedatabasesforrecognition.InProceedingsoftheIEEEConfonComputerVisionandPatternRecognition,2008.[9]Y.Weiss,A.Torralba,andR.Fergus.Spectralhashing.InProceedingsofNeuralInformationProcessingSystems,2008.