/
UsingVeryDeepAutoencodersfor ContentBasedImageRetrieval Alex Krizhevsky and Geor UsingVeryDeepAutoencodersfor ContentBasedImageRetrieval Alex Krizhevsky and Geor

UsingVeryDeepAutoencodersfor ContentBasedImageRetrieval Alex Krizhevsky and Geor - PDF document

tatyana-admore
tatyana-admore . @tatyana-admore
Follow
416 views
Uploaded On 2014-10-06

UsingVeryDeepAutoencodersfor ContentBasedImageRetrieval Alex Krizhevsky and Geor - PPT Presentation

Hinton University of Toronto Department of Computer Science 6 Kings College Road Toronto M5S 3H5 Canada Abstract We show how to learn many layers of features on color images and we use these features to initialize deep auto enco ders We then use ID: 2921

Hinton University Toronto

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "UsingVeryDeepAutoencodersfor ContentBase..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

featureextractionprocesstoallowaDBNtolearnreallygoodshortcodes.Asaresult,therehasbeennoproperevaluationofbinarycodesproducedbydeeplearningforimageretrieval.In[9],theauthorsintroducedanewandveryfastspectralmethodforgen-eratingbinarycodesfromhigh-dimensionaldataandshowedthatthesespectralcodesare,insomecases,moreusefulforimageretrievalthanbinarycodesgen-eratedbyautoencoderstrainedontheGISTdescriptors.WedemonstratethatspectralcodesdonotworkaswellasthecodesproducedbyDBN-initializedautoencoderstrainedontherawpixels.2HowthecodesarelearnedDBNsaremultilayer,stochasticgenerativemodelsthatarecreatedbylearningastackofRestrictedBoltzmannMachines(RBMs),eachofwhichistrainedbyusingthehiddenactivitiesofthepreviousRBMasitstrainingdata.EachtimeanewRBMisaddedtothestack,thenewDBNhasabettervariationallowerboundonthelogprobabilityofthedatathanthepreviousDBN,providedthenewRBMislearnedintheappropriateway[3].Wetrainon1.6million3232colorimagesthathavebeenpreprocessedbysubtractingfromeachpixelitsmeanvalueoverallimagesandthendividingbythestandarddeviationofallpixelsoverallimages.TherstRBMinthestackhas8192binaryhiddenunitsand3072linearvisibleunitswithunitvariancegaussiannoise.AlltheremainingRBM'shaveNbinaryhiddenunitsand2Nbinaryvisibleunits.DetailsofhowtotrainanRBMcanbefoundin[1].Weusethestandardcontrastivedivergencelearningprocedurewhichhasfoursteps:1.Foreachdata-vector,v,inamini-batch,stochasticallypickabinarystatevector,hforthehiddenunits:p(hj=1jv)=(bj+Xi2visviwij)(1)wherebjisthebias,wij,isaweight,and(x)=(1+exp(�x))�1.2.Stochasticallyreconstructeachvisiblevectorasv0usingtherstequa-tionforbinaryvisibleunitsandthesecondforlinearvisibleunits,whereN(;V)isaGaussian.p(v0i=1jh)=(bi+Xj2hidhjwij);orv0i=N(bi+Xj2hidhjwij;1)(2)3.Recomputethehiddenstatesash0usingEq.1withv0insteadofv.4.Updatetheweightsusingwij/hvihji�hv0ih0jiwheretheanglebracketsdenoteaveragesoverthemini-batch. 256-bitdeep256-bitspectralEuclideandistance Figure2:Retrievalresultsfromthe1.6milliontinyimagesdatasetusingafulllinearsearchwith256-bitdeepcodes,256-bitspectralcodes,andEuclideandistance.Thetop-leftimageineachblockisthequeryimage.Theremainingimagesaretheclosestretrievedmatchesinscan-lineorder.Thedatasetcontainssomenear-duplicateimages.classasthequeryimage,averagedover5,000queries.Weusedexactlythesameautoencoders,butusedqueryimagesfromtheCIFAR-10dataset[4],whichisacarefullylabeledsubsetofthe80milliontinyimages,containing60,000imagessplitequallybetweenthetenclasses:airplane,automobile,bird,cat,deer,dog,frog,horse,ship,andtruck.EachimageinCIFAR-10hasbeenselectedtocontainonedominantobjectoftheappropriateclassandonly3%oftheCIFAR-10imagesareinthesetof1.6milliontrainingimages.3.1RetrievalresultsQualitatively,afulllinearsearchof1.6millionimagesusing256-bitdeepcodesproducesbetterresultsthanusingEuclideandistanceinpixelspaceandisabout1000timesfaster.256-bitspectralcodesaremuchworse(seegure2).Pruningthesearchbyrestrictingittoimageswhose28-bitdeepcodediersby5bitsorlessfromthequeryimagecodeonlyveryslightlydegradestheperformanceofthe256-bitdeepcodes.Quantitatively,theorderingofthemethodsisthesame,with28-bitdeepcodesperformingaboutaswellas256-bitspectralcodes(seegure3).Thebestperformanceisachievedbyamoreelaboratemethoddescribedbelowthatcreatesacandidatelistbyusingmanysearcheswithmanydierent28-bitcodeseachofwhichcorrespondstoatransformedversionofthequeryimage.4MultiplesemantichashingSemantichashingretrievesobjectsinatimethatisindependentofthesizeofthedatabaseandanobviousquestioniswhetherthisextremespeedcanbetradedformoreaccuracybysomehowusingmanydierent28-bitcodingschemesandcombiningtheirresults.Wenowdescribeonewayofdoingthis. [7]A.Torralba,R.Fergus,andW.T.Freeman.80milliontinyimages:alargedatabasefornon-parametricobjectandscenerecognition.IEEEPAMI,30(11):19581970,November2008.[8]A.Torralba,R.Fergus,andY.Weiss.Smallcodesandlargeimagedatabasesforrecognition.InProceedingsoftheIEEEConfonComputerVisionandPatternRecognition,2008.[9]Y.Weiss,A.Torralba,andR.Fergus.Spectralhashing.InProceedingsofNeuralInformationProcessingSystems,2008.