/
ZeroShot Learning with Semantic Output Codes Mark Palatucci Robotics Institute Carnegie ZeroShot Learning with Semantic Output Codes Mark Palatucci Robotics Institute Carnegie

ZeroShot Learning with Semantic Output Codes Mark Palatucci Robotics Institute Carnegie - PDF document

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
554 views
Uploaded On 2014-11-27

ZeroShot Learning with Semantic Output Codes Mark Palatucci Robotics Institute Carnegie - PPT Presentation

cmuedu Dean Pomerleau Intel Labs Pittsburgh PA 15213 deanapomerleauintelcom Geoffrey Hinton Computer Science Department University of Toronto Toronto Ontario M5S 3G4 Canada hintoncstorontoedu Tom M Mitchell Machine Learning Department Carnegie Mellon ID: 17814

cmuedu Dean Pomerleau Intel Labs

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "ZeroShot Learning with Semantic Output C..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

recognizersinconjunctionwithalargeknowledgebaserepresentingwordsascombinationsofphonemes.Toapplyasimilarapproachtoneuralactivitydecoding,wemustdiscoverhowtoinferthecompo-nentpartsofaword'smeaningfromneuralactivity.Whilethereisnoclearconsensusastohowthebrainencodessemanticinformation(Plaut,2002),thereareseveralproposedrepresentationsthatmightserveasaknowledgebaseofneuralactivity,thusenablinganeuraldecodertorecognizealargesetofpossiblewords,evenwhenthosewordsareomittedfromatrainingset.Thegeneralquestionthispaperasksis:Givenasemanticencodingofalargesetofconceptclasses,canwebuildaclassiertorecognizeclassesthatwereomittedfromthetrainingset?Weprovideaformalframeworkforaddressingthisquestionandaconcreteexampleforthetaskofneuralactivitydecoding.Weshowitispossibletobuildaclassierthatcanrecognizewordsapersonisthinkingabout,evenwithouttrainingexamplesforthoseparticularwords.1.1RelatedWorkTheproblemofzero-shotlearninghasreceivedlittleattentioninthemachinelearningcommunity.SomeworkbyLarochelleetal.(2008)onzero-datalearninghasshowntheabilitytopredictnovelclassesofdigitsthatwereomittedfromatrainingset.Incomputervision,techniquesforsharingfeaturesacrossobjectclasseshavebeeninvestigated(Torralba&Murphy,2007;Bart&Ullman,2005)butrelativelylittleworkhasfocusedonrecognizingentirelynovelclasses,withtheexceptionofLampertetal.(2009)predictingvisualpropertiesofnewobjectsandFarhadietal.(2009)usingvisualpropertypredictionsforobjectrecognition.Intheneuralimagingcommunity,Kayetal.(2008)hasshowntheabilitytodecode(fromvisualcortexactivity)whichnovelvisualscenesapersonisviewingfromalargesetofpossibleimages,butwithoutrecognizingtheimagecontentperse.TheworkmostsimilartoourownisMitchell(2008).TheyusesemanticfeaturesderivedfromcorpusstatisticstogenerateaneuralactivitypatternforanynouninEnglish.Inourwork,bycontrast,wefocusonworddecoding,wheregivenanovelneuralimage,wewishtopredictthewordfromalargesetofpossiblewords.Wealsoconsidersemanticfeaturesthatarederivedfromhumanlabelinginadditiontocorpusstatistics.Further,weintroduceaformalismforazero-shotlearnerandprovidetheoreticalguaranteesonitsabilitytorecognizenovelclassesomittedfromatrainingset.2ClassicationwithSemanticKnowledgeInthissectionweformalizethenotionofazero-shotlearnerthatusessemanticknowledgetoextrap-olatetonovelclasses.Whileazero-shotlearnercouldtakemanyforms,wepresentonesuchmodelthatutilizesanintermediatesetoffeaturesderivedfromasemanticknowledgebase.Intuitively,ourgoalistotreateachclassnotassimplyanatomiclabel,butinsteadrepresentitusingavectorofsemanticfeaturescharacterizingalargenumberofpossibleclasses.Ourmodelswilllearntherelationshipbetweeninputdataandthesemanticfeatures.Theywillusethislearnedrelationshipinatwosteppredictionproceduretorecovertheclasslabelfornovelinputdata.Givennewinputdata,themodelswillpredictasetofsemanticfeaturescorrespondingtothatinput,andthenndtheclassintheknowledgebasethatbestmatchesthatsetofpredictedfeatures.Signicantly,thisprocedurewillevenworkforinputdatafromanovelclassifthatclassisincludedinthesemanticknowledgebase(i.e.evenifnoinputspacerepresentationisavailablefortheclass,butafeatureencodingofitexistsinthesemanticknowledgebase).2.1FormalismDenition1.SemanticFeatureSpaceAsemanticfeaturespaceofpdimensionsisametricspaceinwhicheachofthepdimensionsencodesthevalueofasemanticproperty.Thesepropertiesmaybecategoricalinnatureormaycontainreal-valueddata.2 Inansweringthisquestion,ourgoalistoobtainaPAC-stylebound:wewanttoknowhowmucherrorcanbetoleratedinthepredictionofthesemanticpropertieswhilestillrecoveringthenovelclasswithhighprobability.Wewillthenusethiserrorboundtoobtainaboundonthenumberofexamplesnecessarytoachievethatleveloferrorintherststageoftheclassier.TheideaisthatiftherststageS()oftheclassiercanpredictthesemanticpropertieswell,thenthesecondstageL()willhaveagoodchanceofrecoveringthecorrectlabelforinstancesfromnovelclasses.Asarststeptowardsageneraltheoryofzero-shotlearning,wewillconsideroneinstantiationofasemanticoutputcodeclassier.Wewillassumethatsemanticfeaturesarebinarylabels,therststageS()isacollectionofPAC-learnablelinearclassiers(oneclassierperfeature),andthesecondstageL()isa1-nearestneighborclassierusingtheHammingdistancemetric.Bymakingtheseassumptions,wecanleverageexistingPACtheoryforlinearclassiersaswellastheoryforapproximatenearestneighborsearch.Muchofournearest-neighboranalysisparallelstheworkofCiacciaandPatella(2000).Werstwanttoboundtheamountoferrorwecantolerategivenapredictionofsemanticfeatures.Tondthisbound,wedeneFtobethedistributioninsemanticfeaturespaceofpointsfromtheknowledgebaseK.Clearlypoints(classes)insemanticspacemaynotbeequidistantfromeachother.Apointmightbefarfromothers,whichwouldallowmoreroomforerrorinthepredictionofsemanticfeaturesforthispoint,whilemaintainingtheabilitytorecoveritsuniqueidentity(label).Conversely,apointclosetoothersinsemanticspacewillhavelowertoleranceoferror.Inshort,thetoleranceforerrorisrelativetoaparticularpointinrelationtootherpointsinsemanticspace.WenextdeneapredictionqtobetheoutputoftheS()mapappliedtosomeraw-inputexamplex2Xd.Letd(q;q0)bethedistancebetweenthepredictionqandsomeotherpointq0representingaclassinthesemanticspace.WedenetherelativedistributionRqforapointqastheprobabilitythatthedistancefromqtoq0islessthansomedistancez:Rq(z)=P(d(q;q0)z)ThisempiricaldistributiondependsonFandisjustthefractionofsampledpointsfromFthatarelessthansomedistancezawayfromq.Usingthisdistribution,wecanalsodeneadistributiononthedistancetothenearestneighborofq,denedasq:Gq(z)=P(qz)whichisgiveninCiaccia(2000)as:Gq(z)=1�(1�Rq(z))nwherenisthenumberofactualpointsdrawnfromthedistributionF.Nowsupposethatwedeneqtobethedistanceapredictionqforraw-inputexamplexisfromthetruesemanticencodingoftheclasstowhichxbelongs.Intuitively,theclassweinferforinputxisgoingtobethepointclosesttopredictionq,sowewantasmallprobability thatthedistanceqtothetrueclassislargerthanthedistancebetweenqanditsnearestneighbor,sincethatwouldmeanthereisaspuriousneighborclosertoqinsemanticspacethanthepointrepresentingq'strueclass:P(qq) RearrangingwecanputthisintermsofthedistributionGqandthencansolveforq:P(qq) Gq(q) IfGq()wereinvertible,wecouldimmediatelyrecoverthevalueqforadesired .Forsomedistributions,Gq()maynotbea1-to-1function,sotheremaynotbeaninverse.ButGq()willneverdecreasesinceitisacumulativedistributionfunction.WewillthereforedeneafunctionG�1qsuchthat:G�1q( )=argmaxqhGq(q) i.SousingnearestneighborforL(),ifqG�1q( ),thenwewillrecoverthecorrectclasswithatleast1� probability.Toensurethatweachievethiserrorbound,weneedtomakesurethetotalerrorofS()islessthanG�1q( )whichwedeneasmaxq.WeassumeinthisanalysisthatwehavepbinarysemanticfeaturesandaHammingdistancemetric,somaxqdenesthetotal4 Inthelanguageofthesemanticoutputcodeclassier,thisdatasetrepresentsthecollectionDofraw-inputspaceexamples.Wealsocollectedtwosemanticknowledgebasesforthese60words.Intherstsemanticknowl-edgebase,corpus5000,eachwordisrepresentedasaco-occurrencevectorwiththe5000mostfrequentwordsfromtheGoogleTrillion-Word-Corpus2.Thesecondsemanticknowledgebase,human218,wascreatedusingtheMechanicalTurkhumancomputationservicefromAmazon.com.Therewere218semanticfeaturescollectedforthe60words,andthequestionswereselectedtoreectpsychologicalconjecturesaboutneuralactivityencoding.Forexample,thequestionsrelatedtosize,shape,surfaceproperties,andtypicalusage.Examplequestionsincludeisitmanmade?andcanyouholdit?.UsersoftheMechanicalTurkserviceansweredthesequestionsforeachwordonascaleof1to5(denitelynottodenitelyyes).4.2ModelInourexperiments,weusemultipleoutputlinearregressiontolearntheS()mapofthesemanticoutputcodeclassier.LetX2NdbeatrainingsetoffMRIexampleswhereeachrowistheimageforaparticularwordanddisthenumberofdimensionsofthefMRIimage.Duringtraining,weusethevoxel-stability-criterionthatdoesnotusetheclasslabelsdescribedinMitchell(2008)toreducedfromabout20,000voxelsto500.LetY2Npbeamatrixofsemanticfeaturesforthosewords(obtainedfromtheknowledgebaseK)wherepisthenumberofsemanticfeaturesforthatword(e.g.218forthehuman218knowledgebase).WelearnamatrixofweightsbW2dp.Inthismodel,eachoutputistreatedindependently,sowecansolveallofthemquicklyinonematrixoperation(evenwiththousandsofsemanticfeatures):bW=(XTX+I)�1XTY(3)whereIistheidentitymatrixandisaregularizationparameterchosenautomaticallyusingthecross-validationscoringfunction(Hastieetal.,2001)3.GivenanovelfMRIimagex,wecanobtainapredictionbfofthesemanticfeaturesforthisimagebymultiplyingtheimagebytheweights:bf=xbWForthesecondstageofthesemanticoutputcodeclassier,L(),wesimplyusea1-nearestneighborclassier.Inotherwords,L(bf)willtakethepredictionoffeaturesandreturntheclosestpointinagivenknowledgebaseaccordingtheEuclideandistance(L2)metric.4.3ExperimentsUsingthemodelanddatasetsdescribedabove,wenowposeandanswerthreeimportantquestions.1.Canwebuildaclassiertodiscriminatebetweentwoclasses,whereneitherclassappearedinthetrainingset?Toanswerthisquestion,weperformedaleave-two-out-cross-validation.Specically,wetrainedthemodelinEquation3tolearnthemappingbetween58fMRIimagesandthesemanticfeaturesfortheirrespectivewords.Fortherstheldoutimage,weappliedthelearnedweightmatrixtoobtainapredictionofthesemanticfeatures,andthenweuseda1-nearestneighborclassiertocomparethevectorofpredictionstothetruesemanticencodingsofthetwoheld-outwords.ThelabelwaschosenbyselectingthewordwiththeencodingclosesttothepredictionforthefMRIimage.Wethenperformedthesametestusingthesecondheld-outimage.Thus,foreachiterationofthecross-validation,twoseparatecomparisonsweremade.Thisprocesswasrepeatedforall�602=1;770possibleleave-two-outcombinationsleadingto3,540totalcomparisons.Table1showstheresultsfortwodifferentsemanticfeatureencodings.Weseethatthehuman218semanticfeaturessignicantlyoutperformedthecorpus5000features,withmeanaccuraciesoverthenineparticipantsof80.9%and69.7%respectively.Butforbothfeaturesets,weseethatitispossibletodiscriminatebetweentwonovelclassesforeachofthenineparticipants. 2Vectorsarenormalizedtounitlengthanddonotinclude100stopwordslikea,the,is3Wecomputethecross-validationscoreforeachtaskandchoosetheparameterthatminimizestheaveragelossacrossalloutputtasks.6 Figure2:Themeanandmedianrankaccuraciesacrossnineparticipantsfortwodifferentsemanticfeaturesets.Boththeoriginal60fMRIwordsandasetof940nounswereconsidered.Table2:ThetopvepredictedwordsforanovelfMRIimagetakenforthewordinbold(allfMRIimagestakenfromparticipantP1).Thenumberintheparenthesescontainstherankofthecorrectwordselectedfrom941concretenounsinEnglish. BearFootScrewdriverTrainTruckCeleryHousePants(1)(1)(1)(1)(2)(5)(6)(21)bearfootscrewdrivertrainjeepbeetsupermarketclothingfoxfeetpinjettruckartichokehotelvestwolfanklenailjailminivangrapetheatert-shirtyakkneewrenchfactorybuscabbageschoolclothesgorillafacedaggerbussedanceleryfactorypanties wordfromthesetof941morethan10%ofthetime.Thechanceaccuracyofpredictingawordcorrectlyisonly0.1%,meaningwewouldexpectlessthanonecorrectpredictionacrossall540presentations.AsFigure2shows,themedianrankaccuraciesareoftensignicantlyhigherthanthemeanrankaccuracies.Usingthehuman218featuresonthenoun940words,themedianrankaccuracyisabove90%foreachparticipantwhilethemeanistypicallyabout10%lower.Thisisduetothefactthatseveralwordsareconsistentlypredictedpoorly.Thepredictionofwordsinthecategoriesani-mals,bodyparts,foods,tools,andvehiclestypicallyperformwell,whilethewordsinthecategoriesfurniture,man-madeitems,andinsectsoftenperformpoorly.Evenwhenthecorrectwordisnottheclosestmatch,thewordsthatbestmatchthepredictedfeaturesareoftenverysimilartotheheld-outword.Table2showsthetopvepredictedwordsforeightdifferentheld-outfMRIimagesforparticipantP1(i.e.the5closestwordsinthesetof941tothepredictedvectorofsemanticfeatures).5ConclusionWepresentedaformalismforazero-shotlearningalgorithmknownasthesemanticoutputcodeclassier.Thisclassiercanpredictnovelclassesthatwereomittedfromatrainingsetbyleveragingasemanticknowledgebasethatencodesfeaturescommontoboththenovelclassesandthetrainingset.Wealsoprovedtherstformalguaranteethatshowsconditionsunderwhichthisclassierwillpredictnovelclasses.Wedemonstratedthissemanticoutputcodeclassieronthetaskofneuraldecodingusingsemanticknowledgebasesderivedfrombothhumanlabelingandcorpusstatistics.WeshowedthisclassiercanpredictthewordapersonisthinkingaboutfromarecordedfMRIimageofthatperson'sneuralactivitywithaccuracymuchhigherthanchance,evenwhentrainingexamplesforthatparticularwordwereomittedfromthetrainingsetandtheclassierwasforcedtopickthewordfromamongnearly1,000alternatives.Wehaveshownthattrainingimagesofbrainactivityarenotrequiredforeverywordwewouldlikeaclassiertorecognize.Theseresultssignicantlyadvancethestate-of-the-artinneuraldecodingandareapromisingsteptowardsalargevocabularybrain-computerinterface.8 ReferencesBart,E.,&Ullman,S.(2005).Cross-generalization:learningnovelclassesfromasingleexamplebyfeaturereplacement.ComputerVisionandPatternRecognition,2005.CVPR2005.IEEEComputerSocietyConferenceon,1,672–679vol.1.Ciaccia,P.,&Patella,M.(2000).PACnearestneighborqueries:Approximateandcontrolledsearchinhigh-dimensionalandmetricspaces.DataEngineering,InternationalConferenceon,244.Dietterich,T.G.,&Bakiri,G.(1995).Solvingmulticlasslearningproblemsviaerror-correctingoutputcodes.JournalofArticialIntelligenceResearch.Farhadi,A.,Endres,I.,Hoiem,D.,&Forsyth,D.(2009).Describingobjectsbytheirattributes.Pro-ceedingsoftheIEEEComputerSocietyConferenceonComputerVisionandPatternRecognition(CVPR).Hastie,T.,Tibshirani,R.,&Friedman,J.H.(2001).Theelementsofstatisticallearning.Springer.Kay,K.N.,Naselaris,T.,Prenger,R.J.,&Gallant,J.L.(2008).Identifyingnaturalimagesfromhumanbrainactivity.Nature,452,352–355.Lampert,C.H.,Nickisch,H.,&Harmeling,S.(2009).Learningtodetectunseenobjectclassesbybetween-classattributetransfer.ProceedingsoftheIEEEComputerSocietyConferenceonComputerVisionandPatternRecognition(CVPR).Larochelle,H.,Erhan,D.,&Bengio,Y.(2008).Zero-datalearningofnewtasks.AAAIConferenceonArticialIntelligence.Mitchell,T.,etal.(2008).Predictinghumanbrainactivityassociatedwiththemeaningsofnouns.Science,320,1191–1195.Mitchell,T.M.(1997).Machinelearning.NewYork:McGraw-Hill.Mitchell,T.M.,Hutchinson,R.,Niculescu,R.S.,Pereira,F.,Wang,X.,Just,M.,&Newman,S.(2004).Learningtodecodecognitivestatesfrombrainimages.MachineLearning,57,145–175.Plaut,D.C.(2002).Gradedmodality-specicspecializationinsemantics:Acomputationalaccountofopticaphasia.CognitiveNeuropsychology,19,603–639.Snodgrass,J.,&Vanderwart,M.(1980).Astandardizedsetof260pictures:Normsfornameagree-ment,imageagreement,familiarityandvisualcomplexity.JournalofExperimentalPsychology:HumanLearningandMemory,174–215.Torralba,A.,&Murphy,K.P.(2007).Sharingvisualfeaturesformulticlassandmultiviewobjectdetection.IEEETrans.PatternAnal.Mach.Intell.,29,854–869.vanderMaaten,L.,&Hinton,G.(2008).Visualizingdatausingt-SNE.JournalofMachineLearningResearch,9(Nov),2579–2605.Waibel,A.(1989).Modularconstructionoftime-delayneuralnetworksforspeechrecognition.NeuralComputation,1,39–46.Wilson,M.(1988).TheMRCpsycholinguisticdatabase:Machinereadabledictionary,version2.BehavioralResearchMethods,6–11.9