goldberg gmailcom Abstract Recent work has shown that neural embedded word representations capture many relational similarities which can be recovered by means of vector arithmetic in the embedded space We show that Mikolov et als method of 64257rst ID: 25382
Download Pdf The PPT/PDF document "Linguistic Regularities in Sparse and Ex..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
(Mikolovetal.,2013c;Zhilaetal.,2013).Itwaslaterdemonstratedthatrelationalsimilaritiescanberecoveredinasimilarfashionalsofromembed-dingstrainedwithdifferentarchitectures(Mikolovetal.,2013a;Mikolovetal.,2013b).Thisfascinatingresultraisesaquestion:towhatextentaretherelationalsemanticpropertiesare-sultoftheembeddingprocess?Experimentsin(Mikolovetal.,2013c)showthattheRNN-basedembeddingsaresuperiortootherdenserepresen-tations,buthowcrucialisitforarepresentationtobedenseandlow-dimensionalatall?Analternativeapproachtorepresentingwordsasvectorsisthedistributionalsimilarityrepre-sentation,orbagofcontexts.Inthisrepresenta-tion,eachwordisassociatedwithaveryhigh-dimensionalbutsparsevectorcapturingthecon-textsinwhichthewordoccurs.Wecallsuchvec-torrepresentationsexplicit,aseachdimensiondi-rectlycorrespondstoaparticularcontext.Theseexplicitvector-spacerepresentationshavebeenextensivelystudiedintheNLPliterature(see(Tur-neyandPantel,2010;BaroniandLenci,2010)andthereferencestherein),andareknowntoexhibitalargeextentofattributionalsimilarity(Pereiraetal.,1993;Lin,1998;LinandPantel,2001;Sahlgren,2006;Kotlermanetal.,2010).Inthisstudy,weshowthatsimilarlytotheneuralembeddingspace,theexplicitvectorspacealsoencodesavastamountofrelationalsimilar-itywhichcanberecoveredinasimilarfashion,suggestingtheexplicitvectorspacerepresentationasacompetitivebaselineforfurtherworkonneu-ralembeddings.Moreover,thisresultimpliesthattheneuralembeddingprocessisnotdiscoveringnovelpatterns,butratherisdoingaremarkablejobatpreservingthepatternsinherentintheword-contextco-occurrencematrix.Akeyinsightofthisworkisthatthevectorarithmeticmethodcanbedecomposedintoalinearcombinationofthreepairwisesimilarities(Section3).Whilemathematicallyequivalent,wendthatthinkingaboutthemethodintermsofthedecom-posedformulationismuchlesspuzzling,andpro-videsabetterintuitiononwhywewouldexpectthemethodtoperformwellontheanalogyre-coverytask.Furthermore,thedecomposedformleadsustosuggestamodiedoptimizationobjec-tive(Section6),whichoutperformsthestate-of-the-artatrecoveringrelationalsimilaritiesunderbothrepresentations.2ExplicitVectorSpaceRepresentationWeadoptthetraditionalwordrepresentationusedinthedistributionalsimilarityliterature(TurneyandPantel,2010).Eachwordisassociatedwithasparsevectorcapturingthecontextsinwhichitoccurs.Wecallthisrepresentationexplicit,aseachdimensioncorrespondstoaparticularcontext.ForavocabularyVandasetofcontextsC,theresultisajVjjCjsparsematrixSinwhichSijcorrespondstothestrengthoftheassociationbetweenwordiandcontextj.Theassociationstrengthbetweenawordw2Vandacontextc2Ccantakemanyforms.Wechosetousethepopularpositivepointwisemutualinformation(PPMI)metric:Sij=PPMI(wi;cj) PPMI(w;c)=(0PMI(w;c)0PMI(w;c)otherwise PMI(w;c)=logP(w;c) P(w)P(c)=logfreq(w;c)jcorpusj freq(w)freq(c) wherejcorpusjisthenumberofitemsinthecor-pus,freq(w;c)isthenumberoftimeswordwappearedincontextcinthecorpus,andfreq(w),freq(c)arethecorpusfrequenciesofthewordandthecontextrespectively.TheuseofPMIindistributionalsimilaritymod-elswasintroducedbyChurchandHanks(1990)andwidelyadopted(Daganetal.,1994;Turney,2001).ThePPMIvariantdatesbacktoatleast(NiwaandNitta,1994),andwasdemonstratedtoperformverywellinBullinariaandLevy(2007).Inthiswork,wetakethelinearcontextsinwhichwordsappear.Weconsidereachwordsur-roundingthetargetwordwinawindowof2toeachsideasacontext,distinguishingbetweendif-ferentsequentialpositions.Forexample,inthesentenceabcdethecontextsofthewordcarea2,b1,d+1ande+2.Eachvector'sdimen-stionisthusjCj4jVj.Empirically,thenum-berofnon-zerodimensionsforvocabularyitemsinourcorpusrangesbetween3(forsomerareto-kens)and474,234(forthewordand),withameanof1595andamedianof415.Anotherpopularchoiceofcontextisthesyntac-ticrelationsthewordparticipatesin(Lin,1998;Pad´oandLapata,2007;LevyandGoldberg,2014).Inthispaper,wechosethesequentialcontextasitiscompatiblewiththeinformationavailabletothestate-of-the-artneuralembeddingmethodwearecomparingagainst. than100timesinthecorpus.Thelteredvocabu-larycontained189,533terms.2TheexplicitvectorrepresentationswerecreatedasdescribedinSection2.Theneuralembeddingswerecreatedusingtheword2vecsoftware3ac-companying(Mikolovetal.,2013b).Weembed-dedthevocabularyintoa600dimensionalspace,usingthestate-of-the-artskip-gramarchitecture,thenegative-trainingapproachwith15negativesamples(NEG-15),andsub-samplingoffrequentwordswithaparameterof105.Theparametersettingsfollow(Mikolovetal.,2013b).4.1EvaluationConditionsWeevaluatethedifferentwordrepresentationsus-ingthethreedatasetsusedinpreviouswork.Twoofthem(MSRandGOOGLE)containanalogyquestions,whilethethird(SEMEVAL)requiresrankingofcandidatewordpairsaccordingtotheirrelationalsimilaritytoasetofsuppliedwordpairs.OpenVocabularyTheopenvocabularydatasets(MSRandGOOGLE)presentquestionsoftheformaistoaasbistob,wherebishidden,andmustbeguessedfromtheentirevocabulary.Performanceonthesedatasetsismeasuredbymicro-averagedaccuracy.TheMSRdataset4(Mikolovetal.,2013c)con-tains8000analogyquestions.Therelationspor-trayedbythesequestionsaremorpho-syntactic,andcanbecategorizedaccordingtopartsofspeechadjectives,nounsandverbs.Adjec-tiverelationsincludecomparativeandsuperlative(goodistobestassmartistosmartest).Nounrelationsincludesingleandplural,possessiveandnon-possessive(dogistodog'sascatistocat's).Verbrelationsaretensemodications(workistoworkedasacceptistoaccepted).TheGOOGLEdataset5(Mikolovetal.,2013a)contains19544questions.Itcovers14relationtypes,7ofwhicharesemanticinnatureand7aremorpho-syntactic(enumeratedinSection8).Thedatasetwascreatedbymanuallyconstructingexampleword-pairsofeachrelation,andprovid-ingallthepairsofword-pairs(withineachrelationtype)asanalogyquestions. 2Initialexperimentswithdifferentwindow-sizesandcut-offsshowedsimilartrends.3http://code.google.com/p/word2vec4research.microsoft.com/en-us/projects/rnn/5code.google.com/p/word2vec/source/browse/trunk/questions-words.txtOut-of-vocabularywords6wereremovedfrombothtestsets.ClosedVocabularyTheSEMEVALdatasetcon-tainsthecollectionof79semanticrelationsthatappearedinSemEval2012Task2:MeasuringRe-lationSimilarity(Jurgensetal.,2012).Eachrela-tionisexempliedbyafew(usually3)character-isticword-pairs.Givenasetofseveraldozentar-getwordpairs,whichsupposedlyhavethesamerelation,thetaskistorankthetargetpairsac-cordingtothedegreeinwhichthisrelationholds.Thiscanbecastasananalogyquestioninthefollowingmanner:Forexample,taketheRecipi-ent:Instrumentrelationwiththeprototypicalwordpairsking:crownandpolice:badge.Tomeasurethedegreethatatargetwordpairwife:ringhasthesamerelation,weformthetwoanalogyquestionskingistocrownaswifeistoringandpoliceistobadgeaswifeistoring.Wecalculatethescoreofeachanalogy,andaveragetheresults.Notethatasopposedtothersttwotestsets,thisonedoesnotrequiresearchingtheentirevocabularyforthemostsuitablewordinthecorpus,butrathertorankalistofexistingwordpairs.Followingpreviouswork,performanceonSE-MEVALwasmeasuredusingaccuracy,macro-averagedacrossalltherelations.5PreliminaryResultsOurrstexperimentuses3COSADD(method(3)inSection3)tomeasuretheprevalenceoflinguis-ticregularitieswithineachrepresentation. Representation MSRGOOGLESEMEVAL Embedding 53:98%62:70%38:49%Explicit 29:04%45:05%38:54% Table1:Performanceof3COSADDondifferenttaskswiththeexplicitandneuralembeddingrepresentations.TheresultsinTable1showthatalargeamountofrelationalsimilaritiescanberecoveredwithbothrepresentations.Infact,bothrepresentationsachievethesameaccuracyontheSEMEVALtask.However,thereisalargeperformancegapinfavoroftheneuralembeddingintheopen-vocabularyMSRandGOOGLEtasks.Next,werunthesameexperimentwithPAIRDIRECTION(method(2)inSection3). 6i.e.wordsthatappearedinEnglishWikipedialessthan100times.Thisremoved882instancesfromtheMSRdatasetand286instancesfromGOOGLE. 7MainResultsWerepeatedtheexperiments,thistimeusingthe3COSMULmethod.Table3presentstheresults,showingthatthemultiplicativeobjectiverecov-ersmorerelationalsimilaritiesinbothrepresenta-tions.Theimprovementsachievedintheexplicitrepresentationareespeciallydramatic,withanab-soluteincreaseofover20%correctlyidentiedre-lationsintheMSRandGOOGLEdatasets. Objective Representation MSRGOOGLE 3COSADD Embedding 53:98%62:70% Explicit 29:04%45:05% 3COSMUL Embedding 59.09%66:72% Explicit 56:83%68.24% Table3:Comparisonof3COSADDand3COSMUL.3COSMULoutperformsthestate-of-the-art(3COSADD)onthesetwodatasets.Moreover,theresultsillustratethatacomparableamountofrela-tionalsimilaritiescanberecoveredwithbothrep-resentations.Thissuggeststhatthelinguisticreg-ularitiesapparentinneuralembeddingsarenotaconsequenceoftheembeddingprocess,butratherarewellpreservedbyit.OnSEMEVAL,3COSMULpreformedonparwith3COSADD,recoveringasimilaramountofanalogieswithbothexplicitandneuralrepresenta-tions(38:37%and38:67%,respectively).8ErrorAnalysisWith3COSMUL,boththeexplicitvectorsandtheneuralembeddingsrecoversimilaramountsofanalogies,butarethesethesamepatterns,orper-hapsdifferenttypesofrelationalsimilarities?8.1AgreementbetweenRepresentationsConsideringtheopen-vocabularytasks(MSRandGOOGLE),wecountthenumberoftimesbothrep-resentationsguessedcorrectly,bothguessedin-correctly,andwhenonerepresentationsleadstotherightanswerwhiletheotherdoesnot(Ta-ble4).Whilethereisalargeamountofagreementbetweentherepresentations,thereisalsoanon-negligibleamountofcasesinwhichtheycomple-menteachother.Ifweweretoruninanora-clesetup,inwhichananswerisconsideredcor-rectifitiscorrectineitherrepresentation,wewouldhaveachievedanaccuracyof71.9%ontheMSRdatasetand77.8%onGOOGLE. BothBothEmbeddingExplicit CorrectWrongCorrectCorrect MSR 43.97%28.06%15.12%12.85%GOOGLE 57.12%22.17%9.59%11.12% ALL 53.58%23.76%11.08%11.59% Table4:Agreementbetweentherepresentationsonopen-vocabularytasks. Relation EmbeddingExplicit GOOGLE capital-common-countries 90.51%99.41% capital-world 77.61%92.73% city-in-state 56.95%64.69% currency 14.55%10.53% family(genderinections) 76.48%60.08% gram1-adjective-to-adverb 24.29%14.01% gram2-opposite 37.07%28.94% gram3-comparative 86.11%77.85% gram4-superlative 56.72%63.45% gram5-present-participle 63.35%65.06% gram6-nationality-adjective 89.37%90.56% gram7-past-tense 65.83%48.85% gram8-plural(nouns) 72.15%76.05% gram9-plural-verbs 71.15%55.75% MSR adjectives 45.88%56.46% nouns 56.96%63.07% verbs 69.90%52.97% Table5:Breakdownofrelationalsimilaritiesineachrepre-sentationbyrelationtype,using3COSMUL.8.2BreakdownbyRelationTypeTable5presentstheamountofanalogiesdis-coveredineachrepresentation,brokendownbyrelationtype.Sometrendsemerge:theex-plicitrepresentationissuperiorinsomeofthemoresemantictasks,especiallygeographyre-latedones,aswellastheonessuperlativesandnouns.Theneuralembedding,however,hastheupperhandonmostverbinections,compara-tives,andfamily(gender)relations.Somerela-tions(currency,adjectives-to-adverbs,opposites)poseachallengetobothrepresentations,thougharesomewhatbetterhandledbytheembeddedrepresentations.Finally,thenationality-adjectivesandpresent-participlesareequallyhandledbybothrepresentations.8.3Default-BehaviorErrorsThemostcommonerrorpatternunderbothrepre-sentationsisthatofadefaultbehavior,inwhichonecentralrepresentativewordisprovidedasananswertomanyquestionsofthesametype.Forexample,thewordFresnoisreturned82timesasanincorrectanswerinthecity-in-staterela-tionintheembeddedrepresentation,andtheworddaughterisreturned47timesasanincorrectan-swerinthefamilyrelationintheexplicitrepresen- Aspect Examples TopFeatures Female womanqueen estrid+1ketevan+1adeliza+1nzinga+1gunnhild+1impregnate2hippolyta+1Royalty queenking savang+1uncrowned1pmare+1sisowath+1nzinga+1tupou+1uvea+2majesty1Currency yenruble devalue2banknote+1denominated+1billion1banknotes+1pegged+2coin+1Country germanyaustralia emigrates21943-45+2pentathletes2emigrated2emigrate2hong-kong1Capital berlincanberra hotshots1embassy21925-26+2consulate-general+2meetups2nunciature2Superlative sweetesttallest freshest+2asia's1cleveland's2smartest+1world's1city's1america's1Height tallertallest regnans2skyscraper+1skyscrapers+16'4+2windsor's1smokestacks+1burj+2 Table7:Thetopfeaturesofeachaspect,recoveredbypointwisemultiplicationofwordsthatsharethataspect.Theresultofpointwisemultiplicationisanaspectvectorinwhichthefeaturescommontobothwords,characterizingtherelation,receivethehighestscores.Thefeaturescores(notshown)correspondtotheweightthefeaturecontributestothecosinesimilaritybetweenthevectors.Thesuperscriptmarksthepositionofthefeaturerelativetothetargetword.featuresintheintersection.Table7presentsthetop(mostinuential)featuresofeachaspect.Manyofthesefeaturesarenamesofpeopleorplaces,whichappearrarelyinourcorpus(e.g.Adeliza,ahistoricalqueen,andNzinga,aroyalfamily)butarenonethelesshighlyindicativeofthesharedconcept.TheprevalenceofrarewordsstemsfromPMI,whichgivesthemmoreweight,andfromthefactthatwordslikewomanandqueenarecloselyrelated(aqueenisawoman),andthushavemanyfeaturesincommon.Orderingthefea-turesofwomanqueenbyprevalencerevealsfemalepronouns(she,her)andalonglistofcommonfemininenames,reectingtheexpectedaspectsharedbywomanandqueen.Wordpairsthatsharemorespecicaspects,suchascapitalcitiesorcountries,showfeaturesthatarecharac-teristicoftheirsharedaspect(e.g.capitalcitieshaveembassiesandmeetups,whileimmigrationisassociatedwithcountries).Itisalsointerestingtoobservehowtherelativelysyntacticsuperlativ-ityaspectiscapturedwithmanyregionalposses-sives(america's,asia's,world's).10RelatedWorkRelationalsimilarity(andansweringanalogyquestions)waspreviouslytackledusingexplicitrepresentations.Previousapproachesusetask-specicinformation,byeitherrelyingona(word-pair;connectives)matrixratherthanthestandard(word;context)matrix(TurneyandLittman,2005;Turney,2006),orbytreatinganal-ogydetectionasasupervisedlearningtask(Ba-roniandLenci,2009;Jurgensetal.,2012;Turney,2013).Incontrast,thevectorarithmeticapproachfollowedhereisunsupervised,andworksonagenericsingle-wordrepresentation.Eventhoughthetrainingprocessisoblivioustothetaskofanal-ogydetection,theresultingrepresentationisabletodetectthemquiteaccurately.Turney(2012)as-sumesasimilarsettingbutwithtwotypesofwordsimilarities,andcombinesthemwithproductsandratios(similarto3COSMUL)torecoveravarietyofsemanticrelations,includinganalogies.Arithmeticcombinationofexplicitwordvec-torsisextensivelystudiedinthecontextofcom-positionalsemantics(MitchellandLapata,2010),whereaphrasecomposedoftwoormorewordsisrepresentedbyasinglevector,computedbyafunctionofitscomponentwordvectors.BlacoeandLapata(2012)comparedifferentarithmeticfunctionsacrossmultiplerepresentations(includ-ingembeddings)onarangeofcompositionalitybenchmarks.Tothebestofourknowledgesuchmethodsofwordvectorarithmetichavenotbeenexploredforrecoveringrelationalsimilaritiesinexplicitrepresentations.11DiscussionMikolovetal.showedhowanunsupervisedneuralnetworkcanrepresentwordsinaspacethatnat-urallyencodesrelationalsimilaritiesintheformofvectoroffsets.Thisstudyshowsthatndinganalogiesthroughvectorarithmeticisactuallyaformofbalancingwordsimilarities,andthat,con-trarytotherecentndingsofBaronietal.(2014),undercertainconditionstraditionalwordsimilar-itiesinducedbyexplicitrepresentationscanper-formjustaswellasneuralembeddingsonthistask.Learningtorepresentwordsisafascinatingandimportantchallengewithimplicationstomostcur-rentNLPefforts,andneuralembeddingsinpar-ticularareapromisingresearchdirection.Webelievethattoimprovetheserepresentationsweshouldunderstandhowtheywork,andhopethatthemethodsandinsightsprovidedinthisworkwillhelptodeepenourgraspofcurrentandfutureinvestigationsofwordrepresentations.