/
Linguistic Regularities in Sparse and Explicit Word Representations Omer Levy and Yoav Linguistic Regularities in Sparse and Explicit Word Representations Omer Levy and Yoav

Linguistic Regularities in Sparse and Explicit Word Representations Omer Levy and Yoav - PDF document

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
650 views
Uploaded On 2014-12-17

Linguistic Regularities in Sparse and Explicit Word Representations Omer Levy and Yoav - PPT Presentation

goldberg gmailcom Abstract Recent work has shown that neural embedded word representations capture many relational similarities which can be recovered by means of vector arithmetic in the embedded space We show that Mikolov et als method of 64257rst ID: 25382

goldberg gmailcom Abstract Recent work

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Linguistic Regularities in Sparse and Ex..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

(Mikolovetal.,2013c;Zhilaetal.,2013).Itwaslaterdemonstratedthatrelationalsimilaritiescanberecoveredinasimilarfashionalsofromembed-dingstrainedwithdifferentarchitectures(Mikolovetal.,2013a;Mikolovetal.,2013b).Thisfascinatingresultraisesaquestion:towhatextentaretherelationalsemanticpropertiesare-sultoftheembeddingprocess?Experimentsin(Mikolovetal.,2013c)showthattheRNN-basedembeddingsaresuperiortootherdenserepresen-tations,buthowcrucialisitforarepresentationtobedenseandlow-dimensionalatall?Analternativeapproachtorepresentingwordsasvectorsisthedistributionalsimilarityrepre-sentation,orbagofcontexts.Inthisrepresenta-tion,eachwordisassociatedwithaveryhigh-dimensionalbutsparsevectorcapturingthecon-textsinwhichthewordoccurs.Wecallsuchvec-torrepresentationsexplicit,aseachdimensiondi-rectlycorrespondstoaparticularcontext.Theseexplicitvector-spacerepresentationshavebeenextensivelystudiedintheNLPliterature(see(Tur-neyandPantel,2010;BaroniandLenci,2010)andthereferencestherein),andareknowntoexhibitalargeextentofattributionalsimilarity(Pereiraetal.,1993;Lin,1998;LinandPantel,2001;Sahlgren,2006;Kotlermanetal.,2010).Inthisstudy,weshowthatsimilarlytotheneuralembeddingspace,theexplicitvectorspacealsoencodesavastamountofrelationalsimilar-itywhichcanberecoveredinasimilarfashion,suggestingtheexplicitvectorspacerepresentationasacompetitivebaselineforfurtherworkonneu-ralembeddings.Moreover,thisresultimpliesthattheneuralembeddingprocessisnotdiscoveringnovelpatterns,butratherisdoingaremarkablejobatpreservingthepatternsinherentintheword-contextco-occurrencematrix.Akeyinsightofthisworkisthatthevectorarithmeticmethodcanbedecomposedintoalinearcombinationofthreepairwisesimilarities(Section3).Whilemathematicallyequivalent,wendthatthinkingaboutthemethodintermsofthedecom-posedformulationismuchlesspuzzling,andpro-videsabetterintuitiononwhywewouldexpectthemethodtoperformwellontheanalogyre-coverytask.Furthermore,thedecomposedformleadsustosuggestamodiedoptimizationobjec-tive(Section6),whichoutperformsthestate-of-the-artatrecoveringrelationalsimilaritiesunderbothrepresentations.2ExplicitVectorSpaceRepresentationWeadoptthetraditionalwordrepresentationusedinthedistributionalsimilarityliterature(TurneyandPantel,2010).Eachwordisassociatedwithasparsevectorcapturingthecontextsinwhichitoccurs.Wecallthisrepresentationexplicit,aseachdimensioncorrespondstoaparticularcontext.ForavocabularyVandasetofcontextsC,theresultisajVjjCjsparsematrixSinwhichSijcorrespondstothestrengthoftheassociationbetweenwordiandcontextj.Theassociationstrengthbetweenawordw2Vandacontextc2Ccantakemanyforms.Wechosetousethepopularpositivepointwisemutualinformation(PPMI)metric:Sij=PPMI(wi;cj) PPMI(w;c)=(0PMI(w;c)0PMI(w;c)otherwise PMI(w;c)=logP(w;c) P(w)P(c)=logfreq(w;c)jcorpusj freq(w)freq(c) wherejcorpusjisthenumberofitemsinthecor-pus,freq(w;c)isthenumberoftimeswordwappearedincontextcinthecorpus,andfreq(w),freq(c)arethecorpusfrequenciesofthewordandthecontextrespectively.TheuseofPMIindistributionalsimilaritymod-elswasintroducedbyChurchandHanks(1990)andwidelyadopted(Daganetal.,1994;Turney,2001).ThePPMIvariantdatesbacktoatleast(NiwaandNitta,1994),andwasdemonstratedtoperformverywellinBullinariaandLevy(2007).Inthiswork,wetakethelinearcontextsinwhichwordsappear.Weconsidereachwordsur-roundingthetargetwordwinawindowof2toeachsideasacontext,distinguishingbetweendif-ferentsequentialpositions.Forexample,inthesentenceabcdethecontextsofthewordcarea�2,b�1,d+1ande+2.Eachvector'sdimen-stionisthusjCj4jVj.Empirically,thenum-berofnon-zerodimensionsforvocabularyitemsinourcorpusrangesbetween3(forsomerareto-kens)and474,234(fortheword“and”),withameanof1595andamedianof415.Anotherpopularchoiceofcontextisthesyntac-ticrelationsthewordparticipatesin(Lin,1998;Pad´oandLapata,2007;LevyandGoldberg,2014).Inthispaper,wechosethesequentialcontextasitiscompatiblewiththeinformationavailabletothestate-of-the-artneuralembeddingmethodwearecomparingagainst. than100timesinthecorpus.Thelteredvocabu-larycontained189,533terms.2TheexplicitvectorrepresentationswerecreatedasdescribedinSection2.Theneuralembeddingswerecreatedusingtheword2vecsoftware3ac-companying(Mikolovetal.,2013b).Weembed-dedthevocabularyintoa600dimensionalspace,usingthestate-of-the-artskip-gramarchitecture,thenegative-trainingapproachwith15negativesamples(NEG-15),andsub-samplingoffrequentwordswithaparameterof10�5.Theparametersettingsfollow(Mikolovetal.,2013b).4.1EvaluationConditionsWeevaluatethedifferentwordrepresentationsus-ingthethreedatasetsusedinpreviouswork.Twoofthem(MSRandGOOGLE)containanalogyquestions,whilethethird(SEMEVAL)requiresrankingofcandidatewordpairsaccordingtotheirrelationalsimilaritytoasetofsuppliedwordpairs.OpenVocabularyTheopenvocabularydatasets(MSRandGOOGLE)presentquestionsoftheform“aistoaasbistob”,wherebishidden,andmustbeguessedfromtheentirevocabulary.Performanceonthesedatasetsismeasuredbymicro-averagedaccuracy.TheMSRdataset4(Mikolovetal.,2013c)con-tains8000analogyquestions.Therelationspor-trayedbythesequestionsaremorpho-syntactic,andcanbecategorizedaccordingtopartsofspeech–adjectives,nounsandverbs.Adjec-tiverelationsincludecomparativeandsuperlative(goodistobestassmartistosmartest).Nounrelationsincludesingleandplural,possessiveandnon-possessive(dogistodog'sascatistocat's).Verbrelationsaretensemodications(workistoworkedasacceptistoaccepted).TheGOOGLEdataset5(Mikolovetal.,2013a)contains19544questions.Itcovers14relationtypes,7ofwhicharesemanticinnatureand7aremorpho-syntactic(enumeratedinSection8).Thedatasetwascreatedbymanuallyconstructingexampleword-pairsofeachrelation,andprovid-ingallthepairsofword-pairs(withineachrelationtype)asanalogyquestions. 2Initialexperimentswithdifferentwindow-sizesandcut-offsshowedsimilartrends.3http://code.google.com/p/word2vec4research.microsoft.com/en-us/projects/rnn/5code.google.com/p/word2vec/source/browse/trunk/questions-words.txtOut-of-vocabularywords6wereremovedfrombothtestsets.ClosedVocabularyTheSEMEVALdatasetcon-tainsthecollectionof79semanticrelationsthatappearedinSemEval2012Task2:MeasuringRe-lationSimilarity(Jurgensetal.,2012).Eachrela-tionisexempliedbyafew(usually3)character-isticword-pairs.Givenasetofseveraldozentar-getwordpairs,whichsupposedlyhavethesamerelation,thetaskistorankthetargetpairsac-cordingtothedegreeinwhichthisrelationholds.Thiscanbecastasananalogyquestioninthefollowingmanner:Forexample,taketheRecipi-ent:Instrumentrelationwiththeprototypicalwordpairsking:crownandpolice:badge.Tomeasurethedegreethatatargetwordpairwife:ringhasthesamerelation,weformthetwoanalogyquestions“kingistocrownaswifeistoring”and“policeistobadgeaswifeistoring”.Wecalculatethescoreofeachanalogy,andaveragetheresults.Notethatasopposedtothersttwotestsets,thisonedoesnotrequiresearchingtheentirevocabularyforthemostsuitablewordinthecorpus,butrathertorankalistofexistingwordpairs.Followingpreviouswork,performanceonSE-MEVALwasmeasuredusingaccuracy,macro-averagedacrossalltherelations.5PreliminaryResultsOurrstexperimentuses3COSADD(method(3)inSection3)tomeasuretheprevalenceoflinguis-ticregularitieswithineachrepresentation. Representation MSRGOOGLESEMEVAL Embedding 53:98%62:70%38:49%Explicit 29:04%45:05%38:54% Table1:Performanceof3COSADDondifferenttaskswiththeexplicitandneuralembeddingrepresentations.TheresultsinTable1showthatalargeamountofrelationalsimilaritiescanberecoveredwithbothrepresentations.Infact,bothrepresentationsachievethesameaccuracyontheSEMEVALtask.However,thereisalargeperformancegapinfavoroftheneuralembeddingintheopen-vocabularyMSRandGOOGLEtasks.Next,werunthesameexperimentwithPAIRDIRECTION(method(2)inSection3). 6i.e.wordsthatappearedinEnglishWikipedialessthan100times.Thisremoved882instancesfromtheMSRdatasetand286instancesfromGOOGLE. 7MainResultsWerepeatedtheexperiments,thistimeusingthe3COSMULmethod.Table3presentstheresults,showingthatthemultiplicativeobjectiverecov-ersmorerelationalsimilaritiesinbothrepresenta-tions.Theimprovementsachievedintheexplicitrepresentationareespeciallydramatic,withanab-soluteincreaseofover20%correctlyidentiedre-lationsintheMSRandGOOGLEdatasets. Objective Representation MSRGOOGLE 3COSADD Embedding 53:98%62:70% Explicit 29:04%45:05% 3COSMUL Embedding 59.09%66:72% Explicit 56:83%68.24% Table3:Comparisonof3COSADDand3COSMUL.3COSMULoutperformsthestate-of-the-art(3COSADD)onthesetwodatasets.Moreover,theresultsillustratethatacomparableamountofrela-tionalsimilaritiescanberecoveredwithbothrep-resentations.Thissuggeststhatthelinguisticreg-ularitiesapparentinneuralembeddingsarenotaconsequenceoftheembeddingprocess,butratherarewellpreservedbyit.OnSEMEVAL,3COSMULpreformedonparwith3COSADD,recoveringasimilaramountofanalogieswithbothexplicitandneuralrepresenta-tions(38:37%and38:67%,respectively).8ErrorAnalysisWith3COSMUL,boththeexplicitvectorsandtheneuralembeddingsrecoversimilaramountsofanalogies,butarethesethesamepatterns,orper-hapsdifferenttypesofrelationalsimilarities?8.1AgreementbetweenRepresentationsConsideringtheopen-vocabularytasks(MSRandGOOGLE),wecountthenumberoftimesbothrep-resentationsguessedcorrectly,bothguessedin-correctly,andwhenonerepresentationsleadstotherightanswerwhiletheotherdoesnot(Ta-ble4).Whilethereisalargeamountofagreementbetweentherepresentations,thereisalsoanon-negligibleamountofcasesinwhichtheycomple-menteachother.Ifweweretoruninanora-clesetup,inwhichananswerisconsideredcor-rectifitiscorrectineitherrepresentation,wewouldhaveachievedanaccuracyof71.9%ontheMSRdatasetand77.8%onGOOGLE. BothBothEmbeddingExplicit CorrectWrongCorrectCorrect MSR 43.97%28.06%15.12%12.85%GOOGLE 57.12%22.17%9.59%11.12% ALL 53.58%23.76%11.08%11.59% Table4:Agreementbetweentherepresentationsonopen-vocabularytasks. Relation EmbeddingExplicit GOOGLE capital-common-countries 90.51%99.41% capital-world 77.61%92.73% city-in-state 56.95%64.69% currency 14.55%10.53% family(genderinections) 76.48%60.08% gram1-adjective-to-adverb 24.29%14.01% gram2-opposite 37.07%28.94% gram3-comparative 86.11%77.85% gram4-superlative 56.72%63.45% gram5-present-participle 63.35%65.06% gram6-nationality-adjective 89.37%90.56% gram7-past-tense 65.83%48.85% gram8-plural(nouns) 72.15%76.05% gram9-plural-verbs 71.15%55.75% MSR adjectives 45.88%56.46% nouns 56.96%63.07% verbs 69.90%52.97% Table5:Breakdownofrelationalsimilaritiesineachrepre-sentationbyrelationtype,using3COSMUL.8.2BreakdownbyRelationTypeTable5presentstheamountofanalogiesdis-coveredineachrepresentation,brokendownbyrelationtype.Sometrendsemerge:theex-plicitrepresentationissuperiorinsomeofthemoresemantictasks,especiallygeographyre-latedones,aswellastheonessuperlativesandnouns.Theneuralembedding,however,hastheupperhandonmostverbinections,compara-tives,andfamily(gender)relations.Somerela-tions(currency,adjectives-to-adverbs,opposites)poseachallengetobothrepresentations,thougharesomewhatbetterhandledbytheembeddedrepresentations.Finally,thenationality-adjectivesandpresent-participlesareequallyhandledbybothrepresentations.8.3Default-BehaviorErrorsThemostcommonerrorpatternunderbothrepre-sentationsisthatofa“defaultbehavior”,inwhichonecentralrepresentativewordisprovidedasananswertomanyquestionsofthesametype.Forexample,theword“Fresno”isreturned82timesasanincorrectanswerinthecity-in-staterela-tionintheembeddedrepresentation,andtheword“daughter”isreturned47timesasanincorrectan-swerinthefamilyrelationintheexplicitrepresen- Aspect Examples TopFeatures Female woman queen estrid+1ketevan+1adeliza+1nzinga+1gunnhild+1impregnate�2hippolyta+1Royalty queen king savang+1uncrowned�1pmare+1sisowath+1nzinga+1tupou+1uvea+2majesty�1Currency yen ruble devalue�2banknote+1denominated+1billion�1banknotes+1pegged+2coin+1Country germany australia emigrates�21943-45+2pentathletes�2emigrated�2emigrate�2hong-kong�1Capital berlin canberra hotshots�1embassy�21925-26+2consulate-general+2meetups�2nunciature�2Superlative sweetest tallest freshest+2asia's�1cleveland's�2smartest+1world's�1city's�1america's�1Height taller tallest regnans�2skyscraper+1skyscrapers+16'4+2windsor's�1smokestacks+1burj+2 Table7:Thetopfeaturesofeachaspect,recoveredbypointwisemultiplicationofwordsthatsharethataspect.Theresultofpointwisemultiplicationisan“aspectvector”inwhichthefeaturescommontobothwords,characterizingtherelation,receivethehighestscores.Thefeaturescores(notshown)correspondtotheweightthefeaturecontributestothecosinesimilaritybetweenthevectors.Thesuperscriptmarksthepositionofthefeaturerelativetothetargetword.featuresintheintersection.Table7presentsthetop(mostinuential)featuresofeachaspect.Manyofthesefeaturesarenamesofpeopleorplaces,whichappearrarelyinourcorpus(e.g.Adeliza,ahistoricalqueen,andNzinga,aroyalfamily)butarenonethelesshighlyindicativeofthesharedconcept.TheprevalenceofrarewordsstemsfromPMI,whichgivesthemmoreweight,andfromthefactthatwordslikewomanandqueenarecloselyrelated(aqueenisawoman),andthushavemanyfeaturesincommon.Orderingthefea-turesofwoman queenbyprevalencerevealsfemalepronouns(“she”,“her”)andalonglistofcommonfemininenames,reectingtheexpectedaspectsharedbywomanandqueen.Wordpairsthatsharemorespecicaspects,suchascapitalcitiesorcountries,showfeaturesthatarecharac-teristicoftheirsharedaspect(e.g.capitalcitieshaveembassiesandmeetups,whileimmigrationisassociatedwithcountries).Itisalsointerestingtoobservehowtherelativelysyntactic“superlativ-ity”aspectiscapturedwithmanyregionalposses-sives(“america's”,“asia's”,“world's”).10RelatedWorkRelationalsimilarity(andansweringanalogyquestions)waspreviouslytackledusingexplicitrepresentations.Previousapproachesusetask-specicinformation,byeitherrelyingona(word-pair;connectives)matrixratherthanthestandard(word;context)matrix(TurneyandLittman,2005;Turney,2006),orbytreatinganal-ogydetectionasasupervisedlearningtask(Ba-roniandLenci,2009;Jurgensetal.,2012;Turney,2013).Incontrast,thevectorarithmeticapproachfollowedhereisunsupervised,andworksonagenericsingle-wordrepresentation.Eventhoughthetrainingprocessisoblivioustothetaskofanal-ogydetection,theresultingrepresentationisabletodetectthemquiteaccurately.Turney(2012)as-sumesasimilarsettingbutwithtwotypesofwordsimilarities,andcombinesthemwithproductsandratios(similarto3COSMUL)torecoveravarietyofsemanticrelations,includinganalogies.Arithmeticcombinationofexplicitwordvec-torsisextensivelystudiedinthecontextofcom-positionalsemantics(MitchellandLapata,2010),whereaphrasecomposedoftwoormorewordsisrepresentedbyasinglevector,computedbyafunctionofitscomponentwordvectors.BlacoeandLapata(2012)comparedifferentarithmeticfunctionsacrossmultiplerepresentations(includ-ingembeddings)onarangeofcompositionalitybenchmarks.Tothebestofourknowledgesuchmethodsofwordvectorarithmetichavenotbeenexploredforrecoveringrelationalsimilaritiesinexplicitrepresentations.11DiscussionMikolovetal.showedhowanunsupervisedneuralnetworkcanrepresentwordsinaspacethat“nat-urally”encodesrelationalsimilaritiesintheformofvectoroffsets.Thisstudyshowsthatndinganalogiesthroughvectorarithmeticisactuallyaformofbalancingwordsimilarities,andthat,con-trarytotherecentndingsofBaronietal.(2014),undercertainconditionstraditionalwordsimilar-itiesinducedbyexplicitrepresentationscanper-formjustaswellasneuralembeddingsonthistask.Learningtorepresentwordsisafascinatingandimportantchallengewithimplicationstomostcur-rentNLPefforts,andneuralembeddingsinpar-ticularareapromisingresearchdirection.Webelievethattoimprovetheserepresentationsweshouldunderstandhowtheywork,andhopethatthemethodsandinsightsprovidedinthisworkwillhelptodeepenourgraspofcurrentandfutureinvestigationsofwordrepresentations.