/
The Role of Syntax in Vector Space Models of Compositi The Role of Syntax in Vector Space Models of Compositi

The Role of Syntax in Vector Space Models of Compositi - PDF document

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
495 views
Uploaded On 2015-05-18

The Role of Syntax in Vector Space Models of Compositi - PPT Presentation

moritzhermannphilblunsom csoxacuk Abstract Modelling the compositional process by which the meaning of an utterance arises from the meaning of its parts is a funda mental task of Natural Language Process ing In this paper we draw upon recent advances ID: 69590

moritzhermannphilblunsom csoxacuk Abstract Modelling the

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "The Role of Syntax in Vector Space Model..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

tions:Canrecursivevectorspacemodelsberecon-ciledwithamoreformalnotionofcompositional-ity;andistherearoleforsyntaxinguidingseman-ticsinthesetypesofmodels?CCAEsmakeuseofCCGcombinatorsandtypesbyconditioningeachcompositionfunctiononitsequivalentstepinaCCGproof.Intermsoflearningcomplexityandspacerequirements,ourmodelsstrikeabalancebetweensimplergreedyapproaches(Socheretal.,2011b)andthelargerrecursivevector-matrixmodels(Socheretal.,2012b).Weshowthatthiscombinationofstateoftheartmachinelearningandanadvancedlinguisticfor-malismtranslatesintoconcisemodelswithcom-petitiveperformanceonavarietyoftasks.InbothsentimentandcompoundsimilarityexperimentsweshowthatourCCAEmodelsmatchorbettercomparablerecursiveautoencodermodels.12BackgroundThereexistanumberofformalapproachestolan-guagethatprovidemechanismsforcomposition-ality.GenerativeGrammars(Jackendoff,1972)treatsemantics,andthuscompositionality,essen-tiallyasanextensionofsyntax,withthegenerative(syntactic)processyieldingastructurethatcanbeinterpretedsemantically.BycontrastMontaguegrammarachievesgreaterseparationbetweenthesemanticandthesyntacticbyusinglambdacalcu-lustoexpressmeaning.However,thisgreatersep-arationbetweensurfaceformandmeaningcomesatapriceintheformofreducedcomputability.Whilethisisbeyondthescopeofthispaper,seee.g.Kracht(2008)foradetailedanalysisofcom-positionalityintheseformalisms.2.1CombinatoryCategorialGrammarInthispaperwefocusonCCG,alinguisticallyexpressiveyetcomputationallyefcientgrammarformalism.Itusesaconstituency-basedstructurewithcomplexsyntactictypes(categories)fromwhichsentencescanbededucedusingasmallnumberofcombinators.CCGreliesoncombi-natorylogic(asopposedtolambdacalculus)tobuilditsexpressions.Foradetailedintroductionandanalysisvis-a-visothergrammarformalismsseee.g.SteedmanandBaldridge(2011).CCGhasbeendescribedashavingatranspar-entsurfacebetweenthesyntacticandtheseman- 1AC++implementationofourmodelsisavailableathttp://www.karlmoritz.com/Tinalikestigers N(S[dcl]nNP)=NPN NPNP �S[dcl]nNP S[dcl]Figure1:CCGderivationforTinalikestigerswithforward(&#x]TJ/;བ ;.96;d T; -8;�.06; -7;&#x.25 ;&#xTd [;)andbackwardapplication().tic.Itisthispropertywhichmakesitattractiveforourpurposesofprovidingaconditioningstruc-tureforsemanticoperators.Asecondbenetoftheformalismisthatitisdesignedwithcomputa-tionalefciencyinmind.Whileonecoulddebatetherelativemeritsofvariouslinguisticformalismstheexistenceofmaturetoolsandresources,suchastheCCGBank(HockenmaierandSteedman,2007),theGroningenMeaningBank(Basileetal.,2012)andtheC&CTools(Curranetal.,2007)isanotherbigadvantageforCCG.CCG'stransparentsurfacestemsfromitscate-gorialproperty:Eachpointinaderivationcorre-spondsdirectlytoaninterpretablecategory.Thesecategories(ortypes)associatedwitheachterminaCCGgovernhowthistermcanbecombinedwithothertermsinalargerstructure,implicitlymakingthemsemanticallyexpressive.ForinstanceinFigure1,thewordlikeshastype(S[dcl]\NP)/NP,whichmeansthatitrstlooksforatypeNPtoitsrighthandside.Subsequentlytheexpressionlikestigers(astypeS[dcl]\NP)re-quiresasecondNPonitsleft.ThenaltypeofthephraseS[dcl]indicatesasentenceandhenceacompleteCCGproof.ThusateachpointinaCCGparsewecandeducethepossiblenextstepsinthederivationbyconsideringtheavailabletypesandcombinatoryrules.2.2VectorSpaceModelsofSemanticsVector-basedapproachesforsemantictaskshavebecomeincreasinglypopularinrecentyears.Distributionalrepresentationsencodeanex-pressionbyitsenvironment,assumingthecontext-dependentnatureofmeaningaccordingtowhichone“shallknowawordbythecompanyitkeeps”(Firth,1957).Effectivelythisisusuallyachievedbyconsideringtheco-occurrencewithotherwordsinlargecorporaorthesyntacticrolesawordper-forms.Distributionalrepresentationsarefrequentlyusedtoencodesinglewordsasvectors.Suchrep- MethodMPQASP Votingwithtwolexica81.763.1MV-RNN(Socheretal.,2012b)-79.0RAE(rand)(Socheretal.,2011b)85.776.8TCRF(Nakagawaetal.,2010)86.177.3RAE(init)(Socheretal.,2011b)86.477.7NB(WangandManning,2012)86.779.4 CCAE-A86.377.8CCAE-B87.177.1CCAE-C87.177.3CCAE-D87.276.7Table3:Accuracyofsentimentclassicationonthesentimentpolarity(SP)andMPQAdatasets.ForNBweonlydisplaythebestresultamongalargergroupofmodelsanalysedinthatpaper.ThisissueofsparsityisexacerbatedinthemorecomplexCCAEmodels,wherethetrainingpointsarespreadacrossdifferentCCGtypesandrules.Whiletheinitializationofthewordvectorswithpreviouslylearnedembeddings(aswaspreviouslyshownbySocheretal.(2011b))helpsthemod-els,allothermodelvariablessuchascompositionweightsandbiasesarestillinitialisedrandomlyandthushighlydependentontheamountoftrain-ingdataavailable.Experiment2:PretrainingDuetoouranaly-sisoftheresultsoftheinitialexperiment,weranasecondseriesofexperimentsontheSPcorpus.Wefollow(ScheibleandSch¨utze,2013)forthissec-ondseriesofexperiments,whicharecarriedoutonarandom90/10training-testingsplit,withsomedatareservedfordevelopment.Insteadofinitialisingthemodelwithexternalwordembeddings,wersttrainitonalargeamountofdatawiththeaimofovercomingthesparsityissuesencounteredinthepreviousexper-iment.Learningisthusdividedintotwosteps:Therst,unsupervisedtrainingphase,usestheBritishNationalCorpustogetherwiththeSPcor-pus.Inthisphaseonlythereconstructionsignalisusedtolearnwordembeddingsandtransforma-tionmatrices.Subsequently,inthesecondphase,onlytheSPcorpusisused,thistimewithboththereconstructionandthelabelerror.Bylearningwordembeddingsandcompositionmatricesonmoredata,themodelislikelytogen-eralisebetter.Particularlyforthemorecomplexmodels,wherethecompositionfunctionsarecon-ditionedonvariousCCGparameters,thisshouldTraining ModelRegularPretraining CCAE-A77.879.5CCAE-B76.979.8CCAE-C77.181.0CCAE-D76.979.7Table4:Effectofpretrainingonmodelperfor-manceontheSPdataset.ResultsarereportedonarandomsubsectionoftheSPcorpus;thusnumbersfortheregulartrainingmethoddifferslightlyfromthoseinTable3.helptoovercomeissuesofsparsity.Ifweconsidertheresultsofthepre-trainedex-perimentsinTable4,thisseemstobethecase.Infact,thetrendofthepreviousresultshasbeenreversed,withthemorecomplexmodelsnowper-formingbest,whereasinthepreviousexperimentsthesimplermodelsperformedbetter.UsingtheTurianembeddingsinsteadofrandominitialisa-tiondidnotimproveresultsinthissetup.5.2CompoundSimilarityInasecondexperimentweusethedatasetfromMitchellandLapata(2010)whichcontainssim-ilarityjudgementsforadjective-noun,noun-nounandverb-objectpairs.7Allcompoundpairshavebeenrankedforsemanticsimilaritybyanumberofhumanannotators.Thetaskisthustorankthesepairsofwordpairsbytheirsemanticsimilarity.Forinstance,thetwocompoundsvastamountandlargequantityaregivenahighsimilarityscorebythehumanjudges,whilenorthernregionandearlyageareassignednosimilarityatall.Wetrainourmodelsasfullyunsupervisedau-toencodersontheBritishNationalCorpusforthistask.Weassumexedparsetreesforallofthecompounds(Figure6),andusethesetocomputecompoundlevelvectorsforallwordpairs.Wesubsequentlyusethecosinedistancebetweeneachcompoundpairasoursimilaritymeasure.WeuseSpearman'srankcorrelationcoefcient()forevaluation;hencethereisnoneedtorescaleourscores(-1.0–1.0)totheoriginalscale(1.0–7.0).BlacoeandLapata(2012)haveanextensivecomparisonoftheperformanceofvariousvector-basedmodelsonthisdatasettowhichwecompareourmodelinTable5.TheCCAEmodelsoutper- 7http://homepages.inf.ed.ac.uk/mlap/resources/index.html Expression MostSimilar conveythemessageofpeace safeguardpeaceandsecuritykeepalighttheameof keepupthehopehasareasontorepent hasnorightasignicantandsuccessfulstrike amuchbetterpositionitisreassuringtobelieve itisapositivedevelopmentexpressedtheirsatisfactionandsupport expressedtheiradmirationandsurpriseisastoryofsuccess isastaunchsupporterarelininguptocondemn aregoingtovoicetheirconcernsmoresanctionsshouldbeimposed chargesbeingleveledcouldfraythebilateralgoodwill couldcauseseriousdamageTable6:PhrasesfromtheMPQAcorpusandtheirsemanticallyclosestmatchaccordingtoCCAE-D.Complexity ModelSizeLearning MV-RNNO(nw2)O(l)RAEO(nw)O(l2)CCAE-*O(nw)O(l)Table7:Comparisonofmodels.nisdictionarysize,wembeddingwidth,lissentencelength.Wecanassumelnw.AdditionalfactorssuchasCCGrulesandtypesaretreatedassmallcon-stantsforthepurposesofthisanalysis.worseincomparisonwiththesimplerones.Wewereabletoovercomethisissuebyusingaddi-tionaltrainingdata.Beyondthis,itwouldalsobeinterestingtoinvestigatetherelationshipsbetweendifferenttypesandtoderivefunctionstoincorpo-ratethisintothelearningprocedure.Forinstancemodellearningcouldbeadjustedtoenforcesomemirroringeffectsbetweentheweightmatricesofforwardandbackwardapplication,ortosupportsimilaritiesbetweenthoseofforwardapplicationandcomposition.CCG-VectorInterfaceExactlyhowtheinfor-mationcontainedinaCCGderivationisbestap-pliedtoavectorspacemodelofcompositionalityisanotherissueforfutureresearch.Ourinvesti-gationofthismatterbyexploringdifferentmodelsetupshasprovedsomewhatinconclusive.WhileCCAE-DincorporatedthedeepestconditioningontheCCGstructure,itdidnotdecisivelyoutperformthesimplerCCAE-Bwhichjustconditionedonthecombinatoryoperators.Issuesofsparsity,asshowninourexperimentsonpretraining,haveasignicantinuence,whichrequiresfurtherstudy.7ConclusionInthispaperwehavebroughtamoreformalno-tionofsemanticcompositionalitytovectorspacemodelsbasedonrecursiveautoencoders.ThiswasachievedthroughtheuseoftheCCGformalismtoprovideaconditioningstructureforthematrixvectorproductsthatdenetheRAE.Wehaveexploredanumberofmodels,eachofwhichconditionsthecompositionaloperationsondifferentaspectsoftheCCGderivation.Ourex-perimentalndingsindicateaclearadvantageforadeeperintegrationofsyntaxovermodelsthatuseonlythebracketingstructureoftheparsetree.Themosteffectivewaytoconditionthecompo-sitionaloperatorsonthesyntaxremainsunclear.Oncetheissueofsparsityhadbeenaddressed,thecomplexmodelsoutperformedthesimplerones.Amongthecomplexmodels,however,wecouldnotestablishsignicantorconsistentdifferencestoconvincinglyargueforaparticularapproach.Whiletheconnectionsbetweenformallinguis-ticsandvectorspaceapproachestoNLPmaynotbeimmediatelyobvious,webelievethatthereisacaseforthecontinuedinvestigationofwaystobestcombinethesetwoschoolsofthought.Thispaperrepresentsonesteptowardsthereconciliationoftraditionalformalapproachestocompositionalse-manticswithmodernmachinelearning.AcknowledgementsWethanktheanonymousreviewersfortheirfeed-backandRichardSocherforprovidingadditionalinsightintohismodels.KarlMoritzwouldfurtherliketothankSebastianRiedelforhostinghimatUCLwhilethispaperwaswritten.ThisworkhasbeensupportedbytheEPSRC. RichardSocher,JeffreyPennington,EricH.Huang,AndrewY.Ng,andChristopherD.Manning.2011b.Semi-supervisedrecursiveautoencodersforpredict-ingsentimentdistributions.InProceedingsofEMNLP,pages151–161.RichardSocher,BrodyHuval,BharathBhat,Christo-pherD.Manning,andAndrewY.Ng.2012a.Convolutional-RecursiveDeepLearningfor3DOb-jectClassication.InAdvancesinNeuralInforma-tionProcessingSystems25.RichardSocher,BrodyHuval,ChristopherD.Man-ning,andAndrewY.Ng.2012b.Semanticcom-positionalitythroughrecursivematrix-vectorspaces.InProceedingsofEMNLP-CoNLL,pages1201–1211.MarkSteedmanandJasonBaldridge,2011.Combina-toryCategorialGrammar,pages181–224.Wiley-Blackwell.AnnaSzabolcsi.1989.BoundVariablesinSyntax:AreThereAny?InRenateBartsch,JohanvanBen-them,andPetervanEmdeBoas,editors,SemanticsandContextualExpression,pages295–318.Foris,Dordrecht.JosephTurian,LevRatinov,andYoshuaBengio.2010.Wordrepresentations:asimpleandgeneralmethodforsemi-supervisedlearning.InProceedingsofACL,pages384–394.SidaWangandChristopherD.Manning.2012.Base-linesandbigrams:simple,goodsentimentandtopicclassication.InProceedingsofACL,pages90–94.JanyceWiebe,TheresaWilson,andClaireCardie.2005.Annotatingexpressionsofopinionsandemo-tionsinlanguage.LanguageResourcesandEvalu-ation,39(2-3):165–210.TheresaWilson,JanyceWiebe,andPaulHoffmann.2005.Recognizingcontextualpolarityinphrase-levelsentimentanalysis.InProceedingsofEMNLP-HLT,HLT'05,pages347–354.