moritzhermannphilblunsom csoxacuk Abstract Modelling the compositional process by which the meaning of an utterance arises from the meaning of its parts is a funda mental task of Natural Language Process ing In this paper we draw upon recent advances ID: 69590
Download Pdf The PPT/PDF document "The Role of Syntax in Vector Space Model..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
tions:Canrecursivevectorspacemodelsberecon-ciledwithamoreformalnotionofcompositional-ity;andistherearoleforsyntaxinguidingseman-ticsinthesetypesofmodels?CCAEsmakeuseofCCGcombinatorsandtypesbyconditioningeachcompositionfunctiononitsequivalentstepinaCCGproof.Intermsoflearningcomplexityandspacerequirements,ourmodelsstrikeabalancebetweensimplergreedyapproaches(Socheretal.,2011b)andthelargerrecursivevector-matrixmodels(Socheretal.,2012b).Weshowthatthiscombinationofstateoftheartmachinelearningandanadvancedlinguisticfor-malismtranslatesintoconcisemodelswithcom-petitiveperformanceonavarietyoftasks.InbothsentimentandcompoundsimilarityexperimentsweshowthatourCCAEmodelsmatchorbettercomparablerecursiveautoencodermodels.12BackgroundThereexistanumberofformalapproachestolan-guagethatprovidemechanismsforcomposition-ality.GenerativeGrammars(Jackendoff,1972)treatsemantics,andthuscompositionality,essen-tiallyasanextensionofsyntax,withthegenerative(syntactic)processyieldingastructurethatcanbeinterpretedsemantically.BycontrastMontaguegrammarachievesgreaterseparationbetweenthesemanticandthesyntacticbyusinglambdacalcu-lustoexpressmeaning.However,thisgreatersep-arationbetweensurfaceformandmeaningcomesatapriceintheformofreducedcomputability.Whilethisisbeyondthescopeofthispaper,seee.g.Kracht(2008)foradetailedanalysisofcom-positionalityintheseformalisms.2.1CombinatoryCategorialGrammarInthispaperwefocusonCCG,alinguisticallyexpressiveyetcomputationallyefcientgrammarformalism.Itusesaconstituency-basedstructurewithcomplexsyntactictypes(categories)fromwhichsentencescanbededucedusingasmallnumberofcombinators.CCGreliesoncombi-natorylogic(asopposedtolambdacalculus)tobuilditsexpressions.Foradetailedintroductionandanalysisvis-a-visothergrammarformalismsseee.g.SteedmanandBaldridge(2011).CCGhasbeendescribedashavingatranspar-entsurfacebetweenthesyntacticandtheseman- 1AC++implementationofourmodelsisavailableathttp://www.karlmoritz.com/Tinalikestigers N(S[dcl]nNP)=NPN NPNP S[dcl]nNP S[dcl]Figure1:CCGderivationforTinalikestigerswithforward(]TJ/;བ ;.96;d T; -8;.06; -7;.25 ;Td [;)andbackwardapplication().tic.Itisthispropertywhichmakesitattractiveforourpurposesofprovidingaconditioningstruc-tureforsemanticoperators.Asecondbenetoftheformalismisthatitisdesignedwithcomputa-tionalefciencyinmind.Whileonecoulddebatetherelativemeritsofvariouslinguisticformalismstheexistenceofmaturetoolsandresources,suchastheCCGBank(HockenmaierandSteedman,2007),theGroningenMeaningBank(Basileetal.,2012)andtheC&CTools(Curranetal.,2007)isanotherbigadvantageforCCG.CCG'stransparentsurfacestemsfromitscate-gorialproperty:Eachpointinaderivationcorre-spondsdirectlytoaninterpretablecategory.Thesecategories(ortypes)associatedwitheachterminaCCGgovernhowthistermcanbecombinedwithothertermsinalargerstructure,implicitlymakingthemsemanticallyexpressive.ForinstanceinFigure1,thewordlikeshastype(S[dcl]\NP)/NP,whichmeansthatitrstlooksforatypeNPtoitsrighthandside.Subsequentlytheexpressionlikestigers(astypeS[dcl]\NP)re-quiresasecondNPonitsleft.ThenaltypeofthephraseS[dcl]indicatesasentenceandhenceacompleteCCGproof.ThusateachpointinaCCGparsewecandeducethepossiblenextstepsinthederivationbyconsideringtheavailabletypesandcombinatoryrules.2.2VectorSpaceModelsofSemanticsVector-basedapproachesforsemantictaskshavebecomeincreasinglypopularinrecentyears.Distributionalrepresentationsencodeanex-pressionbyitsenvironment,assumingthecontext-dependentnatureofmeaningaccordingtowhichoneshallknowawordbythecompanyitkeeps(Firth,1957).Effectivelythisisusuallyachievedbyconsideringtheco-occurrencewithotherwordsinlargecorporaorthesyntacticrolesawordper-forms.Distributionalrepresentationsarefrequentlyusedtoencodesinglewordsasvectors.Suchrep- MethodMPQASP Votingwithtwolexica81.763.1MV-RNN(Socheretal.,2012b)-79.0RAE(rand)(Socheretal.,2011b)85.776.8TCRF(Nakagawaetal.,2010)86.177.3RAE(init)(Socheretal.,2011b)86.477.7NB(WangandManning,2012)86.779.4 CCAE-A86.377.8CCAE-B87.177.1CCAE-C87.177.3CCAE-D87.276.7Table3:Accuracyofsentimentclassicationonthesentimentpolarity(SP)andMPQAdatasets.ForNBweonlydisplaythebestresultamongalargergroupofmodelsanalysedinthatpaper.ThisissueofsparsityisexacerbatedinthemorecomplexCCAEmodels,wherethetrainingpointsarespreadacrossdifferentCCGtypesandrules.Whiletheinitializationofthewordvectorswithpreviouslylearnedembeddings(aswaspreviouslyshownbySocheretal.(2011b))helpsthemod-els,allothermodelvariablessuchascompositionweightsandbiasesarestillinitialisedrandomlyandthushighlydependentontheamountoftrain-ingdataavailable.Experiment2:PretrainingDuetoouranaly-sisoftheresultsoftheinitialexperiment,weranasecondseriesofexperimentsontheSPcorpus.Wefollow(ScheibleandSch¨utze,2013)forthissec-ondseriesofexperiments,whicharecarriedoutonarandom90/10training-testingsplit,withsomedatareservedfordevelopment.Insteadofinitialisingthemodelwithexternalwordembeddings,wersttrainitonalargeamountofdatawiththeaimofovercomingthesparsityissuesencounteredinthepreviousexper-iment.Learningisthusdividedintotwosteps:Therst,unsupervisedtrainingphase,usestheBritishNationalCorpustogetherwiththeSPcor-pus.Inthisphaseonlythereconstructionsignalisusedtolearnwordembeddingsandtransforma-tionmatrices.Subsequently,inthesecondphase,onlytheSPcorpusisused,thistimewithboththereconstructionandthelabelerror.Bylearningwordembeddingsandcompositionmatricesonmoredata,themodelislikelytogen-eralisebetter.Particularlyforthemorecomplexmodels,wherethecompositionfunctionsarecon-ditionedonvariousCCGparameters,thisshouldTraining ModelRegularPretraining CCAE-A77.879.5CCAE-B76.979.8CCAE-C77.181.0CCAE-D76.979.7Table4:Effectofpretrainingonmodelperfor-manceontheSPdataset.ResultsarereportedonarandomsubsectionoftheSPcorpus;thusnumbersfortheregulartrainingmethoddifferslightlyfromthoseinTable3.helptoovercomeissuesofsparsity.Ifweconsidertheresultsofthepre-trainedex-perimentsinTable4,thisseemstobethecase.Infact,thetrendofthepreviousresultshasbeenreversed,withthemorecomplexmodelsnowper-formingbest,whereasinthepreviousexperimentsthesimplermodelsperformedbetter.UsingtheTurianembeddingsinsteadofrandominitialisa-tiondidnotimproveresultsinthissetup.5.2CompoundSimilarityInasecondexperimentweusethedatasetfromMitchellandLapata(2010)whichcontainssim-ilarityjudgementsforadjective-noun,noun-nounandverb-objectpairs.7Allcompoundpairshavebeenrankedforsemanticsimilaritybyanumberofhumanannotators.Thetaskisthustorankthesepairsofwordpairsbytheirsemanticsimilarity.Forinstance,thetwocompoundsvastamountandlargequantityaregivenahighsimilarityscorebythehumanjudges,whilenorthernregionandearlyageareassignednosimilarityatall.Wetrainourmodelsasfullyunsupervisedau-toencodersontheBritishNationalCorpusforthistask.Weassumexedparsetreesforallofthecompounds(Figure6),andusethesetocomputecompoundlevelvectorsforallwordpairs.Wesubsequentlyusethecosinedistancebetweeneachcompoundpairasoursimilaritymeasure.WeuseSpearman'srankcorrelationcoefcient()forevaluation;hencethereisnoneedtorescaleourscores(-1.01.0)totheoriginalscale(1.07.0).BlacoeandLapata(2012)haveanextensivecomparisonoftheperformanceofvariousvector-basedmodelsonthisdatasettowhichwecompareourmodelinTable5.TheCCAEmodelsoutper- 7http://homepages.inf.ed.ac.uk/mlap/resources/index.html Expression MostSimilar conveythemessageofpeace safeguardpeaceandsecuritykeepalighttheameof keepupthehopehasareasontorepent hasnorightasignicantandsuccessfulstrike amuchbetterpositionitisreassuringtobelieve itisapositivedevelopmentexpressedtheirsatisfactionandsupport expressedtheiradmirationandsurpriseisastoryofsuccess isastaunchsupporterarelininguptocondemn aregoingtovoicetheirconcernsmoresanctionsshouldbeimposed chargesbeingleveledcouldfraythebilateralgoodwill couldcauseseriousdamageTable6:PhrasesfromtheMPQAcorpusandtheirsemanticallyclosestmatchaccordingtoCCAE-D.Complexity ModelSizeLearning MV-RNNO(nw2)O(l)RAEO(nw)O(l2)CCAE-*O(nw)O(l)Table7:Comparisonofmodels.nisdictionarysize,wembeddingwidth,lissentencelength.Wecanassumelnw.AdditionalfactorssuchasCCGrulesandtypesaretreatedassmallcon-stantsforthepurposesofthisanalysis.worseincomparisonwiththesimplerones.Wewereabletoovercomethisissuebyusingaddi-tionaltrainingdata.Beyondthis,itwouldalsobeinterestingtoinvestigatetherelationshipsbetweendifferenttypesandtoderivefunctionstoincorpo-ratethisintothelearningprocedure.Forinstancemodellearningcouldbeadjustedtoenforcesomemirroringeffectsbetweentheweightmatricesofforwardandbackwardapplication,ortosupportsimilaritiesbetweenthoseofforwardapplicationandcomposition.CCG-VectorInterfaceExactlyhowtheinfor-mationcontainedinaCCGderivationisbestap-pliedtoavectorspacemodelofcompositionalityisanotherissueforfutureresearch.Ourinvesti-gationofthismatterbyexploringdifferentmodelsetupshasprovedsomewhatinconclusive.WhileCCAE-DincorporatedthedeepestconditioningontheCCGstructure,itdidnotdecisivelyoutperformthesimplerCCAE-Bwhichjustconditionedonthecombinatoryoperators.Issuesofsparsity,asshowninourexperimentsonpretraining,haveasignicantinuence,whichrequiresfurtherstudy.7ConclusionInthispaperwehavebroughtamoreformalno-tionofsemanticcompositionalitytovectorspacemodelsbasedonrecursiveautoencoders.ThiswasachievedthroughtheuseoftheCCGformalismtoprovideaconditioningstructureforthematrixvectorproductsthatdenetheRAE.Wehaveexploredanumberofmodels,eachofwhichconditionsthecompositionaloperationsondifferentaspectsoftheCCGderivation.Ourex-perimentalndingsindicateaclearadvantageforadeeperintegrationofsyntaxovermodelsthatuseonlythebracketingstructureoftheparsetree.Themosteffectivewaytoconditionthecompo-sitionaloperatorsonthesyntaxremainsunclear.Oncetheissueofsparsityhadbeenaddressed,thecomplexmodelsoutperformedthesimplerones.Amongthecomplexmodels,however,wecouldnotestablishsignicantorconsistentdifferencestoconvincinglyargueforaparticularapproach.Whiletheconnectionsbetweenformallinguis-ticsandvectorspaceapproachestoNLPmaynotbeimmediatelyobvious,webelievethatthereisacaseforthecontinuedinvestigationofwaystobestcombinethesetwoschoolsofthought.Thispaperrepresentsonesteptowardsthereconciliationoftraditionalformalapproachestocompositionalse-manticswithmodernmachinelearning.AcknowledgementsWethanktheanonymousreviewersfortheirfeed-backandRichardSocherforprovidingadditionalinsightintohismodels.KarlMoritzwouldfurtherliketothankSebastianRiedelforhostinghimatUCLwhilethispaperwaswritten.ThisworkhasbeensupportedbytheEPSRC. RichardSocher,JeffreyPennington,EricH.Huang,AndrewY.Ng,andChristopherD.Manning.2011b.Semi-supervisedrecursiveautoencodersforpredict-ingsentimentdistributions.InProceedingsofEMNLP,pages151161.RichardSocher,BrodyHuval,BharathBhat,Christo-pherD.Manning,andAndrewY.Ng.2012a.Convolutional-RecursiveDeepLearningfor3DOb-jectClassication.InAdvancesinNeuralInforma-tionProcessingSystems25.RichardSocher,BrodyHuval,ChristopherD.Man-ning,andAndrewY.Ng.2012b.Semanticcom-positionalitythroughrecursivematrix-vectorspaces.InProceedingsofEMNLP-CoNLL,pages12011211.MarkSteedmanandJasonBaldridge,2011.Combina-toryCategorialGrammar,pages181224.Wiley-Blackwell.AnnaSzabolcsi.1989.BoundVariablesinSyntax:AreThereAny?InRenateBartsch,JohanvanBen-them,andPetervanEmdeBoas,editors,SemanticsandContextualExpression,pages295318.Foris,Dordrecht.JosephTurian,LevRatinov,andYoshuaBengio.2010.Wordrepresentations:asimpleandgeneralmethodforsemi-supervisedlearning.InProceedingsofACL,pages384394.SidaWangandChristopherD.Manning.2012.Base-linesandbigrams:simple,goodsentimentandtopicclassication.InProceedingsofACL,pages9094.JanyceWiebe,TheresaWilson,andClaireCardie.2005.Annotatingexpressionsofopinionsandemo-tionsinlanguage.LanguageResourcesandEvalu-ation,39(2-3):165210.TheresaWilson,JanyceWiebe,andPaulHoffmann.2005.Recognizingcontextualpolarityinphrase-levelsentimentanalysis.InProceedingsofEMNLP-HLT,HLT'05,pages347354.