219K - views

Accurate Unlexicalized Parsing Dan Klein Computer Science Department Stanford University Stanford CA kleincs

stanfordedu Christopher D Manning Computer Science Department Stanford University Stanford CA 943059040 manningcsstanfordedu Abstract We demonstrate that an unlexicalized PCFG can parse much more accurately than previously shown by making use of simp

Embed :
Pdf Download Link

Download Pdf - The PPT/PDF document "Accurate Unlexicalized Parsing Dan Klein..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Accurate Unlexicalized Parsing Dan Klein Computer Science Department Stanford University Stanford CA kleincs






Presentation on theme: "Accurate Unlexicalized Parsing Dan Klein Computer Science Department Stanford University Stanford CA kleincs"— Presentation transcript:

AccurateUnlexicalizedParsingDanKleinComputerScienceDepartmentStanfordUniversityStanford,CA94305-9040klein@cs.stanford.eduChristopherD.ManningComputerScienceDepartmentStanfordUniversityStanford,CA94305-9040manning@cs.stanford.eduAbstractWedemonstratethatanunlexicalizedPCFGcanparsemuchmoreaccuratelythanpreviouslyshown,bymakinguseofsimple,linguisticallymotivatedstatesplits,whichbreakdownfalseindependenceassumptionslatentinavanillatreebankgrammar.Indeed,itsperformanceof86.36%(LP/LRF1)isbetterthanthatofearlylexicalizedPCFGmodels,andsurprisinglyclosetothecurrentstate-of-the-art.Thisresulthaspotentialusesbeyondestablish-ingastronglowerboundonthemaximumpossi-bleaccuracyofunlexicalizedmodels:anunlexical-izedPCFGismuchmorecompact,easiertorepli-cate,andeasiertointerpretthanmorecomplexlex-icalmodels,andtheparsingalgorithmsaresimpler,morewidelyunderstood,oflowerasymptoticcom-plexity,andeasiertooptimize.Intheearly1990s,asprobabilisticmethodssweptNLP,parsingworkrevivedtheinvestigationofprob-abilisticcontext-freegrammars(PCFGs)(BoothandThomson,1973;Baker,1979).However,earlyre-sultsontheutilityofPCFGsforparsedisambigua-tionandlanguagemodelingweresomewhatdisap-pointing.AconvictionarosethatlexicalizedPCFGs(whereheadwordsannotatephrasalnodes)werethekeytoolforhighperformancePCFGparsing.Thisapproachwascongruentwiththegreatsuccessofwordn-grammodelsinspeechrecognition,anddrewstrengthfromabroaderinterestinlexicalizedgrammars,aswellasdemonstrationsthatlexicalde-pendencieswereakeytoolforresolvingambiguitiessuchasPPattachments(Fordetal.,1982;HindleandRooth,1993).Inthefollowingdecade,greatsuccessintermsofparsedisambiguationandevenlanguagemodelingwasachievedbyvariouslexicalizedPCFGmodels(Magerman,1995;Charniak,1997;Collins,1999;Charniak,2000;Charniak,2001).However,severalresultshavebroughtintoques-tionhowlargearolelexicalizationplaysinsuchparsers.Johnson(1998)showedthattheperfor-manceofanunlexicalizedPCFGoverthePenntree-bankcouldbeimprovedenormouslysimplybyan-notatingeachnodebyitsparentcategory.ThePenntreebankcoveringPCFGisapoortoolforparsingbe-causethecontext-freedomassumptionsitembodiesarefartoostrong,andweakeningtheminthiswaymakesthemodelmuchbetter.Morerecently,Gildea(2001)discusseshowtakingthebilexicalprobabil-itiesoutofagoodcurrentlexicalizedPCFGparserhurtsperformancehardlyatall:byatmost0.5%fortesttextfromthesamedomainasthetrainingdata,andnotatallfortesttextfromadifferentdomain.1ButitispreciselythesebilexicaldependenciesthatbackedtheintuitionthatlexicalizedPCFGsshouldbeverysuccessful,forexampleinHindleandRooth'sdemonstrationfromPPattachment.Wetakethisasareectionofthefundamentalsparsenessofthelex-icaldependencyinformationavailableinthePennTreebank.Asaspeechpersonwouldsay,onemil-lionwordsoftrainingdatajustisn'tenough.Evenfortopicscentraltothetreebank'sWallStreetJour-naltext,suchasstocks,manyveryplausibledepen-denciesoccuronlyonce,forexamplestocksstabi-lized,whilemanyothersoccurnotatall,forexam-plestocksskyrocketed.2Thebest-performinglexicalizedPCFGshavein-creasinglymadeuseofsubcategorization3ofthe1Thereareminordifferences,butallthecurrentbest-knownlexicalizedPCFGsemploybothmonolexicalstatistics,whichdescribethephrasalcategoriesofargumentsandadjunctsthatappeararoundaheadlexicalitem,andbilexicalstatistics,orde-pendencies,whichdescribethelikelihoodofaheadwordtakingasadependentaphraseheadedbyacertainotherword.2Thisobservationmotivatesvariousclass-orsimilarity-basedapproachestocombatingsparseness,andthisremainsapromisingavenueofwork,butsuccessinthisareahasprovensomewhatelusive,and,atanyrate,currentlexicalizedPCFGsdosimplyuseexactwordmatchesifavailable,andinterpolatewithsyntacticcategory-basedestimateswhentheyarenot.3Inthispaperweusethetermsubcategorizationintheorigi-nalgeneralsenseofChomsky(1965),forwhereasyntacticcat- categoriesappearinginthePenntreebank.Charniak(2000)showsthevaluehisparsergainsfromparent-annotationofnodes,suggestingthatthisinforma-tionisatleastpartlycomplementarytoinformationderivablefromlexicalization,andCollins(1999)usesarangeoflinguisticallymotivatedandcare-fullyhand-engineeredsubcategorizationstobreakdownwrongcontext-freedomassumptionsofthenaivePenntreebankcoveringPCFG,suchasdiffer-entiating“baseNPs”fromnounphraseswithphrasalmodiers,anddistinguishingsentenceswithemptysubjectsfromthosewherethereisanovertsubjectNP.Whilehegivesincompleteexperimentalresultsastotheirefcacy,wecanassumethatthesefeatureswereincorporatedbecauseofbenecialeffectsonparsingthatwerecomplementarytolexicalization.Inthispaper,weshowthattheparsingperfor-mancethatcanbeachievedbyanunlexicalizedPCFGisfarhigherthanhaspreviouslybeendemon-strated,andis,indeed,muchhigherthancommunitywisdomhasthoughtpossible.Wedescribeseveralsimple,linguisticallymotivatedannotationswhichdomuchtoclosethegapbetweenavanillaPCFGandstate-of-the-artlexicalizedmodels.Specically,weconstructanunlexicalizedPCFGwhichoutper-formsthelexicalizedPCFGsofMagerman(1995)andCollins(1996)(thoughnotmorerecentmodels,suchasCharniak(1997)orCollins(1999)).Onebenetofthisresultisamuch-strengthenedlowerboundonthecapacityofanunlexicalizedPCFG.Totheextentthatnosuchstrongbaselinehasbeenprovided,thecommunityhastendedtogreatlyoverestimatethebenecialeffectoflexicalizationinprobabilisticparsing,ratherthanlookingcriticallyatwherelexicalizedprobabilitiesarebothneededtomaketherightdecisionandavailableinthetrainingdata.Secondly,thisresultafrmsthevalueoflin-guisticanalysisforfeaturediscovery.Theresulthasotherusesandadvantages:anunlexicalizedPCFGiseasiertointerpret,reasonabout,andimprovethanthemorecomplexlexicalizedmodels.Thegrammarrepresentationismuchmorecompact,nolongerre-quiringlargestructuresthatstorelexicalizedproba-bilities.Theparsingalgorithmshavelowerasymp-toticcomplexity4andhavemuchsmallergrammaregoryisdividedintoseveralsubcategories,forexampledivid-ingverbphrasesintoniteandnon-niteverbphrases,ratherthaninthemodernrestrictedusagewherethetermrefersonlytothesyntacticargumentframesofpredicators.4O.n3/vs.O.n5/foranaiveimplementation,orvs.O.n4/ifusingthecleverapproachofEisnerandSatta(1999).constants.AnunlexicalizedPCFGparserismuchsimplertobuildandoptimize,includingbothstan-dardcodeoptimizationtechniquesandtheinvestiga-tionofmethodsforsearchspacepruning(CaraballoandCharniak,1998;Charniaketal.,1998).Itisnotourgoaltoargueagainsttheuseoflex-icalizedprobabilitiesinhigh-performanceprobabi-listicparsing.Ithasbeencomprehensivelydemon-stratedthatlexicaldependenciesareusefulinre-solvingmajorclassesofsentenceambiguities,andaparsershouldmakeuseofsuchinformationwherepossible.Wefocushereonusingunlexicalized,structuralcontextbecausewefeelthatthisinfor-mationhasbeenunderexploitedandunderappreci-ated.Weseethisinvestigationasonlyonepartofthefoundationforstate-of-the-artparsingwhichem-ploysbothlexicalandstructuralconditioning.1ExperimentalSetupTofacilitatecomparisonwithpreviouswork,wetrainedourmodelsonsections2–21oftheWSJsec-tionofthePenntreebank.Weusedtherst20les(393sentences)ofsection22asadevelopmentset(devset).Thissetissmallenoughthatthereisno-ticeablevarianceinindividualresults,butitallowedrapidsearchforgoodfeaturesviacontinuallyrepars-ingthedevsetinapartiallymanualhill-climb.Allofsection23wasusedasatestsetforthenalmodel.Foreachmodel,inputtreeswereannotatedortrans-formedinsomeway,asinJohnson(1998).Givenasetoftransformedtrees,weviewedthelocaltreesasgrammarrewriterulesinthestandardway,andused(unsmoothed)maximum-likelihoodestimatesforruleprobabilities.5Toparsethegrammar,weusedasimplearray-basedJavaimplementationofageneralizedCKYparser,which,forournalbestmodel,wasabletoexhaustivelyparseallsentencesinsection23in1GBofmemory,takingapproxi-mately3secforaveragelengthsentences.65Thetaggingprobabilitiesweresmoothedtoaccommodateunknownwords.ThequantityP.tagjword/wasestimatedasfollows:wordsweresplitintooneofseveralcategorieswordclass,basedoncapitalization,sufx,digit,andothercharacterfeatures.Foreachofthesecategories,wetookthemaximum-likelihoodestimateofP.tagjwordclass/.Thisdis-tributionwasusedasaprioragainstwhichobservedtaggings,ifany,weretaken,givingP.tagjword/DTc.tag;word/CP.tagjwordclass/U=Tc.word/CU.ThiswastheninvertedtogiveP.wordjtag/.Thequalityofthistaggingmodelimpactsallnumbers;forexampletherawtreebankgrammar'sdevsetF1is72.62withitand72.09withoutit.6Theparserisavailablefordownloadasopensourceat:http://nlp.stanford.edu/downloads/lex-parser.shtml VPVP:[VBZ]...PP�.61;聆VP:[VBZ]...NP�.61;聆VP:[VBZ]�.61;聆VBZNPPPFigure1:Thev=1,h=1markovizationofVP!VBZNPPP.2VerticalandHorizontalMarkovizationThetraditionalstartingpointforunlexicalizedpars-ingistherawn-arytreebankgrammarreadfromtrainingtrees(afterremovingfunctionaltagsandnullelements).Thisbasicgrammarisimperfectintwowell-knownways.First,thecategorysymbolsaretoocoarsetoadequatelyrendertheexpansionsindependentofthecontexts.Forexample,subjectNPexpansionsareverydifferentfromobjectNPex-pansions:asubjectNPis8.7timesmorelikelythananobjectNPtoexpandasjustapronoun.HavingseparatesymbolsforsubjectandobjectNPsallowsthisvariationtobecapturedandusedtoimproveparsescoring.Onewayofcapturingthiskindofexternalcontextistouseparentannotation,aspre-sentedinJohnson(1998).Forexample,NPswithSparents(likesubjects)willbemarkedNPˆS,whileNPswithVPparents(likeobjects)willbeNPˆVP.Thesecondbasicdeciencyisthatmanyruletypeshavebeenseenonlyonce(andthereforehavetheirprobabilitiesoverestimated),andmanyruleswhichoccurintestsentenceswillneverhavebeenseenintraining(andthereforehavetheirprobabili-tiesunderestimated–seeCollins(1999)foranaly-sis).Notethatinparsingwiththeunsplitgrammar,nothavingseenaruledoesn'tmeanonegetsaparsefailure,butratherapossiblyveryweirdparse(Char-niak,1996).Onesuccessfulmethodofcombatingsparsityistomarkovizetherules(Collins,1999).Inparticular,wefollowthatworkinmarkovizingoutfromtheheadchild,despitethegrammarbeingun-lexicalized,becausethisseemsthebestwaytocap-turethetraditionallinguisticinsightthatphrasesareorganizedaroundahead(Radford,1988).Bothparentannotation(addingcontext)andRHSmarkovization(removingit)canbeseenastwoin-stancesofthesameidea.Inparsing,everynodehasaverticalhistory,includingthenodeitself,parent,grandparent,andsoon.Areasonableassumptionisthatonlythepastvverticalancestorsmattertothecurrentexpansion.Similarly,onlytheprevioushhorizontalancestorsmatter(weassumethattheheadHorizontalMarkovOrderVerticalOrderhD0hD1h2hD2hD1vD1Noannotation71.2772.573.4672.9672.62(854)(3119)(3863)(6207)(9657)v2Sel.Parents74.7577.4277.7777.5076.91(2285)(6564)(7619)(11398)(14247)vD2AllParents74.6877.4277.8177.5076.81(2984)(7312)(8367)(12132)(14666)v3Sel.GParents76.5078.5979.0778.9778.54(4943)(12374)(13627)(19545)(20123)vD3AllGParents76.7479.1879.7479.0778.72(7797)(15740)(16994)(22886)(22002)Figure2:Markovizations:F1andgrammarsize.childalwaysmatters).ItisahistoricalaccidentthatthedefaultnotionofatreebankPCFGgrammartakesvD1(onlythecurrentnodemattersvertically)andhD1(rulerighthandsidesdonotdecomposeatall).Onthisview,itisunsurprisingthatincreasingvanddecreasinghhavehistoricallyhelped.Asanexample,considerthecaseofvD1,hD1.IfwestartwiththeruleVP!VBZNPPPPP,itwillbebrokenintoseveralstages,eachabinaryorunaryrule,whichconceptuallyrepresentahead-outwardgenerationoftherighthandsize,asshowningure1.Thebottomlayerwillbeaunaryovertheheaddeclaringthegoal:hVP:[VBZ]i!VBZ.ThesquarebracketsindicatethattheVBZisthehead,whiletheanglebracketshXiindicatesthatthesymbolhXiisanintermediatesymbol(equiv-alently,anactiveorincompletestate).Thenextlayerupwillgeneratetherstrightwardsiblingoftheheadchild:hVP:[VBZ]...NPi!hVP:[VBZ]iNP.Next,thePPisgenerated:hVP:[VBZ]...PPi!hVP:[VBZ]...NPiPP.Wewouldthenbranchoffleftsiblingsiftherewereany.7Finally,wehaveanotherunarytonishtheVP.Notethatwhileitiscon-venienttothinkofthisasahead-outwardprocess,thesearejustPCFGrewrites,andsotheactualscoresattachedtoeachrulewillcorrespondtoadownwardgenerationorder.Figure2presentsagridofhorizontalandverti-calmarkovizationsofthegrammar.Therawtree-bankgrammarcorrespondstovD1;hD1(theupperrightcorner),whiletheparentannotationin(Johnson,1998)correspondstovD2;hD1,andthesecond-ordermodelinCollins(1999),isbroadlyasmoothedversionofvD2;hD2.Inaddi-tiontoexactnth-ordermodels,wetriedvariable-7Inoursystem,thelastfewrightchildrencarryoveraspre-cedingcontextfortheleftchildren,distinctfromcommonprac-tice.Wefoundthiswrappedhorizontobebenecial,anditalsouniestheinniteordermodelwiththeunmarkovizedrawrules. CumulativeIndiv.AnnotationSizeF11F11F1Baseline(v2,h2)761977.77––UNARY-INTERNAL806578.320.550.55UNARY-DT806678.480.710.17UNARY-RB806978.861.090.43TAG-PA852080.622.852.52SPLIT-IN854181.193.422.12SPLIT-AUX903481.663.890.57SPLIT-CC919081.693.920.12SPLIT-%925581.814.040.15TMP-NP959482.254.481.07GAPPED-S974182.284.510.17POSS-NP982083.065.290.28SPLIT-VP1049985.727.951.36BASE-NP1166086.048.270.73DOMINATES-V1409786.919.141.42RIGHT-REC-NP1527687.049.271.94Figure3:Sizeanddevsetperformanceofthecumulativelyan-notatedmodels,startingwiththemarkovizedbaseline.TherighttwocolumnsshowthechangeinF1fromthebaselineforeachannotationintroduced,bothcumulativelyandforeachsin-gleannotationappliedtothebaselineinisolation.historymodelssimilarinintenttothosedescribedinRonetal.(1994).Forvariablehorizontalhis-tories,wedidnotsplitintermediatestatesbelow10occurrencesofasymbol.Forexample,ifthesymbolhVP:[VBZ]...PPPPiweretoorare,wewouldcol-lapseittohVP:[VBZ]...PPi.Forverticalhistories,weusedacutoffwhichincludedbothfrequencyandmutualinformationbetweenthehistoryandtheex-pansions(thiswasnotappropriateforthehorizontalcasebecauseMIisunreliableatsuchlowcounts).Figure2showsparsingaccuraciesaswellasthenumberofsymbolsineachmarkovization.Thesesymbolcountsincludealltheintermediatestateswhichrepresentpartiallycompletedconstituents.Thegeneraltrendisthat,intheabsenceoffurtherannotation,moreverticalannotationisbetter–evenexhaustivegrandparentannotation.Thisisnottrueforhorizontalmarkovization,wherethevariable-ordersecond-ordermodelwassuperior.Thebestentry,vD3,h2,hasanF1of79.74,alreadyasubstantialimprovementoverthebaseline.Intheremainingsections,wediscussotheran-notationswhichincreasinglysplitthesymbolspace.Sinceweexpresslydonotsmooththegrammar,notallsplitsareguaranteedtobebenecial,andnotallsetsofusefulsplitsareguaranteedtoco-existwell.Inparticular,whilevD3,h2markovizationisgoodonitsown,ithasalargenumberofstatesanddoesnottoleratefurthersplittingwell.Therefore,webaseallfurtherexplorationonthev2;h2ROOTSˆROOTNPˆSNNRevenueVPˆSVBDwasNPˆVPQP$$CD444.9CDmillion,,SˆVPVPˆSVBGincludingNPˆVPNPˆNPJJnetNNinterest,,CONJPRBdownRBslightlyINfromNPˆNPQP$$CD450.7CDmillion..Figure4:AnerrorwhichcanberesolvedwiththeUNARY-INTERNALannotation(incorrectbaselineparseshown).grammar.Althoughitdoesnotnecessarilyjumpoutofthegridatrstglance,thispointrepresentsthebestcompromisebetweenacompactgrammarandusefulmarkovhistories.3Externalvs.InternalAnnotationThetwomajorpreviousannotationstrategies,par-entannotationandheadlexicalization,canbeseenasinstancesofexternalandinternalannotation,re-spectively.Parentannotationletsusindicateanimportantfeatureoftheexternalenvironmentofanodewhichinuencestheinternalexpansionofthatnode.Ontheotherhand,lexicalizationisa(radi-cal)methodofmarkingadistinctiveaspectoftheotherwisehiddeninternalcontentsofanodewhichinuencetheexternaldistribution.Bothkindsofan-notationcanbeuseful.Toidentifysplitstates,weaddsufxesoftheform-Xtomarkinternalcontentfeatures,andˆXtomarkexternalfeatures.Toillustratethedifference,considerunarypro-ductions.Intherawgrammar,therearemanyunar-ies,andonceanymajorcategoryisconstructedoveraspan,mostothersbecomeconstructibleaswellus-ingunarychains(seeKleinandManning(2001)fordiscussion).Suchchainsarerareinrealtreebanktrees:unaryrewritesonlyappearinveryspeciccontexts,forexampleScomplementsofverbswheretheShasanempty,controlledsubject.Figure4showsanerroneousoutputoftheparser,usingthebaselinemarkovizedgrammar.Intuitively,thereareseveralreasonsthisparseshouldberuledout,butoneisthatthelowerSslot,whichisintendedpri-marilyforScomplementsofcommunicationverbs,isnotaunaryrewriteposition(suchcomplementsusuallyhavesubjects).Itwouldthereforebenaturaltoannotatethetreessoastoconneunaryproduc-tionstothecontextsinwhichtheyareactuallyap-propriate.Wetriedtwoannotations.First,UNARY- INTERNALmarks(witha-U)anynonterminalnodewhichhasonlyonechild.Inisolation,thisresultedinanabsolutegainof0.55%(seegure3).Thesamesentence,parsedusingonlythebaselineandUNARY-INTERNAL,isparsedcorrectly,becausetheVPrewriteintheincorrectparseendswithanSˆVP-Uwithverylowprobability.8Alternately,UNARY-EXTERNAL,markednodeswhichhadnosiblingswithˆU.ItwassimilartoUNARY-INTERNALinsolobenet(0.01%worse),butprovidedfarlessmarginalbenetontopofotherlaterfeatures(noneatallontopofUNARY-INTERNALforourtopmodels),andwasdiscarded.9Onerestrictedplacewhereexternalunaryannota-tionwasveryuseful,however,wasatthepretermi-nallevel,whereinternalannotationwasmeaning-less.OnedistributionallysalienttagconationinthePenntreebankistheidenticationofdemonstra-tives(that,those)andregulardeterminers(the,a).SplittingDTtagsbasedonwhethertheywereonlychildren(UNARY-DT)capturedthisdistinction.Thesameexternalunaryannotationwasevenmoreef-fectivewhenappliedtoadverbs(UNARY-RB),dis-tinguishing,forexample,aswellfromalso).Be-yondthesecases,unarytagmarkingwasdetrimen-tal.TheF1afterUNARY-INTERNAL,UNARY-DT,andUNARY-RBwas78.86%.4TagSplittingTheideathatpart-of-speechtagsarenotne-grainedenoughtoabstractawayfromspecic-wordbe-haviourisacornerstoneoflexicalization.TheUNARY-DTannotation,forexample,showedthatthedeterminerswhichoccuraloneareusefullydistin-guishedfromthosewhichoccurwithothernomi-nalmaterial.ThismarkstheDTnodeswithasinglebitabouttheirimmediateexternalcontext:whethertherearesisters.Giventhesuccessofparentanno-tationfornonterminals,itmakessensetoparentan-notatetags,aswell(TAG-PA).Infact,asgure3shows,exhaustivelymarkingallpreterminalswiththeirparentcategorywasthemosteffectivesingleannotationwetried.Whyshouldthisbeuseful?Mosttagshaveacanonicalcategory.Forexample,NNStagsoccurunderNPnodes(only234of70855donot,mostlymistakes).However,whenatag8Notethatwhenweshowsuchtrees,wegenerallyonlyshowoneannotationontopofthebaselineatatime.More-over,wedonotexplicitlyshowthebinarizationimplicitbythehorizontalmarkovization.9Thesetwoarenotequivalentevengiveninnitedata.VPˆSTOtoVPˆVPVBseePPˆVPINifNPˆPPNNadvertisingNNSworksVPˆSTOˆVPtoVPˆVPVBˆVPseeSBARˆVPINˆSBARifSˆSBARNPˆSNNˆNPadvertisingVPˆSVBZˆVPworks(a)(b)Figure5:AnerrorresolvedwiththeTAG-PAannotation(oftheINtag):(a)theincorrectbaselineparseand(b)thecorrectTAG-PAparse.SPLIT-INalsoresolvesthiserror.somewhatregularlyoccursinanon-canonicalposi-tion,itsdistributionisusuallydistinct.Forexample,themostcommonadverbsdirectlyunderADVParealso(1599)andnow(544).UnderVP,theyaren't(3779)andnot(922).UnderNP,only(215)andjust(132),andsoon.TAG-PAbroughtF1upsubstan-tially,to80.62%.Inadditiontotheadverbcase,thePenntagsetconatesvariousgrammaticaldistinctionsthatarecommonlymadeintraditionalandgenerativegram-mar,andfromwhichaparsercouldhopetogetuse-fulinformation.Forexample,subordinatingcon-junctions(while,as,if),complementizers(that,for),andprepositions(of,in,from)allgetthetagIN.ManyofthesedistinctionsarecapturedbyTAG-PA(subordinatingconjunctionsoccurunderSandprepositionsunderPP),butarenot(bothsubor-dinatingconjunctionsandcomplementizersappearunderSBAR).Also,thereareexclusivelynoun-modifyingprepositions(of),predominantlyverb-modifyingones(as),andsoon.TheannotationSPLIT-INdoesalinguisticallymotivated6-waysplitoftheINtag,andbroughtthetotalto81.19%.Figure5showsanexampleerrorinthebaselinewhichisequallywellxedbyeitherTAG-PAorSPLIT-IN.Inthiscase,themorecommonnominaluseofworksispreferredunlesstheINtagisanno-tatedtoallowiftopreferScomplements.Wealsogotvaluefromthreeotherannotationswhichsubcategorizedtagsforspeciclexemes.FirstwesplitoffauxiliaryverbswiththeSPLIT-AUXannotation,whichappendsˆBEtoallformsofbeandˆHAVEtoallformsofhave.10Moremi-norly,SPLIT-CCmarkedconjunctiontagstoindicate10Thisisanextendeduniformversionofthepartialauxil-iaryannotationofCharniak(1997),whereinallauxiliariesaremarkedasAUXanda-GisaddedtogerundauxiliariesandgerundVPs. whetherornottheywerethestrings[Bb]utor&,eachofwhichhavedistinctlydifferentdistributionsfromotherconjunctions.Finally,wegavetheper-centsign(%)itsowntag,inlinewiththedollarsign($)alreadyhavingitsown.Togetherthesethreean-notationsbroughttheF1to81.81%.5WhatisanUnlexicalizedGrammar?Aroundthispoint,wemustaddressexactlywhatwemeanbyanunlexicalizedPCFG.TotheextentthatwegoaboutsubcategorizingPOScategories,manyofthemmightcometorepresentasingleword.Onemightthusfeelthattheapproachofthispaperistowalkdownaslipperyslope,andthatwearemerelyarguingdegrees.However,webelievethatthereisafundamentalqualitativedistinction,groundedinlin-guisticpractice,betweenwhatweseeaspermittedinanunlexicalizedPCFGasagainstwhatonendsandhopestoexploitinlexicalizedPCFGs.Thedi-visionrestsonthetraditionaldistinctionbetweenfunctionwords(orclosed-classwords)andcontentwords(oropenclassorlexicalwords).Itisstan-dardpracticeinlinguistics,datingbackdecades,toannotatephrasalnodeswithimportantfunction-worddistinctions,forexampletohaveaCP[for]oraPP[to],whereascontentwordsarenotpartofgrammaticalstructure,andonewouldnothavespe-cialrulesorconstraintsforanNP[stocks],forexam-ple.Wefollowthisapproachinourmodel:variousclosedclassesaresubcategorizedtobetterrepresentimportantdistinctions,andimportantfeaturescom-monlyexpressedbyfunctionwordsareannotatedontophrasalnodes(suchaswhetheraVPisnite,oraparticiple,oraninnitiveclause).However,nouseismadeoflexicalclasswords,toprovideeithermonolexicalorbilexicalprobabilities.11Atanyrate,wehavekeptourselveshonestbyes-timatingourmodelsexclusivelybymaximumlike-lihoodestimationoveroursubcategorizedgram-mar,withoutanyformofinterpolationorshrink-agetounsubcategorizedcategories(althoughwedomarkovizerules,asexplainedabove).Thiseffec-11ItshouldbenotedthatwestartedwithfourtagsinthePenntreebanktagsetthatrewriteasasingleword:EX(there),WP$(whose),#(thepoundsign),andTO),andsomeotherssuchasWP,POS,andsomeofthepunctuationtags,whichrewriteasbarelymore.Totheextentthatwesubcategorizetags,therewillbemoresuchcases,butmanyofthemalreadyexistinothertagsets.Forinstance,manytagsets,suchastheBrownandCLAWS(c5)tagsetsgiveaseparatesetsoftagstoeachformoftheverbalauxiliariesbe,do,andhave,mostofwhichrewriteasonlyasingleword(andanycorrespondingcontractions).VPˆSTOtoVPˆVPVBappearNPˆVPNPˆNPCDthreeNNStimesPPˆNPINonNPˆPPNNPCNNJJlastNNnightVPˆSTOtoVPˆVPVBappearNPˆVPNPˆNPCDthreeNNStimesPPˆNPINonNPˆPPNNPCNNNP-TMPˆVPJJlastNNˆTMPnight(a)(b)Figure6:AnerrorresolvedwiththeTMP-NPannotation:(a)theincorrectbaselineparseand(b)thecorrectTMP-NPparse.tivelymeansthatthesubcategoriesthatwebreakoffmustthemselvesbeveryfrequentinthelanguage.Insuchaframework,ifwetrytoannotatecate-gorieswithanydetailedlexicalinformation,manysentenceseitherentirelyfailtoparse,orhaveonlyextremelyweirdparses.Theresultingbattleagainstsparsitymeansthatwecanonlyaffordtomakeafewdistinctionswhichhavemajordistributionalimpact.Evenwiththeindividual-lexemeannotationsinthissection,thegrammarstillhasonly9255statescom-paredtothe7619ofthebaselinemodel.6AnnotationsAlreadyintheTreebankAtthispoint,onemightwonderastothewisdomofstrippingoffalltreebankfunctionaltags,onlytoheuristicallyaddothersuchmarkingsbackintothegrammar.Byandlarge,thetreebankout-of-thepackagetags,suchasPP-LOCorADVP-TMP,havenegativeutility.Recallthattherawtreebankgram-mar,withnoannotationormarkovization,hadanF1of72.62%onourdevelopmentset.Withthefunc-tionalannotationleftin,thisdropsto71.49%.Theh2;v1markovizationbaselineof77.77%droppedevenfurther,allthewayto72.87%,whentheseannotationswereincluded.Nonetheless,somedistinctionspresentintherawtreebanktreeswerevaluable.Forexample,anNPwithanSparentcouldbeeitheratemporalNPorasubject.FortheannotationTMP-NP,weretainedtheoriginal-TMPtagsonNPs,and,furthermore,propa-gatedthetagdowntothetagoftheheadoftheNP.Thisisillustratedingure6,whichalsoshowsanexampleofitsutility,clarifyingthatCNNlastnightisnotaplausiblecompoundandfacilitatingtheoth-erwiseunusualhighattachmentofthesmallerNP.TMP-NPbroughtthecumulativeF1to82.25%.Notethatthistechniqueofpushingthefunctionaltagsdowntopreterminalsmightbeusefulmoregener-ally;forexample,locativePPsexpandroughlythe ROOTSˆROOT““NPˆSDTThisVPˆSVBZisVPˆVPVBpanicNPˆVPNNbuying.!””ROOTSˆROOT““NPˆSDTThisVPˆS-VBFVBZisNPˆVPNNpanicNNbuying.!””(a)(b)Figure7:AnerrorresolvedwiththeSPLIT-VPannotation:(a)theincorrectbaselineparseand(b)thecorrectSPLIT-VPparse.samewayasallotherPPs(usuallyasINNP),buttheydotendtohavedifferentprepositionsbelowIN.Asecondkindofinformationintheoriginaltreesisthepresenceofemptyelements.FollowingCollins(1999),theannotationGAPPED-SmarksSnodeswhichhaveanemptysubject(i.e.,raisingandcontrolconstructions).ThisbroughtF1to82.28%.7HeadAnnotationThenotionthattheheadwordofaconstituentcanaffectitsbehaviorisausefulone.However,oftentheheadtagisasgood(orbetter)anindicatorofhowaconstituentwillbehave.12Wefoundseveralheadannotationstobeparticularlyeffective.First,pos-sessiveNPshaveaverydifferentdistributionthanotherNPs–inparticular,NP!NP rulesareonlyusedinthetreebankwhentheleftmostchildispos-sessive(asopposedtootherimaginableuseslikeforNewYorklawyers,whichisleftat).Toaddressthis,POSS-NPmarkedallpossessiveNPs.ThisbroughtthetotalF1to83.06%.Second,theVPsymbolisveryoverloadedinthePenntreebank,mostseverelyinthatthereisnodistinctionbetweenniteandin-nitivalVPs.Anexampleofthedamagethiscon-ationcandoisgiveningure7,whereoneneedstocapturethefactthatpresent-tenseverbsdonotgenerallytakebareinnitiveVPcomplements.Toallowthenite/non-nitedistinction,andotherverbtypedistinctions,SPLIT-VPannotatedallVPnodeswiththeirheadtag,mergingallniteformstoasin-gletagVBF.Inparticular,thisalsoaccomplishedCharniak'sgerund-VPmarking.Thiswasextremelyuseful,bringingthecumulativeF1to85.72%,2.66%absoluteimprovement(morethanitssoloimprove-mentoverthebaseline).12Thisispartoftheexplanationofwhy(Charniak,2000)ndsthatearlygenerationofheadtagsasin(Collins,1999)issobenecial.Therestofthebenetispresumablyintheavailabilityofthetagsforsmoothingpurposes.8DistanceErroranalysisatthispointsuggestedthatmanyre-mainingerrorswereattachmentlevelandconjunc-tionscope.Whilethesekindsoferrorsareundoubt-edlyprotabletargetsforlexicalpreference,mostattachmentmistakeswereoverlyhighattachments,indicatingthattheoverallright-branchingtendencyofEnglishwasnotbeingcaptured.Indeed,thisten-dencyisadifculttrendtocaptureinaPCFGbe-causeoftenthehighandlowattachmentsinvolvetheverysamerules.Evenifnot,attachmentheightisnotmodeledbyaPCFGunlessitissomehowex-plicitlyencodedintocategorylabels.Morecom-plexparsingmodelshaveindirectlyovercomethisbymodelingdistance(ratherthanheight).LineardistanceisdifculttoencodeinaPCFG–markingnodeswiththesizeoftheiryieldsmas-sivelymultipliesthestatespace.13Therefore,wewishtondindirectindicatorsthatdistinguishhighattachmentsfromlowones.InthecaseoftwoPPsfollowingaNP,withthequestionofwhetherthesecondPPisasecondmodieroftheleftmostNPorshouldattachlower,insidetherstPP,theim-portantdistinctionisusuallythatthelowersiteisanon-recursivebaseNP.Collins(1999)capturesthisnotionbyintroducingthenotionofabaseNP,inwhichanyNPwhichdominatesonlypreterminalsismarkedwitha-B.Further,ifanNP-Bdoesnothaveanon-baseNPparent,itisgivenonewithaunaryproduction.Thiswashelpful,butsubstantiallylesseffectivethanmarkingbaseNPswithoutintroducingtheunary,whosepresenceactuallyerasedausefulinternalindicator–baseNPsaremorefrequentinsubjectpositionthanobjectposition,forexample.Inisolation,theCollinsmethodactuallyhurtthebase-line(absolutecosttoF1of0.37%),whileskippingtheunaryinsertionaddedanabsolute0.73%tothebaseline,andbroughtthecumulativeF1to86.04%.InthecaseofattachmentofaPPtoanNPei-theraboveorinsidearelativeclause,thehighNPisdistinctfromthelowoneinthatthealreadymod-iedonecontainsaverb(andthelowonemaybeabaseNPaswell).ThisisapartialexplanationoftheutilityofverbaldistanceinCollins(1999).To13TheinabilitytoencodedistancenaturallyinanaivePCFGissomewhatironic.IntheheartofanyPCFGparser,thefunda-mentaltableentryorchartitemisalabeloveraspan,forex-ampleanNPfromposition0toposition5.Theconcreteuseofagrammarruleistotaketwoadjacentspan-markedlabelsandcombinethem(forexampleNP[0,5]andVP[5,12]intoS[0,12]).Yet,onlythelabelsareusedtoscorethecombination. Length40LPLRF1ExactCB0CBMagerman(1995)84.984.61.2656.6Collins(1996)86.385.81.1459.9thispaper86.985.786.330.91.1060.3Charniak(1997)87.487.51.0062.1Collins(1999)88.788.60.9067.1Length100LPLRF1ExactCB0CBthispaper86.385.185.728.81.3157.2Figure8:Resultsofthenalmodelonthetestset(section23).capturethis,DOMINATES-Vmarksallnodeswhichdominateanyverbalnode(V*,MD)witha-V.ThisbroughtthecumulativeF1to86.91%.Wealsotriedmarkingnodeswhichdominatedprepositionsand/orconjunctions,butthesefeaturesdidnothelpthecu-mulativehill-climb.Thenaldistance/depthfeatureweusedwasanexplicitattempttomodeldepth,ratherthanusedistanceandlinearinterventionasaproxy.WithRIGHT-REC-NP,wemarkedallNPswhichcontainedanotherNPontheirrightperiphery(i.e.,asaright-mostdescendant).Thiscapturedsomefurtherat-tachmenttrends,andbroughtustoanaldevelop-mentF1of87.04%.9FinalResultsWetookthenalmodelandusedittoparsesec-tion23ofthetreebank.Figure8showsthere-sults.ThetestsetF1is86.32%for40words,alreadyhigherthanearlylexicalizedmodels,thoughofcourselowerthanthestate-of-the-artparsers.10ConclusionTheadvantagesofunlexicalizedgrammarsareclearenough–easytoestimate,easytoparsewith,andtime-andspace-efcient.However,thedismalper-formanceofbasicunannotatedunlexicalizedgram-marshasgenerallyrenderedthoseadvantagesirrel-evant.Here,wehaveshownthat,surprisingly,themaximum-likelihoodestimateofacompactunlexi-calizedPCFGcanparseonparwithearlylexicalizedparsers.Wedonotwanttoarguethatlexicalse-lectionisnotaworthwhilecomponentofastate-of-the-artparser–certainattachments,atleast,requireit–thoughperhapsitsnecessityhasbeenoverstated.Rather,wehaveshownwaystoimproveparsing,someeasierthanlexicalization,andothersofwhichareorthogonaltoit,andcouldpresumablybeusedtobenetlexicalizedparsersaswell.AcknowledgementsThispaperisbasedonworksupportedinpartbytheNationalScienceFoundationunderGrantNo.IIS-0085896,andinpartbyanIBMFacultyPartnershipAwardtothesecondauthor.ReferencesJamesK.Baker.1979.Trainablegrammarsforspeechrecogni-tion.InD.H.KlattandJ.J.Wolf,editors,SpeechCommuni-cationPapersforthe97thMeetingoftheAcousticalSocietyofAmerica,pages547–550.TaylorL.BoothandRichardA.Thomson.1973.Applyingprobabilitymeasurestoabstractlanguages.IEEETransac-tionsonComputers,C-22:442–450.SharonA.CaraballoandEugeneCharniak.1998.Newguresofmeritforbest-rstprobabilisticchartparsing.Computa-tionalLinguistics,24:275–298.EugeneCharniak,SharonGoldwater,andMarkJohnson.1998.Edge-basedbest-rstchartparsing.InProceedingsoftheSixthWorkshoponVeryLargeCorpora,pages127–133.EugeneCharniak.1996.Tree-bankgrammars.InProc.ofthe13thNationalConferenceonArticialIntelligence,pp.1031–1036.EugeneCharniak.1997.Statisticalparsingwithacontext-freegrammarandwordstatistics.InProceedingsofthe14thNa-tionalConferenceonArticialIntelligence,pp.598–603.EugeneCharniak.2000.Amaximum-entropy-inspiredparser.InNAACL1,pages132–139.EugeneCharniak.2001.Immediate-headparsingforlanguagemodels.InACL39.NoamChomsky.1965.AspectsoftheTheoryofSyntax.MITPress,Cambridge,MA.MichaelJohnCollins.1996.Anewstatisticalparserbasedonbigramlexicaldependencies.InACL34,pages184–191.M.Collins.1999.Head-DrivenStatisticalModelsforNaturalLanguageParsing.Ph.D.thesis,Univ.ofPennsylvania.JasonEisnerandGiorgioSatta.1999.Efcientparsingforbilexicalcontext-freegrammarsandhead-automatongram-mars.InACL37,pages457–464.MarilynFord,JoanBresnan,andRonaldM.Kaplan.1982.Acompetence-basedtheoryofsyntacticclosure.InJoanBres-nan,editor,TheMentalRepresentationofGrammaticalRe-lations,pages727–796.MITPress,Cambridge,MA.DanielGildea.2001.Corpusvariationandparserperformance.In2001ConferenceonEmpiricalMethodsinNaturalLan-guageProcessing(EMNLP).DonaldHindleandMatsRooth.1993.Structuralambiguityandlexicalrelations.ComputationalLinguistics,19(1):103–120.MarkJohnson.1998.PCFGmodelsoflinguistictreerepresen-tations.ComputationalLinguistics,24:613–632.DanKleinandChristopherD.Manning.2001.Parsingwithtreebankgrammars:Empiricalbounds,theoreticalmodels,andthestructureofthePenntreebank.InACL39/EACL10.DavidM.Magerman.1995.Statisticaldecision-treemodelsforparsing.InACL33,pages276–283.AndrewRadford.1988.TransformationalGrammar.Cam-bridgeUniversityPress,Cambridge.DanaRon,YoramSinger,andNaftaliTishby.1994.Thepowerofamnesia.AdvancesinNeuralInformationProcessingSys-tems,volume6,pages176–183.MorganKaufmann.