/
TheMULIProjectAnnotationandAnalysisofInformationStructureinGermanandEn TheMULIProjectAnnotationandAnalysisofInformationStructureinGermanandEn

TheMULIProjectAnnotationandAnalysisofInformationStructureinGermanandEn - PDF document

adah
adah . @adah
Follow
342 views
Uploaded On 2021-08-15

TheMULIProjectAnnotationandAnalysisofInformationStructureinGermanandEn - PPT Presentation

guagespeci2crealisationsofthesefeaturesThisisparticularlythecasefortheexpletiveesinGermananditsEnglishequivalentthereinsertionTheunitunderinvestigationonthesyntacticlevelistheclauseiepriortotheanaly ID: 863804

unique 150 csingular http 150 unique http csingular speci abstract 2001 1997 finally uncountable 1996 www 1999 passoneau 148

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "TheMULIProjectAnnotationandAnalysisofInf..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 TheMULIProject:AnnotationandAnalysisofIn
TheMULIProject:AnnotationandAnalysisofInformationStructureinGermanandEnglishStefanBaumann,CarenBrinckmann,SilviaHansen-Schirra,Geert-JanKruijff,IvanaKruijff-Korbayov´a,StellaNeumann,ErichSteiner,ElkeTeich,HansUszkoreitSaarlandUniversity,Saarbr¨ucken,GermanyAbstractThegoaloftheMULI(MUltiLingualInformationstructure)projectistoempiricallyanalyseinformationstructureinGermanandEnglishnewspapertexts.Incontrasttootherprojectsinwhichinformationstructureisannotatedandinvestigated(e.g.inthePragueDependencyTreebank,whichmirrorsthebasicinformationaboutthetopic-focusarticulationofthesentence),wedo guagespecicrealisationsofthesefeatures.Thisisparticu-larlythecasefortheexpletiveesinGermananditsEnglishequivalentthere-insertion.Theunitunderinvestigationonthesyntacticlevelistheclause,i.e.priortotheanalysisthecorpuswassegmentedintoclauses.2.3.DiscourseInformationstructure(IS)theoriesdescribethephe-nomenaathandatasurfacelevel,atasemanticlevel,oratbothlevelssimultaneously,i.e.,anexpressionbelongstosomeISpartition,invirtueofsomeinformationsta-tusofthecorrespondingdiscourseentity.Fortheinves-tigationofISatthesemanticlevel,weneedmoreinfor-mationaboutthecharacterofthediscourseentitiesintro-ducedbylinguisticexpressions.Wethereforeannotateex-pressionswiththeirdiscoursereferentsandtheirfollowingproperties:Type(intensionalorextensionalobject,prop-erty,eventualityortextuality)andmorenegrainedSeman-ticSort;referentialpropertiesofDelimitation(unique,ex-istential,variable,non-denotationaluse(Hlavsa,1975))andQuantication(uncountable,unspecicnon-singular,specic-nonsingularorspecicsingular);theFormofanexpression(althoughitdoesnotnecessarilybelongtothislevel,buttherearecorrelationswiththeotherfeatures);In-formationStatus(new,unused,inferable,evoked)(Prince,1981).CodinginformationstatusismotivatedbythefactthatIStheoriesoftenemploysomenotionofinformationstatusasonedimensionofthepartitioningonitsown,orasthebasisforderivingahigherlevelofpartitioning.WeusePrince'sfamiliaritytaxonomy,whichclearlyaddressesthestatusofdiscourseentitiesassuch,nototherreferentialproperties.Besidesthepropertiesofindividualdiscoursereferents,weannotateanaphoriclinksbetweenexpressions.Wedis-tinguishbetweencoreferenceandbridging,wherethereexistsanassociativerelationshipbetweenthereferentsoftheanaphorandtheantecedent,suchasset-containment,part-wholecomposition,property-attribution,possession,causalityorlexical-argument-lling.TherelationbetweenanaphoricityandISisnotastraightforwardone,andneedsfurtherinvestigation,enabledbyanannotationlikeours.OurannotationschemefollowstheTextEncodingIni-tiativerecommendations(http://www.tei-c.org/)andtheDiscourseResourceInitiativeguidelines(Carlettaetal.,1997).Inlinewiththesestandards,wedenewhatex-pressionsaremarkables,whatattributestheyhaveandwhatlinkscanholdbetweenthem.Atthediscourselevel,markablesare“nominal-like”(Passoneau,1996)linguis-ticexpressionsthatintroduceoraccessdiscourseentities(i.e.,discoursereferentsinthesenseusedinDRTandalike).Webuildonandextendthereferenceannota-tionschemesforMUC-6andMUC-7(MUCCoreferenceSpecication),DRAMA(Passoneau,1996),theMATEproject((Poesioetal.,1999);http://mate.mip.ou.dk),theDRIguidelines(Carlettaetal.,1997),(PoesioandVieira,1998)and(M¨ullerandStrube,2001).Thecorpushasbeenannotatedbytwoannotators(oneofthedevelop-ersandoneonlyinstructedbytheannotationguidelines),usingtheMMAXannotationtool(http://www.eml.villa-bosch.de/english/Research/NLP/Downloads).2.4.ProsodyInspokenlanguage

2 ,prosody(intonation,phrasing,stress,rhyt
,prosody(intonation,phrasing,stress,rhythm)isoftenusedtorealisetheinformationstructureofatext,e.g.thepragmaticstructure(fo-cus/background)orthedegreeofcognitiveactivationofindividualdiscoursereferentsorpropositions(given/new).Accentplacementandphrasingaretheprimarymeanstomarkinformationstructuralconcepts,butpitchrange,rhythm,andspeechratealsoplayanimportantrole.Inordertocarryouttheprosodicannotation,werecordedoneGermanandoneEnglishnativespeakerread-ingaloudthetextsoftheMULIcorpus.1Sinceindivid-ualspeakingpreferencesmayvaryfromspeakertospeaker,ourresultsarenotgeneralisable,reectingtheexperimentalcharacterofthestudy.TherecordingsweredigitisedandannotatedonsixdifferentlevelsusingtheEMUSpeechDatabaseSystem((CassidyandHarrington,2001);http://emu.sourceforge.net/):(1)wordboundariesandpauses,(2)punctuationofthewrittentexts,(3)positionandtypeofpitchaccentsandboundarytones,(4)positionandstrengthofphrasebreaks,(5)rhythmicphenomena,includingnon-canonicalwordstress,(6)comments.Theannotationoflevel3and4followsthecon-ventionsofToBI(TonesandBreakIndices(BeckmannandHirschberg,1994))forEnglishandGToBI((Griceetal.,inpress);http://www.coli.uni-sb.de/phonetik/projects/Tobi/gtobi.html)forGerman.Theycanberegardedasstandardsfordescribingtheintonationoftheselanguageswithintheframeworkofautosegmental-metricalphonol-ogy,inwhichpitchcontoursaredecomposedintohighandlowtonaltargets(symbolisedbyHandL).DiacriticsarelistedinTable1,thetonalandbreakindexinventoriesaresummarisedinTable2. targetontheaccentedsyllable +targetbeforeoraftertheaccentedsyllable –boundarytoneofanintermediatephrase(ip) %boundarytoneofanintonationphrase(IP) !downstepofanHtone ˆupstepofanHtone Table1:(G)ToBIdiacritics ToBI GToBI pitchaccents H*,L*,L+H* H*,L*,L+H* L*+H,H+!H* L*+H,H+!H*,H+L* forceaccents – H(*),L(*) boundary L–,H–,L–L% L–,H–,L–% tones H–L%,H–H% H–%,H–ˆH% L–H%,%H L–H%,%H breakindices 0,1,2,3,4 2r,2t,3,4 Table2:(G)ToBIinventoriesoftonesandbreakindices 1Sinceprosodicannotationisverytime-consuming,wehadtoconcentrateononelanguage.Thus,weanalysedallGermantextsandrestrictedourselvestosomeEnglishexamples. 1490 3.ExampleWeillustratethedifferentlevelsofannotationandanal-ysiswithanexamplesequencetakenfromourEnglishcor-pus(Figure1).Weconsiderthesyntacticannotationasuit-ablestartingpointfortheanalysis.Whererelevantfeaturesaredetected,wecomparetheannotationtootherlevels. (1)Inthe1987crash,remember,themarketwasshakenbyaDannyRostenkowskiproposaltotaxtakeoversoutofexistence.(2)Evenmoreim-portant,inourview,wastheTreasury'sthreattothrashthedollar.(3)TheTreasuryisdoingthesamethingtoday;(4)thankfully,thedollarisnotunder1987-stylepressure. Figure1:ExamplesequencefromtheEnglishcorpusTheexamplesequencewassegmentedintofourclauses.Ofallfourclauses,threeshownoncanonicalwordorders.In(1),thetemporaladjunctisfronted,followedbythepred-icateremember(inimperativemood).Similarly,in(4),anadjunct(markingstance)isfronted.In(2),subjectcomple-mentandadjunct(againmarkingstance)arefronted.Ad-ditionally,(1)containsapassiveconstructionbringingthepatientinsubjectposition.Thediscourseentity(DE)introducedinthefrontedtem-poralphrasethe1987crashin(1)isextensional,abstract,unique,specicsingular,andhastheinformationstatusofunused(alsoindicatedbyremember).TheDEintro-ducedintheunmarkedsubjectpositionisextensional,ab-stract,unique,specicsingular,buthasthestatusofin-ferable:themarketcanbeseenasabridginganaphortothecrash,bymeansofan

3 argumentlling(crashofthemarket).TheD
argumentlling(crashofthemarket).TheDEsintroducedbythesentence-nalexpres-sionsin(1)and(2)arealsoextensional,abstract,unique,specicsingular,andbothhavetheinformationstatusofnew.2Whatappearssentence-nalin(1)and(2)arethustwonegativethingsthathappenedduringthe1987crash.Theevaluation-ascribingadjectivephrasein(2)isnotan-notatedasaDE.TheDEsintheunmarkedsubjectpositionsin(3)and(4)bothhavetheinformationstatusoftextuallyevoked,asbothexpressionsarecoreferentialanaphorstopartsoftheTreasury'sthreattothrashthedollar.WhiletheDEreferredtobytheTreasuryisanextensional,of-ce,unique,specicsingular,thatofthedollarisinten-sional,abstract,unique,uncountable.Theexpressionthesamethingin(3)isanaphorictotheTreasury'sthreat...in(2),butitintroducesanewDEofthesametype;itsin-formationstatusisthatofinferable.Finally,theDEintro-ducedinthesentence-nalexpression1987-stylepressurein(4)isintensional,abstract,existential,uncountable,andalsohastheinformationstatusofinferable;itishoweverhardtocodeitasabridginganaphor,becauseitisnotclearwhatrelationitwouldhavetowhatantecedent:ifanything,thenaDannyRostenkowskiproposal...in(1)(accordingtooneoftheannotators). 2Weassumealaymanreader.Foraneconomyexpert,theseentitiesmayhavethestatusofunused.Theprosodicanalysisshowsthatthefrontedphrasein(2)isnotonlysyntacticallybutalsoprosodicallypromi-nent(cf.Figure2):Twopeakaccentsonevenandmorehighlightthesewords(withthemorepronouncedaccentonmoreexpressingacontrast),whereasthewordimportantisdeaccented,sincetheconceptof'importance'isinfer-ablefromthecontext.Furthermore,theadjectiveconstruc-tionformsaphraseofitsown,delimitedbyanintonationphraseboundary,whichisinturnsignalledbyafalling-risingcontourplusashortpause.Thefollowingparenthe-sisinourviewalsoconstitutesasingleintonationphrase.Hereagain,ourisassignedacontrastiveaccent,whileviewisunaccentedduetogivenness.Allremainingcontentwordsoftheclausereceiveac-cents.However,themost'newsworthy'word,threat,istheonlyonemarkedbyarisingpitchaccent(L+H*),in-dicatingitshigherdegreeofimportanceforthespeaker.Thisinterpretationisfurthersupportedbytheinsertionofaphrasebreakdirectlyafterthisword.Finally,thehigh-downsteppednuclearaccent(H+!H*)ondollarmarksthisitemasbeingaccessiblebyspeakerandhearer(cf.(PierrehumbertandHirschberg,1990)).Thismeansitcanneithercountasbrandnew(whichnormallyrequiresaH*peakaccent),norasimmediatelygiven,sinceitisnotdeaccented(asisthecasewiththewordimportantabove).4.ConclusionsFirstexperienceswithourmultilingualmulti-layeran-notationleadtoconclusionswithrespecttohowtoau-tomizetheannotationprocessusingstatisticalmethodsandlearningprocedures.Onthegrammaticallevel,thesyntac-ticannotationoftheTigerCorpusandthePennTreebank,forexample,canbeusedtodeterminepassiveconstruc-tions.Inconnectionwiththediscourseannotation,forin-stance,theexistingpart-of-speechtagscanbeusedinordertoidentifypronominalco-reference.Wearealsoworkingonrobustmethodsforidentify-inginformationstructure,followingtheannotationscheme(whichfocusesontheinformationstatusofindividualmarkables)aswellasinvestigatingtheideaofinformativityzoning,i.e.thedivisionofclauses/sentencesintopartsthataremoreorlessinformativeinthegivencontext.Furthermore,conclusionscanbedrawnfromtheco-occurrenceofparticularcategoriesondifferentlevelsofan-notationinMULI,indicatinghowthesedifferentlevelsaredeployedinordertomarkinformationstructure.However,ourinitialinvestigationalsorevealswhereadditionalannotationwouldbeneeded.Forinstance,thetextexamplediscussedaboveconstitutesaconces

4 sionscheme,whichwecannotidentifywithouta
sionscheme,whichwecannotidentifywithoutannotatingdis-course/rhetoricalrelations.Usingourndingsasatertiumcomparationis,theory-dependentinformationstructureannotationandthusexist-ingtheoriesoninformationstructurecanbecomparedtoandvalidatedagainstourtheory-neutralapproachandviceversa.Finally,usingourndingsontheco-occurrenceoftheannotatedcategories,itispossibletocomparehowdif-ferentlanguagesusedifferentgrammatical,discursiveandprosodicmeanstostructureinformation. 1491 Figure2:Prosodicannotationofexamplesentence(2)inEMU5.ReferencesBecker,MarkusandAnetteFrank,2002.Astochastictopo-logicalparserofGerman.InProceedingsofCOLING2002.Taipei,Taiwan.Beckmann,MaryE.andJuliaHirschberg,1994.TheToBIannotationconventions.Ms.andaccompanyingspeechmaterials,OhioStateUniversity.Biber,Douglas,StigJohansson,GeoffreyLeech,Su-sanConrad,andEdwardFinegan,1999.TheLong-manGrammarofSpokenandWrittenEnglish.Harlow:Longman.Brants,Sabine,StefanieDipper,PeterEisenberg,Sil-viaHansen,EstherK¨onig,WolfgangLezius,ChristianRohrer,GeorgeSmith,andHansUszkoreit,toappear.TIGER:LinguisticinterpretationofaGermancorpus.JournalofLanguageandComputation(JLAC),SpecialIssue.Carletta,Jean,NilsDahlb¨ack,NorbertReithinger,andMarylinA.Walker,1997.Standardsfordialoguecodinginnaturallanguageprocessing.Reportonthedagstuhlseminar,DiscourseResourceInitiative.Cassidy,SteveandJonathanHarrington,2001.Multi-levelannotationintheEMUspeechdatabasemanagementsystem.SpeechCommunication,33(1-2):61–78.Eisenberg,Peter,1994.GrundrissderdeutschenGram-matik,3.Au..Stuttgart,Weimar:Metzler.Grice,Martine,StefanBaumann,andRalfBenzm¨uller,inpress.Germanintonationinautosegmental-metricalphonology.InSun-AhJun(ed.),ProsodicTypol-ogy:ThroughIntonationalPhonologyandTranscrip-tion.OUP.Hlavsa,Zdenek,1975.Denotaceobjektuajej´prostredkyvsoucasn´ecestine[DenotatingofobjectsanditsmeansincontemporaryCzech],volume10ofStudieapr´acelingvistick´e[Linguisticstudiesandworks].Academia.Marcus,Mitchell,GraceKim,MaryAnnMarcinkiewicz,RobertMacIntyre,AnnBies,MarkFerguson,KarenKatz,andBrittaSchasberger1994.Thepenntreebank:Annotatingpredicateargumentstructure.InProceed-ingsoftheHumanLanguageTechnologyWorkshop.SanFrancisco,MorganKaufmann.M¨uller,ChristophandMichaelStrube,2001.AnnotatinganaphoricandbridgingrelationswithMMAX.InPro-ceedingsofthe2ndSIGdialWorkshoponDiscourseandDialogue.Aalborg,Denmark.Passoneau,Rebecca,1996.Instructionsforapplyingdis-coursereferenceannotationformultipleapplications(DRAMA).Draft.Pierrehumbert,JanetandJuliaHirschberg,1990.Themeaningofintonationalcontoursintheinterpretationofdiscourse.InP.R.Cohen,J.Morgan,andM.E.Pollack(eds.),IntentionsinCommunication.MITpress,pages271–311.Poesio,Massimo,FlorenceBruneseaux,SarahDavies,andLaurentRomary,1999.TheMATEmeta-schemeforcoreferenceindialogueinmultiplelanguages.InMarylinWalker(ed.),Proceedingsoftheworkhopson”TowardsStandardsandToolsforDiscourseTagging”atthe37thAnnualMeetingoftheAssociationforCom-putationalLinguistics(ACL).UniversityofMaryland.Poesio,MassimoandRenataVieira,1998.Acorpus-basedinvestigationofdenitedescriptionuse.ComputationalLinguistics,24(2):183–216.Prince,Ellen,1981.Towardataxonomyofgiven-newin-formation.InPeterCole(ed.),RadicalPragmatics.Aca-demicPress,pages223–256.Quirk,Randolph,SidneyGreenbaum,GeoffreyLeech,andJanSvartik,1985.AcomprehensivegrammaroftheEn-glishlanguage.London:Longman.Weinrich,Harald,1993.TextgrammatikderdeutschenSprache.Mannheimu.a.:Dudenverlag. 149

Related Contents


Next Show more