/
Pragmatic Annotation & Analysis in DART Pragmatic Annotation & Analysis in DART

Pragmatic Annotation & Analysis in DART - PowerPoint Presentation

delilah
delilah . @delilah
Follow
66 views
Uploaded On 2023-09-22

Pragmatic Annotation & Analysis in DART - PPT Presentation

Martin Weisser School of English amp Education Guangdong University of Foreign Studies weissermargmailcom martinweisserorg Outline Getting DART Design Background DART Annotation Scheme Basic Automated Annotation ID: 1019316

act amp edit resources amp act resources edit speech editing topic tag files annotation corpus click creating file lexicon

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Pragmatic Annotation & Analysis in D..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Pragmatic Annotation & Analysis in DARTMartin WeisserSchool of English & EducationGuangdong University of Foreign Studiesweissermar@gmail.commartinweisser.org

2. OutlineGetting DARTDesign BackgroundDART Annotation SchemeBasic Automated AnnotationSpeech-Act AnalysisN-Gram AnalysisCreating & Editing Resources

3. Getting DARTgo to http://martinweisser.org/ling_soft.html#DARTdownload & run installer (currently 64bit Win only)

4. Design Background (1)1997–1998: Expert Advisory Group on Language Engineering Standards (EAGLES) WP4guidelines for the representation and annotation of dialogue2001–2002: SPAAC (A Speech-Act Annotated Corpus of Dialogues) Projectannotation of some 1,200 task-oriented dialogue files (Trainline + BT)need to annotate and post-edit corpus efficiently and consistently on multiple levels  SPAACy

5. Design Background (2)colour coding helps to identify syntactic patternspost-processing constrained through fixed optionsresources loaded automatically

6. Design Background (3)flaws in SPAACYmonolithic, i.e. no separation of ‘linguistic intelligence’ & output displayhard to improve linguistic analysisprocessing & editing of single files onlyother interface issues, e.g. too many buttons, etc.development of DARTmodularisationstrict separation of processing and linguistic analysis routinesenhanced options for analysis and creation of resources

7. DART Annotation Scheme (1) –Basic Input Formatoptional stylesheet referencetext with optional punctuation ‘tags’ or embedded commentsbasic skeleton can be created via ‘File→New’ (Ctrl + n)

8. DART Annotation Scheme (1) –Output Formatsyntactic categorymode = semantico-pragmatic markers/’IFIDs’topic = semantic info(surface) polarityspeech act(s)speech act generally inferred from combination of syntax + mode

9. Basic Automated Annotationinput files workspaceoutput files workspaceto load single file, press Ctrl + a(, for whole directory Ctrl + d)single file loaded;to pre-edit, click hyperlink;to annotate pragmatically, press Ctrl+adebugging output;ignore if annotation completes successfully single file processed;to post-edit, click hyperlink

10. Speech-Act Analysisgenerate frequency list of syntactic category + speech act(s) from ‘Analysis→Speech act stats’click hyperlinked speech act (combination) to prime concordancerinvestigate resultsif necessary, correct speech act tag by clicking the hyperlink to the file and editing it

11. N-Gram Analysisuseful for determining formulaic expressions for modes or topic patterns (or in general)predefined options for uni- to tri-gramsoptionally also freely definable n-gramsfrequency lists display abs. & rel. frequencieshyperlink again primes concordancerfor all n>1 with interpolated optional fillersdue to accommodating mixed-case data, sometimes ‘case insensitive’ flag required

12. Creating & Editing Resources (1)mostly done via ‘Edit resources’ menu…… apart from creating new filesto create new corpuschoose ‘Edit configuration’click ‘Add corpus entry’fill in corpus, lexicon, and topic file name (usually identical, apart from extension)click ‘Save configuration’new resources createddata folder for corpusthree subfolders: ‘info’, ‘notes’, and ‘stats’dummy lexicon & topics files (in relevant program folders)

13. Creating & Editing Resources (2)existing resources can be edited…generally via relevant entry in the ‘Edit resources’ menulexica & topic files via hyperlinks in configuration editorsafest to edit only dialogue, lexica & topic files…… unless you really know what you’re doing lexica can also be ‘synthesised’ from corpus data

14. Creating & Editing Resources (3) –Lexicavery simple formatword (base form) + space + tag + optional comment (preceded by #)special DART tagsetallows for lexical polysemyuppercase tag name = unambiguouslowercase tag name = predominantly tag Xtooltips on tag buttons provide explanations while editingsynthesising lexicon works bycreating word list from corpus‘subtracting’ items from general lexicon suggesting possible candidates after morphological analysis

15. Creating & Editing Resources (4) –Topic Filessyntax more complex than for lexicacombination of topic labels, space, double colon, space, associated (representative) patternspatterns expressed asregexesindividual sub-patterns separated by 3 underscores