/
Lexico -grammar: From simple counts to complex models Lexico -grammar: From simple counts to complex models

Lexico -grammar: From simple counts to complex models - PowerPoint Presentation

walsh
walsh . @walsh
Follow
0 views
Uploaded On 2024-03-13

Lexico -grammar: From simple counts to complex models - PPT Presentation

Brezina V 2018 Statistics in Corpus Linguistics A Practical Guide Cambridge Cambridge University Press 1 If these words are not essential to the meaning of your sentence use which and separate the words with a comma Microsoft 2010 ID: 1047853

2018 cambridge corpus statistics cambridge 2018 statistics corpus linguistics practical guide university press lexico grammatical brezina test logistic variables

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Lexico -grammar: From simple counts to c..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Lexico-grammar: From simple counts to complex modelsBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.1

2. “If these words are not essential to the meaning of your sentence, use ‘which’ and separate the words with a comma” (Microsoft 2010).

3. Think about and discussWhat is a grammatical rule?Do you think grammatical rules apply in all cases?Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.3

4. Where to start?Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.4

5. Lexico-grammar: Research designLinguistic feature designBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.5Linguistic/outcome variableexplanatory variables/predictors

6. Lexico-grammatical frameOpportunity of use (obligatory place).Place where variation happens in text.E.g.Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.6NOUN + which/that + relative clause

7. Lexico-grammatical frame (cont.)Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.7Research questionOutcome variable optionsLexico-grammatical frameWhen do we use the passive construction?ACTIVE, PASSIVEAll verb forms that can be used in passive i.e. transitive verbs.In what contexts do we use which and in what contexts that in relative clauses?which, thatAll relative clauses.When do speakers use that deletion? E.g. I think Ø this is good.that, Ø [no relativizer]All clauses where that occurs or is deleted.What is the difference between various modal expressions of strong obligation?must, have to, need toAll contexts in which strong deontic modals occur.

8. Lexico-grammatical vs. ‘ambient’ variablesIt's about time that was done [BNC, file: KBB]. Well, you know, it you see, time were, I don't know I suppose, I don't know but I never seemed to be afraid... [BNC, file: HDK]. Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.8

9. Cross-tabulation and mosaic plotBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.9Presence of separatorRelativizer separator (, or –) no separatorTotalwhich 1,396 (63%) 804 (37%) 2,200 that 191 (3%) 7,281 (97%) 7,472 Total 1,587 8,085 9,672

10. Cross-tabulation and mosaic plot (cont.)Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.10Presence of separatorRelativizer separator (, or –) no separatorTotalwhich 1,396 (63%) 804 (37%) 2,200 that 191 (3%) 7,281 (97%) 7,472 Total 1,587 8,085 9,672

11. Percentages and chi-squared testPercentages in a cross-tabulation table Chi-squared  Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.11Assumptions:Independence of observations. Expected frequencies greater than 5 (In contingency tables larger than 2 × 2 at least 80% of expected frequencies greater than 5). Alternative tests: Log likelihood test (also known as likelihood ratio test or G test) or the Fisher exact test.

12. Complex model: logistic regressionBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.12Presence of separatorRelativizer separator (, or –) no separatorTotalwhich 1,396 (63%) 804 (37%) 2,200 that 191 (3%) 7,281 (97%) 7,472 Total 1,587 8,085 9,672  Variety Separator (, or –) ClauseSyntaxRelativizer Total  that which AmericanNONon-restrictiveObject314   Subject11213  RestrictiveObject18220   Subject1265131 YESNon-restrictiveObject066   Subject02020  RestrictiveObject000   Subject101 BritishNONon-restrictiveObject325   Subject3811  RestrictiveObject14822   Subject761591YESNon-restrictiveObject044   Subject03131  RestrictiveObject101   Subject000Total   256104360

13. Logistic regressionBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.13Outcome: nominal, (ordinal)Predictors: nominal, ordinal, scale

14. Logistic regression: DatasetBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.14Linguistic/outcome variableexplanatory variables/predictorsscalenominalnominalnominalnominalnominal

15. Logistic regression: outputBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.15Estimate (log odds)Standard ErrorZ value (Wald)p-valueEstimate (odds)95% CI lower95% CI upper(Intercept)-3.3540.563-5.9580.0000.0350.0110.099VarietyB_BR1.6670.3974.1950.0005.2962.51112.080SeparatorB_YES3.9850.8254.8320.00053.79512.876376.448ClauseB_Non_restr2.0460.4464.5880.0007.7333.23518.812SyntaxB_Subject-0.6140.421-1.4600.1440.5410.2401.260Length0.0790.0292.7390.0061.0831.0231.147Model 1 with predictor variables (‘Variety’, ‘Separator’, ‘Clause type’, ‘Syntax’ and ‘Length’) is significant (LL: 222.31; p < .0001) and has outstanding classification properties (C-index: 0.91).

16. “If these words are not essential to the meaning of your sentence, use ‘which’ and separate the words with a comma” (Microsoft 2010).Was the computer right after all? If the suggestion by the computer were to be taken as a categorical rule, the answer is certainly ‘no’. There is a combination of multiple factors that favour or disfavour the use of which (and that) and these factors have to be interpreted as probabilities (or odds, to be precise), not certainty.

17. Things to rememberWhen analysing lexico-grammatical variation we need to pay attention to individual linguistic contexts and define a lexico-grammatical frame.Cross-tabulation can be used for a simple analysis of categorical variables. In addition to frequencies, crosstab tables can also include percentages based on row totals (most useful for investigation of lexico-grammar), column totals and the grand total.The data in cross-tab tables can be effectively visualized using mosaic plots.We can test the statistical significance of the relationship between variables in a two-way crosstab table (i.e. a table with one linguistic and one explanatory variable) using the chi-squared test. The effect sizes reported are Cramer’s V (overall effect) and probability or odds ratios (individual effects). Logistic regression is a sophisticated multivariable method for analysing the effect of different predictors (both categorical and scale) on a categorical (typically binary) outcome variable.Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.17