Brezina V 2018 Statistics in Corpus Linguistics A Practical Guide Cambridge Cambridge University Press 1 If these words are not essential to the meaning of your sentence use which and separate the words with a comma Microsoft 2010 ID: 1047853
Download Presentation The PPT/PDF document "Lexico -grammar: From simple counts to c..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. Lexico-grammar: From simple counts to complex modelsBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.1
2. “If these words are not essential to the meaning of your sentence, use ‘which’ and separate the words with a comma” (Microsoft 2010).
3. Think about and discussWhat is a grammatical rule?Do you think grammatical rules apply in all cases?Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.3
4. Where to start?Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.4
5. Lexico-grammar: Research designLinguistic feature designBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.5Linguistic/outcome variableexplanatory variables/predictors
6. Lexico-grammatical frameOpportunity of use (obligatory place).Place where variation happens in text.E.g.Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.6NOUN + which/that + relative clause
7. Lexico-grammatical frame (cont.)Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.7Research questionOutcome variable optionsLexico-grammatical frameWhen do we use the passive construction?ACTIVE, PASSIVEAll verb forms that can be used in passive i.e. transitive verbs.In what contexts do we use which and in what contexts that in relative clauses?which, thatAll relative clauses.When do speakers use that deletion? E.g. I think Ø this is good.that, Ø [no relativizer]All clauses where that occurs or is deleted.What is the difference between various modal expressions of strong obligation?must, have to, need toAll contexts in which strong deontic modals occur.
8. Lexico-grammatical vs. ‘ambient’ variablesIt's about time that was done [BNC, file: KBB]. Well, you know, it you see, time were, I don't know I suppose, I don't know but I never seemed to be afraid... [BNC, file: HDK]. Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.8
9. Cross-tabulation and mosaic plotBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.9Presence of separatorRelativizer separator (, or –) no separatorTotalwhich 1,396 (63%) 804 (37%) 2,200 that 191 (3%) 7,281 (97%) 7,472 Total 1,587 8,085 9,672
10. Cross-tabulation and mosaic plot (cont.)Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.10Presence of separatorRelativizer separator (, or –) no separatorTotalwhich 1,396 (63%) 804 (37%) 2,200 that 191 (3%) 7,281 (97%) 7,472 Total 1,587 8,085 9,672
11. Percentages and chi-squared testPercentages in a cross-tabulation table Chi-squared Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.11Assumptions:Independence of observations. Expected frequencies greater than 5 (In contingency tables larger than 2 × 2 at least 80% of expected frequencies greater than 5). Alternative tests: Log likelihood test (also known as likelihood ratio test or G test) or the Fisher exact test.
12. Complex model: logistic regressionBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.12Presence of separatorRelativizer separator (, or –) no separatorTotalwhich 1,396 (63%) 804 (37%) 2,200 that 191 (3%) 7,281 (97%) 7,472 Total 1,587 8,085 9,672 Variety Separator (, or –) ClauseSyntaxRelativizer Total that which AmericanNONon-restrictiveObject314 Subject11213 RestrictiveObject18220 Subject1265131 YESNon-restrictiveObject066 Subject02020 RestrictiveObject000 Subject101 BritishNONon-restrictiveObject325 Subject3811 RestrictiveObject14822 Subject761591YESNon-restrictiveObject044 Subject03131 RestrictiveObject101 Subject000Total 256104360
13. Logistic regressionBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.13Outcome: nominal, (ordinal)Predictors: nominal, ordinal, scale
14. Logistic regression: DatasetBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.14Linguistic/outcome variableexplanatory variables/predictorsscalenominalnominalnominalnominalnominal
15. Logistic regression: outputBrezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.15Estimate (log odds)Standard ErrorZ value (Wald)p-valueEstimate (odds)95% CI lower95% CI upper(Intercept)-3.3540.563-5.9580.0000.0350.0110.099VarietyB_BR1.6670.3974.1950.0005.2962.51112.080SeparatorB_YES3.9850.8254.8320.00053.79512.876376.448ClauseB_Non_restr2.0460.4464.5880.0007.7333.23518.812SyntaxB_Subject-0.6140.421-1.4600.1440.5410.2401.260Length0.0790.0292.7390.0061.0831.0231.147Model 1 with predictor variables (‘Variety’, ‘Separator’, ‘Clause type’, ‘Syntax’ and ‘Length’) is significant (LL: 222.31; p < .0001) and has outstanding classification properties (C-index: 0.91).
16. “If these words are not essential to the meaning of your sentence, use ‘which’ and separate the words with a comma” (Microsoft 2010).Was the computer right after all? If the suggestion by the computer were to be taken as a categorical rule, the answer is certainly ‘no’. There is a combination of multiple factors that favour or disfavour the use of which (and that) and these factors have to be interpreted as probabilities (or odds, to be precise), not certainty.
17. Things to rememberWhen analysing lexico-grammatical variation we need to pay attention to individual linguistic contexts and define a lexico-grammatical frame.Cross-tabulation can be used for a simple analysis of categorical variables. In addition to frequencies, crosstab tables can also include percentages based on row totals (most useful for investigation of lexico-grammar), column totals and the grand total.The data in cross-tab tables can be effectively visualized using mosaic plots.We can test the statistical significance of the relationship between variables in a two-way crosstab table (i.e. a table with one linguistic and one explanatory variable) using the chi-squared test. The effect sizes reported are Cramer’s V (overall effect) and probability or odds ratios (individual effects). Logistic regression is a sophisticated multivariable method for analysing the effect of different predictors (both categorical and scale) on a categorical (typically binary) outcome variable.Brezina, V. (2018). Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.17