/
Mar-23 H.S. 1 Stata:  Linear Mar-23 H.S. 1 Stata:  Linear

Mar-23 H.S. 1 Stata: Linear - PowerPoint Presentation

morgan
morgan . @morgan
Follow
66 views
Uploaded On 2023-08-23

Mar-23 H.S. 1 Stata: Linear - PPT Presentation

Regression Hein Stigum Presentation data and programs at Stata Course Institutt for helse og samfunn uiono Introduction DAG Mar23 HS 2 DAG Gestational age and Birthweight ID: 1014085

gest 23h linear mar 23h gest mar linear sex educ store continuous effect model regression weight birth interaction residual

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Mar-23 H.S. 1 Stata: Linear" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Mar-23H.S.1Stata: Linear RegressionHein StigumPresentation, data and programs at:Stata Course - Institutt for helse og samfunn (uio.no)

2. IntroductionDAGMar-23H.S.2

3. DAG: Gestational age and BirthweightBirth weight analysisContinuous outcomePlots of birth weight by gestational ageCompare meansLinear regressionMar-23H.S.3Egest ageDbirth weightC2educationC1sex

4. AgendaPurposeWorkflowSyntaxTesting assumptionsInfluenceMar-23H.S.4

5. BackgroundMar-23H.S.5

6. Mar-23H.S.6Regression idea

7. Mar-23H.S.7Model, measure and assumptionsModel (standard)Association measureb1 = change in y for one unit increase in x1Assumptions for the standard modelIndependent residuals discussNo interactions testLinear effects testConstant residual variance plot/testcan relax the assumptionsexposureconfounderInfluence of outliers plot

8. Mar-23H.S.8Purpose of regressionEstimationEstimate effect of exposure on outcome adjusted for other covariatesEstimate the effect of smoking on lung cancerPredictionPredict outcome by exposuresEstimate model (air pollution and distance from roads)Predict air pollution in a new dataset using distance from roadsDAGs, bias, precisionPredictive power, model fit, R2

9. Outcome distributions by exposureMar-23H.S.9Linear regressioncutoff, logistic regressionLinear regressionorLog-transform,linear regression

10. Mar-23H.S.10WorkflowDAGEgest ageDbirth weightC2educationC1sexConfounders: education  adjustRisk factors: sex  includeScatter- and density plotsBivariate analysisRegressionModel estimationTest of assumptionsIndependent residualsNo interactionsLinear effectsConstant error varianceInfluenceInfluence to outliers

11. Syntax: “3 Linear Regression.do” “Analysis”Mar-23H.S.11

12. Density and scatter plotsScatter of birth weight by gestational ageDistribution of birth weight for low/high gestational ageMar-23H.S.12Look for deviations from linearityand outliersLook for shift in mean,shift in shape

13. Bi-variateWeight by sex (continuous by binary)ttest bw, by(sex) t-testWeight by education (continuous by categorical-3)anova bw, by(educ) one way anovaWeight by gest. age (continuous by continuous)regress bw gest regressionttest bw, by(gest2) cut in 2, t-testMar-23H.S.13

14. Bi-variate resultMar-23H.S.14Birth weight in gr

15. Mar-23H.S.15Syntax for linear regressionEstimationregress y x1 x2 linear regressionregress y c.age i.sex continuous age, categorical sexregress y c.age##i.sex main+interactionCompare modelsestimates store m1 save modelestimates table m1 m2 compare coefficientsestimates stats m1 m2 compare model fitPost estimationpredict res, residuals predict residuals in new “res”

16. Factor (categorical) variablesVariableeduc = 1, 2, 3 for Low, Medium and High educationBuilt ini.educ use educ=1 as base (reference)ib3.educ use educ=3 as base (reference)help fvvarlist help for factor variablesManual “dummies”*educ=1 as base, make dummies for 2 and 3generate Medium =(educ==2) if educ<.generate High =(educ==3) if educ<.Mar-23H.S.16*margins and contrast require i.var notation

17. Continuous variablesVariableGestational age ranging from 28 to 42 weeks (mode=40)Built inc.gest default except for interactionsAdviceDo not categorize continuous variables in a final analysis!Loss of powerIncreased measurement errorSpurious interactionWhether exposure, confounder (or outcome)Need methods for non-linear effects (polynomials, splines)Mar-23H.S.17(Royston, Altman et al. 2006)

18. Syntax “Regression analysis”Mar-23H.S.18

19. Mar-23H.S.19Model 1: outcome+exposureregress bw gest crude modelestimates store m1 store model results

20. Mar-23H.S.20Model 2 and 3: Add covariatesEstimate association:m1 is biased, m2=m3regress bw gest i.educ sex add covariatesestimates table m1 m2 m3 compare coefsm3 more precise?m2: se(gest)=4.3m3: se(gest)=4.2(Robinson and Jewell 1991, (VanderWeele 2008))Conclusion:m1 is biased, m2 and m3 are unbiased, but m3 is more precise

21. InfluenceMeasures of influenceMar-23H.S.21would normally handle assumptions first

22. Mar-23H.S.22Influence idea (different data)delta beta*se=-6.8

23. Mar-23H.S.23Measures of influenceMeasure change in:Coefficients (beta)Delta beta (scaled by se(coeff))Remove obs 1, see changeremove obs 2, see changeOne delta-betaper observation(per covariate,the exposure)

24. Syntax:“Influence of outliers” Mar-23H.S.24

25. Delta-beta for gestational ageMar-23H.S.25dfbeta(gest) create delta-betascatter _dfbeta_1 id plot vs id-variableOBSvariable specificIf obs nr 370 is removed, beta will change 2 se’s=2*4.2≈8 gr

26. Mar-23H.S.26Removing outliersregress bw gest i.educ sex if id!=370est store drop1regress bw gest i.educ sex if id!=370 & id!=62est store drop2est table full drop1 drop2, b(%8.0f)Conclusion:Outlier 370 had a large effectOutlier 62 had a small effect on the “se”

27. AssumptionsMar-23H.S.27

28. Assumptions of the standard modelIndependent residualsNo interactionsLinear effectsConstant residual varianceMar-23H.S.28discusstest in modelplot, testadd splinesWhen will the birth weight of one child depend on the birth weight of another?Dependent residuals?Siblings, twins

29. Dependent residuals,vce(cluster var) or mixed modelsInteractionsAdd interaction termNon linear effectsAdd polynomial or splineMar-23H.S.29If violations of assumptionsNon-constant varianceUse robust variance estimationregress y x, robust

30. Interactiononly linear effectsMar-23H.S.30

31. InteractionInteraction: combined effect of two variablesExampley= b0+b1x+b2sex effect of x does not depend on sexy= b0+b1x+b2sex+ b3x∙sex effect of x depends on sex (interaction)Test Interaction if b3≠0ScaleLinear models additiveLogistic, Poisson, Cox multiplicative Interaction is scale dependentNo interaction on the additive scale implies interaction on other scales Mar-23H.S.31

32. Syntax“Interaction”Mar-23H.S.32

33. Interaction (only linear effects)Add interaction termsShow resultsMar-23H.S.33regress bw c.gest##i.sex i.educ main + gest-sex interactionmargins, dydx(gest) at(sex=0) effect of gest for boysmargins, dydx(gest) at(sex=1) effect of gest for girls

34. Non-linear effectsMar-23H.S.34

35. Smoothers in regressionsPolynomialsx, x2, x3Mar-23H.S.35SplinescubiclinearFractional polynomials (2 of 8)x-2, x-1, x-0.5 log(x), x0.5 x, x2, x3c1c2estimatesonly plots(Govindarajulu, Malloy et al. 2009, Binder, Sauerbrei et al. 2013, Kahan, Rushton et al. 2016)knotsyxyxPolynomials: globalSplines: local

36. Syntax“Linear effect”Using polynomialsMar-23H.S.36

37. 3.order polynomial3.order polynomialPlotMar-23H.S.37margins, at(gest=(27(1)42) sex=0.5 educ=2) predicted bw by gestmarginsplot plotregress bw c.gest##c.gest##c.gest i.educ sex regression with polynomialConclusion:Good fit.Clear deviations from linearity.

38. Summing up: non-linear effectsCapture non-linearities in continuous variable Categorize, lose precisionPolynomials or splinesContinuous exposure Replace by 3.order polynomial: good fitContinuous confounderKeep linear (unless non-linear in both exposure and outcome effect)Mar-23H.S.38

39. Syntax“Linear effect”Using SplinesMar-23H.S.39

40. Cubic splineCubic splinePlotTestMar-23H.S.40mkspline c=gest, cubic nknots(4) make spline with 4 knots (c1,c2,c3)gen igest=round(gest) integer values of gest margins, over(igest) predicted bw by gestmarginsplot plotregress bw c1 c2 c3 i.educ sex regression with splineest stats linear cs AICest store cs store estimates as csbetter fit

41. Cubic spline with given knotsCubic splineMar-23H.S.41mkspline c=gest, cubic knots(30 32 38 40)regress bw c1 c2 c3 i.educ sex regression with splineBetter fitat low gest

42. Linear splineLinear splinePlot (as before)Test (as before)Mar-23H.S.42mkspline l1 32 l2 38 l3=gest linear spline with knots at 32 and 38regress bw l1 l2 l3 i.educ sex regression with splineest store ls store estimates as lsbest fitDifferent from categorical gest!2.best fit

43. Summing up: non-linear effectsCapture non-linearities in continuous variable Categorize, lose precisionFractional polynomials or splines are betterContinuous exposure Replace by cubic spline: good fit, only plotReplace by linear spline: good fit, estimatesContinuous confounderKeep linear (unless non-linear in both exposure and outcome effect)Mar-23H.S.43

44. Constant residual varianceMar-23H.S.44

45. Mar-23H.S.45Test constant residual varianceConstant variance:PlotTestestat hettestrvfplotplot residual versus predicted (fitted)some heteroscedasticityor,compare se-swith and without “robust”

46. Syntax :“Constant residual variance”Mar-23H.S.46

47. Final modelLinear spline model with robust variance estimation: regress bw l1 l2 l3 i.educ sex, robustest store lsrMar-23H.S.47Conclusion:At 27-32 weeks the birth weight increases with 104 gr per weekAt 32-38 weeks the birth weight increases with 345 gr per weekAt 38-42 weeks the birth weight increases with 35 gr per weekestimatesseestimatesestimatessesePrefer the robust model

48. Correct model for effect of educationInterpret other covariate effects from the model?Mar-23H.S.48gestbweduceduc confounderadjustgest mediatornot adjustExposure:gestbweduc(Westreich and Greenland 2013)finaleducModgesteducConclusion:Effect of education is misleading in the final model.Need a separate model for each covariateTablesDAGs

49. HelpLinear regressionhelp regresssyntax and optionshelp regress postestimationdfbetaestat hettestrvfplotpredictmarginshelp factor variablesfactor variables and interactionsMar-23H.S.49

50. Summing up 1: Model fittingBuild modelregress bw gest crude modelest store m1 storeregress bw gest i.educ sex full modelest store m2est table m1 m2 compare coefficientsMar-23H.S.50

51. Summing up 2: AssumptionsIndependent residuals discussNo interactionregress bw3 c.gest##i.sex i.educ test interactionmargins, dydx(gest) at(sex=0) gest for boysLinear effectsmkspline g1 38 g2 linear splineregress bw g1 g2 i.sex i.educ estimate splinesConstant residual variancervfplot residual versus fittedregress …, robust robust varianceMar-23H.S.51

52. Summing up 3: Influence of outliersInfluencedfbeta(gest) delta-betascatter _dfbeta_1 id plot versus idMar-23H.S.52

53. ReferencesBinder H, Sauerbrei W, Royston P. 2013. Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: A simulation study with continuous response. Stat Med 32:2262-2277.Daniel R, Zhang J, Farewell D. 2020. Making apples from oranges: Comparing noncollapsible effect estimators and their standard errors after adjustment for different covariate sets. Biom J.Govindarajulu US, Malloy EJ, Ganguli B, Spiegelman D, Eisen EA. 2009. The comparison of alternative smoothing methods for fitting non-linear exposure-response relationships with cox models in a simulation study. Int J Biostat 5.Kahan BC, Rushton H, Morris TP, Daniel RM. 2016. A comparison of methods to adjust for continuous covariates in the analysis of randomised trials. BMC medical research methodology 16.Robinson LD, Jewell NP. 1991. Some surprising results about covariate adjustment in logistic-regression models. Int Stat Rev 59:227-240.Royston P, Altman DG, Sauerbrei W. 2006. Dichotomizing continuous predictors in multiple regression: A bad idea. Stat Med 25:127-141.VanderWeele TJ, Hernan MA, Robins JM. 2008. Causal directed acyclic graphs and the direction of unmeasured confounding bias. Epidemiology 19:720-728.Westreich D, Greenland S. 2013. The table 2 fallacy: Presenting and interpreting confounder and modifier coefficients. AJE 177:292-298.Xing C, Xing GA. 2010. Adjusting for covariates in logistic regression models. Genet Epidemiol 34:937-937.Mar-23H.S.53

54. Extra materialMar-23H.S.54

55. Test deviance from linearityregress y x1 x2 linear termestimates store linregress y f(x1) x2 smoother termestimates store smo f(x1)=poly or splineestimates table lin smo LR-test or AICMar-23H.S.55

56. Table 1Mar-23H.S.56Outcome: Birth weightExposure: Gestational ageCovariates:

57. Table 2Mar-23H.S.57(Westreich and Greenland 2013)Do not show coefficients from cofactors,they may be misleading