RxC Contingency Tables Pearson Chi square test of association Stratified 2x2 tables Cochran Mantel Haenszel test of association Breslow Day test of interaction Simple Logistic Regression Modeling dichotomous outcomes ID: 784951
Download The PPT/PDF document "Lecture 5 Agenda Basic Contingency Table..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Lecture 5
Slide2Agenda
Basic Contingency Table Analysis
RxC
Contingency Tables
Pearson Chi square test of association
Stratified 2x2 tables
Cochran Mantel
Haenszel
test of association
Breslow Day test of interaction
Simple Logistic Regression
Modeling dichotomous outcomes
Odds Ratios and Logistic Regression
Slide3Western Collaborative Group Study (WCGS)
Large scale prospective cohort study designed to examine risk factors for cardiovascular disease.
Main outcome is coronary heart disease (chd69)
1 = yes, 0 = no
Primary risk factor is personality type (
dibpat
)
1 = type A, 0 = type B
Other risk factors collected as well:
Blood pressure, cholesterol, smoking, age,
arcus
senilis
Slide4Contingency table analysis
Historical precursor to logistic regression models.
What variables are associated with coronary heart disease?
Examples
:
Arcus
senilis
(1 = present, 0 = not present)
Cholesterol (1 = < 200, 2 = 200 to 240, 3 = > 240)
Smoking (1 = 0 cigarettes/day, 2 = 1-10, 3= 11-20, 4= > 20)
Slide5Chi-square Test of Independence
Test of association between categorical variables based on Pearson Chi-square statistic.
r = # of rows (levels of variable 1)
c = # of columns (levels of variable 2)
Compares observed cell count (
O
ij
) to cell count that would be observed if the variables were independent (
Eij). H0: variables are independent ↔ no associationH1: variables are not independent ↔ association
Slide6Basic 2 x 2 Table
Is
arcus
senilis
associated with CHD?
H
0
: arcus senilis and CHD are independentH1: arcus senilis and CHD are dependent
Chi-square
test with 1
d.f.
Reject H0: conclude that CHD and arcus are not independent.
Slide7Example: 2 x 2 Table
Odds ratio interpretation
Odds of CHD are 1.63 times higher for those with presence of arcus
senilis
compared to those without presence.
Relative risk interpretation
Risk (probability) of CHD is 1.56 times higher for those with presence of arcus
senilis
compared to those without presence.
Use
relrisk
option in
proc freq for this table.
Slide8Stratified 2x2 Table
Arcus
senilis
is a condition associated with fatty deposits in the eye.
Also caused by widening of the eye vessels with age which makes it easier for fat to deposit.
May be of interest, therefore, to control for cholesterol and age.
Stratification allows us to divide a 2x2 table with respect to a third (and possibly 4
th
variable).
Slide9Stratified 2x2 Table
Below we stratify by cholesterol group
Chol < 200, 200 <=
chol
< 240,
chol
> 240
The Cochran Mantel
Haenszel test allows us to compute the OR and RR adjusting for the stratifying variable (cholesterol group). Use proc freq with cmh option to calculate the adjusted analysis statistics.
Slide10Stratified 2x2 Table
To stratify by cholesterol group, add it to table statement in the order below.
Adjusted test of association in Cochran-Mantel-
Hanszel
statistics table.
H
0
: zero association between arcus and CHD after adjusting for cholesterol group
H1: nonzero association between arcus and CHD after adjusting for cholesterol group
Slide11Stratified 2x2 Table
Since p = .004 < .05, reject H0 and conclude there is enough evidence to suggest arcus and CHD are associated after adjusting for cholesterol.
All three statistics above are the same for stratified 2x2 tables.
Slide12Stratified 2x2 Table
Adjusted odds ratio and relative risk suggest elevated risk of CHD with presence of arcus.
Effect is a little smaller than the unadjusted effect.
After adjusting for cholesterol group, odds of CHD are 1.47 (1.13, 1.93) times higher for arcus vs. no arcus
After adjusting for cholesterol group, risk of CHD is 1.42 (1.12, 1.80) times higher for arcus vs. no arcus
Slide13Stratified 2x2 Table
In order to conduct the previous analysis, we must assume that the OR (or RR) is the same in all strata.
Breslow Day Test of interaction between stratum and group
H
0
: OR
1
= OR
2H1: OR1 ≠ OR2Breslow Day test p-value = .8288 > .05. Fail to reject H0 and conclude that we cannot reject the hypothesis of equal odds ratios across strata.
Slide14Stratified 2x2 Table: Flowchart
Breslow Day test of Interaction
p < . 05
p ≥ . 05
Interpret the common OR
or
RR
Report the CMH test of association
Interpret the OR (or RR) separately within each stratum
Report the test of association for each stratum
Slide15More General Contingency Tables
Cholgrp
= 1,2, or 3 based on categorization of a subject’s cholesterol level.
Is cholesterol group associated with CHD?
H
0
: cholesterol group and CHD are independent
H
1: cholesterol group and CHD are dependent
Slide16Example: 3 x 2 table
Reject H
0
– cholesterol group is not independent of coronary heart disease.
Summarize this by calculating odds ratios relative to reference group.
OR 2 vs. 1 = (84/1121)/(31/800) = 1.93
Odds of CHD are twice as high for
cholgrp
= 2 vs. 1
OR 3 vs. 1 = (142/964)/(31/800) = 3.80
Odds of CHD are 3.8 times higher for
cholgrp
= 3 vs. 1
Slide17Example: 3 x 4 table
Is cholesterol group associated with CHD type?
H
0
: cholesterol group and CHD type are independent
H
1
: cholesterol group and CHD type are dependent
Slide18Contingency Table Analysis
Contingency table analysis is useful for descriptive purposes.
Some limitations
As dimensions increase it becomes harder to summarize the
direction
of association
Cannot estimate association of categorical and continuous variables (must categorize)
Multivariable modeling beyond one or two stratifying variables is cumbersome
Slide19Logistic Regression
Linear Regression describes how the mean of a continuous outcome is affected by independent predictors.
Mean is directly related to covariates.
A dichotomous outcome (Y) takes on values of either 0 or 1 with probabilities 1-p and p respectively.
Logistic Regression
Mean of a dichotomous outcome is equal to the probability of a “positive” outcome: p = P(Y=1).
μ
= 1*p + 0*(1-p) = p
Cannot use linear regression.
0 ≤ p ≤ 1
Linear regression may estimate p > 1 or p < 0 since there is no constraint
Slide21Logistic Regression
Instead of directly modeling p, we model the log-odds of positive outcome.
Parameters can be interpreted in terms of odds ratios.
Logistic Regression
Re-expressing previous equation in terms of p, we can see that 0 < p < 1 is guaranteed.
)
Logistic Regression
β
j
is the adjusted log-odds ratio comparing unit differences in x
j
Slide24Example:
Arcus
Senilis
vs. CHD
Using logistic regression define:
1 if subject
i has arcus senilis presentX = 0 if subject i does not have arcus senilis present
The model then defines log-odds of CHD as:
Slide25Logistic Regression
OR = e
.4918
= 1.635
Interpretation:
Odds of CHD 1.63 times higher among subjects with
arcus
compared to those without
arcus
.
63% increased odds of CHD among subjects with
arcus.Statistically significant (p=.0002)
Slide26Logistic Regression
Important considerations in
proc
logistic
Descending:
defines numerator of odds to be P(Y=1)
Descending
Odds = p/(1-p) Param=ref use indicator variables for categorical independent variables. Ref= last sets last alphanumeric category as reference groupRef = first sets first alphanumeric category as reference group
Slide27Logistic Regression
Polychotomous
Predictors
Pick a baseline (reference) group
Set up a series of indicators for all other groups
Should have k-1 Odds ratios comparing each group to baseline group.
Continuous Predictors
Compute Odds for two levels that differ by 1-unit, odds ratio is then the
exponentiated coefficient for the predictor. Odds Ratio comparing c-unit differences in the predictor is ORc where OR is the 1-unit odds ratio. Confidence interval for c-unit OR comparisons are 1-unit endpoints raised to the cth
power.
Slide28Example
: Cholesterol Group vs. CHD
Cholgrp
2 vs. 1
1.933 = exp(.6592)
Odds of CHD for cholesterol between 200 and 240 is approximately twice as high as odds of CHD for cholesterol < 200 (statistically significant p = .0022).
Cholgrp
3 vs. 1 3.80 = exp(1.3351) Odds of CHD for cholesterol > 240 is 3.8 times higher than odds of CHD for cholesterol < 200 (statistically significant p < .0001).
Slide29Example
: Cholesterol vs. CHD
1-unit OR = exp(.0124) = 1.0125
Odds of CHD are 1.2% higher per unit increase in total cholesterol (statistically significant p < .0001)
30-unit OR = exp(30*.0124) = 1.0125^30 = 1.45
Odds of CHD are 45% higher per 30 unit increase in total cholesterol (statistically significant p < .0001)
Slide30Multivariable Logistic Regression
Basic principles of covariate adjustment and effect modification from linear regression carry over to logistic regression.
Use additional covariates to:
Control for potential confounders, other covariates
Build a stronger predictive model for the outcome
Describe effect modification (interaction)
Slide31Example
: Adjust for Categorical Confounder
Suppose we now want to adjust for cholesterol group.
Arcus
OR = 1.475 = exp(.3888)
Odds of CHD among subjects with presence of
arcus
senilis is approximately 50% higher than among subjects without presence adjusting for cholesterol group statistically significant (p=.0042).
Slide32Example
: Adjust for Categorical Confounder
For cholesterol group, we must first look at the type III p-value to determine overall significance.
Overall joint effect of cholesterol group is significant (p < .0001).
OR
cholgrp
2 vs. 1 = 1.884 = exp(.6332)
significant (p=.0033)
OR cholgrp 3 vs. 1 = 3.563 = exp(1.2706) significant (p < .0001
)
Slide33Example
: Adjust for Categorical Confounder
Is cholesterol group a potential confounder?
Adjusted estimate = .3888
Unadjusted estimate = .4918
% change = (.3888-.4918)/.3888 = 26.5%
Suggests it is important to adjust for cholesterol group.
Slide34Example
: Adjust for Continuous Confounder
Adjusting for cholesterol as a continuous covariate is an alternative option.
Requires assumption that log-odds is linearly associated with cholesterol.
Odds of CHD among subjects with presence of
arcus
senilis
is approximately 44% higher than among subjects without presence adjusting for cholesterol level. Effect is significant (p=.0076)
Slide35Example
: Adjust for Continuous Confounder
Is cholesterol group a potential confounder?
Adjusted estimate = .3658
Unadjusted estimate = .4918
% change = (.3658-.4918)/.3658 = 34.4%
Suggests it is important to adjust for cholesterol.
Slide36Example
: Adjust for Multiple Confounders
Now suppose we wish to adjust for cholesterol, age, and smoking status (1=
ncigs
> 0, 0 =
ncigs
=0).
Slide37Example
: Adjust for Multiple Confounders
Arcus
% change = (.1699-.3888)/.1699 = 128.8%
Important to adjust for these covariates.