sAS 94 Andy Lin IDRE Statistical Consulting Background Regression models effects of IVs on DVs Eg does amount of time exercising predict weight loss Can also model effect of IV modified ID: 728473
Download Presentation The PPT/PDF document "Analyzing and visualizing interactions i..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Analyzing and visualizing interactions in sAS 9.4
Andy Lin
IDRE Statistical ConsultingSlide2
BackgroundRegression models effects of IVs on DVs.E.g. does amount of time exercising predict weight loss?Can also model effect of IV modified by another IV
moderating variable (MV)
e.g
is effect of exercise time on weight loss modified by the type of exercise?
Effect modification = interactionSlide3
BackgroundInteractions are products of IVsTypically entered with the IVs into regressionAll we get out of regression is a coefficientNot enough to understand interaction
What are the conditional effects?
Simple effects and slopes
Conditional interactionsSlide4
Purpose of seminarDemonstrate methods to estimate, test and graph effects within an interactionSpecifically we will use PROC PLM to:Calculate and estimate simple effectsCompare simple effects
Graph simple effectsSlide5
Main effects vs interaction modelsMain effects modelsIV effects constrained to be the same across levels of all other IVs in the model
Main effect of height is constrained to be the same across sexes
Average
of male and female
height effect
weight
=
β
0
+
βsSEX+βhHEIGHTSlide6
Main effects vs interaction modelsInteraction modelsAllow effect of an IV to vary with levels of another IVFormed as product of 2 IVs
Now the effect of height may vary between sexes
And effect of sex may vary at different heights
weight
=
β
0
+
β
s
SEX+βhHEIGHTSlide7
Simple effects and slopesFrom this equationWe can derive sex-specific regression equationsMales (sex=0)
Females
(sex=1)
Slide8
Simple effects and slopesEach sex has its own height effectMales (sex=0)Females
(sex=1)
These are the simple slopes of height within each group
Interaction coefficient is difference in simple slopesSlide9
PROC PLMWe use proc plm for most of our analysesProc plm performs post-estimation analyses and graphing
Uses and “item store” as input
Contains model information (coefficients and covariance matrices)
Item store created in other
procs
Inlcuding
glm
,
genmod
, logistic, phreg, mixed, glmmix, and moreSlide10
PROC PLMImportant proc plm statement used in this seminarEstimate statementForms linear combinations of coefficients and tests them against 0
Very flexible – linear combinations can be means, effects, contrasts, etc.
We use it to estimate and compare simple slopes
Syntax is a bit more difficultSlide11
PROC PLMImportant proc plm statement used in this seminarSlice statementSpecifically analyzes simple effects
Very simple syntax
Lsmestimate
statement
Compare estimated marginal means, i.e. calculate simple effects
More versatile than slice Slide12
PROC PLMLsmeans statementEstimates marginal means and can calculate differences between themEffectplot
Plots predicted values of the outcome across range of values on 1 or more predictors
Can visualize interactions
Many types of plotsSlide13
WHY PROC PLMMany of these statements found in regression procsWhy use PROC PLM?Do not have to rerun model as we run code for interaction analysis
These statements sometimes have more functionality in PROC PLMSlide14
Dataset used in seminarStudy of average weekly weight loss achieved by subjects in 3 exercise programs900 subjectsImportant variables:Loss – continuous, normal outcome – average weekly weight loss
Hours – continuous predictor – average weekly hours of exercise
Effort – continuous predictor – average weekly rating of exertion when exercising, ranging from 0 to 50 Slide15
Dataset used in the seminarImportant variables cont:Prog – 3-category predictor - which exercise program the subject followed, 1=jogging, 2=swimming, 3=reading (control)
Female
– binary predictor - gender
, 0=male, 1=female
Satisfied - binary outcome - subject’s overall satisfaction with weight loss due to participation in exercise program, 0=unsatisfied, 1=
satisifiedSlide16
Continuous-by-continuous: the modelWe first model the interaction of 2 continuous IVsThe effect of a continuous IV on the outcome is called a slopeExpresses change in outcome pre unit increase in IVWith the interaction of 2 continuous variables, the slope of each IV is allowed to vary with the other IV
Simple slopesSlide17
Continuous-by-continuous: the modelLet us look at model where Y is predicted by continuous X, continuous Z, and their interaction:Be careful when interpreting
β
x
and
β
z
They are simple effects (when interacting variable=0), not main effectsSlide18
Continuous-by-continuous: the modelThe coefficient βxz
is interpreted as the change in the simple slope of X per unit-increase in Z
Equation for simple slope of X:Slide19
Continuous-by-continuous: example modelWe regress loss on hours, effort, and their interactionIs the effect of hours modified by the effort that the subject exerts?And the converse – is effect of effort modified by hours?Slide20
Continuous-by-continuous: example model
proc
glm
data=exercise;
model loss =
hours|effort
/ solution;
store
contcont
;run;The “|” requests main effects and interactionssolution requests table of regression coefficientsstore contcont creates an item store of the model for proc plmSlide21
Continuous-by-continuous: example modelInteraction is significant
Remember that hours and effort terms are simple slopesSlide22
Continuous-by-continuous: calculating simple slopesEstimate statement used to form linear combinations of regression coefficientsIncluding simple slopes (and effects)Very flexible
Understanding the regression equation very helpful in coding estimate statementsSlide23
Estimate statement syntaxEstimate ‘label’ coefficient values / ee.g. to estimate expected loss when hours=2 and effort = 30
proc
plm
restore=
contcont
;
estimate '
pred
loss, hours=2, effort=30' intercept
1 hours 2 effort 30 hours*effort 60 / e;run;The regression coefficients are multiplied by their values and summed to form the estimate, which is tested against 0Slide24
We see that the values are correctAnd a test against 0 (not interesting here)Slide25
Continuous-by-continuous: calculating simple slopesLet’s revisit the formula for the simple slope of X moderated by ZIn the estimate statement, we will put a 1 after
β
x
and the value of z after
β
zx
In our model, X = hours and Z=effortSlide26
Continuous-by-continuous: calculating simple slopesWhat values of effort to choose to evaluate simple slopes of hoursTwo common choices:Substantively important values (education=12yrs, BMI=18, temperature = 98.6, etc.)
Data-driven values (mean,
mean+sd
, mean-
sd
)
There are no a priori important values of effort, so we choose
(mean,
mean+sd
, mean-
sd) = (26.66, 34.8, 24.52)Slide27
Continuous-by-continuous: calculating simple slopesproc
plm
restore=
contcont
;
estimate
'hours,
effort=mean-
sd
' hours 1 hours*effort 24.52, ‘ hours, effort=mean' hours 1 hours*effort 29.66, 'hours,
effort=
mean+sd
' hours
1
hours*effort
34.8
/ e;
run
;Slide28
Continuous-by-continuous: calculating simple slopesWe might be interested in whether those simple slopes are different, but we don’t need to test itWhy?If the moderator is continuous and interaction is significant then simple slopes will always be different
We demonstrate a difference to show thisSlide29
Continuous-by-continuous: calculating simple slopesTo get the difference between simple slopes, take the difference between values across coefficients in the estimate statement
hours
1
hours*effort
29.66
-
hours
1
hours*effort
24.52
hours 0 hours*effort 5.14Slide30
Continuous-by-continuous: calculating simple slopesCoefficients with 0 values can be omitted:
proc
plm
restore=
contcont
;
estimate 'diff slopes,
mean+sd
- mean' hours*effort
5.14;run;Same t-value and p-value as interaction coefficientSlide31
Continuous-by-continuous: graphing simple slopesWe use effectplot
statement in proc
plm
Plot predicted outcome across range of values of predictors
We will plot across range of 2 predictors to depict an interactionSlide32
Simple slopes as contour plotsproc
plm
source=
contcont
;
effectplot
contour (x=hours y=effort);
run
;Slide33
Simple slopes as contour plots
Contour plots uncommon
Nice that both continuous variables are represented continuously
Simple slopes of hours are horizontal lines across graph
The more the color changes, the steeper the slopeSlide34
Simple slopes as a fit plotproc
plm
source=
contcont
;
effectplot
fit (x=hours) / at(effort=
24.52
29.66 34.8);run;Effort will not be represented continuously, so we must specify values what we wantA separate graph will be plotted for each effortSlide35
Simple slopes as a fit plotMore easily understoodBut why not all 3 on one graph?Slide36
Creating a custom graph through scoringWe can make the graph ourselves by getting predicted loss values across a range of hours at the 3 selected effort values (24.52, 29.66, 34.8) by:Creating a dataset of hours and effort values at which to predict the outcome loss
Use the score statement in proc
plm
to predict the outcome and its 95% confidence interval
Use the scored dataset in proc
sgplot
to create a plot Slide37
Creating a custom graph through scoringdata
scoredata
;
do effort =
24.52
,
29.66
,
34.8
; do hours = 0 to 4 by 0.1; output; end;end;run;
proc
plm
source=
contcont
;
score data=
scoredata
out=
plotdata
predicted=
pred
lclm
=lower
uclm
=upper;
run
;
proc
sgplot
data=
plotdata
;
band x=hours upper=upper lower=lower / group=effort transparency=
0.5
;
series x=hours y=
pred
/ group=effort;
yaxis
label="predicted loss";
run
;Slide38
Creating a custom graph through scoring
Purty
!Slide39
Quadratic effect: the modelSpecial case of a continuous-by-continuous interactionInteraction of IV with itselfAllows the (linear) effect of the IV to vary depending on the level of the IV itselfModels a curvilinear relationship between DV and IV Slide40
Quadratic effect: the modelThe regression equation with linear and quadratic effects of continuous predictor X:β
x
is still interpreted as slope of X when X=0
β
xx
interpretation slightly different
Represents ½ the change in the slope of X when X increase by 1 unitSlide41
Quadratic effect: the modelTo get formula for simple slope of X, we must use partial derivative:Here we see that the slope of X changes by 2β
xx
per unit-increase in XSlide42
Quadratic effect: example modelWe regress loss on the linear and quadratic effect of hours
proc
glm
data=exercise order=internal;
model loss =
hours|hours
/ solution;
store quad;
run
;Slide43
Quadratic effect: example modelQuadratic effect is significantNegative sign indicates that slope becomes more negative as hours increases (inverted U-shaped curve)
Diminishing returns on increasing hoursSlide44
Quadratic effect: calculating simple slopesWe construct estimate statements for simple slopes in the same way as beforeBUT, we must be careful to multiply the value after the quadratic effect by 2We will put a 1 after
β
x
and the value of
2*x
after
β
xx
N
o a priori important values of hours, so we choose mean=2,
mean+sd=2.5, and mean-sd=1.5 Slide45
Quadratic effect: calculating simple slopesproc
plm
restore=quad;
estimate
'hours,
hours=mean-
sd
(1.5)' hours
1
hours*hours 3, 'hours, hours=mean(2)' hours 1 hours*hours 4, 'hours, hours=mean+sd
(2.5)' hours
1
hours*hours
5
/ e;
run
;
Slopes decrease as hours increase, eventually non-significantSlide46
Quadratic effect: comparing simple slopesDo not need to compareSignificance always same as interaction coefficientSlide47
Quadratic effect: graphing the quadratic effectThe “fit” type of effectplot is made for plotting the outcome vs a single continuous predictor
proc
plm
restore=quad;
effectplot
fit (x=hours);
run
;Slide48
Quadratic effect: graphing the quadratic effect
Diminishing returns apparent
Too many hours of exercise may lead to weight gainSlide49
Continuous-by-categorical: the modelWe can also estimate the simple slopes in a continuous-by-categorical interactionWe will estimate the slope of the continuous variable within each category of the categorical variableWe could also look at the simple effects of the categorical variable across levels of the continuous
First, how do categorical variables enter regression models?Slide50
Categorical predictors and dummy variablesA categorical predictor with k categories can be represented by k dummy variables
Each dummy codes for membership to a category, where 0=non-membership and 1=membership
However, typically only
k
-1 dummies are entered into the regression model?
Each dummy is a linear combination of all other dummies --
collinearity
Regression model cannot estimate coefficient for a collinear predictor Slide51
Categorical predictors and dummy variablesOmitted category known as the reference categoryAll effects of a categorical variable in the regression table are comparisons with reference groupSAS by default will use the last category as the referenceSlide52
Categorical predictors and dummy variablesSlide53
Interaction of dummy variables and continuous variableTo interact the dummy variables with a continuous predictor, multiply each one by the continuous variableAny interaction involving an omitted dummy will be omitted as well Slide54
Continuous-by-categorical: the modelHere is the regression equation for a continuous variable, X, interacted with a 3-category categorical predictor, M
β
x
is simple slope of X for M=3
β
m1
and
β
m2
are simple effects of M when X=0βxm1 and βxm2 represent differences in slopes of X when M=1 and M=2, and differences in simple effects of M per unit change in XSlide55
Continuous-by-categorical: the modelFormulas for simple slopesSlide56
Continuous-by-categorical: example modelWe regress loss on hours, prog (3-category) and their interaction
proc
glm
data=exercise order=internal;
class
prog
;
model loss =
hours|prog
/ solution;store catcont;run;Put prog on class statement to declare it categoricalUse order=internal to order prog by numeric value rather than formatsSlide57
Continuous-by-categorical: example model
Notice the 0 coefficients for reference groups
Interaction is significant overallSlide58
Continuous-by-categorical: calculating simple slopesHere are the formulas for our simple slopes again:
SAS will accept the first two formulas for estimates of the simple slopes in estimate statements
But the estimate statement for the slope of X when (M=3) REQUIRES the inclusion of the coefficient for the interaction X and (M=3), even though it is constrained to 0
We don’t normally need to calculate the slope in the reference group, nor compare to other slopes, so not usually a huge problemSlide59
Continuous-by-categorical: calculating simple slopes
proc
plm
restore =
catcont
;
estimate 'hours slope,
prog
=1 jogging' hours
1 hours*prog 1 0 0, 'hours slope,
prog
=2 swimming' hours
1
hours*
prog
0
1
0
,
'hours
slope,
prog
=3 reading' hours
1
hours*
prog
0
0
1
/ e;
run
;
Notice the inclusion of the zero coefficient in the estimate of the slope when M=3Slide60
Continuous-by-categorical: calculating simple slopesIncreasing hours increases weight loss in jogging and swimming, lessens loss in reading programNotice that last estimate appears in regression table as hours coefficientSlide61
Potential pitfallIf calculating a simple slope or effect, do not omit interaction coefficients Otherwise, SAS will average over those coefficientsLet’s pretend we forgot to include the 0 interaction coefficient in the estimation of the hours slope when M=3
proc
plm
restore =
catcont
;
estimate 'hours slope,
prog
=3 reading (wrong)' hours
1 / e;run;Slide62
Potential pitfallThe e option gives us the estimate coefficientsSAS applied values of .333 to all 3 interaction coefficients, averaging their effectsSlide63
Continuous-by-categorical: calculating simple slopesWe again take differences in values across coefficients to test differences in simple slopes:
hours
1 hours*
prog
1
0
0
-hours
1 hours*
prog 0 1 0 hours 0 hours*prog 1 -1 0Slide64
Continuous-by-categorical: calculating simple slopesproc
plm
restore =
catcont
;
estimate 'diff slopes,
prog
=1 vs
prog
=2' hours*prog -1 1 0, 'diff slopes, prog=1 vs
prog
=3' hours*
prog
-
1
0
1
,
'diff slopes,
prog
=2 vs
prog
=3' hours*
prog
0
-
1
1
/ e;
run
;
Slopes in
prog
=1 and
prog
=2 do not differ
Other 2 comparisons are regression coefficientsSlide65
Continuous-by-categorical: graphing slopesThe slicefit type of effectplot plots the outcome against a continuous predictor on the X-axis, with separate lines by a categorical predictor (typically, but can be continuous)
proc
plm
source=
catcon
;
effectplot
slicefit
(x=hours sliceby=prog) / clm;run;The option clm adds confidence limitsSlide66
Continuous-by-categorical: graphing slopes
Easy to see direction of effects, and that slopes in jogging and reading do not differSlide67
Categorical-by-categorical: the modelThe interaction of a categorical variables X with 2 categories and M with 3 produces 6 interaction dummiesAny interaction dummy formed by a omitted dummy will be omitted as well
4 of the 6 will be omitted because of
collinearitySlide68
Categorical-by-categorical: the modelSlide69
Categorical-by-categorical: the modelSlide70
Categorical-by-categorical: the modelRegression equation modeling the interaction of X and M
β
x
is simple
effect of X (X=0 vs X=1)
for M=3
β
m1
and
βm2 are simple effects of M when X=1βx0m1 and βx0m2 represent differences in effects of X when M=1 and M=2, or differences in effects of M when X=0Slide71
Think of simple effects as differences in expected meansSimple effects represent differences between the mean outcome of 2 groups that belong to different categories on one predictorFor instance, the simple effect of X when M=1 is the difference between the mean outcome when X=0,M=1 and the mean outcome when X=1,M=1Slide72
Simple effects expressed as differences in meansSlide73
Categorical-by-categorical: example model
proc
glm
data=exercise order=internal;
class
female
prog
;
model loss
= female|prog / solution;store catcat;run;Slide74
Categorical-by-categorical: example model
Interaction is overall significant
Lots of omitted coefficientsSlide75
Categorical-by-categorical: estimating simple effects with the slice statementSlice statement designed for simple effect estimationSyntax:slice interaction_effect
/
sliceby
=
diff
interaction_effect
is interaction to be decomposed
Sliceby
= specifies variable at whose distinct levels the simple effects of the other variable will be estimated
Diff produces numerical estimates of the simple effect, instead of just a test of significance (default)Slide76
Categorical-by-categorical: estimating simple effects with the slice statementproc
plm
restore =
catcat
;
slice
female*prog
/
sliceby
=prog diff adj=bon plots=none nof e means;slice female*prog / sliceby=female diff adj=bon plots=none
nof
e means;
run
;
Estimates both sets of simple effects
Bonferroni
adjustment due to multiple comparisons (
adj
=bon)
No plotting (hard to interpret and slow)
We suppress the somewhat redundant F-test “
nof
”
T
he means of each cell will be output with “means”Slide77
Categorical-by-categorical: estimating simple effects with the slice statement
All simple effects are significant
except males vs females
in reading program
So genders differ in other 2 programs
And programs differ within each genderSlide78
Estimating simple effects with the lsmestimate statementThe lsmestimate statement combines
lsmeans
and estimate statements
Used to estimate linear combinations of estimated (marginal) means
From a balanced population
Simple effects can be estimated through linear combinations of marginal meansSlide79
Estimating simple effects with the lsmestimate statementSyntax:lsmestimate effect [value,
level_x
level_m
]...
Effect is effect made up of only categorical predictors
Value is value to apply to mean in linear combination
l
evel_x
and
level_m are the ORDINAL levels of the categorical predictors defining target meanFor X=0 and X=1, specify 1 for X=0 and 2 for X=1Slide80
Estimating simple effects with the lsmestimate statementproc
plm
restore=
catcat
;
lsmestimate
female*
prog
'male-female, prog = jogging(1)' [1, 1 1] [-1, 2 1
],
'male-female,
prog
= swimming(2)' [
1
,
1
2
] [-
1
,
2
2
],
'male-female,
prog
= reading(3)' [
1
,
1
3
] [-
1
,
2
3
],
'jogging-reading
,
female
=
male(0
)' [
1
,
1
1
] [-
1
,
1
3
],
'jogging-reading
,
female
=
female(1
)' [
1
,
2
1
] [-
1
,
2
3
],
'swimming-reading
,
female
=
male(0
)' [
1
,
1
2
] [-
1
,
1
3
],
'swimming-reading
,
female
=
female(1
)' [
1
,
2
2
] [-
1
,
2
3
],
'jogging-swimming
,
female
=
male(0
)' [
1
,
1
1
] [-
1
,
1
2
],
'jogging-swimming
,
female
=
female(1
)' [
1
,
2
1
] [-
1
,
2
2
] / e
adj
=bon;
run
;Slide81
Estimating simple effects with the lsmestimate statement
Same estimates as slice statementSlide82
Comparing simple effects with the lsmestimate statementOnly the lsmestimate and not the slice statement can compare simple effects
To compare, place 2 simple effects on same row and reverse values for 1
[1, 1 1] [-1, 2 1]
-[1, 1 2] [-1, 2 2]
[
1, 1 1] [-1, 2 1
] [-1
, 1 2]
[1
, 2 2]Slide83
Comparing simple effects with the lsmestimate statementproc
plm
restore=
catcat
;
lsmestimate
prog
*female 'diff m-f, jog-swim’ [
1, 1 1] [-1, 2 1] [-1
,
1
2
] [
1
,
2
2
],
'diff m-f, jog-read' [
1
,
1
1
] [-
1
,
2
1
] [-
1
,
1
3
] [
1
,
2
3
],
'diff m-f, swim-read' [
1
,
1
2
] [-
1
,
2
2
] [-
1
,
1
3
] [
1
,
2
3
],
'diff jog-read, m - f' [
1
,
1
1
] [-
1
,
1
3
] [-
1
,
2
1
] [
1
,
2
3
],
'diff swim-read, m - f' [
1
,
1
2
] [-
1
,
1
3
] [-
1
,
2
2
] [
1
,
2
3
],
'diff jog-swim, m - f' [
1
,
1
1
] [-
1
,
1
2
] [-
1
,
2
1
] [
1
,
2
2
]/ e
adj
=bon;
run
;Slide84
Comparing simple effects with the lsmestimate statement
All differences are significant – although only one we already didn’t knowSlide85
Categorical-by-categorical: graphing simple effectsThe interaction type effectplot is used to plot the outcome vs two categorical predictors.
The connect option is used to connect the points
proc
plm
restore=
catcat
;
effectplot
interaction (x=female
sliceby=prog) / clm connect;effectplot interaction (x=prog sliceby=female) / clm connect;
run
;Slide86
Graph of simple gender effects
No effect of gender in the reading programSlide87
Graph of simple program effects
Effect of program seems stronger from femalesSlide88
3-way interactions: categorical-by-categorical-by-continuousInteraction of 3 predictors can be decomposed in many more ways than the interaction of 2.Imagine we interact 2-category X with 3-category M and continuous ZHow can we decompose this interaction?Slide89
3-way interactions, categorical-by-categorical-by-continuous: the modelWe can estimate the conditional interaction of X and Z across levels of MDo X and Z interact at each level of M?Are the X and Z interactions different across levels of M?
We can further decompose the conditional interactions of X and Z
What are the simple slopes of Z across X and simple effects of X across Z?Slide90
3-way interactions, categorical-by-categorical-by-continuous: the modelWe could then look at interaction of X and M across levels of ZDo X and M interact at various values of Z?Are these interactions different?
Within each conditional interaction of X and M, what are the simple effects of X and M?
We can also look at the interaction of M and Z across XSlide91
3-way interactions, categorical-by-categorical-by-continuous: the modelRegression equation can be intimidating
Single variable coefficients are still simple effects and slopes (but now for 2 reference levels each)
2-way interaction coefficients are conditional interactions (at reference level of 3
rd
variable)Slide92
3-way interactions: examplemodelWe regress loss on female (2-category),
prog
(3-category) and hours (continuous)
proc
glm
data = exercise order=internal;
class female
prog
;model loss = female|prog|hours / solution;store catcatcon;run;Slide93
3-way interactions: examplemodel
3-way interaction is significantSlide94
3-way interactions: examplemodel
Not very easy to interpret!Slide95
3-way interactions: simple slope focused analysisImagine our focus is estimating which groups benefit the most from increasing the weekly number of hours of exercise.This analysis is focused on the simple slopes of hoursWe approach this section by addressing questions the researcher might ask, starting with the lowest level and building upSlide96
What are the simple slopes of Z across levels of X and M?There are a total of 6 groups made up by X and M, and we can estimate the slope of hours in eachWe use estimate statements againPlace a 1 after the coefficient for the slope variable by itself (e.g. hours)
Place a 1 after each 2-way interaction coefficient involving the slope variable and either of the 2 factor groups (e.g. hours*(female=0) and hours*(
prog
=1))
Place a 1 after the 3-way interaction
coefficient
involving the slope variable and both of the factor groups (e.g. hours*(female=0,prog=1))Slide97
3-way interaction: estimating simple slopes using estimate statementproc
plm
restore=
catcatcon
;
estimate 'hours
slope, male
prog
=jogging' hours
1 hours*female 1 0 hours*prog 1 0 0 hours*female*
prog
1
0
0
0
0
0
,
'hours
slope, male
prog
=swimming' hours
1
hours*female
1
0
hours*
prog
0
1
0
hours*female*
prog
0
1
0
0
0
0
,
'hours
slope, male
prog
=reading' hours
1
hours*female
1
0
hours*
prog
0
0
1
hours*female*
prog
0
0
1
0
0
0
,
'hours
slope, female
prog
=jogging' hours
1
hours*female
0
1
hours*
prog
1
0
0
hours*female*
prog
0
0
0
1
0
0
,
'hours
slope, female
prog
=swimming' hours
1
hours*female
0
1
hours*
prog
0
1
0
hours*female*
prog
0
0
0
0
1
0
,
'hours slope, female
prog
=reading' hours
1
hours*female
0
1
hours*
prog
0
0
1
hours*female*
prog
0
0
0
0
0
1
/ e
adj
=bon;
run
;Slide98
3-way interaction: estimating simple slopes using estimate statement
Increasing number of weekly hours of exercise significantly increases weight loss in all groups except those in the reading program, where it decreases weight loss (not significantly for females in the reading program after
Bonferroni
adjustment)Slide99
Are the (X*Z) conditional interactions significant?We can now compare the simple slopes hours of between genders within each programThis is a test of whether hours and gender interact within each programAs always, we test differences in effects by subtracting values across coefficientsSlide100
Are the (X*Z) conditional interactions significant?proc
plm
restore=
catcatcon
;
estimate 'diff hours slope, male-female
prog
=1' hours*female
1
-1 hours*female*prog 1 0 0 -1 0
0
,
'diff
hours slope, male-female
prog
=2' hours*female
1
-
1
hours*female*
prog
0
1
0
0
-
1
0
,
'diff
hours slope, male-female
prog
=3' hours*female
1
-
1
hours*female*
prog
0
0
1
0
0
-
1
/ e
adj
=bon;
run
;
Males and female benefit differently from increasing the number of hours in jogging
and reading programs.
One of these interactions appears in the regression table? Which one?Slide101
Are the (X*Z) conditional interactions different?We can test if the conditional interactions are different from one anotherDo the way males and females benefit differently by increasing hours of exercise VARY between programs?
Take differences between conditional interactions
Notice only the 3-way interaction coefficient is leftSlide102
Are the (X*Z) conditional interactions different?proc
plm
restore=
catcatcon
;
estimate 'diff
diff
hours slope, male-female
prog
=1-prog=2' hours*female*prog 1 -1 0 -1 1 0,
'diff
diff
hours slope, male-female
prog
=1-prog=3' hours*female*
prog
1
0
-
1
-
1
0
1
,
'diff
diff
hours slope, male-female
prog
=2-prog=3' hours*female*
prog
0
1
-
1
0
-
1
1
/ e;
run
;
All of the comparisons are significant. The differential benefit from increasing exercise hours between genders differs between all 3 programs.Slide103
3-way interaction: graphing simple slopesWe need to partition our graphs by a third-variable now.We can use the plotby= option, to plot separate graphs across levels of a variable
proc
plm
restore=
catcatcon
;
effectplot
slicefit
(x=hours sliceby=female plotby=prog) / clm;run;Slide104
3-way interaction: graphing simple slopes
Easy to see slopes, differences between slopes, and interactionsSlide105
3-way interaction, simple effects focused analysisImagine instead we are more interested in gender differences across programs and at different hours of weekly exercise?Similar questions can be posedSlide106
What are the simple effects of X across M and Z?We use lsmestimate statements to estimate simple effects of female at each level of prog at the mean, mean-
sd
and
mean+sd
of hours
The “at” option allows us to specify hours
For this question we could use slice or
lsmestimateSlide107
Estimating the simple effects of X across M and Z using lsmestimate
proc
plm
restore=
catcatcon
;
lsmestimate
female*
prog
'male-female, prog=jogging(1) hours=1.51' [1, 1 1] [-1, 2 1
],
'male-female
,
prog
=swimming(2) hours=1.51' [
1
,
1
2
] [-
1
,
2
2
],
'male-female
,
prog
=reading(3) hours=1.51' [
1
,
1
3
] [-
1
,
2
3
] / e
adj
=bon at hours=
1.51
;
lsmestimate
female*
prog
'male-female,
prog
=jogging(1) hours=2' [
1
,
1
1
] [-
1
,
2
1
],
'male-female
,
prog
=swimming(2) hours=2' [
1
,
1
2
] [-
1
,
2
2
],
'male-female
,
prog
=reading(3) hours=2' [
1
,
1
3
] [-
1
,
2
3
] / e
adj
=bon at hours=
2
;
lsmestimate
female*
prog
'male-female,
prog
=jogging(1) hours=2.5' [
1
,
1
1
] [-
1
,
2
1
],
'male-female
,
prog
=swimming(2) hours=2.5' [
1
,
1
2
] [-
1
,
2
2
],
'male-female
,
prog
=reading(3) hours=2.5' [
1
,
1
3
] [-
1
,
2
3
] / e
adj
=bon at hours=
2.5
;
run
;Slide108
Estimating the simple effects of X across M and Z using lsmestimate Slide109
Are the conditional interactions significant?The overall test of each conditional interaction of female and program (at a fixed number of hours) involves tests of 2 coefficients (which are differences in simple effects), so must be tested with a joint F-testThe “joint” option on lsmestimate
performs a joint F-test Slide110
Are the conditional interactions significant?proc
plm
restore=
catcatcon
;
lsmestimate
female*
prog
'diff male-female,
prog=1 - prog=2, hours=1.51' [1, 1 1] [-1, 2 1
] [-
1
,
1
2
] [
1
,
2
2
],
'diff
male-female,
prog
=1 -
prog
=3, hours=1.51' [
1
,
1
1
] [-
1
,
2
1
] [-
1
,
1
3
] [
1
,
2
3
],
'diff
male-female,
prog
=2 -
prog
=3, hours=1.51' [
1
,
1
2
] [-
1
,
2
2
] [-
1
,
1
3
] [
1
,
2
3
] / e at hours=
1.51
joint;
lsmestimate
female*
prog
'diff male-female,
prog
=1 -
prog
=2, hours=2' [
1
,
1
1
] [-
1
,
2
1
] [-
1
,
1
2
] [
1
,
2
2
],
'diff
male-female,
prog
=1 -
prog
=3, hours=2' [
1
,
1
1
] [-
1
,
2
1
] [-
1
,
1
3
] [
1
,
2
3
],
'diff
male-female,
prog
=2 -
prog
=3, hours=2' [
1
,
1
2
] [-
1
,
2
2
] [-
1
,
1
3
] [
1
,
2
3
] / e at hours=
2
joint;
lsmestimate
female*
prog
'diff male-female,
prog
=1 -
prog
=2, hours=2.5' [
1
,
1
1
] [-
1
,
2
1
] [-
1
,
1
2
] [
1
,
2
2
],
'diff
male-female,
prog
=1 -
prog
=3, hours=2.5' [
1
,
1
1
] [-
1
,
2
1
] [-
1
,
1
3
] [
1
,
2
3
],
diff
male-female,
prog
=2 -
prog
=3, hours=2.5' [
1
,
1
2
] [-
1
,
2
2
] [-
1
,
1
3
] [
1
,
2
3
] / e at hours=
2.5
joint;
run
;Slide111
Are the conditional interactions significant?
Female and
prog
significantly interact at hours = 1.5, 2 and 2.5Slide112
3-way interaction: graphing simple effectsWe add the plotby= option to an interaction plot
proc
plm
restore=
catcatcon
;
effectplot
interaction (x=female
sliceby
=prog) / at(hours = 1.51 2 2.5) clm connect; run;Slide113
3-way interaction: graphing simple effects
Interaction more pronounced at lower numbers of hoursSlide114
Logistic RegressionBinary (0/1) outcomeOften defined as success and failureModels how predictors affect probability of the outcomeProbability, p, is transformed to logit in logistic regressionSlide115
Logit transformationLogit transforms probability to log-odds metricCan take on any value (instead of restricted to 0 through 1)Slide116
Logistic regression Logit of p (not p itself) is modeled as having a linear relationship with predictorsSlide117
Non-linear relationship between p and predictorsImagine simple logit model where estimate the log odds of p when X=0 and X=1:The difference between log odds estimate is:Remembering our logarithmic identity and the definition of odds:Slide118
Non-linear relationship between p and predictorsWe substitute and get:Which we then exponentiate
:Slide119
Odds ratiosExponentiated logistic regression coefficients are interpreted as odds ratios (ORs)By what factor is the odds changed per unit increase in the predictor
Or, what is the percent change in the odds
per unit increase in the predictor
Odds ratios are constant across the range of the predictor
Differences in probabilities are not
But ORs can be misleading without knowing the underlying probabilitiesSlide120
Logistic regression, categorical-by-continuous interaction: example modelWe model how the odds (probability) of satisfaction is predicted by hours of exercise, program and their interactionWe can create an item store in proc logistic for proc
plmSlide121
Logistic regression, categorical-by-continuous interaction: example modelproc
logistic
data = exercise descending;
class
prog
/
param
=
glm
order=internal;model satisfied = prog|hours / expb;store logit;run;descending tells SAS to model probability of 1 instead of 0, the defaultparam=glm ensures we use dummy coding (rather than effect coding, the default)expb
exponentiates
the regression
coeffients
– although not all are interpreted as odds ratiosSlide122
Logistic regression, categorical-by-continuous interaction: example model
Interaction is significantSlide123
Logistic regression cat-by-cont, calculating and graphing simple ORsThe simple slope of hours in each program yields an odds ratio when exponentiated
We use the
oddsratio
statement within proc logistic to estimate these simple odds ratios
A nice odds ratio plot is produced by defaultSlide124
Logistic regression cat-by-cont, calculating and graphing simple ORs
proc
logistic
data = exercise descending;
class
prog
/
param
=
glm order=internal;model satisfied = prog|hours / expb;oddsratio hours / at(prog=all);store logit;run;
The at(
prog
=all) option requests that
oddsratio
for hours be calculated at each level of
progSlide125
Logistic regression cat-by-cont, calculating and graphing simple ORs
Increasing weekly hours of exercise increases odds of satisfaction in jogging and swimming groupsSlide126
Simple odds ratios can be compared in estimate statementsThis code produces the simple odds ratios in an estimate statement
proc
plm
restore=logit;
estimate 'hours OR,
prog
=1' hours
1
hours*
prog 1 0 0, 'hours OR, prog=2' hours 1
hours*
prog
0
1
0
,
'hours
OR,
prog
=3' hours
1
hours*
prog
0
0
1
/ e
exp
cl;
run;
This code compares them
proc
plm
restore=logit;
estimate 'ratio hours OR,
prog
=1/
prog
=2' hours*
prog
1
-
1
0
,
'ratio
hours OR,
prog
=1/
prog
=3' hours*
prog
1
0
-
1
,
'ratio
hours OR,
prog
=2 /
prog
=3' hours*
prog
0
1
-
1
/ e
exp
cl;
run
;Slide127
Simple odds ratios can be compared in estimate statements
The
exponentiated
differences between simple slopes (
exponentiated
interaction coefficient) yields a ratio of odds ratios
ORjog
/
ORswim
= Ratio of ORs
4.109/5.079 = .809Slide128
Predicted probabilitiesOdds ratios summarize the effects of predictors in 1 number, but can be misleading because we don’t know the underlying probabilitiesE.g. OR for p=.001 and p=.003 is the same OR for p=.25 and p=.5Good idea to get sense of probabilities of outcome across groupsSlide129
The lsmeans statement for predicted probabilitiesThe lsmeans statement is used to estimate marginal means
The
ilink
option allows transformation back the original response metric (here probabilities)
The at option allows specification of continuous covariates for estimation of meansSlide130
The lsmeans statement for predicted probabilitiesproc
plm
source = logit;
lsmeans
prog
/ at hours=
1.51
ilink plots=none;lsmeans prog / at hours=2 ilink plots=none;lsmeans
prog
/ at hours=
2.5
ilink
plots=none;
run
;Slide131
The lsmeans statement for predicted probabilities
Predicted probabilities are the column “Mean”Slide132
Graphs of predicted probabilitiesThe effectplot statement by default plots the outcome in its original metricWe can get an idea of the simple effects and simple slopes in the probability metric with 2
effectplot
statements
proc
plm
restore=logit;
effectplot
interaction (x=
prog
) / at(hours = 1.51 2 2.5) clm;effectplot slicefit (x=hours
sliceby
=
prog
) /
clm
;
run
; Slide133
Graphs of predicted probabilitiesSlide134
Graphs of predicted probabilitiesSlide135
Concluding guidelinesGuidelines for using estimate statement to estimate simple slopesAlways put a 1 after the coefficient for slope variableIf interacted with continuous IV (not quadratic), put value of continuous IV after interaction coefficient
If interacted with categorical, put a 1 after relevant interaction dummy
If interacted with 3 way, make sure to include:
the coefficient alone
both 2-way coefficients involving slope and either interactor
The 3-way coefficient involving all interactors
Follow the second rule above if interaction involves continuous (unless both are continuous, in which case apply the product of the 2 continuous interactors)
Follow the third rule if the interaction involves only dummy variables
To estimate differences, subtract values across coefficients
Use “e” to check values and coefficients
Use “joint” to perform a joint F-testUse adj= to correct for multiple comparisonsUse exp to exponentiate estimates (for logistic and other non-linear models)Slide136
Concluding guidelinesGuidelines for using lsmestimate statement to estimate simple effectsThink of simple effects as differences between means
Assign one mean the value 1 and the other -1
Remember to use ordinal values for categorical predictors, not the actual numeric values
To compare simple effects, put two effects on the same row and reverse the values for one of them
Use joint for joint F-tests
Use
adj
= for multiple comparisonsSlide137
It’s over!Thank you for attending!