Jinxia Ma November 7 2013 Contents What are robust methods Why robust methods How to conduct the robust methods analysis Apply robust analysis to your data What are robust methods Robust statistics ID: 928143
Download Presentation The PPT/PDF document "Overview of Robust Methods Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Overview of Robust Methods Analysis
Jinxia
Ma
November 7, 2013
Slide2Contents
What are robust methods
Why robust methods
How to conduct the robust methods analysis
Apply robust analysis to your data
Slide3What are “robust methods”?
Robust statistics
are
statistics
with good performance for data drawn from a wide range of
probability distributions
, especially for distributions that are not
normally distributed
.
Outliers
Departures from parametric distributions
Slide4Why robust methods?
What’s the problem of standard methodologies?
Example: Linear regression assumptions
Linearity
Independence of errors
Errors are normally distributed
Homoscedasticity
Example: comparing groups (ANOVA F-test)
Errors have a common variance,
normally
distributed and independent
Slide5Why robust methods?
Example: Detecting differences among groups
Problem 1: Heavy-tailed distributions
Figure 1: Despite the obvious similarity between the standard normal and contaminated normal distributions, the standard normal has variance 1 and the contaminated normal has variance 10.9.
Slide6Why robust methods?
Example: Detecting differences among groups
Problem 1: Heavy-tailed distributions
Figure 2: Left panel, power = 0.96. Right panel, power = 0.28.
(n= 25 per group, Student’s T test.
Slide7Why robust methods?
Example: Detecting differences among groups
Problem 1: Heavy-tailed distributions
Figure 3: Left panel, a bivariate normal distribution,
corr
= .8.
Middle panel, a bivariate normal distribution,
corr
= .2.
Right panel, one marginal distribution is normal, but the other is a contaminated normal, corr = .2.
Correlation = .8 Correlation = .2 Correlation = .2
Slide8Why robust methods?
Example: Detecting differences among groups
Problem 2: Assuming normality via the central limit theorem
Figure 4: The distribution of Student’s T, n=25, when sampling from a (standard) lognormal distribution. The dashed line is the distribution under normality.
For real Student’s T: P(T<=-2.086)=P(T>=2.086)=.025, E(T)=0.
For “Lognormal T”: P(
T<=-2.086
)=.12, P(T>=2.86)=.001, E(T)=-.54.
Slide9Why robust methods?
Example: Detecting differences among groups
Problem 3:
Heteroscedasticity
The third fundamental insight is that violating the usual homoscedasticity assumption (i.e. the assumption that all groups are assumed to have a common variance), is much more serious than once thought. Both relatively poor power and inaccurate confidence intervals can result.
Slide10How to test/compare robust methods?
Example: Comparing dependent groups with missing values: an approach based on a robust method
1: Simulation
2: Bootstrap
Slide11How to test/compare robust methods?
Example: Comparing dependent groups with missing values: an approach based on a robust method
1: Simulation
g-and-h distribution
Let Z be a random variable generated from a standard normal distribution, then W has a g-and-h distribution.
Slide12How to test/compare robust methods?
Example: Comparing dependent groups with missing values: an approach based on a robust method
1: Simulation
g-and-h distribution
g=h=0, standard normal
G>0, skewed; the bigger the value of g, the more skewed.
H>0, heavy-tailed; the bigger the value of h, the more heavy-tailed.
Slide13How to test/compare robust methods?
1: Simulation
g-and-h distribution
Slide14How to test/compare robust methods?
2
: Bootstrap (B = 2000)
Slide15Robust solutions
Alternate Measures of Location
One way of dealing with outliers is to replace the mean with alternative measures of location
Median
Trimmed mean
Winsorized
mean
M-estimator
Slide16Robust solutions
Transformations
A simple way of dealing with
skewness
is to transform the data.
Logarithms
Simple transformations do not deal effectively with outliers
The resulting distributions can remain highly skewed
Slide17Robust solutions
Nonparametric regression
Sometimes called smoothers.
I
magine
that in a regression situation the goal is to estimate the mean of
Y
, given that
X
=6, based on
n pairs of observations. The strategy is to focus on the observed X values close to 6 and use the corresponding Y values to estimate the mean of Y. Typically, smoothers give more weight to Y values for which the corresponding
X
values are close to 6. For pairs of points for which the
X
value is far from 6, the corresponding
Y
values are ignored.
Slide18Robust solutions
Robust measures of association
Use some analog of Pearson’s correlation that removes or down weights outliers
Fit a regression line and measure the strength of the association based on this fit.
Slide19Practical Illustration of Robust Methods
Analysis of a lifestyle intervention for older adults
N=364
This trial was conducted to compare a six-month lifestyle intervention to a no treatment control condition
Outcome variables: (a) eight indices of health-related quality of life; (b) depression; (c) life satisfaction.
Preliminary analysis revealed that all outcome variables were found to have outliers based on boxplots.
Slide20Practical Illustration of Robust Methods
Analysis of a lifestyle intervention for older adults
Figure 5: The median regression line for predicting physical function based on the number of session hours (R function:
qsmcobs
).
r
=.178 (p=.001). However, the association appears to be non-linear.
Slide21Practical Illustration of Robust Methods
Analysis of a lifestyle intervention for older adults
Figure 6: The median regression line for predicting physical composite based on the number of session hours (R function:
qsmcobs
).
For 0 to 5 hours, r=-.071 (p=.257).
For 5 hours or more, r=.25 (p=.045).
Slide22Practical Illustration of Robust Methods
Analysis of a lifestyle intervention for older adults
Table: Measures of association between hours of treatment and the variables listed in column 1 (n = 364).
r
w
* = 20%
Winsorized
correlation
Slide23Practical Illustration of Robust Methods
Analysis of a lifestyle intervention for older adults
Table 2: P-values when comparing ethnic matched group patients to a non-matched group.
Welch’s test: dealing with
heteroscedasticity
Yuen’s test: based on trimmed means
No single method is always best.
Slide24Software
R:
www.r-project.org
www.rcf.usc.edu/~rwilcox
Example: comparing two groups
>
x1=
read.table
(file=“ ”)
>
x2=read.table(file=“ ”)> x<-list(x1,x2)> lincon
(
x,tr
=0.2,alpha=0.05)
Lincon
is a
heteroscedastic
test of d linear contrasts using trimmed means.
Slide25No single method is always best.
Slide26Thank you!