Lecture 7 Nonnormality and outliers Normally distributed data Many of the statistical tests we will study rely on the assumption that the data were sampled from a normal distribution How reasonable is this assumption ID: 617878
Download Presentation The PPT/PDF document "BMS 617" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
BMS 617
Lecture 7 – Non-normality and outliersSlide2
Normally distributed data
Many of the statistical tests we will study rely on the assumption that the data were sampled from a normal distribution
How reasonable is this assumption?
The normal distribution is an ideal distribution that likely never exists in realityIncludes arbitrarily large values and arbitrarily small (negative) valuesHowever, simulations show that most tests that rely on the assumption of normality are robust to deviations from the normal distribution
Marshall University School of MedicineSlide3
The ideal normal distribution
Marshall University School of Medicine
Image shows data sampled from a theoretical normal distribution
Uses a very large sample size
Close approximation to theoretical distributionSlide4
Samples from a normal distribution
Marshall University School of MedicineSlide5
Tests for normality
It is possible to perform tests to see if the sample data are consistent with the assumption that they were sampled from a normal distribution
Unfortunately, this is not what we really want to know…
Would really like to know if the distribution is close enough to normal for the test we use to be usefulMarshall University School of MedicineSlide6
Tests for normality
A test for normality is a statistical test for which the null hypothesis is
The data were sampled from a normal distribution
Common normality tests includeD’Agostino-Pearson omnibus K2 normality testShapiro-Wilk test
Kolmogorov-Smirnov test
Marshall University School of MedicineSlide7
D’Agostino
-Pearson omnibus K2 normality test
The
D’Agostino-Pearson omnibus K2 normality test works by computing two values for the data set:The skewness, which measures how far the data is from being symmetricThe kurtosis
, which measures how sharply peaked the data is
The test then combines these to a single value that describes how far from normal the data appear to lie
Computes a p-value for this combined value
Marshall University School of MedicineSlide8
Problem with normality tests
If the p-value for a normality test is small, the interpretation is:
If the data were sampled from an ideal normal distribution, it is unlikely the sample would be this skewed and/or
kurtoticIf the p-value for a normality test is large, then the data are not inconsistent with being sampled from a normal distributionHowever…
If the sample size is large, it is possible to get a small p-value even for small deviations from the normal distribution
Data are likely sampled from a distribution that is close to, but not exactly, normal
If the sample size is small, it is possible to get a large p-value even if the underlying distribution is far from normal
Data do not provide sufficient evidence to reject the null hypothesis…
Useful to examine the values for
skewness
and kurtosis as well as the p-value
Marshall University School of MedicineSlide9
Skewness and kurtosis
Marshall University School of MedicineSlide10
Interpreting skewness
and kurtosis
The real question we would like to answer is
How much skewness and kurtosis are acceptable?Difficult to answer…In general, interpret a skewness between -0.5 and 0.5 as being
approximately symmetric
Between -1.0 and -0.5, or 0.5 and 1.0 is
moderately skewed
Less than -1.0 or more than 1.0 is
highly skewed
For kurtosis, values between -2 and 2 are generally accepted as being “within limits”
Outside this is evidence the distribution is far from normal
Marshall University School of MedicineSlide11
What to do if the data fail a test for normality
If the data fail a test for normality, the following options are available
Can the data be transformed to data that come from a normal distribution?
For example, if the data are negatively skewed, transforming to logs may give normally distributed dataAre there a small number of outliers that are causing the data to fail a normality test?Next section discusses outliers
Is the departure from normality small? I.e. are the
skewness
and kurtosis “small”. If so, your statistical tests may still be accurate enough
Use a test that does not assume a normal distribution (a
non-parametric test
)
Marshall University School of MedicineSlide12
Non-parametric tests
T
he most common statistical tests assume the data are sampled from a normal distribution
T-tests, ANOVA, Pearson correlation, etcSome other tests do not make this assumptionMann-Whitney test, Kruskal-Wallis test, Spearman correlation,
etc
However, these tests have (much) lower statistical power than their parametric equivalents when the data are normally distributed
Marshall University School of MedicineSlide13
Choosing non-parametric tests
When running a series of similar experiments, all data should be analyzed the same way
Use normality tests to choose the statistical test for all experiments together
Following “common practice” is acceptable…Ideally, run one experiment just to determine whether the data look like they come from a normal distributionFor small data setsA test for normality does not tell you much
Not likely to get a small p-value anyway
Violations of the normality assumption are more egregious
Non-parametric tests have very low statistical power
Marshall University School of MedicineSlide14
The Mann-Whitney Test
The Mann-
W
hitney test is the non-parametric equivalent of the unpaired T-testUse when you want to compare a variable between two groups, but you have reason to believe the data is not sampled from a normally-distributed populationMarshall University School of MedicineSlide15
How the Mann-Whitney Test works
The Mann
-
Whitney test works as follows:Compute the rank for all values, regardless of which group they come from Smallest value has a rank of 1, next smallest has a rank of 2,
etc.
Choose
one group: for each data point in that group, count the number of data points in the other group which are smaller
Sum these values, and call the sum U
1
Similarly
compute U
2
, or use the fact that U
1
+U2=n1n
2 Let U=min(U1,U2)The distribution of U under the null hypothesis is
known, so software can compute a p-valueMarshall University School of MedicineSlide16
Pros and cons of non-parametric tests
Pros of non-parametric
tests:
Since non-parametric tests do not rely on the assumption of normally-distributed populations, they can be used when that assumption fails, or cannot be verified Cons of non-parametric tests:If the data really do come from normally-distributed populations, the non-parametric tests are less powerful than their parametric counterparts
i.e. they will give higher p-
values
For
small sample sizes, they are much less powerful:
Mann-Whitney p-values are always greater than 0.05 if the sample size is 7 or
fewer
Nonparametric
Tests typically do not compute confidence intervals
Can sometimes be computed, but often require additional assumptions
Non
-parametric tests are not related to regression models Cannot be extended to account for confounding variables using multiple regression techniques
Marshall University School of MedicineSlide17
Choosing between parametric and non-parametric tests
The choice between parametric and non-parametric tests is not straightforward
A
common, but invalid, approach is to use normality tests to automate the choice The choice is most important for small data sets, for which normality tests are of limited use Using the data set to determine the statistical analysis will underestimate p-
values
If
data fail normality tests, a transformation may be appropriate
The most "honest" approach is to perform in independent experiment with a large sample to test for normality, and then design the experiment in hand based on the results of this
This is almost always
impractical
For
well-used experimental designs, an almost-equivalent approach is to follow customary procedure
Essentially assuming this has been carried out in some way already
Marshall University School of MedicineSlide18
How much difference does it make?
The central limit theorem ensures that parametric tests work well with non-normal distributions if the sample is large enough
How large is large enough
?Depends on the distribution!For most distributions, sample sizes in the range of dozens will remove any issues with normality
You will still increase your statistical power by using a transformation if appropriate
Conversely
, if the data really come from a normally-distributed population and you choose a non- parametric test, you will lose statistical power
For large samples, however, the difference is minimal
Small
samples present problems:
Non-parametric tests have very little power for small
samples
Parametric
tests can give misleading results for small samples if the population data are non-
normalTests
for normality are not helpful for small samples Marshall University School of MedicineSlide19
Conclusions
The bottom-line conclusion is that large samples are better than small samples
In
general, the larger the better Of course, it can be prohibitively time consuming and/or expensive to analyze large samplesIf your experimental design is going to use a small sample, you need to be able to justify the data come from a normally distributed population If this is a common experimental design that is conventionally analyzed this way, that may be good
enough
For
a new methodology, you should really perform an independent experiment with a large sample to test for normality first
Use the results of this to guide the data analysis for future experiments
Marshall University School of MedicineSlide20
Computationally-intensive non-parametric methods
The non-parametric methods we examined worked by analyzing the ranks of the data
Another
class of non-parametric tests is the class of computationally-intensive methodsThere are two subclasses: Permutation or randomization tests:
Simulate
the null distribution by repeatedly randomly reassigning group labels
Compare
the "real" data to the generated null distribution
Bootstrapping
techniques:
Effectively
generate many samples from the population by resampling from the original sample
Look at the
distribution
of summary data from the generated samples These techniques still require a reasonable sample size to begin withBig
enough to generate enough distinct permutations or bootstraps Marshall University School of MedicineSlide21
Outliers
Outliers are values in the data that are “far” from the other values
Occur for several reasons:
Invalid data entryExperimental mistakesRandom chanceIn any distribution, some values are far from the othersIn a normal distribution, these values are rarer, but still exist
Biological diversity
If your samples are from patient or animal samples, the outlier may be “correct” and due to biological diversity
May be an interesting finding!
Wrong assumptions
For example, in a lognormal distribution, some values are far from the others
Marshall University School of MedicineSlide22
Why test for outliers
Presence of erroneous outliers, or assuming the wrong distribution, can introduce spurious results or mask real results
Trying to detect outliers without a test can be problematic
We tend to want to observe patterns in dataAnything that appears to be counter to these patterns seems to be an outlierWe tend to see too many outliers
Marshall University School of MedicineSlide23
Before testing for outliers
Before testing for outliers:
Check the data entry
Errors here can often be fixedWere there problems with the experiment?If errors were observed during the experiment, remove data associated with those errorsMany experimental protocols have quality control measuresIs it possible your data is not normally distributed
Most outlier tests assume the (non-outlier) data is normally distributed
Was there anything different about any of the samples
Was one of the mice phenotypically different,
etc
?
Marshall University School of MedicineSlide24
Outlier tests
After addressing the concerns on the previous slide, if you still suspect an outlier you can run an outlier test
Outlier tests answer the following question:
If the data were sampled from a normal distribution, what is the chance of observing one value as far from the others as is in the observed data?Marshall University School of MedicineSlide25
Results of an outlier test
If an outlier test results in a small p-value, then the conclusion is that the outlying value is (probably) not from the same distribution as the other values
Justifies excluding it from the analysis
If the outlier test results in a high p-value, there is no evidence the value came from a different distributionDoesn’t prove it did come from the same distribution, just that there is no strong evidence to the contrary
Marshall University School of MedicineSlide26
Guidelines on removing outliers
If you address all the previous concerns, and an outlier test gives strong evidence of an outlier, then it is legitimate to remove it from the analysis
The rules for eliminating outliers should be established before you generate the data
You should report the number of outliers removed and the rationale for doing so in any publication using the dataMarshall University School of MedicineSlide27
How outlier tests work
Outlier tests work by computing the difference between the extreme value and some measure of central tendency
That value is typically divided by a measure of the variability
Resulting ratio is compared with a table or expected distribution of those valuesMarshall University School of MedicineSlide28
Grubb’s outlier test
Grubb’s outlier test calculates the difference between the extreme value and the mean of all values (including the extreme value), and divides by the standard deviation
Resulting value is then compared to a table of critical values
Critical value depends on the sample sizeIf the value is larger than the critical value, then the extreme value can be considered an outlier
Marshall University School of MedicineSlide29
Demo
We’ll experiment with the GRHL2 Basal-A and Basal-B data sets in
GraphPad
, checking for outliers and testing for normality.