Measures of LD Jess Paulus ScD January 29 2013 Todays topics Multiple comparisons Measures of Linkage disequilibrium D and r 2 r 2 and power Multiple testing amp significance thresholds ID: 613335
Download Presentation The PPT/PDF document "Multiple Comparisons" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Multiple ComparisonsMeasures of LD
Jess Paulus, ScD
January 29, 2013
Slide2
Today’s topics
Multiple comparisons
Measures of Linkage disequilibrium
D’ and r
2
r
2
and powerSlide3
Multiple testing & significance thresholds
Concern about multiple testing
Standard thresholds (p<0.05) will lead to a large number of “significant” results
Vast majority of which are false positives
Various approaches to handling this statisticallySlide4Slide5
Possible Errors in Statistical Inference
Unobserved Truth
in the Population
H
a
: SNP prevents DM
H
0
: No association
Observed in the Sample
Reject
H
0
:
SNP prevents DM
True positive (1 – β)
False positive Type I error (α)
Fail to reject H0: No assoc.
False negativeType II error (β):
True negative (1- α)Slide6
Probability of Errors
α
= Also known as: “Level of significance” Probability of Type I error – rejecting null hypothesis when it is in fact true (false positive), typically 5%
p
value
=
The probability of obtaining a result as extreme or more extreme than you found in your study by chance alone
Slide7
Type I Error (α) in Genetic and Molecular Research
A genome-wide association scan of 500,000 SNPs will yield:
25,000 false positives by chance alone using
α = 0.05
5,000 false positives by chance alone using
α = 0.01
500 false positives by chance alone using
α = 0.001Slide8
Multiple Comparisons Problem
Multiple comparisons (or "multiple testing") problem occurs when one considers a set, or family, of statistical inferences simultaneously
Type I errors are more likely to occur
Several statistical techniques have been developed to attempt to adjust for multiple comparisons
Bonferroni adjustment Slide9
Adjusting alpha
Standard Bonferroni correction
Test each SNP at the
α
* =
α
/m
1
level
Where m
1 = number of markers tested Assuming m1 = 500,000, a Bonferroni-corrected threshold of
α*= 0.05/500,000 = 1x10–7Conservative when the tests are correlatedPermutation or simulation procedures may increase power by accounting for test correlationSlide10
Measures of LD
Jess Paulus, ScD
January 29, 2013
Slide11
Haplotype definition Haplotype: an ordered sequence of alleles at a subset of loci along a chromosome
Moving from examining single genetic markers to sets of markersSlide12
Measures of linkage disequilibrium
Basic data: table of haplotype frequencies
A
G
a
g
A
G
a
g
A
g
A
G
a
g
A
G
A
G
a
g
A
G
A
g
a
g
A
G
a
g
A
G
A
a
G
8
0
50%
g
2
6
50%
62.5%
37.5%Slide13
D’ and r2 are most common
Both measure correlation between two loci
D prime …
Ranges from 0 [no LD] to 1 [complete LD]
R squared…
also ranges from 0 to 1
is correlation between alleles on the same chromosomeSlide14
D
Deviation of the observed frequency of a
haplotype
from the expected is a quantity called the linkage disequilibrium (D)
If two alleles are in LD, it means D ≠ 0
If D=1, there is complete dependency between loci
Linkage
equilibrium
means D=0Slide15
A
a
G
n
11
n
10
n
1
g
n
01
n
00
n
0
n
1
n
0
Measure
Formula
Ref.
D’
Lewontin (1964)
2
= r
2
Hill and Weir (1994)
*
Levin (1953)
Edwards (1963)
Q
Yule (1900)Slide16
A
G
a
g
A
G
a
g
A
g
A
G
a
g
A
G
A
G
a
g
A
G
A
g
a
g
A
G
a
g
A
G
A
a
G
8
0
50%
g
2
6
50%
62.5%
37.5%
D’ =
(8
6
– 0x2)
/ (8
6) =1
r
2
=
(8
6
– 0x2)
2
/ (10
68
8)
= .6
R
2
=
D
’
=Slide17
r2 and power
r
2
is directly related to study power
A low r
2
corresponds to a large sample size that is required to detect the LD between the markers
r
2
*N is the “effective sample size”
If a marker M and causal gene G are in LD, then a study with N cases and controls which measures M (but not G) will have the same power to detect an association as a study with r2*N cases and controls that directly measured GSlide18
r2 and power
Example:
N = 1000 (500 cases and 500 controls)
r
2
= 0.4
If you had genotyped the causal gene directly, would only need a total N=400 (200 cases and 200 controls) Slide19
Today’s topics
Multiple comparisons
Measures of Linkage disequilibrium
D’ and r
2
r
2
and power