/
Multiple Comparisons Multiple Comparisons

Multiple Comparisons - PowerPoint Presentation

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
394 views
Uploaded On 2017-12-07

Multiple Comparisons - PPT Presentation

Measures of LD Jess Paulus ScD January 29 2013 Todays topics Multiple comparisons Measures of Linkage disequilibrium D and r 2 r 2 and power Multiple testing amp significance thresholds ID: 613335

power multiple 500 false multiple power false 500 linkage 000 comparisons measures positives cases controls type error disequilibrium study

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Multiple Comparisons" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Multiple ComparisonsMeasures of LD

Jess Paulus, ScD

January 29, 2013

Slide2

Today’s topics

Multiple comparisons

Measures of Linkage disequilibrium

D’ and r

2

r

2

and powerSlide3

Multiple testing & significance thresholds

Concern about multiple testing

Standard thresholds (p<0.05) will lead to a large number of “significant” results

Vast majority of which are false positives

Various approaches to handling this statisticallySlide4
Slide5

Possible Errors in Statistical Inference

Unobserved Truth

in the Population

H

a

: SNP prevents DM

H

0

: No association

Observed in the Sample

Reject

H

0

:

SNP prevents DM

True positive (1 – β)

False positive Type I error (α)

Fail to reject H0: No assoc.

False negativeType II error (β):

True negative (1- α)Slide6

Probability of Errors

α

= Also known as: “Level of significance” Probability of Type I error – rejecting null hypothesis when it is in fact true (false positive), typically 5%

p

value

=

The probability of obtaining a result as extreme or more extreme than you found in your study by chance alone

Slide7

Type I Error (α) in Genetic and Molecular Research

A genome-wide association scan of 500,000 SNPs will yield:

25,000 false positives by chance alone using

α = 0.05

5,000 false positives by chance alone using

α = 0.01

500 false positives by chance alone using

α = 0.001Slide8

Multiple Comparisons Problem

Multiple comparisons (or "multiple testing") problem occurs when one considers a set, or family, of statistical inferences simultaneously

Type I errors are more likely to occur

Several statistical techniques have been developed to attempt to adjust for multiple comparisons

Bonferroni adjustment Slide9

Adjusting alpha

Standard Bonferroni correction

Test each SNP at the

α

* =

α

/m

1

level

Where m

1 = number of markers tested Assuming m1 = 500,000, a Bonferroni-corrected threshold of

α*= 0.05/500,000 = 1x10–7Conservative when the tests are correlatedPermutation or simulation procedures may increase power by accounting for test correlationSlide10

Measures of LD

Jess Paulus, ScD

January 29, 2013

Slide11

Haplotype definition Haplotype: an ordered sequence of alleles at a subset of loci along a chromosome

Moving from examining single genetic markers to sets of markersSlide12

Measures of linkage disequilibrium

Basic data: table of haplotype frequencies

A

G

a

g

A

G

a

g

A

g

A

G

a

g

A

G

A

G

a

g

A

G

A

g

a

g

A

G

a

g

A

G

A

a

G

8

0

50%

g

2

6

50%

62.5%

37.5%Slide13

D’ and r2 are most common

Both measure correlation between two loci

D prime …

Ranges from 0 [no LD] to 1 [complete LD]

R squared…

also ranges from 0 to 1

is correlation between alleles on the same chromosomeSlide14

D

Deviation of the observed frequency of a

haplotype

from the expected is a quantity called the linkage disequilibrium (D)

If two alleles are in LD, it means D ≠ 0

If D=1, there is complete dependency between loci

Linkage

equilibrium

means D=0Slide15

A

a

G

n

11

n

10

n

1

g

n

01

n

00

n

0

n

1

n

0

Measure

Formula

Ref.

D’

Lewontin (1964)

2

= r

2

Hill and Weir (1994)

*

Levin (1953)

Edwards (1963)

Q

Yule (1900)Slide16

A

G

a

g

A

G

a

g

A

g

A

G

a

g

A

G

A

G

a

g

A

G

A

g

a

g

A

G

a

g

A

G

A

a

G

8

0

50%

g

2

6

50%

62.5%

37.5%

D’ =

(8

6

– 0x2)

/ (8

6) =1

r

2

=

(8

6

– 0x2)

2

/ (10

68

8)

= .6

R

2

=

D

=Slide17

r2 and power

r

2

is directly related to study power

A low r

2

corresponds to a large sample size that is required to detect the LD between the markers

r

2

*N is the “effective sample size”

If a marker M and causal gene G are in LD, then a study with N cases and controls which measures M (but not G) will have the same power to detect an association as a study with r2*N cases and controls that directly measured GSlide18

r2 and power

Example:

N = 1000 (500 cases and 500 controls)

r

2

= 0.4

If you had genotyped the causal gene directly, would only need a total N=400 (200 cases and 200 controls) Slide19

Today’s topics

Multiple comparisons

Measures of Linkage disequilibrium

D’ and r

2

r

2

and power