Sebanti Sengupta 11152017 Background Metaanalysis an important strategy for genetic association studies Increases sample size and power Can lead to discovery of novel loci Can use previously published study results ID: 1024537
Download Presentation The PPT/PDF document "Correcting for Sample Overlap in Associa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. Correcting for Sample Overlap in Association Meta-analysis Using Summary StatisticsSebanti Sengupta11/15/2017
2. BackgroundMeta-analysis an important strategy for genetic association studiesIncreases sample size and powerCan lead to discovery of novel lociCan use previously published study resultsWhat happens if different studies in the meta-analysis have samples in common?We wish to estimate the ‘bottom-line’ p-value that corrects for sample overlapUse summary statistics (Z-score, allele frequency, sample size)
3.
4.
5. Standard Meta-analysis MethodUsual way to combine estimates from different studies:where is the estimator of interest from the study (eg. Z-score or log odds ratio), the corresponding estimated variance the weight used for the study in the meta-analysis
6. Standard Meta-analysis MethodUsual way to combine estimates from different studies: Not true if samples overlap
7. Standard Meta-analysis MethodUsual way to combine estimates from different studies: Not true if samples overlap
8. Standard Meta-analysis MethodUsual way to combine estimates from different studies: Not true if samples overlapIdea: estimate the covariance term using summary statistics
9. MethodAssume that samples are homogeneous, that is, belong to the same ancestryAllele frequencies and effect sizes same in all samplesDo not vary with overlap
10. Stratifying Markers by Sample SizeThe overlap number for a marker M may differ depending on whether it is present in the overlapping samples or notPossible combinations for a marker M00Overlap Numberwhere sample size of Cohort 2 is
11. Estimating CovarianceSuppose the trait is independent of all markers, then sample correlation of the Z-scores can be used to estimate the covarianceTrait associated loci are expected to show correlation even when samples are independentUse Z-scores truncated at some cut-off value , to estimate the correlation Most results shown use
12. Correcting for OverlapMeta-analysis done by adjusting the covariance term in the weightsThe updated weights to correct for overlap are as follows:
13. Effective Sample Size of OverlapWe estimate effective sample size of overlap as:Note that this need not be the actual number of samples overlappingEg: For case-control studies, the overlap estimated may correspond to a range of overlap numbers depending on what proportion of cases and controls overlap respectively
14. Creating Overlapping Datasets: Case-Control Study (T2D) 3 European Studies from DIAMANTE: FUSION, METSIM and MGITrue combined sample size 25,240
15. Estimated Overlap Stratified by Sample SizeObserved Sample SizeCount of MarkersNCategoryEstimated OverlapCI 4,418 B+B2,209(2,187, 2,209) 10,946 A+B+B2,225(2,175, 2,262) 20,921 B+B+C2,656(2,540, 2,765) 23,031 A+C176(52, 311) 27,449 A+B+B+C2,326(2,238, 2,365)Z-scores truncated at cutoff = 1
16. Meta-Analysis Results-log10 Naive Meta-analysis P-value-log10 Target P-valueMeta-analysis without Correcting for Overlap
17. Meta-Analysis Results-log10 Naive Meta-analysis P-value-log10 Target P-value-log10 New Method P-value-log10 Target P-valueMeta-analysis without Correcting for OverlapMeta-analysis Correcting for Overlap
18. Creating Overlapped Datasets: Quantitative TraitGLGC European Cohorts, Trait HDL-cholesterolTrue combined sample size 15,579
19. Estimated Overlap Stratified by Sample SizeCount of MarkersObserved Sample SizeSample SizeCategoryEstimated OverlapCI4,970B+B2,485(2,478, 2,485)10,223B+B+C2,409(2,324, 2500)12,811A+B+B2,448(2,305, 2583)13,094A+C0-18,064A+B+B+C2,546(2,458, 2,592)Z-scores truncated at cutoff = 1
20. Meta-analysis Results-log10 Naive Meta-analysis P-value-log10 Target P-valueMeta-analysis without Correcting for Overlap
21. Meta-analysis Results-log10 Naive Meta-analysis P-value-log10 Target P-value-log10 Weight-stratified Adjusted P-value-log10 Target P-valueMeta-analysis without Correcting for OverlapMeta-analysis Correcting for Overlap
22. Meta-Analysis with Multiple StudiesFor multiple studies, meta-analysis is conducted sequentiallyWhen meta-analyzing a pair of studies and , for each marker we calculate and this total weight is used when meta-analyzing the combined Z with a new study
23. In SummaryWhen different studies in a meta-analysis have overlapping samples, the standard methods lead to an inflation in type I errorIf overlapping sample sizes are unknown, effective overlap sample size can be estimatedAssuming samples belong to the same ancestryActual overlap numbers may differ depending on overlap patternCovariance estimated using truncated Z-scores used to correct for overlap in meta-analysis
24. AcknowledgementsGoncalo AbecasisMichael BoehnkeDaniel Taliun
25. Thanks!
26. Estimated Overlap by Varying Cutoff Values: T2DN = 23,031N = 27,449Estimated Effective Sample Size of OverlapEstimated Effective Sample Size of Overlap
27. Estimated Overlap for Varying Cut-off Values: GLGCEstimated Effective Sample Size of OverlapEstimated Effective Sample Size of OverlapN = 13,094N = 18,064