/
GWAS Consortia and Meta-Analysis GWAS Consortia and Meta-Analysis

GWAS Consortia and Meta-Analysis - PowerPoint Presentation

clara
clara . @clara
Follow
66 views
Uploaded On 2023-11-22

GWAS Consortia and Meta-Analysis - PPT Presentation

Inês Barroso Joint Head of Human Genetics Metabolic Disease Group Leader Wellcome Trust Sanger Institute 1 Objectives Why perform metaanalysis How What are the issues to consider What can you gain ID: 1034136

analysis meta variants gwas meta analysis gwas variants data genome snps snp allele fine sample association loci based study

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "GWAS Consortia and Meta-Analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. GWAS Consortia and Meta-AnalysisInês BarrosoJoint Head of Human GeneticsMetabolic Disease Group LeaderWellcome Trust Sanger Institute1

2. ObjectivesWhy perform meta-analysis?How? What are the issues to consider?What can you gain?Setting up consortiaWhat next?Deep replicationFine-mappingRare variants2

3. First Obesity LocusScience, 2007PLOS Genetics, 2007Nature Genetics, 2007No additional variants robustly associated with BMI identified3

4. Second Common Obesity LocusNature Genetics, 2008Required: >16K samples in initial analysis; replication in additional ~90K samples; association study in population of different ancestryCONCLUSION:Large sample sizes needed to detect the small effects one expects for complex traits4

5. GWAS Changing ApproachesGWAS 1Association SignalsReplication in Additional Populations of Similar DescentGWAS 2GWAS 3GWAS 7GWAS 4GWAS 5GWAS 6Meta-Analysis in Large International ConsortiaAssociation Testing in Diverse PopulationsReplicating LociRe-Sequencing & Fine-MappingCausal VariantsBiology5

6. Why Perform Meta-Analysis?Maximise the value and information from pre-existing data -> e.g. GIANT and BMIIncrease sample sizeoften sample sizes required too large for a single group to be able to perform appropriately powered studyIncrease power of the studyPossibly to increase diversity of study samplesTrans-ethnic fine-mapping6

7. Meta-analysis -principlesSynthesis of different datasets to obtain a summary based on evidence from the combined dataIn epidemiological terms, meta-analyses provide a better estimate of effect sizeIn the GWAS setting, meta-analysis is usually initially carried out to help the discovery of further susceptibility variants of moderate/ small effect size that would have otherwise escaped detection due to low power7

8. Sample Size and Power8Power to detect association (p=5x10-8) at a variant with risk allele frequency 0.30 and allelic OR 1.10 >20, 000 cases and equal number of controls needed for power > 80%

9. Forming consortiaThe first step in many GWAS meta-analyses involves setting up consortia to study specific traits of interest9

10. Consortia Rules of EngagementIt is helpful to decide governance structures up frontSteering committeesWorking groupsAgree principles for data sharingWithin the consortiaWith the wider communityAgree principles for authorshipHave a written document delineating the aboveAsk participating investigators to agree to/ sign up to the aboveDocument can be referred back to when/if needed10

11. GWAS Meta-Analysis Considerations (I)The majority of GWAS meta-analyses combine data retrospectivelyharmonisation of study design can be extremely difficult;Meta-analyses can be carried out sequentially, and can be updated when new GWAS datasets for the same trait emergeRobust GWAS meta-analyses require a clear predefined protocol:Definition of phenotype to study should be uniform;Agreed uniform criteria for inclusion/ exclusion of samples; Agreed uniform QC for genotyped data (call rate, HWE, MAF);There is a need to specify basic analytical options, such as genetic model examined, strategy for covariate adjustments ;Need to consider how to correct for population stratification (e.g. genomic control)11

12. GWAS Meta-Analysis Considerations (II)Retrospective aspect of GWAS Meta-AnalysesIt is critical to gather information systematically regarding phenotype, genotyping platform, local QC metrics, co-variates, etc.Because most GWAS meta-analyses combine pre-existing datasets often these are generated using different genotyping platforms with limited overlap in variants testedHow should one combine data generated in multiple different platforms? (imputation)12

13. ImputationAllows to estimate genotypes at SNPs with missing dataFailed SNPsSNPs not present on a given genotyping arrayTwo studies to be meta-analysed used two different genotyping platforms with little overlap in SNP contentOnly using overlapping SNPs would exclude the majority of the dataImputation allows data to be combined across platformsImputation increases the coverage across the genome and the number of variants that can be tested for association 13

14. Imputation14Study 1Study 2Study 1 with imputed missing SNPsImputationRequires GWAS genotypes to be used as scaffoldRequires reference datasets (e.g. www.hapmap.org; www.1000genomes.org) where the LD (correlation) between SNPs is known and allows imputation of genotypes for variants not typed on a given array. Increasingly these could include reference datasets generated by whole-genome sequencing of subsets of individuals from the populations included in the studyThere is specialist software to facilitate imputation as well as meta-analysis

15. GWAS Meta-Analysis Collecting Information for Consortia WorkRequires summary statistics at each variantInformation on analysis method and covariates usedInformation on the size of the studyInformation on the independence of samplesInformation on approaches taken to adjust for any population stratification (for example genomic control)Information on strand and build of the human genome, on which allele coding has been based15

16. Typical data sharing table formatSTUDY TITLEGeneral informationName of studyName of analystEmail of analystStudy designpopulation-based, family-based –please give detailsSample informationNumber of cases (females)Number of controls (males)Ethnic compositionPossible relatedness issuesare individuals related (how?)Possible structure issuesmixed population?Genotyping and imputation informationGenotyping platformSummary of key QC metrics# SNPs passed QCImputation methodImputation settingsReference data used for imputationincluding buildAnalytical informationAssociation analysis method for imputed genotypes accounting for uncertainty using SNPTEST or other (which?) program, using only genotypes with P(call)>X (which threshold?) as hard calls, using best guess genotypesCalculated GC lambda (typed SNPs)Calculated GC lambda (imputed SNPs)Covariates includedPCA, GC, noneGenetic model 16

17. Typical data sharing table formatColumn headerDescriptionSNPSNP rs number (if unknown, e.g. with some Affymetrix SNPs, report Affy SNP ID)builde.g. “36”, human genome build usedstrande.g. “+”, human genome strand usedchromosomechromosome on which SNP residespositionposition of SNP on chromosome in base pairs, based on human genome build usedimputed“1” for imputed, “0” for directly-typed SNP passing QCmajor_allelee.g. “G”, major allele at that SNP, based on control frequencyminor_allelee.g. “A”, minor allele at that SNP, based on control frequencyMAF_controlse.g. “0.246”, minor allele frequency in controls -provide 3 digits to the right of the decimalOR_allelee.g. “A”, allele to which the OR has been estimatedcall_ratee.g. “0.985”, call rate for this SNP across cases and controls -provide 3 digits to the right of the decimalexact_HWE_casesexact HWE p value in casesexact_HWE_controlsexact HWE p value in controlsORe.g. “1.097”, allelic odds ratio –provide 3 digits to the right of the decimallower_95%CIe.g. “0.874”, lower 95% confidence interval of the OR –provide 3 digits to the right of the decimalupper_95%CIe.g. “1.267”, upper 95% confidence interval of the OR –provide 3 digits to the right of the decimaladditive_p_uncorradditive model p value, uncorrected for genomic controladditive_p_corradditive model p value, corrected for genomic controlimpute_acce.g. “0.98”, metric for imputation accuracy (i.e. value for r2hat or proper_info measures, depending on imputation programme used; if some other measure used, please specify)17

18. HeterogeneityResults from meta-analysis of various studies may suggest between study heterogeneity (e.g. especially when combining populations of different ancestry)How to interpret heterogeneity?Differences in study designDifferences in population structureDifferences in environmental exposuresFalse-positive?18

19. Benefits of GWAS Meta-AnalysisIncreased sample sizes for many disease and continuous trait consortia increased power to detect new locinew pathways and important biological insights gainedgreater power to detect even smaller effect sizes and greater coverage of allele frequency spectrumPower of large collaborations/consortiaDesign better powered replication and fine-mapping experiments (e.g. Metabochip, Immunochip)19

20. GWAS Meta-Analysis ResultsAs sample size increases, power increases to detect smaller effect sizes20Effect sizes were small with FTO being the largest;For a given allele frequency novel loci had slightly smaller effect sizes than previously established lociBMI-increasing allele for new loci varied from 4-87%, covering greater allele freq spectrum than previous GWAS meta-analysis with ~ half sample size (24-83%)Speliotes et al. NG, 2010

21. Published GWAS Meta-Analysis StudiesIncreased the number of loci discoveredIncreased the fraction of the heritability explainedHowever…Follow-up of discovery results in published GWAS meta-analysis was limited:Small number (N<30) of top signals from discovery analysis were taken into additional studies for validationInformation from discovery meta-analysis has not been fully exploitedDevelopment of custom chips to enableDeep replicationFine-mapping21

22. MetabochipThe Metabochip is a custom iSelect Illumina array (~196,725 SNPs).Designed to support large scale follow up of putative associations for glycaemic, cardiovascular and other metabolic traits. The chip incorporates a number of fine-mapping regions, designed using the initial release of the 1000 Genomes to ensure high SNP coverageThe chip also incorporates all established GWAS hits that ( p<5x10-8) known at the time of design22

23. MAGICFG FI/HbA1c/2hrGCARDIoGRAMMICAD MPV/PLT/WBC ICBP‐GWASSBP/DBP GIANTBMI/WHRWC/Height/FATPCTDIAGRAMT2D T2DAoD/EarlyOnset QT‐IGCQTLipidsHDL/LDL/TGTC5k,3x1k SNPs5k,2x1k SNPs7,3x0.7k SNPs2x5k SNPs2x5k,3x1k SNPs3x5k,1k SNPs5k SNPs~66,117 SNPs Metabochip: replication(Illumina ~200k iSelect array, Aug2009)23

24. Metabochip ApplicationMetabochip array used for follow up of regions with preliminary association evidence in:DIAGRAM (type 2 diabetes case control analysis)MAGIC (quantitative glycaemic trait analysis)New loci influencing type 2 diabetes risk and glycaemic traits have been discovered (Morris et al., 2012; Scott et al., 2012)Metabochip facilitates investigating the genetic overlap between related cardiometabolic traitsShared genetic determinants24

25. ImmunochipCustom Illumina array with ~200,000 markersTrynka et al., 2011Application of Immunochip to coeliac disease13 new loci associated with disease risk at p<5x10-81/3 loci with multiple independent association signals29 of the 54 fine-mapped signals seemed to be localized to single genes and, in some instances, to gene regulatory elements25

26. SummaryGWAS Meta-AnalysisMade possible by imputationRequires agreed upon phenotype definition, samples and SNP QC, and analysis planFacilitated the discovery of new loci due to >> sample sizesLarge consortia work facilitated the development of cost-effective custom arrays for deep replication and fine-mappingNew methods for conditional analysisAllow application to summary level dataRare variants – next analytical challenge!26

27. Fine-Mapping in Meta-Analysis SettingSeveral rounds of conditional analysis by local analystsCentral meta-analysis of results from conditional analysisIterative processFurther insights and better coverage by includingImputed SNPs to the most recent reference set from 1000 genomesFrom ongoing sequencing efforts (e.g. uk10k)Overlay data with ENCODE datasets to evaluate functional relevanceIf many studies and samples are involved this process is laborious and error prone27

28. Fine-Mapping Using Summary Level DataYang et al., 2012Developed a method for approximate conditional and joint genome-wide association analysis Method can use summary-level statistics from a meta-analysis of genome-wide association studies (GWAS) and LD information from a reference panel Computationally fastAvoids having to go back to individuals cohorts for iterative rounds of conditional analysisAvoids the need to request individual level genotype data from participating cohorts (which is something most people are not comfortable sharing)28

29. Where to next?H3A ConsortiaInvestigate genetic bases of diseases in AfricaTrans-ethnic fine-mappingAnalyse GWAS data from multiple populations with different patterns of LDInvestigate the role of rarer variants in disease and underlying traits:Exome chip - targeted array with variants mapping to protein coding regions and identified from exome sequencing projectsSequence-based association studiesWhole-exome re-sequencingExons, slice junctions and conserved NCS;More “interpretable” portion of genome;Per base more expensive but overall cheaper;Whole-genome re-sequencingHypothesis free;Covers whole genome.29

30. Common VariantsAllelefrequencyEffect SizeLowModestIntermediateHighCommonVeryrareRareLow frequency0.1%0.5%5%Rare variants causing Mendelian diseaseCommon Variants, very few of theseLow frequency variants intermediate effectLinkage/ Candidate geneGWAS/ Candidate geneNowGenetic variants and human disease30