/
Polygenic Risk Scores Mayo Clinic & Illinois Alliance, Computational Genomics Course Polygenic Risk Scores Mayo Clinic & Illinois Alliance, Computational Genomics Course

Polygenic Risk Scores Mayo Clinic & Illinois Alliance, Computational Genomics Course - PowerPoint Presentation

lucinda
lucinda . @lucinda
Follow
27 views
Uploaded On 2024-02-09

Polygenic Risk Scores Mayo Clinic & Illinois Alliance, Computational Genomics Course - PPT Presentation

Dan Schaid Curtis L Carlson Professor of Genomics Research Mayo Clinic Email schaidmayoedu Polygenic Risk Score PRS PRS Many Gene Score Weighted sum of genotypes across genetic markers SNPs ID: 1045609

snps risk allele prs risk snps prs allele amp gwas genetic snp cancer number heritability genome trait small linkage

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Polygenic Risk Scores Mayo Clinic & ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Polygenic Risk ScoresMayo Clinic & Illinois Alliance, Computational Genomics Course Dan SchaidCurtis L. Carlson Professor of Genomics ResearchMayo ClinicEmail: schaid@mayo.edu

2. Polygenic Risk Score (PRS)PRS = Many Gene ScoreWeighted sum of genotypes across genetic markers (SNPs) for a subjectHow to choose weights?How to choose SNPs?What do you do with a PRS?

3. Main OutlineIngredients for PRS: Genome Wide Association StudiesHow to develop PRSCurrent limits of PRSMeasure performance of PRSClinical use of PRSCurrent efforts to bring PRS to clinical practice

4. Background on GWASGenome-wide association study (GWAS)Aim: detect associations between genetic locus (plural, loci) and a traitGenetic locus: fixed position on a chromosomeTrait: disease status, quantitative trait (e.g. blood pressure), etc.GWAS brings us to the “neighborhood” of a causal geneIf lucky, we refine the region of the causal geneMeasured/Imputed SNPDisease-risk allele, unmeasuredSNP Markers12D3

5. SNP Array for common variantsWhole genome sequencing (WGS) for rare variantsSample size: 10K-100KPower depends on: allele frequency, effect size, linkage disequilibriumStatistical association of each genetic variant with traitRefinement of genomic position depends on linkage disequilibrium$$$

6. SNP: single-nucleotide polymorphismgermline substitution of a single nucleotide at specific position in the genometransitions more frequent than transversions due to biochemistry

7. Collect SamplesExtract DNAhttps://www.bnl.gov/education/programs/program.php?q=169Genotype ArraySequencerGenotype Data

8. https://www.illumina.com/science/technology/microarray.html

9. Genotype Array

10. Linkage Disequilibrium (LD)non-random association of alleles at different loci in a given populationSee NCI Dictionary of genetic termsHighly correlated regionLD blockWeakly correlated regionhttps://www.cancer.gov/publications/dictionaries/genetics-dictionary

11. Tag SNPs & Linkage DisequilibriumHaplotype: Set of alleles inherited together from a single parent. SNPs: where genetic variants occurTag SNPs: “tag” particular haplotypes in regions with high LD to represent the regionSave costs by genotyping only tag SNPsATCUniquely identify haplotype-1

12. Consequence of Tag SNPsTag SNPs are measures of variation for a regionSize of region depends on LDLD differs across different ancestriesLD often weaker among African ancestriesTag SNPs for Europeans may not represent African ancestries

13. Genotyping completedNext - QC: Selection of SNPs, Samples for AnalysesRemove SNPs poorly genotypedRemove samples that perform poorly (e.g., poor quality DNA)If sample mainly European, remove small number non-European samples (limited power for small number of non-European)Some studies ignore X-chromosome in analysesMoral of story: Details of GWAS analyses have downstream impact on creation of polygenic risk scores

14. Next: Genotype Imputation – Filling in Unmeasured SNPsGenotype codeddose of minor (alt) allele (0, 1, 2 copies)A/A A/G G/G 0 1 2Imputed SNPs:From ~1M SNPs to ~20M SNPs

15. Genotype Imputation – Fill in Unmeasured SNPsBoost power of GWASCapture poorly measured SNPsFine-mapping:high-resolution of genomic regionCombine different studiesDifferent studies use different SNP arraysImprove estimates of effects of SNPsMeasured SNPsNo OverlapImputed SNPs

16. Michigan Imputation ServerTOPMed Imputation ServerCourtesy of Center of Inherited Disease Research, NIH

17. Imputation accuracy Reference panels (number subjects, ancestral diversity)Sample sizeGenotyping chipAllele frequencySummaryFor common variants (minor allele frequency > 1%), TOPMed slightly betterTOPMed better for African ancestry samplesRare variants (< .1%) poorly imputed by HRC and TOPMed

18. Discovery & Prediction Depend on Links from SNPs to TraitSNP Markers123DMeasured/Imputed SNPDisease-risk allele, unmeasuredTrait4

19. GWAS Regression AnalysesTest 1 SNP at a time: dose of risk allele (0, 1, 2 copies)Account for covariatesage, sex, known risk factorspopulation stratificationRegression ModelLinear regression: quantitative trait (e.g., blood pressure)Logistic regression: case/controlCox model: cohort design, age of disease onset

20. Population StratificationSystematic difference in allele frequencies among subpopulationsCaused by physical separation, random changes over timeBias if cases originate in different clusters than controlsUse all genetic data to cluster and adjust for cluster membership Principal components used as covariatesReduce the number of SNPs to ~10 PCs that best ‘explain’variation of SNPs

21. View GWAS ResultsP-value < 5x10-8Control for testing millions of SNPsMost robust resultsManhattan PlotOlama et al, Prostate Cancer (doi:10.1038/ng.3094)Mix of SNPs: many no effect on trait some small effectsAdditional SNPs with weaker effects might be useful for PRS to predict trait-log-10(p-value)Novel discovery

22. GWAS Catalog https://www.ebi.ac.uk/gwas/

23. Polygenic Risk Scores(Genomic Risk Scores)

24. Key Steps to Develop PRSDevelopDetermine which SNPsEstimate weights ValidateIdeal: evaluate predictions on independent dataNext best: cross-validation

25. Considerations for Creating PRS from GWASWhich genome build was used for GWAS?Genome Build: chromosome positions of SNPsGRCh38/hg38 released December 2013reference allele: matches allele of  reference genomeWhich allele is risk alleleMany GWAS code dose of minor alleleDoes minor allele agree with alt allele of reference genome?Summary statistics easily shared and managed

26. riskrisk allele freqbetasezp-valuereferenceMy Summary Data for Each SNPChoose Reference Ancestry for LDmany more

27. Methods to Compute PRSClumping & ThresholdingClumpingUse LD to group SNPs that are highly correlatedChoose 1 SNP to represent groupThresholdingEvaluate different p-value thresholds for best predictionBayes methods

28. .02712,148 SNPs, explain 2.1% of variation in Major Depression DisorderClumping & ThresholdingModel Fit to Depressionpoorbest

29. Thresholding by p-valuesBiased overestimate of genetic effect sizeWorsens with low power (small N) and large number of SNPsWinner’s Curse in Auction SettingWinner = highest bidderWinner pays more than ave. (true) valueBiased overpay increases as number of bidders increases

30. Better by Bayes MethodsUse more SNPsBetter use of LDSome include functional informationLDpredVilhjalmsson 2015 ( Am J Hum Genet), PRIVÉ 2020 ( bioRxiv)lassosumMak 2017 (Genet Epidemiol)PRS-CSGe 2019 (Nature comm)PRS-CSxRuan 2020 (medRxiv)SBayesRLloyd-Jones 2019 (Nat Commun)PolyPredPolyPred+ Weissbrod 2021 (medRxiv)

31. Clump & ThresholdBayesNo effectSmall effectMedium effectIdea of Bayes MethodsModel probability SNP has No effect, small, medium Result: β shrunken towards 0

32. Generally Modest Gains by Bayes MethodsROC-AUC Clumping & ThresholdingLDpredType-1 Diabetes0.840.87Type-2 Diabetes0.620.63Coronary Artery Dis.0.610.60Chron Dis.0.630.67Rheumatoid Arth.0.700.72Bipolar Dis.0.670.67Hypertension0.620.61Vilhjalmsson AJHG 2015

33. PRS: simple vs. complex approachesLinear modelsSum of SNP effectsIgnore interactionsMachine learningCapture non-linear effects, interactionsTo datelittle evidence of interactions in GWAS

34. AI & Machine Learning: minimal gainsSimple Weighted Sum Works WellBadre et al. (2021). Journal of Human GeneticsTraditional linear models workas well as deep learning (CNN)Bellot, Genetics 2018

35. PGS Catalogue https://www.pgscatalog.org/

36. Limits of Portable PRS across Different AncestriesMost GWAS in European ancestries (~79%)Limited African, Asian, Hispanic ancestriesPrediction accuracy reduced by 2-5 fold in East Asian and African AmericanDifferences in:population-specific causal variantspopulation allele frequencies linkage disequilibrium tagging of causal variantspossibly gene-gene or gene-environment interactionsMartin 2017 Nature Genetics doi: 10.1038/s41588-019-0379-x

37. Measures of Performance of PRSReceiver-Operating Characteristic (ROC) CurveRisk in extreme quantilesAbsolute risk and calibrationBest performance depends onHeritabilitySNP Effect SizeSample Size

38. Receiver-Operating Characteristic (ROC) CurveSensitivityprobability diseased has positive testSpecificityprobability non-diseased has negative test

39. ThompsonJAMA 2005;294:66-70Receiver Operating Characteristic Curve for PSAPSA ValueAUC: Probability that risk score is higher for diseased than non-diseased0.5 random (no discrimination)0.77 cholesterol prediction of CHD0.80 for population screening0.98 for pre-symptomatic diagnosis

40. Upper Limit of PRS: Heritability of TraitHeritability: Percent of Trait Variance due to GenesRanges 0 - 100%Change environment can change heritabilityTag-SNPs measure only a portion of genesImproved measure of genes -> increased measure of heritability

41. Heritability of HeightFamily & Twin Studies: ~80% heritableStrict p-value thresholdSNP Heritability ~ 5%2008250K Common SNPsAdvanced modelsLarger sample2010doi:10.1038/ng.608SNP Heritability ~ 45%2019https://doi.org/10.1101/588020Whole Genome Seq.21K subjects47M Genetic VariantsMany rare variantsHeritability ~ 79%

42. Max AUC Depends on Heritability of Trait(Wray, Plos Genetics 2010)Family-based heritabilityWang, Nat. Genet 2017hypertensiondepressiondermatitis

43. AUCSample size (thousands)Large Sample Size NeededComplex Traits: Many SNPs of Small EffectsZhang, Nature Commun. 2020

44. Discrimination vs. PredictionROC/AUC better for disease diagnosis (classification)Only need cases and controlsGlobal measure of discrimination – does not inform extreme risk strata Predictive medicine: predict future eventStratify into low/high risk groupsAbsolute risk of future diseaseCook, Circulation 2007

45. Conti Nat Genet 2021Stratify into Risk Groups: Prostate CancerLowest 10%: 3-4 times LESS likely to have prostate cancer than ave.Highest 10%: 4-5 times MORE likely to have prostate cancer than ave.

46. Absolute RiskFuture risk (next 5yr, 10yr, lifetime) givenPRSCurrent ageSexRaceOther Risk FactorsDetermine by:Population disease incidence rates (age/sex/ancestry specific)Cancer Surveillance, Epidemiology, and End Results (SEER) Odds ratios for risk factors + PRSiCare R package: Choudhury Plos One 2020

47. Calibration: the Achilles heel of predictive analytics Van Calster et al. BMC Medicine (2019) 17:230Calibration: accuracy of risk estimatesAgreement of estimated and observed number of eventsIf poorly calibratedFalse expectations of patients and healthcare professionalsHow to calibrateCohort studies: large cohort followed over timeBiobanks good source: UK Biobank, US All of Us

48. Reasons for Poor CalibrationAlgorithm developed on data that differs with target populationReferral vs. regional medical centersChanges over time: diagnostics, types of patientsStatistical overfittingFlexible model with too many parameters (e.g., deep learning neural nets)Capture random errors in data

49. Clinical Use of PRSAbsolute riskSpecific to current age & future riskTraditional risk factors + PRSStratify into high risk: targeted screeningBehavior interventionPreventive medication

50. eMERGEElectronic Medical Records and Genomics (eMERGE) NetworkNHGRI-organized and funded consortium Develop PRS for 15 common diseases Integrate PRS in the EHR Assess outcomes following return of results

51. Women at increased risk breast cancer are offered endocrine therapy to reduce riskStandard Models:Gail Model (BCRAT) 5 yr 3%International Breast Cancer Intervention Study (IBIS) 10 yr 5%Does standard model + PRS influence intent to take meds? 

52. PRS: Reclassify Beyond Standard Risk FactorsSimulated: 55yr women, IBIS 10 yr risk = 0.05Mavaddat AJHG 2019: 313 SNPs for PRS for breast cancerHigher riskLower risk

53. % Change in Risk due to PRS(N=151)More likelyLess likelyIntent to take endocrine therapy

54. Considerations for Risk PredictionsImproved reporting standards for PRS (Wand, Nature 2021)Description of construction & validation of PRSAccount for non-genetic risk factorsAncestry limitationsBenefits vs risks of predictionsModifiable behaviors for high-risk strata?Preventive medications?Management of anxiety?

55. Summary Take-away PointsPRS: weighted sum of SNPsPRS active area of developmentNew statistical methods, include gene function, different diseases/traitsClinical UtilityRisk stratificationAbsolute risk predictionFuture Needs:More diverse ancestries, large samplesCohort studies for calibrationImproved reporting standards (Wand, Nature 2021)