Ida Surakka PostDoctoral Fellow Department of Internal Medicine Division of Cardiovascular Medicine University of Michigan Outline Background for complex diseases Genomewide association analyses ID: 931961
Download Presentation The PPT/PDF document "From Genomics to Prevention of Cardiovas..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
From Genomics to Prevention of Cardiovascular diseases
Ida SurakkaPostDoctoral FellowDepartment of Internal Medicine Division of Cardiovascular MedicineUniversity of Michigan
Slide2Outline
Background for complex diseasesGenome-wide association analyses
Coronary heart disease prediction
Challenges in genetic prediction
Where to go from here?
2
Slide3Outline
Background for complex diseasesGenome-wide association analyses
Coronary heart disease prediction
Challenges in genetic prediction
Where to go from here?
3
Slide4Inheritance
4
Dad’s chromosomes
Mother’s chromosomes
Meiosis
Child’s chromosomes
Slide5Inheritance
5
Dad’s chromosomes
Mother’s chromosomes
Meiosis
Child’s chromosomes
Slide6Inheritance
6
Dad’s chromosomes
Mother’s chromosomes
Meiosis
Child’s chromosomes
Slide7Inheritance
7
Dad’s chromosomes
Mother’s chromosomes
Meiosis
Child’s chromosomes
Slide8Mutations
8
A T T G C A G C A G T C A A A G C T A G A T C A G C T A C A G C T
Slide9Mutations
9
A T T G C A G C A G T C A A A G C T A G A T C A G C T A C A G C T
Slide10Mutations
10
A T T G C A G C A G T C A A A G C T A G A T C A G C T A C A G C T
A T T G C A G C A G T C A A A G C T
T
G A T C A G C T A C A G C T
Single nucleotide polymorphism (SNP)
Slide11Mutations
11
A T T G C A G C A G T C A A A G C T A G A T C A G C T A C A G C T
A T T G C A G C A G T C A A A G C T
T
G A T C A G C T A C A G C T
SNP
Slide12Mutations
12
A T T G C A G C A G T C A A A G C T A G A T C A G C T A C A G C T
A T T G C A G A G T C A A A G C T
T
G A T C A G C T A C A G C T
Deletion
A T T G C A G C A G T C A A A G C T
T
G A T C A G C T A C A G C T
SNP
Slide13Mutations
13
A T T G C A G C A G T C A A A G C T A G A T C A G C T A C A G C T
A T T G C A G A G T C A A A G C T
T
G A T C A G C T A C A G C T
A T T G C A G C A G T C A A A G C T
T
G A T C A G C T A C A G C T
SNP
Deletion
Slide14Mutations
14
A T T G C A G C A G T C A A A G C T A G A T C A G C T A C A G C T
A T T G C A G A G T C A A A G C T
T
G A T C A G C T A C A G C T
A T T G C A G C A G T C A A A G C T
T
G A T C A G C T A C A G C T
SNP
Deletion
A T T G C A G A G T C A A A G C T
T
G A T C A G A C T A C A G C T
Insertion
Slide15Genetic Terminology
A mutation that doesn’t cause fatal phenotype and gets common in the population is called polymorphism
or
genetic variant
At each genetic locus (location in the genome)
every individual has two alleles (versions), one from mother and one from fatherBy alleles we usually refer to alternate versions of polymorphism or mutation (can also refer to gene copy)A/T (SNP, individual has inherited two different alleles)T/T (SNP, individual has inherited same allele)
-/T (individual has inherited deletion from one of the parent)
ATG/G (individual has inherited insertion from one of the parents)
15
Slide16Monogenic vs Polygenic Disease
16
Mendelian inheritance
Complex inheritance
Gene A
Gene B
Gene C
Gene D
Gene E
Single mutation in single gene
Multiple mutations in multiple genes
Variant impact
100%
Variant impact
14%
8%
7%
11%
Phenotype usually severe and similar between carriers
Phenotype severity depends on the genetic burden,
ie
. how many contributing variants patient carries
Slide17Complex Disease
17
Gene B
Gene C
Gene D
Gene E
Multiple mutations in multiple genes
Risk burden:
40%
Slide18Complex Disease
18
Gene B
Gene C
Gene D
Gene E
Multiple mutations in multiple genes
Unhealthy lifestyle
Risk burden:
40%
Risk burden:
50%
Slide19Complex Disease
19
Gene B
Gene C
Gene D
Gene E
Multiple mutations in multiple genes
Unhealthy lifestyle
Risk burden:
40%
Risk burden:
50%
Risk burden:
90%
Slide20Well-known Examples
20
Heart diseases
Type 2 diabetes
Cancers
Crohn’s
disease &
Inflammatory bowel disease
Alzheimer’s disease
Dementia
Slide21Why Study Complex Diseases?
Most impact on population health#1 killer in the world = cardiovascular diseases#2 killer in the world = cancers
Most impact on health care costs
Effective prevention would save billions of dollars
General knowledge of the risk factors still scarcePrediction being dependent on the available information is still fairly inaccurate
Complex diseases are hard to study!
21
Slide22Outline
Background for complex diseasesGenome-wide association analyses
Coronary heart disease prediction
Challenges in genetic prediction
Where to go from here?
22
Slide23Genome-wide Association Study
Thousands (or millions) of genetic variants measured in thousands of samples
Data matrix with thousands or rows and thousands/millions columns
-> Seriously big data!!
Linear or logistic regression model applied for every variant
As a result we have summary statistics for thousands/millions of statistical testsLarge sample sizes needed for adequate power because of multiple testing penalty currently used significance threshold is 5e-8
23
24
How to achieve sample sizes of hundreds of thousands needed for genome-wide association analysis with adequate power?
Slide2525
How to achieve sample sizes of hundreds of thousands needed for genome-wide association analysis with adequate power?
Two commonly used methods: Meta-analysis or Biobank sample collection
Slide26Consortia Revolution
26
Small studies join forces to increase sample size -> Consortium
First ones were founded ~15 years ago
There is a Consortium for almost every major disease available
Some of the biggest Consortia today: Global Lipids Genetics Consortium (blood lipids)
Currently 1.2 million samples
CardioGramC4D (Coronary artery disease)
Currently over 1 million samples
GIANT (Anthropometric traits)Currently over 1 million samples
All of the above consortiums have over 30 million genetic mutations analyzed in their latest effort!!
Slide27Meta-analysis
Combines summary statistics for multiple separate datasets
Analysis must have been performed using same trait/disease with same analysis method
Usually done using inverse variance weighted fixed effects meta-analysis
For every variant separately:
,
is the effect estimate in dataset
i
,
and
standard error of the effect estimate in dataset
i
27
Slide28Example: Blood lipids (8,800 samples, 18 loci)
28
Global lipids genetics consortium
Slide29Example: Blood lipids (100k samples, 95 loci)
29
Global lipids genetics consortium
Slide30Example: Blood lipids (>1 million samples)
30
124
79
131
58
127
Sarah Graham
University of Michigan
Slide31Biobanks
The new wave in human geneticsLarge collections of samples with DNA information, dense phenotyping and electronic health records availableLargest examples:
UK-Biobank, 500,000 samples from United Kingdom
Biobank Japan, 200,000 samples from Japan
Million Veteran program, 300,000 United States Veterans
FinnGen, 500,000 samples from Finland (sample collection ongoing)
31
Slide32Current Stage of Complex Disease Genetics
32
Thousands of risk-altering genetic variants identified with Genome-wide Association analysis
Only small fraction with known biological effect
Most of this information not used in clinical practice
How to translate complex disease genetics into clinically relevant form?
https://
www.ebi.ac.uk
/
gwas
/diagram
Slide33Outline
Background for complex diseases
Genome-wide association analyses
Coronary heart disease prediction
Challenges in genetic prediction
Where to go from here?
33
Slide34Coronary Heart Disease Prediction
Clinical practice uses Framingham risk scoreSex specific calculator for 10-year coronary heart disease risk
Takes into account
Age
Total cholesterolSmoking
HDL cholesterolSystolic blood pressurePeter W. F. Wilson et al. (1998): Prediction of Coronary Heart Disease Using Risk Factor Categories, Circulation
https://www.ahajournals.org/doi/full/10.1161/01.cir.97.18.1837
34
Slide35Coronary Heart Disease Genetics
Contribution of genetics estimated to be 50-60%Over 160 genetic variants identified by the end of 2018
Largest published study with 123k CHD cases and 425k controls
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5805277
Most of the identified genetic variants also associated with cholesterol levels or other relevant risk factors
Genetics are currently not used in clinical practice
How could genetic information be included in the prediction models?
35
Slide36Coronary Heart Disease Prediction with Genetics
First publications combining both environmental and genetic factors published early 2010’s
36
Slide37Prediction Model
Cox proportional hazards model:
Where
are the values for predictors for individual
Notice the dependence on time,
, through the baseline hazard
The studies in the previous slide have used
environmental factors as predictors and compared how much the model improves when adding genetics into the model
How
do we quantify the genetic burden?
37
Slide38Polygenic Risk Score
Polygenic risk score is a way to summarize genetic burden for an individual
For the calculation we need
Summary statistics for the trait of interest (for ex. Consortia analysis results)
Dataset with genetic data available
Polygenic risk score (PRS) for individual is
where
is the effect estimate for variant
from the summary statistics,
is the number of risk alleles for individual
and genetic variant
38
Slide39Polygenic Risk Score
First polygenic risk scores only used genome-wide significant SNPs in the predictionCurrent state of the art PRSs include the whole genome informationThe information for the whole genome achieved by selecting all independent SNPs with effect on the trait/disease of interestLargest scores are using information on millions of SNPs at the same time -> very computationally heavy to calculate!!
The PRS approaches normal distribution with large number of SNPs
39
Slide40Polygenic Risk Score for CHD
40
Prevalence of CHD in PRS percentiles
PRS Percentile
Number of samples ~70,000
Number of SNPs in the score 6.6M
Slide41Polygenic Risk Score for CHD
41
Prevalence of CHD in PRS percentiles
PRS Percentile
3-fold difference!
Number of samples ~70,000
Number of SNPs in the score 6.6M
Slide42Polygenic Risk Score for CHD
42
CHD Cases
Dataset: HUNT
Genetic Risk Score for CAD
Controls
Slide43Polygenic Risk Score for CHD
43
CHD Cases
Dataset: HUNT
Genetic Risk Score for CAD
Controls
Notice the difference between young and old?
Slide44Polygenic Risk Score for CHD
44
Dataset: HUNT
Genetic Risk Score for CAD
CHD Cases
Controls
Slide45Polygenic Risk Score for CHD
45
Khera
AV. et al (2018): Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations
. Nature Genetics
50: 1219-1224
Slide46Polygenic Risk Scores for Risk Factors
46
Sinnot
-Armstrong N. et al. (2019): Genetics of 38 blood and urine biomarkers in the UK Biobank.
Bioarchives
.
doi
: https://
doi.org
/10.1101/660506
Slide47Polygenic Risk Scores for Risk Factors
47
Sinnot
-Armstrong N. et al. (2019): Genetics of 38 blood and urine biomarkers in the UK Biobank.
Bioarchives
.
doi
: https://
doi.org
/10.1101/660506
Slide48Outline
Background for complex diseases
Genome-wide association analyses
Coronary heart disease prediction
Challenges in genetic prediction
Where to go from here?
48
Slide49Translating Polygenic Risk into Clinics Requires …
More accurate prediction models Good population reference datasets for ex. FinnGen & Ukbb & MVP BioBanks
Easy user interface for the clinicians and patients
Training for Clinicians for understanding genetic risk
Education of future MDs in collaboration with Medical FacultyIdentifying the preventative actions that can be used to counter the risk
automated computation for the risk prediction49
Slide50Challenges
Population differences:Currently association summary statistics mainly for European ancestry populationsApplying SNP weights from different populations cause bias
Applying risk factor weights from different populations also cause bias
Monogenic vs polygenic
Both genetic modes should be included in the prediction
Prediction accuracy still very lowAs complex diseases are complex there is still a lot of components we don’t know, or don’t understand, mediating the risk
50
Slide51Outline
Background for complex diseases
Genome-wide association analyses
Coronary heart disease prediction
Challenges in genetic prediction
Where to go from here?
51
Slide52My Long Term Dream
52
Health checkup & blood test
Automated computing
Multi-ethnic reference database
Risk report interface
Personalized risks
Personalized prevention strategies
Medical counseling
Personalized prevention
Genotyping, omics, biomarkers, questionnaire, EHR-data
Slide53Personalized Prevention
If we take genetic prediction to clinical practice we need to have understanding on how to prevent diseaseWe cannot affect the genetic risk, only lifestyle or medication of the patient
Could biomarker risk scores be the answer to find the most suitable preventative action for an individual?
At which age should we screen for the individuals with high risk to have highest impact on the population health?
Ethical questions?
53
Slide54Acknowledgements
Cristen Willer
Sheunggeun
(Shawn)
Lee