/
Case-Control studies Bryce Seifert, Ph.D. Case-Control studies Bryce Seifert, Ph.D.

Case-Control studies Bryce Seifert, Ph.D. - PowerPoint Presentation

blanko
blanko . @blanko
Follow
66 views
Uploaded On 2023-08-25

Case-Control studies Bryce Seifert, Ph.D. - PPT Presentation

and Jen McGlaughon PhD Biocurator Working Group 102617 CaseControl Studies Case cohort Individuals ascertained from an institute academic medical center etc or from multiple centers based on a given phenotype or disease ID: 1014343

controls cases variants variant cases controls variant variants gene cancer disease individuals ratio analysis exac control matched test risk

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Case-Control studies Bryce Seifert, Ph.D..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Case-Control studiesBryce Seifert, Ph.D. and Jen McGlaughon, Ph.D.Biocurator Working Group10/26/17

2.

3. Case-Control StudiesCase cohortIndividuals ascertained from an institute, academic medical center, etc. or from multiple centers based on a given phenotype or disease.Could also be part of a defined population at risk (e.g. lifetime- or relative-risk studies).Usually based on clinical indicators of hereditary predisposition (e.g. Family history, young age of onset). Unselected(i.e. only ascertained based on phenotype and/or disease).Control cohortIndividuals that are typically unaffected, serve as comparison group (noncases). Some studies ascertain controls that lack the phenotype or disease of interest, but may not otherwise be unaffected for other phenotypes (e.g. cancer-free controls that have cardiac phenotypes). Relative risk analyses may include relatives of probandsMay or may not be matched to the case cohort.Meta-analyses: case and control cohorts typically drawn from multiple studies and published as one large set.Fletcher et al., 1996

4. Quality CriteriaStudy type: Single Variant Analysis or Aggregate AnalysisVariant Detection MethodologyWere both cases and controls sequenced for the entire gene under analysis or genotyped for the variant under analysis?Pay attention to the Methods section (sequencing, genotyping subsection) – example: cases sequenced for all exons in PALB2 and a nonsense variant is identified. The specific variant is subsequently genotyped in controls without sequencing all PALB2 exons. Cases and controls are not matched for variant detection methodology.PowerAnalysis of a sufficient number of cases and controls – dependent on prevalence of disease, allele frequency, effect size of gene being analyzed.Bias/ConfoundingAre controls matched to cases based on ethnicity, location of recruitment, genetic ancestry, age, gender? Cases and controls equivalently evaluated for presence/absence of phenotype, family history?Previous testing conducted on cases?Cases ascertained due to being from families with disease history (ascertainment bias)?Statistical significanceWeigh this value carefully, considering the strength of statistical association. If multiple pair-wise comparisons are conducted, check for multiple test correction.Cohort composition: if cases overlap across studies, be sure not to double-count studies (e.g. meta-analyses encompassing 5 earlier studies, do not count the 5 earlier studies and the meta-analysis, but just one or the other).

5. Measures of association and riskOdds Ratio (OR) – measures the frequency of variants among cases versus controls (a stronger association between variant(s) in a gene and disease results in a higher odds ratio). HBOC and Colorectal cancer/Polyposis Gene Disease Curation Teams (GDCT) employ cut-off OR = 2.0 and 95% confidence interval that does not cross below 1.0, p-value <0.05 or <0.001 depending on the study.Disease-specific OR and confidence interval thresholds should be discussed with GDCT experts.Mantel Haenszel Odds Ratio: often seen in meta-analyses. Pooled OR across multiple studies (fixed effects models assumed).Fisher’s exact test – measures if the proportion of cases with variant(s) is significantly different from the proportion of controls with variant(s)P-valueSimilar to Chi-Squared testFletcher et al., 1996; McDonald, 2014https://www.statsdirect.com/help/meta_analysis/mh.htm

6. Relative risk (RR): How likely are individuals with variant(s) to become affected, relative to individuals that lack the variant(s). E.g. RR = 2, Heterozygous variant carriers are 2 times more likely to develop breast cancer than individuals that lack the variant(s).Standardized risk ratio (SRR): ratio of the weighted average of category-specific risks (e.g. ratio of observed frequency of pathogenic variants in cancer cases by summed frequency all pathogenic variants in a control population). Standardized incidence ratio (SIR): ratio of the observed number of disease cases/expected number of disease cases (note: expected number is what might be expected if the population had the same experience as a some larger comparison population)Hazard ratio (HR): how often disease happens in one group (e.g. individuals with pathogenic variants) versus another group (e.g. individuals without variants), over time.Measures of association and riskFletcher, et al., 1996; Flanders and Rhodes, 1987; Ressing et al., 2010https://www.cancer.gov/publications/dictionaries/cancer-terms?cdrid=618612

7. Example: Breast cancer

8. PMID: 28418444; Couch et al. 2017, ATM:Breast cancerAggregate Variant Analysis of 29,229 Caucasian breast cancer cases referred for panel testing to Ambry. Cases sequenced by NGS for 16 susceptibility genes and compared to 26,644 ExAC-Non-Finnish European (NFE) controls (TCGA excluded).Variants were Pathogenic (P) or Likely Pathogenic (LP) in cases by Ambry’s classification system and Control variants in individuals in ExAC deemed P/LP by Ambry’s system. Control P/LP variants also included nonsense, frameshift, consensus splice-site variants. Additionally, controls included individuals with missense variants deemed P/LP in ClinVar by clinical laboratories.

9. Factors to considerVariant detection methodology: ATM sequenced in cases and controls (matched)Power: total number of cases and controls sufficient for breast cancer.Statistical significance: OR =2.78, 95% CI: 2.22-3.62, P-value: 2.42 X 10 -19Bias/Confounding: controls matched to cases based on ethnicity, but not genetic ancestry, location of recruitment, age.4 points due to power of the study and use of ethnicity matched controls

10. Example: Colorectal cancer

11. Case-control study: colorectal cancerPMID: 27329137; Chubb, 2016Cases: 1,028 unrelated patients with CRC, ≤55 years at Dx, who had at least 1 first-degree relative with CRC (1,006 after QC).UK residents with self-reported European ancestry.Controls: 1,644 healthy individuals from the UK (1,609 after QC).Using HapMap data, they excluded individuals who had non-northern European ancestry.

12. MethodsPMID: 27329137; Chubb, 2016WES was performed on cases and controls.First examined individual, moderately low-frequency coding variants (MAF 1-5%) for an association with CRC risk.To maximize statistical significance, performed meta-analysis on additional 5,552 cases and 6,792 controls that were genotyped.Next examined impact of rare alleles (MAF<1%) collectively within a gene on CRC risk by aggregating SNVs and indels in each gene and comparing the counts between cases and controls.To search for novel CRC susceptibility genes, confined analysis to 863 cases w/o a probable disease-causing mutation in an established CRC gene.Used WGS on an additional 188 cases and 3,582 UK controls.

13. Gene-centric analysisPMID: 27329137; Chubb, 2016Performed a collapsing burden test imposing a max MAF threshold of 1% (T1 test) then applied the T1 test to 3 different types of variants:(1) Disruptive, (2) Predicted damaging, (3) Non-synonymous (all coding non-synonymous variants).Exome wide significance was considered to be P=8.0x10-7, corresponding to a Bonferroni correction for the testing of ~20,000 genes and three variant sets.MLH1 P=6.67x10-7

14. Factors to considerPMID: 27329137; Chubb, 2016Variant detection methodology: WES was performed for cases and controls.Power: CRC is common, so it may have been helpful to have more cases, but experts agreed this was sufficiently powered.Bias and confounding factors: Cases and controls were matched by location and ancestry. Controls that developed cancer during the study were excluded.Calculated a similarity metric to assess identity-by-state; no related individuals were identified.Controls at time of submission of paper would have been ~58 years (2016-1958 = 58 years), the mean age of the cases was 47±6 years.Statistical significance: P value for MLH1 fell below the threshold (P=8x10-7) for the multiple comparison testing.5 points assigned

15. Example: Serrated Polyposis

16. 24512911; Gala et al. 2014, RNF43:Polyposis20 unrelated cases from Massachusetts General Hospital with sessile serrated adenomas that fulfilled modified WHO diagnostic criteria for serrated polyposis. 4300 European Americans from NHBLI ESP as controls. 2/20 cases with RNF43 truncating variants, 1/4300 controls with RNF43 truncating variants. “The cohort allelic sums test was employed to assess enrichment of strong LoF mutations of the gene set in cases compared to controls”.

17. Factors to considerVariant detection methodology: RNF43 sequenced in cases and controls (matched).Power: Case cohort very small, even though serrated polyposis was phenotype of interest. 555 gene comparisons made, multiple test correction resulted in P-value <9X10-5 as significant. Statistical significance: OR, 460; 95% CI, 23.1–16,384; P = 6.8 E -5. Bias/Confounding:Large control set may have inflated Odds Ratio, controls somewhat ethnicity-matched.

18. Example: Hypertrophic cardiomyopathy

19. Case-control study: hypertrophic cardiomyopathyPMID: 27532257; Walsh, 2015Cases: Data from 3,267 individuals diagnosed with HCM, 559 with DCM, and 361 with ARVC obtained from Oxford Medical Genetics Laboratory.Data from LMM were downloaded from previous publications (632-2,912 HCM patients, 121-756 DCM patients).Controls: Data were downloaded from ExAC database.Applied a MAF threshold of 1x10-4 as a conservative upper bound because variants more frequent than this is the general population would likely not be highly penetrant pathogenic mutations.Most common pathogenic variant in both clinical cohorts was MYBPC3 c.1504C>T (p.Arg502Trp) which was found in 1.7% HCM cases and 2.5x10-5 in ExAC.

20. MethodsPMID: 27532257; Walsh, 2015For each gene, frequency of rare variants (MAF<1x10-4) in ExAC was calculated by dividing the sum of the adjusted allele count by the mean of the total adjusted alleles.In cardiomyopathy cohorts, it was calculated by the total number of patients analyzed for each gene.For each gene, the frequency of rare variation in the clinical cohort was compared with that in ExAC.Case excess defined by subtracting proportion of individuals in ExAC with a filtered variant from the proportion in the clinical chorts.For each gene and variant class, calculated:Odds ratio (OR)- ratio of odds of cardiomyopathy comparing rare variant carriers with noncarriers.Etiological fraction (EF)- estimates the proportion of cases in which the exposure (a rare variant in a gene) was causal.

21. ResultsPMID: 27532257; Walsh, 2015Figure 2 Odds ratios (OR) with 95% confidence interval (CI) for each gene tested in the hypertrophic cardiomyopathy (HCM) (n = 632 to 6,179), dilated cardiomyopathy (DCM) (n = 121 to 131), and arrhythmogenic right ventricular cardiomyopathy (ARVC) (n= 93 to 361) clinical cohorts compared with Exome Aggregation Consortium (ExAC) reference samples (n = mean of total adjusted allele count for rare variant carriers. for HCM genes, n = 47,153 to 60,647; for DCM genes, n = 42,697 to 60,647; and for ARVC genes, n = 51,126 to 60,218). See Supplementary Table S5a–c online for data used to generate this plot. Data have been plotted (log10 scale) for all protein-altering variants (black) and truncating variants (blue). For truncating variants, OR with 95% CI have been plotted for genes where a statistically significant difference was observed for this variant type on FET. *Statistically significant Fisher’s exact test (FET) (P=0.05 with Bonferroni correction; HCM P ≤ 0.0025; DCM P ≤ 0.001; and ARVC P ≤ 0.006.). ^Genes analyzed in fewer than 200 cases. CTF1 and RBM20 were removed from analysis owing to poor coverage in ExAC.

22. Factors to considerPMID: 27532257; Walsh, 2015Variant detection methodology: Sequencing data from diagnostic labs for cases, exome sequencing for controls.Power: HCM is relatively common and is estimated at 1:500 in the general population.Applied a max frequency based on the most pathogenic mutation.Assessed 7,855 cardiomyopathy samples and 60,706 reference samples.Bias and confounding factors: Controls (ExAC) not closely matched to cases.Unsure of the quality of phenotype evaluation of controls. Statistical significance: Fisher’s exact test to determine the significance of observed excess in cases, OR, and EF.2 points assigned

23. ReferencesRessing, M., Blettner, M., Klug, SJ. Data analysis of epidemiological studies: part 11 of a series on evaluation of scientific publications. Dtsch Arztebl Int. 2010 Mar;107(11):187-92. doi: 10.3238/arztebl.2010.0187. Flanders, W.D., and Rhodes, P.H. Large sample confidence intervals for regression standardized risks, risk ratios, and risk differences. J Chronic Dis. 1987;40(7):697-704. PMID:3597672Fletcher, R.H., Fletcher, S.W., and Wagner, E.H., Clinical Epidemiology: The Essentials. 3rd ed., Baltimore: Williams & Wilkins, 1996. Accessed online 26 October 2017 (http://202.74.245.22:8080/xmlui/bitstream/handle/123456789/405/Clinical%20Epidemiology%20by%20Robert%20H.%20Fletcher.pdf?sequence=1). McDonald, J.H. Handbook of Biological Statistics. 3rd ed., Baltimore: Sparky House Publishing, 2014. 77-85. Accessed online 24 October 2017 (http://www.biostathandbook.com/fishers.html). https://www.cancer.gov/publications/dictionaries/cancer-terms?cdrid=618612https://www.statsdirect.com/help/meta_analysis/mh.htm