/
Data  Standards and Statistical Data  Standards and Statistical

Data Standards and Statistical - PowerPoint Presentation

WheresMyPizza
WheresMyPizza . @WheresMyPizza
Follow
343 views
Uploaded On 2022-08-01

Data Standards and Statistical - PPT Presentation

Issues for Immunogenetic Data Richard M Single Associate Professor of Statistics Department of Mathematics amp Statistics University of Vermont HLA nomenclature Why it matters for analysis and interpretation ID: 931910

data hla amp kir hla data kir amp alleles level evidence bw4 genotype selection analysis chrom genomic ambiguity correlation

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data Standards and Statistical" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Data Standards and Statistical Issuesfor Immunogenetic Data

Richard M. Single

Associate Professor of Statistics

Department of Mathematics & Statistics

University of Vermont

Slide2

HLA nomenclature: Why it matters for analysis and interpretationChallenges for combining HLA data from different sourcesData Standardization to facilitate meta-analyses and reproducibility Developing a community standard for HLA & KIR data

reporting

Overview of HLA data curation & ambiguity resolutionExample, Immport, Next steps: GL strings & QR codesHLA (chrom 6) and KIR (chrom 19) interactions A brief overviewHLA and KIR: population-level evidence of co-evolutionPopulation-genetic evidence of co-evolutionRandomization tests and genomic controls

Outline

Slide3

HLA Nomenclature and why it matters

MHC

Slide4

HLA Nomenclature and why it mattersChallenges for HLA data management and analysisThe HLA genes are very polymorphic;HLA nomenclature is complicated;There are multiple ways to generate HLA data;All common typing systems generate ambiguous data;There are multiple ways to report alleles and ambiguities;These issues make meta-analyses of HLA data from different sources very difficult.

Slide5

Klein J. et al New Eng J Med, 2000; 343:702-709

An extremely gene-rich region.

Slide6

Structure of HLA molecules

HLA molecules are cell-surface proteins

that present peptide fragments to T-cellsThey bind specific sets of peptides based on structure

Slide7

7

90

73

77

80

Ribbon drawing from Hedrick et al. PNAS, 88, 5897-5901

HLA-C binding pocket

Slide8

DP

DQ

DRB

C

A

50 kb

850 kb

100 kb

1270 kb

class II loci

class I loci

B1

A1

B1

A1

B1

A

400 kb

250 kb

1612

2211

1280

2

980

31

216

19

153

IMGT/HLA

Database Release 3.12.0 April 17,

2013

HLA classical loci and polymorphism

Protein-level allele numbers

:

Slide9

HLA-A * 24 : 02 : 01 : 02 : LLocus

Field 1

(2-Digit)Serological level(where possible)Field 2 (4-Digit)Peptide level(amino acid difference)Field 3(6-Digit)Nucleotide level[silent](synonymous substitutions)

Field 4

(8-Digit)

Intron level

(3’ or 5’

polymorphism)

Expression

N = null

L = low

S = soluble

For most analyses, we want to distinguish among unique peptide

sequences,

i.e., 2

fields

(“4-digit”) level

This level of resolution treats alleles with the same peptide sequence for

exons 2 & 3 (class I) or exon 2 (class II) as being equivalent [“

binning

” alleles]

HLA Allele Nomenclature

Slide10

HLA alleles are defined by a “patchwork” of sequence-level polymorphisms. Most typing systems do not interrogate the same set of polymorphisms - e.g., DRB1*14:01:01 vs. *14:54 differ only in exon 3

There is currently no simple way to identify which alleles could (could not) have been detected by a given typing system. HLA Nomenclature & Polymorphism

Slide11

Distinctive Geographical Distribution of subtypes of HLA-DRB1*08

Slide12

HLA nomenclature: Why it matters for analysis and interpretationChallenges for combining HLA data from different sourcesData Standardization to facilitate meta-analyses and reproducibility

Developing a community standard for HLA & KIR data

reportingOverview of HLA data curation & ambiguity resolutionExample, Immport, Next steps: GL strings & QR codesHLA (chrom 6) and KIR (chrom 19) interactions A brief overviewHLA and KIR: population-level evidence of co-evolutionPopulation-genetic evidence of co-evolutionRandomization tests and genomic controls

Outline

Slide13

Data Standardization to facilitate Meta-analysesData standardization methods …Document the typing method (SSOP, SSP, SBT, …), version, exons interrogated, and the set of detectable alleles: Perform data validation by checking against IMGT & IPD-KIR allele lists

 allow re-evaluation of raw data in future contexts  allow information/results to be combined across datasets more easily

Slide14

Extending STREGA to Immunogenomic StudiesThe STrengthening the REporting of Genetic Association studies (STREGA) statement provides community-based data reporting and analysis standards for genomic disease association studiesThe IDAWG (immunogenomics.org) has proposed an extension of STREGA:

ST

rengthening the REporting of Immunogenomic Studies (STREIS)

Slide15

From STREGA to STREISExtensions to the

STREGA

guidelines for immunogenomic data include:Describing the system(s) used to store, manage, and validate genotype and allele dataDocumenting all methods applied to resolve ambiguity Defining any codes used to represent ambiguitiesDescribing any binning or combining of alleles into common categoriesAvoiding the use of subjective terms (e.g. high-resolution typing), that may change over time

Slide16

HLA nomenclature: Why it matters for analysis and interpretationChallenges for combining HLA data from different sourcesData Standardization to facilitate meta-analyses and reproducibility

Developing a community standard for HLA & KIR data

reportingOverview of HLA data curation & ambiguity resolutionExample, Immport, Next steps: GL strings & QR codesHLA (chrom 6) and KIR (chrom 19) interactions A brief overviewHLA and KIR: population-level evidence of co-evolutionPopulation-genetic evidence of co-evolutionRandomization tests and genomic controls

Outline

Slide17

Allele-level AmbiguityGroup codes (“g”-codes) for alleles identical in exons 2 & 3 for class I, or exon 2 for class II.A*0201/ 0209/ 0243N/ 0266/ 0275/ 0283N/ 0289 = “A020101g”

NMDP ambiguity codes for

4-digit non-null allelesA*0201/0209 = A*02AFA*0201/0209/0266 = A*02AJEYA*0201/0209/0266/0275/0289 = A*02BSFJAmbiguous allele setsA*0201/ 0209/ 0243N/ 0266/ 0275/ 0283N/ 0289Ambiguous alleles result from polymorphisms outside of assessed regions; outside of exons 2 & 3, or in sections of those exons that were not interrogated.

Slide18

Genotype-level AmbiguityAmbiguous genotypes result from an inability to establish the phase of individual polymorphisms or entire exons.Different combinations of alleles can lead to the same typing result.Example: A typing result for one individual that could be explained by any of four different possible genotype sets at HLA-B.

Genotype 1

27054402Genotype 227054411

Genotype 3

2709

4402

Genotype 4

2709

4411

B*2705 + B*4402 or

B*2705 + B*4411 or

B*2709 + B*4402 or

B*2709 + B*4411

Most analytical methods require a single genotype call for each individual sample.

Slide19

Standardized Ambiguity Reduction2703, 270502, 270503, 270504, 270505, 270506, 270508, 2710, 2713, 2717

44020101, 44020102S, 440203

, 4419N, 4423N, 4424, 4427, 44332703, 270502, 270503, 270504, 270505, 270506, 270508, 2710, 2713, 2717

440202

,

4411

2709

44020101, 44020102S, 440203

,

4419N

, 4423N,

4424

,

4427

,

4433

2709

440202

,

4411

HLA-B allele 1

HLA-B allele 2

Genotype 1

Genotype 2

Genotype 3

Genotype 4

Sample #001

Peptide-level

Filtering,

Remove non-CWD alleles,

Binning alleles identical over exons 2&3

Unambiguous

data

2703, 2705

4402

Regional population-level frequency data

Slide20

xxx

2703, 2705

440227054402

immunogenomics.org

Slide21

Slide22

Genotype List (GL) StringsUse a hierarchical set of operators to describe the relationships between alleles, lists of possible alleles, phased alleles, genotypes, lists of possible genotypes, and multilocus unphased genotypes, without losing typing information or increasing typing ambiguity.Are proposed to replace NMDP codes

Milius

et al. (2013) Tissue Antigens

Slide23

Genotype List (GL) StringsExample GL string for the genotype:A*02:69 + A*23:30 or A*02:302 + A*23:26 or A*02:302 + A*23:39

B*44:02

+ B*49:08and

Slide24

Immunology Database and Analysis Portal (www.ImmPort.org)

Developed

under the Bioinformatics Integration Support Contract (BISC) for NIH, NIAID, & DAIT (Division of Allergy, Immunology, and Transplantation)Data validation pipelineAnalysis toolsStandardized ambiguity reduction tools Data from a large number of immunogenomic studiesImmunoGenomics Data Analysis Working Group (www.immunogenomics.org) (www.IgDAWG.org)

An international collaborative group working to …

facilitate the sharing of

immunogenomic

data

(HLA, KIR, etc.) and

foster consistent analysis and interpretation of

immunogenomic

data

Resources

for

HLA Data Validation & Analysis

Slide25

Slide26

Slide27

HLA nomenclature: Why it matters for analysis and interpretationChallenges for combining HLA data from different sourcesData Standardization to facilitate meta-analyses and reproducibility

Developing a community standard for HLA & KIR data

reportingOverview of HLA data curation & ambiguity resolutionExample, Immport, Next steps: GL strings & QR codesHLA (chrom 6) and KIR (chrom 19) interactions A brief overviewHLA and KIR: population-level evidence of co-evolutionPopulation-genetic evidence of co-evolutionRandomization tests and genomic controls

Outline

Slide28

The KIR gene complex is located on

C

hromosome 19 (19q13.4)KIR are expressed on natural killer (NK) cells and a subset of T cellsCertain HLA alleles serve as ligands for KIR KIR Gene Function Ligand 2DL1

Inhibitory

HLA-C group2

2DS1

Activating

HLA-C group2

2DL2/3

Inhibitory

HLA-C group1

2DS2

Activating

HLA-C group1

3DL1

Inhibitory HLA-Bw4

3DS1

Activating

HLA-Bw4

Killer cell

Immunoglobulin-like Receptor

(KIR

)

Slide29

NK Cell

Normal

Cell

No Lysis

Dominant inhibition

iKIR

HLA

Act. rec.

Protection

ligand

Lysis

Cytokines

Missing-self recognition

NK Cell

iKIR

Act. rec.

HIV

+

Targets

ligand

KIR regulate NK cell activity

Slide30

HLA-C alleles can be divided into two groups based on the amino acid at position 80 (& 77), which determines KIR recognition

Ser

77Asp80Cw1 Cw3 Cw7 Cw8 Cw12Cw13Cw14

HLA-C1

KIR2DL3/2DL2

NK cell

inhibition

HLA-C2

Asp

77

Lys

80

Cw2

Cw4

Cw5

Cw6

Cw15

Cw17

KIR2DL1

Slide31

Bifurcation of HLA-B allotypes

HLA-B

Bw4 (40%)

Bw6 (60%)

KIR3DL1 ligands

KIR3DS1

Not a ligand for KIR

80I

80T

Slide32

HLA nomenclature: Why it matters for analysis and interpretationChallenges for combining HLA data from different sourcesData Standardization to facilitate meta-analyses and reproducibility

Developing a community standard for HLA & KIR data

reportingOverview of HLA data curation & ambiguity resolutionExample, Immport, Next steps: GL strings & QR codesHLA (chrom 6) and KIR (chrom 19) interactions A brief overviewHLA and KIR: population-level evidence of co-evolutionPopulation-genetic evidence of co-evolutionRandomization tests

and genomic controls

Outline

Slide33

KIR & HLA in 30 Global Populations

Slide34

Several studies hypothesized selection for KIR that suit the locale-specific HLA repertoire.Disease association studies point to HLA-Bw4 alleles with Isoleucine at position 80 (“Bw4-80I”) as the strongest ligand for KIR3DS1Population-level evidence for Co-evolution & Natural Selection for KIR and HLA

Slide35

KIR2DL3 vs. HLA-Cgroup1

r = 0.184

KIR3DL1 vs. HLA-Bw4r = 0.426KIR2DL1 vs. HLA-Cgroup2r = 0.046Inhibitory

KIR

Correlations between frequencies for

KIR and HLA Ligands

Slide36

Correlations between frequencies for KIR and HLA Ligands

KIR3DS1 vs. HLA-Bw4

r = -0.632KIR2DS1 vs. HLA-Cgroup2r = -0.478KIR2DS2 vs. HLA-Cgroup1r = -0.371

Activating

KIR

Slide37

Correlations between frequencies for KIR and HLA LigandsActivating KIR3DS1

Subsets of Bw4 alleles based on amino acid position 80

KIR3DS1 vs. HLA-Bw4

r = -0.632

KIR3DS1 vs. HLA-Bw4-

80I

r = -0.657

KIR3DS1 vs. HLA-Bw4-

80T

r = -0.190

Single

et al

., Nature

Genetics

Slide38

Challenges for these and other population studiesDemographic history shapes patterns of variation & can mimic the effects of selection.

Gene frequencies are not statistically independent among populations,

due to shared demographic history.Ordinary Pearson correlation p-values assume independence among the observations.We constructed a randomization test to account for the demographic histories of the populations and focus on the genetic effect.

Statistical Issues

Slide39

Assessing the significance

ρ

= cor(X,Y) Null Hypothesis: H0: ρ = 0Statistic: Pearson’s correlation coefficient

Hypothesis Test for a Correlation Coefficient

X

Y

4.1

4.9

8.6

5.4

2.3

4.2

5.4

7.4

9.2

8.8

7.7

6.7

6.4

8.8

4.3

5.1

7.6

9.4

3.4

5.3

Slide40

Randomization Test

Bw4

alleles: 1301, 1302, 1516, 1517, 2702, 2703, 2704, 2705, 3701, 3801, 3802, 4402, 4403, 4404, 4405, ...Bw6 alleles: 0702, 0705, 0799, 0801, 1401, 1402, 1403, 1501, 1502, 1503, 1504, 1506, 1507, 1508, 1510, ...Reassign Bw4/Bw6 status to simulate the null hypothesisCompute correlation of frequencies for KIR-3DS1 & reassigned HLA

Slide41

Permutation Distribution

KIR3DS1 – HLA-Bw4 correlation

 Permutation p-value=0.012

r = -0.632

Slide42

Empirical comparisons based on genomic data or other methods that incorporate information about the demographic histories of populations (Pritchard and Donnelly, 2001).

Our

study used data from the ALFRED database to assess statistical significance http://alfred.med.yale.eduWe selected 538 neutral sites from 202 genes typed in the same individuals Genomic Controls

Slide43

Genomic Data

Slide44

Randomly select two SNP sites from different chromosomes Find the frequencies in each population and compute the correlationRepeat

Genomic Data for Empirical Tests

Slide45

KIR3DS1 – HLA-Bw4 correlation

 empirical p-value=0.041

r = -0.632

Genomic Data – Empirical Distribution

Slide46

* Ordinary Pearson p-values in red overestimate the significance of trendslocus pair

Correlation

p-value (1)(correlation)p-value (2)(permutation)

p-value

(3)

(empirical)

3DS1 - Bw4

-0.632

0.000

0.012

0.041

3DS1 - Bw480I

-0.657

0.000

0.009

0.038

3DS1 - Bw480T

-0.190

0.316

0.532

0.534

3DL1 - Bw4

0.426

0.019

0.106

0.218

3DL1 - Bw480

I

0.416

0.022

0.1

15

0.1

9

1

3DL1 - Bw480

T

0.171

0.367

0.

540

0.

7

5

8

2DS1 - C2

-0.478

0.008

0.243

0.149

2DL1 - C2

0.046

0.810

0.891

0.924

2DL2 - C1

-0.366

0.047

0.193

0.542

2DL3 - C10.1840.331

0.4580.328

2DS2 - C1-0.371

0.044

0.1700.479

P-correlation is the ordinary Pearson product-moment correlation p-value.(2) P-permutation is based on the permutation distribution under the null hypothesis.(3) P-empirical is based on the empirical distribution for unlinked SNPs from ALFRED.

Significance of Correlations

*

Slide47

HLA nomenclature: Why it matters for analysis and interpretationChallenges for combining HLA data from different sourcesData Standardization to facilitate meta-analyses and reproducibility

Developing a community standard for HLA & KIR data

reportingOverview of HLA data curation & ambiguity resolutionExample, Immport, Next steps: GL strings & QR codesHLA (chrom 6) and KIR (chrom 19) interactions A brief overviewHLA and KIR: population-level evidence of co-evolutionPopulation-genetic evidence of co-evolution

Randomization tests

and genomic controls

Outline

Slide48

AcknowledgementsNCIMary CarringtonPat MartinGao

XiaojiangUSPDiogo MeyerRodrigo dos Santos FranciscoYale UniversityKen and Judy KiddChildren's Hospital Oakland Research Inst.Steven J. Mack

Jill A.

Hollenbach

Harvard Medical School

Alex Lancaster

UC San Francisco

Owen Solberg

Roche Molecular Systems

Henry A.

Erlich

Anthony Nolan Research Inst.

Steven G.E. Marsh

NCBI/NIH

Mike

Feolo

NGIT

Jeff Wiser

Patrick Dunn

Tom Smith

Slide49

If time allows …

Slide50

The two most common measures of the strength of LD are: the normalized measure of the individual LD values, namely Dij

' =

Dij / Dmax (Lewontin 1964); and (2) the correlation coefficient r for bi-allelic data, which is most often reported as r2 = D2 / (pA1 pA2 pB1 pB2). r =1 only when the allelic variations at the two loci show 100% correlationTheir multi-allelic extensions are:Linkage Disequilibrium (LD) Measures

Slide51

Standard LD measures D’ and WnStandard LD measures (overall D’ & Wn) assume/force symmetry,

even though with >2 alleles per locus that is not the case

Data Source: Immport Study#SDY26: Identifying polymorphisms associated with risk for the development of myopericarditis following smallpox vaccine

Slide52

Asymmetric Linkage Disequilibrium (ALD)Interpretation: ALD for HLA-DRB1 conditioning on HLA-DQA1 WDRB1 /

DQA1

= .58ALD for HLA-DQA1 conditioning on HLA-DRB1 WDQA1 / DRB1 = .95 The overall variation for DRB1 is relatively high given specific DQA1 alleles.The overall variation for DQA1 is relatively low given specific DRB1 alleles.ALDrow gene conditional on column geneThomson and Single, 2014 Genetics

Slide53

Balancing selection can result from:- Overdominance/Heterozygote advantage- Frequency-dependent

selection

- Selective regimes that change over time/spaceFor HLA, the common factor in these models is rare allele advantage, which is consistent with a pathogen-directed frequency-dependent selection model.At the Amino Acid (AA) level we seeHigh AA variability at antigen recognition sites (ARS)Relatively even AA frequencies at ARS sitesHigher rates of non-synonymous vs. synonymous changes at ARSBalancing Selection Operates at Most HLA LociMeyer & Mack, 2008

Slide54

Homozygosity (F) and theNormalized Deviate (Fnd)

Neutrality

FOBS

F

EQ

F

nd

0

Directional Selection

F

OBS

>

F

EQ

F

nd

> 0

Balancing Selection

F

OBS

<

F

EQ

F

nd

< 0

Fnd

=

(

F

OBS

-

F

EQ

)

/

SD(FEQ)

Slide55

Fnd for DRB1 AA sites in a EUR populationFnd << 0 gives evidence of possible balancing selection.Fnd >> 0 gives evidence of possible directional selection.

Slide56

LD for DRB1 AAsWn

ALD

row gene conditional on column geneAsymmetric LD (ALD)Wn (symmetric)

Slide57

Fnd for DRB1 AA sites (Meta-Analysis)Fnd for all polymorphic sites in a meta-analysis of 57 populationsFnd << 0 gives evidence of possible balancing selection.

Fnd

>> 0 gives evidence of possible directional selection.

Slide58

Asymmetric Linkage Disequilibrium (ALD)

Thomson and Single(2014) Genetics

Slide59

Asymmetric Linkage Disequilibrium (ALD)

Thomson and Single(2014) Genetics