/
Expanded  marker panels (Part 2) Expanded  marker panels (Part 2)

Expanded marker panels (Part 2) - PowerPoint Presentation

emily
emily . @emily
Follow
343 views
Uploaded On 2022-06-28

Expanded marker panels (Part 2) - PPT Presentation

Daniel Kling amp Andreas Tillmar danielklingrmvse andreastillmarrmvse Agenda 10am12pm Small ForenSeq ICMP panel Markers Typing technology Statistical concepts of relevance ID: 926654

individual snp 1individual markers snp individual markers 1individual phasing dna data genetic allele errors worked location statistical segment marker

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Expanded marker panels (Part 2)" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Expanded marker panels (Part 2)

Daniel Kling & Andreas

Tillmar

daniel.kling@rmv.se

andreas.tillmar@rmv.se

Slide2

Agenda 10am-12pm

«Small»

(ForenSeq, ICMP panel)

Markers

Typing technology

Statistical concepts of relevance

Statistical evaluation/interpretation

«Large»

(Microarray, WGS, other >100K markers

)

Markers

Typing technology

Statistical concepts of relevance

Statistical evaluation/interpretation (incl genetic genealogy

)

Discussion

Slide3

Large panels

>10k markers (generally >100k markers)

Markers are mainly SNPsCovers the whole genome

Markers can be associated to phenotypic traits

We can use imputation to get even more data

We can phase data

Slide4

Technologies

Typing and bioinformatics

Slide5

Sequencing

Microarray (chip)

+ Cheap (probably <$50)+ Used by 23andMe, Ancestry.com

- «Huge» amount of DNA

Expensive ($1000-$2000)

Low coverage

-

B

ioinformatics

!

+ Minute amounts of DNA

+ Base pair information (coverage)

Slide6

Microarrays in brief

Commercial chips (

mainly

Illumina

and

Affymetrix

)

Up to millions of

SNPs

Calling is

based

on

intensities

Typically

used

for

good

quality

reference

samples

Price as

low

as $30

per

sample

Slide7

Microarrays in brief

Microarrays

can

be

used

on

samples

with

less

than

1

ng/ul

of DNA

Slide8

Microarrays in brief

Example

output

Homozygote genotype AA

Slide9

Whole genome sequencing in brief

Amplification

of the

entire

genome

(

fragmented

)

Sequence

everything

!

Sequenced

on

NextSeq

or

NovaSeq/HiSeq

Typically

30X (

average

)

coverage

Genotypes

are

called

based

on reads

Data

for areas of

interest

(

SNPs

,

STRs

etc

)

can

be

extracteed

Slide10

Whole genome sequencing in brief

Example

output

Chr1:55518316 (rs2483205)

Base

Reads

Avg

quality

A

1

0

C

10

36

G

0

-

T

15

36

Genotype

is C/G

Slide11

Capture sequencing in brief

Amplification

of the

entire

genome

(

fragmented

)

Capture

and

sequence

areas of

interests

(

SNPs

,

STRs

etc

)

Can be

sequenced

on

MiSeq

and/or S5

Coverage

depending

on

size

of

capture

panel

Amplicon

or

Hybrydization

based

Bioinformatics

similar

to WGS

Slide12

Probabilistic genotyping

Conditional

genotype

probabilities

Pr(

CC|observations

)=0.05

Pr(

CT|observations

)=0.9

Pr(

TT|observations

)=0.05

Others < 0.0001

Base

Reads

Avg

quality

A

1

0

C

10

36

G

0

-

T

15

36

Slide13

Inferense of relationships

Methods and approaches

Slide14

Brief genetics

Physical

location

(

bp

)

Genetic

location

(

cM

)

Chr1

rs12564807

rs7538305

rs3131972

rs12124819

Marker

bp

cM

rs12564807

734462

0.7

rs3131972

752721

0.8

rs12124819

776546

0.85

rs7538305

824398

0.98

Slide15

Brief genetics

Physical

location

(

bp

)

Genetic

location

(

cM

)

Chr1

rs12564807

rs7538305

rs3131972

rs12124819

Marker

bp

cM

rs12564807

734462

0.7

rs3131972

752721

0.8

rs12124819

776546

0.85

rs7538305

824398

0.98

Markers rs12564807 and rs7538305

are separated by 89 936 base pairs,

The genetic distance is 0.18 cM

Slide16

Brief genetics

Physical

location

(

bp

)

Genetic

location

(

cM

)

Both

measures

can

be

used,”cM

location

informs

us

of

how

linked markers are in

statistical

calculations

whereas

bp

location

informs

us

where

in the

genome

markers are

located

.

Slide17

Identity of alleles

Identical

by

state

(IBS)

Identical

by

descent

(IBD)

S1

M

F

S2

M1:

A

/

a

M1:

A

/

a

M1:

A

/

a

M1:

A

/

a

Zero

alleles

IBD,

two

IBS

Slide18

You

share

only

about

0.8% of your DNA with your

Third

cousins

(3C)

So

if

we

have

100 STR markers,

they

would

share

an

allele

in 1 of

those

Slide19

Relationships

Note

: S2-2 has the same number

of

generation as S1-3

Slide20

Methods to

infer relationships

Key points

LR compares two hypotheses

Segment measures length of half-identical streches

IBS uses shared alleles to infer IBD

Slide21

We are looking for a way to infer the degree of relationship (or the most likely degree) based on observed DNA data (IBS)

Slide22

Likelihood ratio

Compares two hypothesesComputes a probabilityRelies on estimates of allele frequenciesCan detect rare alleles shared through ancestrySensitive to linkage and LD

Requires computer-intense statistical models

Slide23

Likelihood ratio

Potential of likelihood ratiosFalse positives

More markers = higher chance of conclusion

and more false positives

Slide24

IBS (KING)

Computes two measures of relatednessKinship coeffecient = Pr(one allele IBD)/4+Pr(two alleles IBD)/2Probability of zero alleles IBDMeasures relies on averaging over a large number of markers

Partly relies on allele frequenciesGenerally robust with mixed ancestryDoes not differentiate very well for distant relatives

Can distinguish relationships up to first cousins

Slide25

Segment

Computes the total length of shared DNARequires two different parameters (or more)Minumum length of segmentTotal number of SNPs in a segmentRelies on dense panels (>500 SNPs per 5 cM required)

Completely insensitive to population (?)Can be extended for more complex variations

Slide26

Total cM is approximately 3490 cM (see parent child)

Why full siblings only 2613 cM?

Important: Numbers are based on IBD calling algorithms of the tests performed.

Slide27

Worked example

A G A T C GA G A T T G

A G A T T C

A G A T T C

Forensic sample

Putative relative

Segment starts

Segment stops

How long is the segment and how many SNPs?

Slide28

Worked example

A G A T C GA G A T T G

A G A T T C

A G A T T C

Forensic sample

Putative relative

Segment starts

Segment stops

Close relationships have many long segments

Distant relationships have few and short segments

Slide29

Worked example

Rule of thumbs5-7 cM500-700 SNPsRequires dense SNP panels!

Slide30

Worked example

We accumulate the segment’s length

SegmentChromosome

SNPs

Length

1

1

555

8 cM

2

1

6542

50 cM

...

..

..

..

23

22

1400

20 cM

Total

.

.

430 cM

Slide31

430 cM is the average sharing of the highlighted relationships

But

, falls within the range of several others!

Slide32

LR

Segment

IBS (KING)Single markers

Can be

very

informative

Uninformative

Can

provide information

Many

makers

Pruning

needed

EssentialPartly

essential

Allele

freqs

Very

useful

Not

used

(

generally)

Used to a certain degree

Linkage/recombCan be used!Essential

Not usedLDA problem

Minor

issue

Minor

issue

?

Population

Sensitive

InsensitivePartly sensitiveStatisticsFull modelAd-hocTheory of large number of markers

Slide33

Phasing and imputing

Improve and extend the raw data

Slide34

Phasing

Assigns maternal/paternal chromosomesInstead of unordered genotypes -> ordered genotypes Recall half-identical stretchesImprovement with regards to finding ”true” IBD segments

Complex statistical modelInduce ”errors”Used in simulations

Slide35

Phasing - Example

Individual

1Individual 2

SNP 1

A/G

A/A

SNP 2

C/T

C/T

SNP 3

C/G

G/G

Unordered

genotypes

Slide36

Phasing - Example

Individual

1Individual 2

SNP 1

A|G

A|A

SNP 2

C|T

T|C

SNP 3

C|G

G|G

Ordered

genotypes

Slide37

Phasing - Example

Individual

1Individual 2

SNP 1

A|G

A|A

SNP 2

C|T

T|C

SNP 3

C|G

G|G

Individual

1

Individual

2

SNP 1

A/G

A/A

SNP 2

C/T

C/T

SNP 3

C/G

G/G

Share

an

allele

for

each

marker

No

sharing

of

haplotypes

Slide38

Phasing - Example

Individual

1Individual 2

SNP 1

A|G

A|A

SNP 2

C|T

T|C

SNP 3

C|G

G|G

Individual

1

Individual

2

SNP 1

A/G

A/A

SNP 2

C/T

C/T

SNP 3

C/G

G/G

Share

an

allele

for

each

marker

No

sharing

of

haplotypes

Error

?

Slide39

Phasing - Example

Individual

1Individual 2

SNP 1

A|G

A|A

SNP 2

C|T

T|C

SNP 3

G|C

G|G

Individual

1

Individual

2

SNP 1

A/G

A/A

SNP 2

C/T

C/T

SNP 3

C/G

G/G

Share

an

allele

for

each

marker

Sharing

of

haplotype

!

Slide40

Phasing – Worked example

Individual

1Individual 2

Individual

3

Individual

4

SNP 1

A/G

A/G

A/A

G/G

SNP 2

C/T

C/T

C/C

T/T

SNP 3

C/G

G/G

G/G

C/C

Slide41

Phasing – Worked example

Individual

1Individual 2

Individual

3

Individual

4

SNP 1

A/G

A/G

A|A

G|G

SNP 2

C/T

C/T

C|C

T|T

SNP 3

C/G

G/G

G|G

C|C

Step 1

Slide42

Phasing – Worked example

Individual

1Individual 2

Individual

3

Individual

4

SNP 1

A/G

A|G

A|A

G|G

SNP 2

C/T

C|T

C|C

T|T

SNP 3

C/G

G|G

G|G

C|C

Step 2

Slide43

Phasing – Worked example

Individual

1Individual 2

Individual

3

Individual

4

SNP 1

A|G

A|G

A|A

G|G

SNP 2

C|T

C|T

C|C

T|T

SNP 3

G|C

G|G

G|G

C|C

Step 3

Slide44

Imputing

Raw data may contain locus dropouts or be incompleteImputing means assigning a genotypeUse complex algorithmsUse known data from large reference panelsInduce genotyping errors

Important strategy for gaining denser panels of markersSNP profiles may be imputed from STR markers

Slide45

Imputing - Example

Individual

1Individual 2

SNP 1

A/G

A/A

SNP 2

C/T

-/-

SNP 3

C/G

G/G

Missing

data

Slide46

Imputing - Example

Individual

1Individual 2

SNP 1

A|G

A|A

SNP 2

C|T

-/-

SNP 3

C|G

G|G

Pre-phasing

Slide47

Imputing - Example

Individual

1Individual 2

SNP 1

A|G

A|A

SNP 2

C|T

G|G

SNP 3

C|G

G|G

Imputation

Slide48

Errors

What happens with ”bad” data

Slide49

Effect of

errors

Induce errors to one of the genotypes! Three types of errors:

Homozygote -> Heterozygote

Heterozygote -> Homozygote

(Homozygote -> Homozygote)

Slide50

Effect of

errors

Pr(

Homozygote

|

Heterozygote

)=0.1

True

relationship

Slide51

Effect of errors

Decreasing DNA amount

Slide52

Effect of errors

Segments are reduced to smaller ones

Slide53

Genetic genealogy

Brief background

Slide54

Genetic genealogy (in forensics)

1.

Identify

traces

2.

Genome

analysis

3. Mask DNA

profile

4.

Search

GEDmatch

5.

Genealogical

records

6.

Investigations

AGCTTGCTAGCTGATCGATGCTAGCTA…

GATCGATGCTGATCGGATAATGCTGAT…

A

G

CTTG

C

TAG

C

TGA

TC

GATG

CTA

GC

T

A…

G

A

TCGA

T

GCT

G

ATC

GG

ATAA

TGC

TG

A

T…

Slide55

Order test

Spit in a cup

Results

Slide56

Popularity of

genealogy companies

Slide57

Rawdata

GEDmatch

No

genetic

testing

Anonymous

«

Free

»

Compare

Accepts only data from these companies

Slide58

Golden State killer

>50 rapes, >12 murders, >100 burglaries

Slide59

Who are her 3rd cousins?

3rd

cousins

2nd

cousins

1st

cousins

Perpetrator one of them!

Slide60

Golden State killer

One (or more) of his 3rd cousins had their profiles in GEDmatch!

Slide61

Genetic genealogy

Contribution from forensic

geneticsExtracting DNA

Producing

a ”reliable” DNA

profile

Transfer the

profile

to

genealogist

Work of the

genealogist

(or

law enforcement

)Upload DNA profile to

database

Genealogy

work

Work of

law

enforcement

Investigative

work on

leads

Find

suspectsSTR

profile comparison

Slide62

Discussion

Benefits with large marker panelsPros and cons with the different statistical methods

Implementation in forensics

Slide63

Expanded marker panels (Part 2)

Daniel Kling & Andreas

Tillmar

daniel.kling@rmv.se

andreas.tillmar@rmv.se