/
lobSTR:  A Short Tandem Repeat Profiler for Personal Genomes lobSTR:  A Short Tandem Repeat Profiler for Personal Genomes

lobSTR: A Short Tandem Repeat Profiler for Personal Genomes - PowerPoint Presentation

kaptainpositive
kaptainpositive . @kaptainpositive
Follow
342 views
Uploaded On 2020-06-16

lobSTR: A Short Tandem Repeat Profiler for Personal Genomes - PPT Presentation

CSE291 021617 Melissa Gymrek 021617 Profiling STR variation in humans Intro lobSTR STR Catalog Conclusion We know quite a lot about genetic variation Huntington Fragile X OPMD Synpolydactyly ID: 779026

lobstr str conclusion profiling str lobstr profiling conclusion intro catalog melissa gymrek variation humans strs reads atgt short stutter

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "lobSTR: A Short Tandem Repeat Profiler ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

lobSTR:

A Short Tandem Repeat Profiler for Personal Genomes

CSE291

02/16/17

Slide2

Melissa Gymrek

02/16/17

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

We know quite a lot about genetic variation…

Slide3

Huntington

Fragile X

OPMD

Synpolydactyly

Ataxia (10 types)

HFG syndrome

Holoprosen-cephaly

Pseudoach-ondroplasia

Myotonic

dystrophy

Cleidocranial

DysplasiaALS-FTD

STRs – domains of interest

Despite multitude of applications, STRs are not routinely profiled in whole genome sequencing

1. Medical genetics - more than 40 documented diseases2. Forensics – 12 million STR profiles

3. Genetic genealogy – a vantage point to recent events.

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

ATGT

CAGCAGCAGCAGCAGCAG

CTGA

02/16/17

Slide4

Challenges in profiling STRs from short reads

1. Only reads entirely spanning an STR are informative

[

C

AA

]

5

Ambiguity

TGACGCATGATGT

C

AA

C

AA

C

AA

C

AA

Non informative read:

[

C

AA

]

4

:

:

[

C

AA

]

??

ATGT

C

AA

C

AA

C

AA

C

AA

C

AA

C

AA

CTGA

Informative read:

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide5

2. Non-reference alleles present as large indels

ATGT

C

AA

C

AA

C

AA

C

AA

C

AA

C

AACAACAAC

AACAACAATTGAGenome of interest:Reference:ATGTCAACAA

CAACAACAACAATTGA15bp indel

Challenges in profiling STRs from short reads

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide6

Challenges in profiling STRs from short reads

3. PCR

“stutter noise” complicates calling

Capillary electrophoresis

ATGT

C

AA

C

AA

C

AA

C

AA

C

AA

CAATTGAATGTCAACAACAACAA

CAA

CAATTGA

ATGT

C

AA

C

AA

C

AA

C

AA

C

AA

C

AA

TTGA

ATGT

C

AA

C

AA

C

AA

C

AA

C

AA

***

TTGA

ATGT

C

AA

C

AA

C

AA

C

AA

C

AA

***

TTGA

ATGT

C

AA

C

AA

C

AA

CAA******TTGASequence reads

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide7

lobSTR algorithm

Slide8

lobSTR: an end-to-end solution for STR-profiling

Takes FASTA/FASTQ/

BAM and outputs VCF

Reports read alignments and STR alleles

Supports multi-threading

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide9

Aim

: Find informative reads

and

characterize

STR

Sensing

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide10

Entropy score successfully detects STRs

S

j

: sequence

j

Σ

: symbol alphabet (

dinucleotides

)

f

i

:

frequency of symbol

i

98.3%99.4%

AGCATATATATATATATATATAT

G

i

f

i

AG

0.04

GC

0.04

CA

0.04

AT

0.43

TA

0.39

TG

0.04

E(S) = 1.8

i

f

i

A

0.46

C

0.04

G

0.08

T

0.42

E = 1.54

CAGCTATTCGGGACTGAGCGGTAT

i

f

i

A

0.21

C

0.21

G

0.33

T

0.25

E = 1.97

i

f

i

CA

0.04

AG

0.08

GC

0.08

TA

0.08

TT

0.04

TC

0.04

E(S) = 3.71

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide11

Entropy score partitions reads

TTTTGTGTGAAACCATGCTCGA

GTGTGTGTGTGTGTGTATGTGTGTGTGTGTGTGTGTGTG

GTGTCTTAAGACTGAAATATCTAAGATTAACTTGG

Left flank

Right flank

STR region

2/22/12

lobSTR

(TG)n

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide12

Aim

: Map STRs to the genome

Alignment

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide13

Avoid gapped alignment by aligning non-repetitive regions

TTTTGTGTGAAACCATGCTCGA

GTGTGTGTGTGTGTGTATGTGTGTGTGTGTGTGTGTGTG

GTGTCTTAAGACTGAAATATCTAAGATTAACTTGG

Divide and conquer approach:

Build an index of all genomic STRs

Align only the flanking regions to the reference

genome

Use paired end information to add specificity

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide14

Aim

: determine the STR alleles

Allelotyping

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide15

Removing PCR stutter noise

Stutter probability primarily depends on STR period

Period (

bp

)

Probability of stutter

Train parameters on

hemizygous

loci (male sex

chroms

)

Most stutter events are +/-1 from true allele

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide16

Determine the most likely allelotype

R = (13, 13, 13, 14, 14, 14, 15)

17,17

16,16

16,17

15,15

15,16

16,17

14,14

14,15

14,16

14,16

13,13

13,14

13,15

13,16

13,17

12,12

12,13

12,14

12,15

12,16

12,17

11,11

11,12

11,13

11,14

11,15

11,16

11,17

Grid search to find maximum likelihood allelotype

L(

reads|allelotype;model

)

Return confidence score:

L(ML allelotype)

Σ

L(allelotype)

Allele 1

Allele 2

17,17

16,16

16,17

15,15

15,16

16,17

14,14

14,15

14,16

14,16

13,13

13,14

13,15

13,16

13,17

12,12

12,13

12,14

12,15

12,16

12,17

11,11

11,12

11,13

11,14

11,15

11,16

11,17

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide17

lobSTR outperforms mainstream short read aligners at STR loci

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide18

And now…

HipSTR

!

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17

Slide19

And now…

HipSTR

!

Melissa Gymrek

Profiling STR variation in humans

Intro.

lobSTR

STR Catalog

Conclusion

02/16/17