CSE291 021617 Melissa Gymrek 021617 Profiling STR variation in humans Intro lobSTR STR Catalog Conclusion We know quite a lot about genetic variation Huntington Fragile X OPMD Synpolydactyly ID: 779026
Download The PPT/PDF document "lobSTR: A Short Tandem Repeat Profiler ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
lobSTR:
A Short Tandem Repeat Profiler for Personal Genomes
CSE291
02/16/17
Slide2Melissa Gymrek
02/16/17
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
We know quite a lot about genetic variation…
Slide3Huntington
Fragile X
OPMD
Synpolydactyly
Ataxia (10 types)
HFG syndrome
Holoprosen-cephaly
Pseudoach-ondroplasia
Myotonic
dystrophy
Cleidocranial
DysplasiaALS-FTD
STRs – domains of interest
Despite multitude of applications, STRs are not routinely profiled in whole genome sequencing
1. Medical genetics - more than 40 documented diseases2. Forensics – 12 million STR profiles
3. Genetic genealogy – a vantage point to recent events.
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
ATGT
CAGCAGCAGCAGCAGCAG
CTGA
02/16/17
Slide4Challenges in profiling STRs from short reads
1. Only reads entirely spanning an STR are informative
[
C
AA
]
5
Ambiguity
TGACGCATGATGT
C
AA
C
AA
C
AA
C
AA
Non informative read:
[
C
AA
]
4
:
:
[
C
AA
]
??
ATGT
C
AA
C
AA
C
AA
C
AA
C
AA
C
AA
CTGA
Informative read:
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide52. Non-reference alleles present as large indels
ATGT
C
AA
C
AA
C
AA
C
AA
C
AA
C
AACAACAAC
AACAACAATTGAGenome of interest:Reference:ATGTCAACAA
CAACAACAACAATTGA15bp indel
Challenges in profiling STRs from short reads
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide6Challenges in profiling STRs from short reads
3. PCR
“stutter noise” complicates calling
Capillary electrophoresis
ATGT
C
AA
C
AA
C
AA
C
AA
C
AA
CAATTGAATGTCAACAACAACAA
CAA
CAATTGA
ATGT
C
AA
C
AA
C
AA
C
AA
C
AA
C
AA
TTGA
ATGT
C
AA
C
AA
C
AA
C
AA
C
AA
***
TTGA
ATGT
C
AA
C
AA
C
AA
C
AA
C
AA
***
TTGA
ATGT
C
AA
C
AA
C
AA
CAA******TTGASequence reads
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide7lobSTR algorithm
Slide8lobSTR: an end-to-end solution for STR-profiling
Takes FASTA/FASTQ/
BAM and outputs VCF
Reports read alignments and STR alleles
Supports multi-threading
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide9Aim
: Find informative reads
and
characterize
STR
Sensing
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide10Entropy score successfully detects STRs
S
j
: sequence
j
Σ
: symbol alphabet (
dinucleotides
)
f
i
:
frequency of symbol
i
98.3%99.4%
AGCATATATATATATATATATAT
G
i
f
i
AG
0.04
GC
0.04
CA
0.04
AT
0.43
TA
0.39
TG
0.04
…
…
E(S) = 1.8
i
f
i
A
0.46
C
0.04
G
0.08
T
0.42
E = 1.54
CAGCTATTCGGGACTGAGCGGTAT
i
f
i
A
0.21
C
0.21
G
0.33
T
0.25
E = 1.97
i
f
i
CA
0.04
AG
0.08
GC
0.08
TA
0.08
TT
0.04
TC
0.04
…
…
E(S) = 3.71
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide11Entropy score partitions reads
TTTTGTGTGAAACCATGCTCGA
GTGTGTGTGTGTGTGTATGTGTGTGTGTGTGTGTGTGTG
GTGTCTTAAGACTGAAATATCTAAGATTAACTTGG
Left flank
Right flank
STR region
2/22/12
lobSTR
(TG)n
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide12Aim
: Map STRs to the genome
Alignment
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide13Avoid gapped alignment by aligning non-repetitive regions
TTTTGTGTGAAACCATGCTCGA
GTGTGTGTGTGTGTGTATGTGTGTGTGTGTGTGTGTGTG
GTGTCTTAAGACTGAAATATCTAAGATTAACTTGG
Divide and conquer approach:
Build an index of all genomic STRs
Align only the flanking regions to the reference
genome
Use paired end information to add specificity
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide14Aim
: determine the STR alleles
Allelotyping
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide15Removing PCR stutter noise
Stutter probability primarily depends on STR period
Period (
bp
)
Probability of stutter
Train parameters on
hemizygous
loci (male sex
chroms
)
Most stutter events are +/-1 from true allele
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide16Determine the most likely allelotype
R = (13, 13, 13, 14, 14, 14, 15)
17,17
16,16
16,17
15,15
15,16
16,17
14,14
14,15
14,16
14,16
13,13
13,14
13,15
13,16
13,17
12,12
12,13
12,14
12,15
12,16
12,17
11,11
11,12
11,13
11,14
11,15
11,16
11,17
Grid search to find maximum likelihood allelotype
L(
reads|allelotype;model
)
Return confidence score:
L(ML allelotype)
Σ
L(allelotype)
Allele 1
Allele 2
17,17
16,16
16,17
15,15
15,16
16,17
14,14
14,15
14,16
14,16
13,13
13,14
13,15
13,16
13,17
12,12
12,13
12,14
12,15
12,16
12,17
11,11
11,12
11,13
11,14
11,15
11,16
11,17
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide17lobSTR outperforms mainstream short read aligners at STR loci
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide18And now…
HipSTR
!
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17
Slide19And now…
HipSTR
!
Melissa Gymrek
Profiling STR variation in humans
Intro.
lobSTR
STR Catalog
Conclusion
02/16/17