/
Personalized genomics Personalized genomics

Personalized genomics - PowerPoint Presentation

calandra-battersby
calandra-battersby . @calandra-battersby
Follow
409 views
Uploaded On 2017-03-27

Personalized genomics - PPT Presentation

Goal Input Genomic sequence WGS from family Pedigree amp affectedness Disease standard ontology needed Output Genesmutations relevant to the disease Read Mapping BAM prep GRCh37 k ID: 530014

sppr solidfill val prstgeom solidfill sppr prstgeom val xfrm len type 91425 nvcxnsppr cxnsp idx cnvcxnsppr prst avlst ext

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Personalized genomics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Personalized genomicsSlide2

Goal

Input

Genomic sequence (WGS) from family

Pedigree & affectedness

Disease (standard ontology needed)

Output

Genes/mutations relevant to the disease.Slide3

Read Mapping

BAM prep

GRCh37

k

nown sites

dbSNP

SNV Calling

SV Calling & Validation

Merge VCFs

Variant filtering

Pedigree &

A

ffectedness

Variant annotation

Disease Gene/Mutation

HGMD

Genomic Sequence

SeattleSeq

BAM

VCF

FASTQSlide4

Steps

Sequencing

Mapping

BAM Prep

Variant calling (SNV)

Variant Calling (SV)

VCF manipulation/merging

Variant annotation

Variant filtering

Disease gene associationSlide5

1

. Sequencing

Platform

HiSeq/MiSeq

PacBio

Ion proton (life)

CGI…ModeWhole GenomeExome

RNA-Seq…Slide6

2. Mapping

Map short reads (FASTQ format) to a reference

Output a BAM file

Mapping tools

BWA

Bowtie

CustomCompute/disk intensive part of the pipeline. WGS file size: ~200Gb per sample. Slide7

3. BAM Prep

Input: BAM file

Output: BAM file

Sorting BAM

Picard Tools

Marking (PCR) Duplicates

Picard ToolsINDEL Re-alignmentGATK

Base-Q Covariates & RecalibrationGATKCompute

intensive part of the pipelineSlide8

4

. Variant Calling

Input

:

multiple BAMs

Output: VCF (loci that differ from the reference)

SNVsBroad’s GATK CallerSVsCustom pipelines neededBrowsing variant callsGenome Savant

Confirming variants via resequencingCompute intensive

part of the pipeline. Integrating SVs and SNVs.Slide9

BreakDancer

CNVer

Bowtie

Reprever

Extract FASTQ

5. SV calling & validation

VCF merging and validation

GQL+Genome

Savant

Zygosity calling Slide10

Push-button pipeline or VM

SNPs (VCF)

SVs

CNVs

BAM

GATK

BreakDancer

CNVer

ISCA

Recombination Blocks

Known SNPs

de novo SNPs

Slide11

6. Merging VCFs

Given multiple VCF files, merge them (each column corresponds to an individual sample).

Can be mostly done by

VCFtools

. Our goal would be to visualize problematic regions for manual validation, and design primers for confirmation automatically.Slide12

7. Variant annotation

Input: variant calls (raw VCF)

Output: annotation of

variants

(annotated VCF)

Coding

SynonymousSplice-variantRegulatoryncRNAAnnotating coding variation for deleteriousness

SIFTPolyphenGERPSeattleSeqSlide13

GERP scoreSlide14

8. Variant Filtering

Input:

VCF (annotated)

Output: set of relevant variants/genes

Filters based on variant annotation

deleterious: missense/nonsense/splice

Filters based on inheritance patterns

Disease model (recessive/dominant/compound het)

Filtering tools:Gemini (http://gemini.readthedocs.org/en/latest/)

FamAnn (https://sites.google.com/site/famannotation/

home)Slide15

9. Annotating genes

Input: collection of genes with mutations.

Output: relevant diseases, functional information

Basic Information

Genecards

Adding pathway

IngenuityDatabases of Disease gene linksHGMD

OMIMClinVarWe are currently using an outdated version of HGMD, but can possibly do better, or just replace it with Step 9. Slide16

9. Identifying Disease genes

Automated machine learning approach to correlating genes with diseases

Standard ontologies for diseases

MeSH

Disease Ontology

Standard vocabulary for gene names

ML approach (parse abstracts to make these connections)Slide17

Disk/sample

CPU/sample

Read Mapping

800 Gb

320 h*

BAM prep

150 Gb

140 h

SNV & INDEL

calling

20

Gb

540 h*

SV & CNV

calling200 Gb

30h + 30hMerging VCF

1.5 Gb

1hVariant Annotation

20 Gb1 h

Variant Filtering

-

1hDisease/Gene

Assoc.?

?

Computational Resource Consumption

*amenable to multithread parallelization (up to a point when memory becomes bottleneck)Slide18

Gene Prioritization

Variant annotation

Variant filtering

Gene Disease connectionSlide19

The HPO aims to act as a central resource to connect several genomics datasets with the diseasome.

Sebastian Köhler et al. Nucl. Acids Res. 2014;42:D966-D974

© The Author(s) 2013. Published by Oxford University Press.Slide20

Human Phenotype Ontology

10,000 terms describing human phenotypic abnormalities, (7300 human hereditary syndromes).

2741 genes used to create DAG (Disease Associated Genes)

3 independent sub-ontologies

mode of inheritance

onset and clinical course

phenotypic abnormalitiesThe phenotypic terms are cross-linkedSlide21

Applications of HPOSlide22

Differential d

iagnosis using

Phenomizer

Slide23

Sequencing

Whole genome sequencing

Exome

sequencing

Disease associated genome sequencingSlide24

Depth of coverage (exome

or disease oriented sequencing)

At 20X coverage, what fraction of het variants will be called?

15% will be missedSlide25

Phenotypic interpretation of eXomes

:

PhenIX

Remove off-target and synonymous variants

Test population frequency of other variantsfrequency score: max(0,1-0.13

exp(100*f))These are known SNPsScores from SIFT/PolyphenMost pathogenic score was taken

Final variant score: pathogenic score X frequency scoreClinical relevance score: semantic similarity between phenotypic abnormalities and 2741 genes.Average (clinical, variant)Slide26

Phenotypic interpretation of eXomes

:

PhenIX

Simulated mutation data from HGMD