/
Transcripts: Background and Curation Strategies Transcripts: Background and Curation Strategies

Transcripts: Background and Curation Strategies - PowerPoint Presentation

ethlyn
ethlyn . @ethlyn
Follow
343 views
Uploaded On 2022-06-14

Transcripts: Background and Curation Strategies - PPT Presentation

Marina DiStefano PhD Clinical Molecular Genetics Fellow Harvard Medical School Genetics Training Program Biocurator Call 11019 Outline Why does transcript curation matter Examples from Hearing Loss ID: 917564

gene transcript transcripts curation transcript gene curation transcripts variant annotation refseq ensembl exons mane project loss hearing expression exon

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Transcripts: Background and Curation Str..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Transcripts: Background and Curation Strategies

Marina DiStefano, Ph.D.

Clinical Molecular Genetics Fellow,

Harvard Medical School Genetics Training Program

Biocurator Call, 1.10.19

Slide2

Outline

Why does transcript curation matter? Examples from Hearing Loss

How are transcripts annotated?

How can you curate transcripts? Examples from Hearing Loss

Slide3

Variant interpretation requires gene curation

If gene is not associated with disease, the clinical significance of a variant in that gene cannot be interpreted.

Figure modified from Daniel MacArthur

Slide4

Variant Interpretation Requires Transcript Curation

Variant interpretation may be dependent on the transcript (we often want to assess the most severe molecular consequence)

Transcript choice may be dependent on the disease and tissue-specific expression

Clinical labs often choose the longest transcript for annotation, which may not be the most biologically relevant

Slide5

Example 1: Tissue-specific Exon Expression

TBC1D24

Associated with

nonsyndromic

hearing loss, DOORS syndrome, and a spectrum of epilepsy conditions

RefSeq

lists 2 curated (NM) transcripts.:

NM_001199107.1 and NM_020705.2

NM_001199107.1:

Expressed in mouse neurons

NM_020705.2:

Expressed in mouse cochlea and non-neuronal tissues

c.969_970delGT (p.Ser324Thrfs)

NM_001199107.1c.969_970delGT (p.Ser324Thrfs) associated with

severe lethal epileptic encephalopathy

without

hearing loss

(

Guven

2013)

Slide6

Ex 2: Multiple Molecular Consequences

Clarin

1 (CLRN1/USH3A)

Integral membrane glycoprotein, definitive for Usher Syndrome Type 3 (

ClinGen

HL

CDWG

3.2.17)

RefSeq

lists 4 Transcripts: NM_001195794, NM_052995, NM_174878, NM_001256819

Longest Transcript:

NM_001195794

Reporting Transcript

(

LMM

, ClinVar

and HGMD

): NM_174878 Reported LP/P variants are a mixture of missense and nonsense variants

NM_001195794.1: c.368C>A (p.Ala123Asp)

NM_001256819.1: c.540C>A (

p.Cys

180X)

NM_052995.2: c.140C>A (p.Ala47Asp)

NM_174878.2: c.368C>A (p.Ala123Asp)

Functional studies showed that the variant protein is not correctly localized in the cell and is rapidly degraded (

Isosomppi

2009). We cannot be sure which molecular consequence is causative for disease.

Slide7

Transcript Annotation Efforts

Ensembl

/GENCODE (EMBL-EBI)

: Transcripts designated by automatic and manual annotation. Sequence is predictive, pulled directly from the genome build.

ENST

label (

ENSP

is for protein)

References that use

Ensembl

transcripts:

gnomAD/ExAC

, 100,000Genomes ProjectGTExDecipher

COSMIC

Slide8

Ensembl

Evidence

Flags

Corresponding

RefSeq

https://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000162065;r=16:2475051-2509560

Slide9

Transcript Annotation Efforts

RefSeq

(NCBI): T

ranscripts are designated by a curation. These annotations are independent of the genome build (GenBank).

Refseqgene

are the most thoroughly curated

Prefixes:

N_: Transcript

supported by some evidence, whether it is published literature or GenBank cDNA or EST data.

X_

: Predicted transcripts that are not confirmed with curation or evidence.

M: mRNA, R: non-coding RNA, P: protein e.g. NM: mRNA transcript supported by evidence

Predominant transcript set for Clinical Labs/Clinical Annotation Pipelines, Research publications (gene curation)

Slide10

RefSeqGene

https://www.ncbi.nlm.nih.gov/gene/57465#reference-sequences

Slide11

Transcript Annotation Efforts: Collaboration

LRG, Locus Reference Genomic (EMBL-EBI/NCBI

collab

):

Aim is to ensure efficient and consistent variant reporting by defining reference sequences.

This is usually a single transcript per gene

It is stable and does not change

There are currently 1182 LRGs.

LRG has curators, but also solicits curation expertise from the community

Slide12

Collaboration on Hearing Loss genes:

Manually reviewed suggested NMs and updated, as necessary

Updated corresponding GENCODE annotation

Harmonized NMs and corresponding ENSTs (100% identity) – foster bi-directional exchange of data

Created LRGs using harmonized transcripts

In future, curation effort will feed into new Matched Annotation from NCBI and EMBL-EBI (MANE) project

Talk in this session describes MANE in more detail

Records:

Display reference sequences and include additional annotation at the locus

Collaboration with the LRG project

Slide from Joannella Morales

Slide13

Transcript selection

Harmonized transcripts

Slide from Joannella Morales

Slide14

Newest Collaboration: MANE Project

Matched annotation from NCBI and EMBL-EBI project

A collaborative transcript set (best of

RefSeq

and

Ensembl

):Aligned to GRCh38, which is the newer genome build100% identical for

RefSeq

and

Ensembl

(UTRs and coding)

Transcripts are well-supported, expressed, and conservedTranscripts are fairly stableClinical or biological data is manually curatedMANE Select: 1 transcript per locus (50% released in Dec 2018)MANE Plus: this is evolution of the LRG project (well-supported, tissue specific, relevant to certain user groups)Modified from Fiona Cunningham

Slide15

MANE Project Resources

Recently a webinar on this effort was held

The 50% of the MANE Select transcript are available on NCBI’s FTP site (

ftp://ftp.ncbi.nlm.nih.gov/refseq/MANE

)

Slides from the webinar can be found here:

https://drive.google.com/drive/folders/1HyY0vvJ9e-Ocm14JOm7Y2p8QgXU4bokt

A recording will be available soon

Slide16

How to map across transcript annotations

Resources with nomenclature will help you map

ClinGen allele registry

ClinGen

VCI

(VEP)

References themselves (Ensembl)

Slide17

Allele registry

http://reg.clinicalgenome.org/redmine/projects/registry/genboree_registry/allele?hgvs=NM_001009921.2%3Ac.3130G%3EA

Slide18

Slide19

Ensembl

Corresponding

RefSeq

Slide20

Where can you find information for curating transcripts?

Slide21

Transcript curation resources

The goal is to figure out the expression profile/levels of each transcript to make an informed choice

OMIM

Pubmed

/Google search

RefSeqGene

GTExDisease-specific efforts (RNA-seq datasets,

etc

)

Slide22

OMIM

“Cloning and Expression” section is often most useful, but go to primary source

“Gene Structure” and “Gene Function” sections can also provide relevant information

Slide23

Pubmed

/Google

“XX Gene Transcript Expression”

“XX Gene Human Transcript Expression”

“XX Gene Isoforms”

“XX Gene Splicing”

Slide24

GTEx

Genotype-Tissue Expression project

53 non-diseased tissue sites from ~1000 adults, post-mortem

WGS

, WES, RNA-Seq

https://gtexportal.org/home/gene/TBC1D24

You can evaluate expression level of the gene and transcript/exon-level expression

Slide25

How do you determine which transcript to use?

DiStefano et al J Mol Diagn. 2018

Slide26

Transcript Curation Process

Gene-level curation,

Variant spectrum

DiStefano et al. J Mol Diagn. 2018

Impact on

LoF

Variant Interpretation

LoF

if mRNA doesn’t escape NMD

Must be further defined

Is transcript biologically relevant?

Is exon biologically relevant?

PVS1

Slide27

Categorization Results

Slide28

Exon-level Curation to Support Variant Interpretation

Classify Exons

Evaluate interpreted variation

(

ClinVar

, HGMD)

GTEx

does not have cochlea

Slide29

Example 2: Exons of Uncertain Significance

Endothelin 3

(EDN3)

-

Waardenburg

syndrome (autosomal recessive)

RefSeq

lists 5 Transcripts, all share exons 1-3, 5

There is a frameshift variant in exon 4 in 0.6% of Finnish European alleles in gnomAD (including 2 homozygotes)

GTEx

predicts exon 4 to be spliced out where

EDN3

is expressed

Calls into question the pathogenicity of two DM variants in HGMD that are located in exon 4

Slide30

Impact on Interpretation

6% of all exons were of “uncertain significance”

These exons contained 124 "clinically significant" variants

These variants require further evaluation to determine if other data supports pathogenicity

Slide31

Technically challenging regions

Although it may not be a curator’s job to evaluate technical aspects of testing, it is important to be aware of sequencing issues with a gene to score variants appropriately. Experts can help with this. It could even be part of the

precuration

, as transcript choice might be.

If it’s a common problem, publications will often allude to it in the methods.

Example of technically challenging regions in hearing loss

Slide32

Technically Challenging Regions

NGS data from Partners Laboratory for Molecular Medicine and Children’s Hospital of Philadelphia were used to calculate average mapping quality and depth of coverage for 109 hearing loss genes

43 technically challenging exons in 20 different genes had inadequate coverage and/or homology issues which might lead to false variant calls

Slide33

Technically Challenging Regions

http://exomeslicer.chop.edu/

Niazi et al; J Mol. Diag. 2018

Slide34

Future Directions and Scaling

Scaling transcript curation is a challenge

Discussion of scaling this process is in progress

Genes can be auto-categorized by

RefSeq

gene annotations (C1, C2, C3)

Exons can be filtered by high frequency LoF variants in gnomADExons can be filtered by GTex

expression

Limitations:

Literature curation is still a manual process

GTEx only contains adult, post-mortem tissueRelevant tissues and timepoints may be missing (e.g. cochlea, neonatal)

Slide35

Conclusions

Transcript curation is critical for variant interpretation

Annotation efforts are harmonizing

Resources are available to curate transcripts

Slide36

Acknowledgements

Heidi Rehm

Ahmad Abou Tayoun

Andrea Oza

Sarah Hemphill, Brandon Cushman, Andy Grant, Becky

Siegert

Sami Amr

Mark Bowser, Beth Hynes, Mike Gonzalez

Joannella Morales, Fiona Cunningham, LRG team

DiStefano et al J Mol Diagn. 2018