/
Paul Stothard Department of Agricultural, Food and Nutritional Science (AFNS) Paul Stothard Department of Agricultural, Food and Nutritional Science (AFNS)

Paul Stothard Department of Agricultural, Food and Nutritional Science (AFNS) - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
342 views
Uploaded On 2019-06-23

Paul Stothard Department of Agricultural, Food and Nutritional Science (AFNS) - PPT Presentation

1400 College Plaza 8215 112 Street Edmonton Alberta Canada T6G 2C8 September 2017 1 Annotation of SNPs and indels from 1000 bulls project run 60 Input data 42920227 SNPs 1 758 ID: 760057

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Paul Stothard Department of Agricultural..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Paul StothardDepartment of Agricultural, Food and Nutritional Science (AFNS)1400 College Plaza8215 - 112 StreetEdmonton, AlbertaCanada T6G 2C8

September 2017

1

Annotation of SNPs and

indels

from 1000 bulls project run

6.0

Slide2

Input data

42,920,227 SNPs1,758,199 indels

2

Slide3

Annotation approach

NGS-SNP (Grant et al., 2011)annotate_SNPs.pl script for SNPsannotate_INDELs.pl script for indelsThe following databases were used for annotation:Ensembl release 87Entrez Gene and UniProt used for some annotation fields (March 2017 versions).OMIA (June, 2017).For information about output annotation approach and output fields see: http://www.ualberta.ca/~stothard/downloads/NGS-SNP/

Grant JR, Arantes AS, Liao X, Stothard P (2011) In-depth annotation of SNPs arising from resequencing projects using NGS-SNP. Bioinformatics 27:2300-2301.

3

Slide4

Gene and transcript types

All transcript and gene types considered when predicting SNP and indel consequences. Type of gene and transcript given in the “Comments” field, with the “Gene_Status” and “Transcript_Status” keys.Pseudogenes are also included. Variants affecting pseudogenes are given appropriate functional classes (e.g. “nc_transcript_variant” for “non-coding transcript variant”) and have “Gene_Biotype=pseudogene” in “Comments” field.

Grant JR, Arantes AS, Liao X, Stothard P (2011) In-depth annotation of SNPs arising from resequencing projects using NGS-SNP. Bioinformatics 27:2300-2301.

4

Slide5

Function class

Each variant is assigned a “function class” using the Ensembl API.The function classes are defined relative to the reference genome sequence. For example:stop_lost means that the non-reference allele in the input variant leads to the loss of a stop codon annotated on the reference genome.stop_gained indicates that the alternative allele adds a stop codon to the coding region of a transcript annotated on the reference genome.

5

Slide6

One variant can have multiple function classes

A single variant can be assigned multiple function classes (also called consequences) due to the presence of multiple overlapping transcripts or genes, and due to overlap among the function classes. For example:a SNP can be located in the 3’UTR of one transcript and translated region of another. Thus this SNP could have two consequence types: 3_prime_UTR_variant and missense_variant.a SNP in a start codon can be assigned all of the following function classes: coding_sequence_variant, missense_variant, and initiator_codon_variant.

6

Slide7

One variant can have multiple function classes

We report one consequence for each variant, in the “Functional_Class” column.When there are multiple consequences of different types we choose the one we consider to be of the highest importance (e.g. missense_variant will be reported over synonymous_variant).Other consequences for each variant are given using the “Other_Consequences” key in the “Comments” column.

7

Slide8

Annotation fields for SNPs

See “Sample annotated SNP” slides. The “Comments” annotation field includes several (>20) key-value pairs providing more information about certain variant types such as SIFT score (for missense_variant SNPs), and length of protein sequence lost (for stop_gained SNPs).The “Model_Annotations” annotation field includes up to five key-value pairs providing information related to the human orthologue of the gene containing the variant. For example, phenotypes associated with the human orthologue are listed, if available.The full list of fields available at:http://www.ualberta.ca/~stothard/downloads/NGS-SNP/annotate_SNPs.html

8

Slide9

Annotation fields for indels

See “Sample annotated indel” slides.“Comments” and “Model_Annotations” fields provide additional information in the form of key-value pairs.Indel position ambiguity handled appropriately for annotation and for identifying matches in dbSNP (for discussion of this issue see “Equivalent Indels – Ambiguous Functional Classes and Redundancy in Databases” by Assmus et al., 2013.).Full list of fields available at:http://www.ualberta.ca/~stothard/downloads/NGS-SNP/annotate_INDELs.html

9

Slide10

Function class values

For indels and SNPs the consequence type of the variant is given in the “Functional_Class” field.The values that appear in this field are defined by Ensembl and the Sequence Ontology (SO) project.

http://www.sequenceontology.org

10

Slide11

Output files

IndelsTab-delimited annotated indels.SNPsTab-delimited annotated SNPs.

11

Slide12

Summary of results for SNPs

12

Slide13

Known vs. novel SNPs

-“Known” is used here to describe input variants where the variant and all of its alleles exist already in the reference database.

SNP typeCountKnown31,952,868 (74.45%)Novel10,967,359 (25.55%)All42,920,227

13

Slide14

Numbers of SNPs in each function class

intergenic_variant 28353891intron_variant 11232495upstream_gene_variant 1510605downstream_gene_variant 1282230missense_variant 185046synonymous_variant 1794423_prime_UTR_variant 99283splice_region_variant 321945_prime_UTR_variant 23431non_coding_transcript_exon_variant 12878stop_gained 3831splice_donor_variant 1876splice_acceptor_variant 1618mature_miRNA_variant 407start_lost 292stop_lost 257coding_sequence_variant 243stop_retained_variant 126non_coding_transcript_variant 82Total 42920227

14

Slide15

Sample annotated SNP

Field numberField nameValue1CHROM12POS1451149633ID.4REFT5ALTC6QUAL.7FILTER.8INFO.9Functional_Classmissense_variant10Chromosome1

15

See http://www.ualberta.ca/~stothard/downloads/NGS-SNP/annotate_SNPs.html

Sample values of interest highlighted in red

Slide16

Sample annotated SNP cont.

Field numberField nameValue11Chromosome_Position14511496312Chromosome_Strandforward13Chromosome_ReferenceT14Chromosome_ReadsC15Gene_DescriptionIntegrin beta-2 [Source:UniProtKB/Swiss-Prot;Acc:P32592]16Ensembl_Gene_IDENSBTAG0000001706017Entrez_Gene_NameITGB218Entrez_Gene_ID28187719Ensembl_Transcript_IDENSBTAT0000002268720Transcript_SNP_Position488

16

Sample values of interest highlighted in red

Slide17

Sample annotated SNP cont.

Field numberField nameValue21Transcript_SNP_ReferenceA22Transcript_SNP_ReadsG23Transcript_To_Chromosome_Strandreverse24Ensembl_Protein_IDENSBTAP0000002268725UniProt_IDP3259226Amino_Acid_Position12827Overlapping_Protein_Domainssuperfamily IPR002035 von Willebrand factor, type A;pfam IPR002369 Integrin beta subunit, N-terminal;smart IPR002369 Integrin beta subunit, N-terminal;pirsf IPR015812 Integrin beta subunit;prints IPR015812 Integrin beta subunit28Overlapping_Protein_FeaturesCHAIN:23:769:Integrin beta-2.;TOPO_DOM:23:700:Extracellular. {ECO:0000255}.;DOMAIN:124:363:VWFA.;DISULFID:33:447:{ECO:0000250}.;VARIANT:128:128:D -> G (in LAD).

17

Sample values of interest highlighted in red

Slide18

Sample annotated SNP cont.

Field numberField nameValue29Amino_Acid_ReferenceD30Amino_Acid_ReadsG31Amino_Acids_In_OrthologuesDDDDDDDDDDDDDDDDXDDDDDDDDDDDDXDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD32Alignment_Score_Change-0.45433C_blosum1.034Context_Conservation84.735Orthologue_SpeciesOvis_aries;Tursiops_truncatus;Sus_scrofa;Mustela_putorius_furo;Canis_lupus_familiaris;Felis_catus;Ailuropoda_melanoleuca;Equus_caballus;Pteropus_vampyrus;Myotis_lucifugus;Nomascus_leucogenys;Gorilla_gorilla_gorilla;Pongo_abelii;Callithrix_jacchus;Chlorocebus_sabaeus;Pan_troglodytes;Tupaia_belangeri;Macaca_mulatta;Homo_sapiens;Microcebus_murinus;Ictidomys_tridecemlineatus;Papio_anubis;Dipodomys_ordii;Otolemur_garnettii;Oryctolagus_cuniculus;Ochotona_princeps;Rattus_norvegicus;Mus_musculus;Mus_musculus;Carlito_syrichta;Mus_spretus;Cavia_porcellus;Mus_spretus;Sorex_araneus;Dasypus_novemcinctus;Echinops_telfairi;Procavia_capensis;Loxodonta_africana;Macropus_eugenii;Monodelphis_domestica;Ficedula_albicollis;Meleagris_gallopavo;Gallus_gallus;Pelodiscus_sinensis;Anas_platyrhynchos;Taeniopygia_guttata;Anolis_carolinensis;Xenopus_tropicalis;Ornithorhynchus_anatinus;Latimeria_chalumnae;Danio_rerio;Tetraodon_nigroviridis;Gadus_morhua;Lepisosteus_oculatus;Poecilia_formosa;Oreochromis_niloticus;Takifugu_rubripes;Astyanax_mexicanus;Xiphophorus_maculatus;Oryzias_latipes;Poecilia_formosa;Gasterosteus_aculeatus;Oreochromis_niloticus;Petromyzon_marinus;Ciona_intestinalis;Ciona_savignyi;Ciona_intestinalis;Ciona_savignyi;Drosophila_melanogaster;Ciona_intestinalis;Caenorhabditis_elegans;Ciona_intestinalis;Ciona_savignyi;Drosophila_melanogaster

18

Sample values of interest highlighted in red

Slide19

Sample annotated SNP cont.

Field numberField nameValue36Gene_Ontology[GO:0001948]:glycoprotein binding;[GO:0004872]:receptor activity;[GO:0005515]:protein binding;[GO:0006909]:phagocytosis;[GO:0007155]:cell adhesion;[GO:0007159]:leukocyte cell-cell adhesion;[GO:0007160]:cell-matrix adhesion;[GO:0007229]:integrin-mediated signaling pathway;[GO:0008305]:integrin complex;[GO:0009986]:cell surface;[GO:0016020]:membrane;[GO:0016021]:integral component of membrane;[GO:0019901]:protein kinase binding;[GO:0030369]:ICAM-3 receptor activity;[GO:0030593]:neutrophil chemotaxis;[GO:0031623]:receptor internalization;[GO:0034113]:heterotypic cell-cell adhesion;[GO:0034687]:integrin alphaL-beta2 complex;[GO:0035987]:endodermal cell differentiation;[GO:0043113]:receptor clustering;[GO:0043235]:receptor complex;[GO:0046872]:metal ion binding;[GO:0050730]:regulation of peptidyl-tyrosine phosphorylation;[GO:0050839]:cell adhesion molecule binding;[GO:0070062]:extracellular exosome;[GO:0071404]:cellular response to low-density lipoprotein particle stimulus;[GO:1903561]:extracellular vesicle37Model_AnnotationsPhenotypes_Position=Source: OMIM Description: LEUKOCYTE ADHESION DEFICIENCY Variation_names: rs137852615|Source: Uniprot Phenotype_name: LAD1 Description: Leukocyte adhesion deficiency 1 Variation_names: rs137852615|Source: ClinVar Description: ClinVar: phenotype not specified|Source: ClinVar Description: Leukocyte adhesion deficiency type 1|Source: ClinVar Description: LEUKOCYTE ADHESION DEFICIENCY|Source: HGMD-PUBLIC Phenotype_name: HGMD_MUTATION Description: Annotated by HGMD but no phenotype description is publicly available;Phenotypes_Gene=Leukocyte adhesion deficiency type 1|GTR|MedGen|OMIM;

19

Sample values of interest highlighted in red

Slide20

Sample annotated SNP cont.

Field numberField nameValue38CommentsGene_Status=KNOWN_BY_PROJECTION;Transcript_Status=KNOWN_BY_PROJECTION;Gene_Biotype=protein_coding;Transcript_Biotype=protein_coding;SIFT_Prediction_Ensembl=deleterious(0);39Ref_SNPsrs44570913140Is_Fully_Knownyes

20

Sample values of interest highlighted in red

Conclusion:

missense variant, predicted to affect protein function, predicted to cause leukocyte adhesion

deficiency.

This example is meant to demonstrate the utility of the various annotation fields.

Slide21

Same SNP annotated with Ensembl VEP

21

Field name

Value

Uploaded_variation

1_145114963_T/C

Location

1:145114963-145114963

Allele

C

Consequence

missense_variant

IMPACT

MODERATE

SYMBOL

ITGB2

Gene

ENSBTAG00000017060

Feature_type

Transcript

Feature

ENSBTAT00000022687

BIOTYPE

protein_coding

Slide22

Same SNP annotated with Ensembl VEP cont.

22

Field name

Value

EXON

5/16

cDNA_position

488

CDS_position

383

Protein_position

128

Amino_acids

D/G

Codons

gAc

/

gGc

Existing_variation

rs445709131

STRAND

-1

ENSP

ENSBTAP00000022687

SWISSPROT

P32592

Slide23

Same SNP annotated with Ensembl VEP cont.

23

Field name

Value

UNIPARC

UPI000012DA0A

SIFT

deleterious(0)

DOMAINS

Pfam_domain:PF00362,PIRSF_domain:PIRSF002512,Prints_domain:PR01186,SMART_domains:SM00187,Superfamily_domains:SSF53300

CLIN_SIG

-

PHENO

-

BLOSUM62

-1

Slide24

Same SNP annotated with Ensembl VEP cont.

24

Conclusion:

Ensembl

VEP annotation consistent with NGS-SNP annotation

but relationship to leukocyte adhesion deficiency

not apparent from annotation information provided by VEP.

Slide25

Summary of results for indels

25

Slide26

Known vs. novel indels

Indel typeCountKnown1,367,963 (77.80%)Novel390,236 (22.20%)All1,758,199

-“Known” is used here to describe input variants where the variant and all of its alleles exist already in the reference database.

26

Slide27

Numbers of indels in each function class

27

INTERGENIC 1144901

intron_variant 476122

upstream_gene_variant 66438

downstream_gene_variant 59886

3_prime_UTR_variant 4839

frameshift_variant 1619

inframe_deletion 1325

splice_region_variant 1236

5_prime_UTR_variant 849

non_coding_transcript_exon_variant 346

inframe_insertion 253

splice_acceptor_variant 135

splice_donor_variant 104

coding_sequence_variant 72

stop_gained 27

non_coding_transcript_variant 20

mature_miRNA_variant 17

protein_altering_variant 10

Total

1758199

Slide28

Length distribution of all indels and indels in coding regions

28

Slide29

Length distribution of all deletions and deletions in coding regions

29

Slide30

Length distribution of all insertions and insertions in coding regions

30

Slide31

Sample annotated indel

Field numberField nameValue1CHROM22POS62183783ID.4REFGATGAACACTCCA5ALTGA6QUAL.7FILTER.8INFO.9Functional_Classframeshift_variant10Chromosome_ReferenceATGAACACTCC

31

See http://

www.ualberta.ca

/~

stothard

/downloads/NGS-SNP/

annotate_INDELs.html

Slide32

Sample annotated indel cont.

Field numberField nameValue11Chromosome_Reads-12Gene_DescriptionGrowth/differentiation factor 8 [Source:UniProtKB/Swiss-Prot;Acc:O18836]13Ensembl_Gene_IDENSBTAG0000001180814Entrez_Gene_NameMSTN15Entrez_Gene_ID28118716Ensembl_Transcript_IDENSBTAT0000001567417Transcript_INDEL_Position95118Transcript_INDEL_ReferenceATGAACACTCC19Transcript_INDEL_Reads-20Transcript_To_Chromosome_Strandforward

32

Slide33

Sample annotated indel cont.

Field numberField nameValue21Ensembl_Protein_IDENSBTAP0000001567422UniProt_IDO18836;C6KEF7;MSTN-20123Amino_Acid_Position27324Overlapping_Protein_Domainssuperfamily IPR029034 Cystine-knot cytokine25Overlapping_Protein_FeaturesCHAIN:267:375:Growth/differentiation factor 8.;DISULFID:272:282:{ECO:0000250|UniProtKB:O08689}.;CHAIN:19:375:{ECO:0000256|SAM:SignalP}.;DOMAIN:263:375:TGF_BETA_2.26Gene_Ontology[GO:0005102]:receptor binding;[GO:0005125]:cytokine activity;[GO:0005160]:transforming growth factor beta receptor binding;[GO:0005576]:extracellular region;[GO:0005615]:extracellular space;[GO:0005623]:cell;[GO:0007179]:transforming growth factor beta receptor signaling pathway;[GO:0008083]:growth factor activity;[GO:0008201]:heparin binding;[GO:0010862]:positive regulation of pathway-restricted SMAD protein phosphorylation;[GO:0014732]:skeletal muscle atrophy;[GO:0033673]:negative regulation of kinase activity;[GO:0040007]:growth;[GO:0042803]:protein homodimerization activity;[GO:0042981]:regulation of apoptotic process;[GO:0043408]:regulation of MAPK cascade;[GO:0045662]:negative regulation of myoblast differentiation;[GO:0045893]:positive regulation of transcription, DNA-templated;[GO:0046627]:negative regulation of insulin receptor signaling pathway;[GO:0046716]:muscle cell cellular homeostasis;[GO:0048147]:negative regulation of fibroblast proliferation;[GO:0048468]:cell development;[GO:0048632]:negative regulation of skeletal muscle tissue growth;[GO:0051898]:negative regulation of protein kinase B signaling;[GO:0060395]:SMAD protein signal transduction;[GO:0071549]:cellular response to dexamethasone stimulus

33

Slide34

Sample annotated indel cont.

Field numberField nameValue27Model_AnnotationsPhenotypes_Gene=Muscle hypertrophy|GTR|MedGen|OMIM|Myostatin-related muscle hypertrophy|Gene Reviews;Overlapping_Protein_Features=CHAIN:267:375:Growth/differentiation factor 8.|DISULFID:272:282:{ECO:0000250UniProtKB:O08689}.|CHAIN:19:375:{ECO:0000256SAM:SignalP}.|DOMAIN:263:375:TGF_BETA_2.;28CommentsNumber_Of_Equivalent_Indels=1(0,1);Gene_Status=KNOWN;Transcript_Status=KNOWN;Gene_Biotype=protein_coding;Transcript_Biotype=protein_coding;Length_Downstream_Protein=102(27.20);29Ref_INDELs.30Is_Fully_Knownno

34