Guide to lobSTR VCF format

Guide to lobSTR VCF format Guide to lobSTR VCF format - Start

Added : 2016-04-22 Views :27K

Download Presentation

Guide to lobSTR VCF format




Download Presentation - The PPT/PDF document "Guide to lobSTR VCF format" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Guide to lobSTR VCF format

Slide1

Guide to lobSTR VCF format

Slide2

Steps to create VCF format

Initial round of lobSTR allelotyping to generate priors on allele frequencies

Second round of lobSTR calling to generate genotype likelihoods and posteriors for all possible genotypes

Merge VCFs from each sample using GATK

Slide3

Fixed Fields

#CHROM

Chromosome of the STR variant

POS

Start position of the STR variant

ID

Set to “.”

REF

Nucleotide at CHROM:POS in the reference genome

Slide4

Fixed Fields

##ALT=<ID=

STRVAR,Description

="Short tandem variation">

Alternate alleles are given as <STRVAR:$ALLELE>, where $ALLELE is the number of base pairs length difference of the alternate allele from the reference sequence

e.g.

ALT

<STRVAR:-4>, <STRVAR:-2>, <STRVAR:2>

Slide5

Fixed Fields

QUAL

Set to -10*log(1-P). Where P=posterior probability of the genotype call given the observed reads.

FILTER

Defaults to “.”

Slide6

INFO fields

##INFO=<ID=

AC,Number

=

A,Type

=

Integer,Description

="Allele count in genotypes, for each ALT allele, in the same order as listed”>

(Standard)

##INFO=<ID=

AF,Number

=

A,Type

=

Float,Description

="Allele Frequency, for each ALT allele, in the same order as listed">

(Standard)

##INFO=<ID=

AN,Number

=1,Type=

Integer,Description

="Total number of alleles in called genotypes">

(Standard)

##INFO=<ID=

DP,Number

=1,Type=

Integer,Description

="Total Depth">

(Standard)

Slide7

INFO fields

##INFO=<ID=

END,Number

=1,Type=

Integer,Description

="End position of variant">

An STR at position chr1: 422395-422435 will have END=422435

##INFO=<ID=

MOTIF,Number

=1,Type=

String,Description

="Repeat motif">

An STR with motif “AAAT” will have MOTIF=AAAT

##INFO=<ID=

REF,Number

=1,Type=

Float,Description

="Reference copy number">

The number of copies of the MOTIF in the reference genome. E.g. if MOTIF=AT and REF=14.5, there are 14.5 copies of AT in the reference genome, spanning 14.5*2=29bp.

Slide8

INFO fields

##INFO=<ID=

VT,Number

=1,Type=

String,Description

="Variant type">

(Standard)

For lobSTR calls, VT=STR.

##INFO=<ID=

set,Number

=1,Type=

String,Description

="Source VCF for the merged record in

CombineVariants

">

(Standard) Files had to be merged in stages to prevent GATK from crashing, so these file names reflect those intermediate files.

Slide9

FORMAT fields

##FORMAT=<ID=

GT,Number

=1,Type=

String,Description

="Genotype">

(Standard)

##FORMAT=<ID=

ALLREADS,Number

=1,Type=

String,Description

="All reads aligned to locus">

Gives the alleles of all reads seen in the form: $allele1|$readcount1;$allele2|$readcount2, etc. where the allele is given in the number of base pairs difference from the reference, and the read count is the number of reads supporting that allele. If no reads fully spanned the STR but there is other data for this locus, this field is set to “NA”.

##FORMAT=<ID=

ALLPARTIALREADS,Number

=1,Type=

String,Description

="All partially spanning reads aligned to locus">

Same format as ALLREADS, but for reads that only partially span the STR. In this case, the allele given is not an actual allele call but an upper bound on the length of the possible allele exhibited by the read if it were to fully span the STR.

Slide10

FORMAT fields

##FORMAT=<ID=

DP,Number

=1,Type=

Integer,Description

="Read Depth">

(Standard)

##FORMAT=<ID=

GB,Number

=1,Type=

String,Description

="Genotype given in

bp

difference from reference">

If an allelotype call was made, this gives the call in the form $A/$B where $A and $B are the number of base pairs difference from reference of each called allele. If no call was made, this is given as “./.”.

Slide11

FORMAT FIELDS

##FORMAT=<ID=

PL,Number

=

G,Type

=

Integer,Description

="Normalized,

Phred

-scaled likelihoods for genotypes as defined in the VCF specification">

For each possible genotype, gives 10*log10(L) where L is the likelihood of the genotype call given the reads seen. The order is as specified in the VCF 4.1 format, for alleles k and j, where k and j are indices into the ALT field, the order in this list of genotype <

allele

[j]

,

allele

[k]

> is (k*(k+1)/2)+j.

##FORMAT=<ID=

GPP,Number

=

G,Type

=

Float,Description

="Genotype Posterior probabilities (

phred

scaled, -10log10)">

For each possible genotype, gives 10*log10(P), where P is the posterior probability of seeing the genotype given the reads seen. Based on priors on allele frequencies (assumes HWE) when available.

Slide12

FORMAT fields

##FORMAT=<ID=

PP,Number

=1,Type=

Float,Description

="Posterior probability of call">

Gives the posterior probability of the maximum a posteriori genotype call, which is that returned in the GT field.

##FORMAT=<ID=

MP,Number

=1,Type=

Float,Description

="Upper bound on maximum partially spanning allele">

If any reads partially spanned the locus, gives the longest possible allele supported by any of those reads.

##FORMAT=<ID=

PC,Number

=1,Type=

Integer,Description

="Coverage by partially spanning reads">

Gives the number of reads partially spanning the STR locus.

Slide13

FORMAT field

##FORMAT=<ID=

STITCH,Number

=1,Type=

Integer,Description

="Number of stitched reads">

The number of paired end reads that overlapped enough to be stitched together into one long read.


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.
Youtube