DAVID D atabase for A nnotation V isualization and I ntegrated D iscovery DAVID Functional Annotation Tool Gene Ontology Protein interaction Protein domain Pathway ID: 458229
Download Presentation The PPT/PDF document "基因功能註解工具" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
基因功能註解工具
:DAVIDSlide2Slide3
D
atabase for
A
nnotation,
V
isualization and
I
ntegrated Discovery (DAVID )
Functional Annotation Tool
Gene Ontology
Protein interaction
Protein domain
Pathway
Disease
Gene ID
Conversion
Gene Functional ClassificationSlide4
上傳基因列表到網站
DAVID
操作流程
Gene Name Batch Viewer
Gene Functional Classification
Functional Annotation Tool
選定
類別以進行分析
取得結果Slide5
上傳基因列表
AFFYMETRIX_3PRIME_IVT_ID
AFFYMETRIX_EXON_GENE_ID
AFFYMETRIX_SNP_ID
AGILENT_CHIP_ID
AGILENT_ID
AGILENT_OLIGO_ID
ENSEMBL_GENE_IDENSEMBL_TRANSCRIPT_IDENTREZ_GENE_IDFLYBASE_GENE_IDFLYBASE_TRANSCRIPT_ID
GENBANK_ACCESSION
GENOMIC_GI_ACCESSION
GENPEPT_ACCESSION
ILLUMINA_ID
IPI_ID
MGI_ID
OFFICIAL_GENE_SYMBOL
PFAM_ID
PIR_ID
PROTEIN_GI_ACCESSION
REFSEQ_GENOMIC
REFSEQ_MRNA
REFSEQ_PROTEIN
REFSEQ_RNA
RGD_ID
SGD_ID
TAIR_ID
UCSC_GENE_ID
UNIGENE
UNIPROT_ACCESSIONUNIPROT_IDUNIREF100_IDWORMBASE_GENE_IDWORMPEP_IDZFIN_IDNot SureSlide6
1.
確定物種
2.
選定後使用
3.Slide7
Functional Annotation Tool
DAVID Gene ID:
It is an internal ID generated on "DAVID Gene Concept" in DAVID system. One DAVID gene ID represents one unique gene cluster belonging to one single gene entry
.
Input Gene list : 817
Map to David Database : 754
David IDs : 734
1.
Genes from your list involved in this annotation categories
2
.
4
.
Single chart report only for this annotation categories.
3.
99 / 734Slide8
Functional Annotation
Chart
Chart Report is an annotation-term-focused view which lists annotation terms and their associated genes under study. To avoid over counting duplicated genes, the Fisher Exact statistics is calculated based on corresponding DAVID gene IDs by which all redundancies in original IDs are removed. All result of Chart Report has to pass the thresholds (by default,
Max.Prob
.<=0.1 and
Min.Count
>=2) in Chart Option section to ensure only statistically significant ones displayed.
Functional Annotation Charta modified Fisher Exact P-Value (
EASE Score
)
每頁可顯示多少結果
List
Total(LT)
- number of genes in the gene list mapping to the
category
of which the term is a
member
Population Hits(PH)
- number of genes in the background gene list mapping to a specific
term
Population Total(PT)
- number of genes in the background gene list mapping to the
category
RT (Related
Term)
Related Term Search can identify other similar termsSlide9
RT (Related Term)
Any given gene is associating with a set of annotation terms. If genes share similar set of those terms, they are most likely involved in similar biological mechanisms. The algorithm adopts
kappa statistics to quantitatively measure the degree of the agreement how genes share the similar annotation terms
. Kappa result ranges from 0 to 1. The higher the value of Kappa, the stronger the agreement.
Any a biological process/term coming from all functional categories listed in
DAVID.Slide10
COG_ONTOLOGY
refers to an ontology from NCBI's COG
database
The
database of Clusters of Orthologous Groups of proteins (COGs): a tool for
genome-scale
analysis of protein functions and
evolutionSP_PIR_KEYWORDS are keywords defined by the SwissProt/Uniprot and PIR (Protein Information Resource) UP_SEQ_FEATURE refers to the annotation category, Uniprot Sequence Feature, found at the Uniprot site, within their report.
Annotation Category - Functional CategoriesSlide11
Annotation Category – Protein domain & Protein Interaction
Protein structureSlide12
GOTerms
are categorized into 3 groups:
BP
- Biological Process
MF
- Molecular Function
CC
- Cellular ComponentGOTERM_BP_1 -> GO term under Biological Process (BP) in the Level 1. GOTERM_BP_ALL -> GO term under Biological Process (BP) in the ALL possible Levels.GOTERM_BP_FAT - Basically this test exams the significance of enriched annotation(GO FAT) filters out very broad GO terms based on a measured specificity of each term (not level-specificity)
Annotation Category - Gene OntologySlide13
Annotation Category-Pathways
Biocarta
KEGGSlide14
挑選
11
項
C
ategories
總共
11
項
C
ategories
Combined View AnnotationSlide15
Functional
Annotation Cluster
Functional Annotation
Clustering
Due to the redundant nature of annotations, Functional Annotation Chart presents similar/relevant annotations repeatedly. It dilutes the focus of the biology in the report. To reduce the redundancy, the newly developed
Functional Annotation Clustering report groups/displays similar annotations together which makes the biology clearer and more focused to be read vs. traditional chart report
. The Functional Annotation Clustering integrates the same techniques of Kappa statistics to measure the degree of the common genes between two annotations, and fuzzy heuristic clustering to classify the groups of similar annotations according kappa values. All gene involved in this annotation cluster
Ease score (modified fisher exact test)
Heat map
調整
Kappa
statistics
的參數
調整
fuzzy heuristic clustering
的參數
P_valueSlide16
Enrichment Score = [ -log(
P_value
1) + -log(
P_value
2) + -log(
P_value
N) ] / n
Initial Group Members (any value >=2; default = 4): the minimum gene number in a seeding group, which affects the minimum size of each functional group in the final. In general, the lower value attempts to include more genes in functional groups, particularly generates a lot small size groups. Final Group Members (any value >=2; default = 4): the minimum gene number in one final group after “cleanup” procedure. In general, the lower value attempts to include more genes in functional groups, particularly generates a lot small size groups. It co-functions with previous parameters to control the minimum size of functional groups. In the final cluster, the number of terms that a cluster must have to be presented in the output.Multi-linkage Threshold (any value between 0% to 100%; default = 50%): It controls how seeding groups merge each other, i.e. two groups sharing the same gene members over the percentage will become one group. The higher percentage, in general, gives sharper separation
i.e. it generates more final functional groups with more tightly associated genes in each group. In addition, changing the parameter does not contribute extra genes into
unclustered
group. Slide17
If
you run both functions with
defualt
setting, they will not be totally overlapped. In general, clustering result may contain more result than chart. In clustering, some 'non-significant' terms could be included due to the link of their 'significant' neigthbors
(co-members in on cluster
).
If
you want to completely cross link the two reports, you should run chart report by setting p-value cutoff to "1" (ground level). Thus, you will have all possible terms with significant or insignificant p-values.Chart vs ClusterSlide18
上傳基因列表到網站
Gene Name Batch Viewer
Gene Functional Classification
Functional Annotation Tool
選定
類別以進行分析
取得結果Slide19
Another Tools in DAVIDSlide20
Gene Name Batch ViewerSlide21
Gene Functional Classification Tool
Term reportSlide22
Gene Functional Classification Tool - Create
sublistSlide23
Gene ID Conversion ToolSlide24
Thank you for your attention