IN HUMAN TRANSCRIPTOME Vasily V Grinev Associate Professor Department of Genetics Faculty of Biology Belarusian State University Minsk Republic of Belarus DIVERSITY OF SPLICE SITES IN HUMAN GENOMETRANSCRIPTOME ID: 801189
Download The PPT/PDF document "IDENTIFICATION OF THE POWER-LAW COMPONEN..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
IDENTIFICATION
OF THE POWER-LAW COMPONENTIN HUMAN TRANSCRIPTOME
Vasily V. Grinev
Associate Professor
Department of Genetics
Faculty of Biology
Belarusian State University
Minsk, Republic of Belarus
Slide2DIVERSITY OF SPLICE SITES
IN HUMAN GENOME/TRANSCRIPTOMEA graphical representation of the traditional (linear) transcriptional model
(A), splice sites (B) and exon (C) splicing graphs models
of
human RCAN3
gene
organisation
Slide3DISCRETE POWER-LAW MODEL
The probability mass function
Normalization
constant
The cumulative distribution function
The complementary cumulative distribution function
Important equations
Determination of parameters
Hurwitz
zeta function
Estimation of the lower bound x
min
by Kolmogorov-Smirnov statistic
Determination of the scaling parameter
a
value by maximum likelihood estimator
for x
min
6
Determination of the scaling parameter a
value
by direct numerical maximization of the likelihood function itself
for xmin < 6
Clauset,A., Shalizi,C.R., Newman,M.N.J. (2009) Power-law distributions in empirical data. SIAM Rev., 51, 661-703.
Newman,M.E.J. (2005) Power laws, Pareto distributions and Zipf’s law. Contemp. Phys., 46, 323-351.
Goldstein,M.L., Morris,S.A., Yen,G.G. (2004) Problems with fitting to the power-law distribution. Eur. Phys. J. B, 41, 255
-
258
.
Slide4COMPETITIVE STATISTICAL MODELS
1) Power-law
2) Truncated
power-law
3)
Yule-Simon
4)
E
xponential
5) Stretched
exponential
6) Log-normal
7) Poisson
The probability mass functions of competitive statistical models
C
omparison
of alternative statistical models
1) Log-likelihood
ratio test
2) Akaike
information criterion
3) Bayesian
information criterion
Vuong,Q.H. (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57,
307-333.
Akaike,Y. (1974) A new look at the statistical model identification. IEEE Transact. Automat. Control, 19, 716-723
.
Schwarz,G.E
. (1978) Estimating the dimension of a model. Ann. Stat., 6, 461-464.
Slide5STATISTICAL ANALYSIS CONFIRMS THE PRESENCE OF POWER-LAW COMPONENT IN TRANSCRIPTOME OF KASUMI-1 CELLS
Slide6USAGE OF EXONS IN ALTERNATIVE SPLICING
FOLLOWS A POWER-LAW IN HUMAN TRANSCRIPTOME
Slide7USAGE OF EXONS IN ALTERNATIVE SPLICING
FOLLOWS A POWER-LAW IN HUMAN TRANSCRIPTOME
Maximum values of splicing
degrees
from
different models of human genes
Slide8ARE THERE ANY SPECIFIC FEATURES ASSOCIATED
WITH DIFFERENT CLASSES OF SPLICE SITES?Every
splice site was
annotated with sequence, sequence-related, functional and structural features which were extracted from four types of the genomic/RNA elements
Slide9RANDOM FOREST BASED DATA MINING
A small set of features allows distinguish betweentwo classes of splice sites in Kasumi-1 cells
Slide10RANDOM FOREST BASED DATA MINING
Iterative removing of misclassified splice sitesleads to high accuracy of classification
Slide11RANDOM FOREST BASED DATA MINING
About half of misclassified cases of splice sitescan be explained by some different ways
Slide12Ilia
M. Ilyushonak
Dr. Petr
V.
Nazarov
Dr. Laurent
Vallar
MANY THANKS TO THE MEMBERS OF OUR TEAM:
Northern Institute for
Cancer Research
Prof. Olaf Heidenreich
Slide13THANK YOU FOR ATTENTION!