/
IDENTIFICATION OF THE POWER-LAW COMPONENT IDENTIFICATION OF THE POWER-LAW COMPONENT

IDENTIFICATION OF THE POWER-LAW COMPONENT - PowerPoint Presentation

impristic
impristic . @impristic
Follow
342 views
Uploaded On 2020-08-07

IDENTIFICATION OF THE POWER-LAW COMPONENT - PPT Presentation

IN HUMAN TRANSCRIPTOME Vasily V Grinev Associate Professor Department of Genetics Faculty of Biology Belarusian State University Minsk Republic of Belarus DIVERSITY OF SPLICE SITES IN HUMAN GENOMETRANSCRIPTOME ID: 801189

law power human splice power law splice human models function model statistical transcriptome likelihood splicing sites data determination forest

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "IDENTIFICATION OF THE POWER-LAW COMPONEN..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

IDENTIFICATION

OF THE POWER-LAW COMPONENTIN HUMAN TRANSCRIPTOME

Vasily V. Grinev

Associate Professor

Department of Genetics

Faculty of Biology

Belarusian State University

Minsk, Republic of Belarus

Slide2

DIVERSITY OF SPLICE SITES

IN HUMAN GENOME/TRANSCRIPTOMEA graphical representation of the traditional (linear) transcriptional model

(A), splice sites (B) and exon (C) splicing graphs models

of

human RCAN3

gene

organisation

Slide3

DISCRETE POWER-LAW MODEL

The probability mass function

 

Normalization

constant

The cumulative distribution function

The complementary cumulative distribution function

Important equations

Determination of parameters

 

 

 

Hurwitz

zeta function

 

Estimation of the lower bound x

min

by Kolmogorov-Smirnov statistic

 

Determination of the scaling parameter

a

value by maximum likelihood estimator

for x

min

6

 

 

Determination of the scaling parameter a

value

by direct numerical maximization of the likelihood function itself

for xmin < 6

Clauset,A., Shalizi,C.R., Newman,M.N.J. (2009) Power-law distributions in empirical data. SIAM Rev., 51, 661-703.

Newman,M.E.J. (2005) Power laws, Pareto distributions and Zipf’s law. Contemp. Phys., 46, 323-351.

Goldstein,M.L., Morris,S.A., Yen,G.G. (2004) Problems with fitting to the power-law distribution. Eur. Phys. J. B, 41, 255

-

258

.

Slide4

COMPETITIVE STATISTICAL MODELS

1) Power-law

 

2) Truncated

power-law

 

3)

Yule-Simon

 

4)

E

xponential

 

5) Stretched

exponential

 

6) Log-normal

 

7) Poisson

 

The probability mass functions of competitive statistical models

C

omparison

of alternative statistical models

1) Log-likelihood

ratio test

 

2) Akaike

information criterion

 

3) Bayesian

information criterion

 

Vuong,Q.H. (1989) Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57,

307-333.

Akaike,Y. (1974) A new look at the statistical model identification. IEEE Transact. Automat. Control, 19, 716-723

.

Schwarz,G.E

. (1978) Estimating the dimension of a model. Ann. Stat., 6, 461-464.

Slide5

STATISTICAL ANALYSIS CONFIRMS THE PRESENCE OF POWER-LAW COMPONENT IN TRANSCRIPTOME OF KASUMI-1 CELLS

Slide6

USAGE OF EXONS IN ALTERNATIVE SPLICING

FOLLOWS A POWER-LAW IN HUMAN TRANSCRIPTOME

Slide7

USAGE OF EXONS IN ALTERNATIVE SPLICING

FOLLOWS A POWER-LAW IN HUMAN TRANSCRIPTOME

Maximum values of splicing

degrees

from

different models of human genes

Slide8

ARE THERE ANY SPECIFIC FEATURES ASSOCIATED

WITH DIFFERENT CLASSES OF SPLICE SITES?Every

splice site was

annotated with sequence, sequence-related, functional and structural features which were extracted from four types of the genomic/RNA elements

Slide9

RANDOM FOREST BASED DATA MINING

A small set of features allows distinguish betweentwo classes of splice sites in Kasumi-1 cells

Slide10

RANDOM FOREST BASED DATA MINING

Iterative removing of misclassified splice sitesleads to high accuracy of classification

Slide11

RANDOM FOREST BASED DATA MINING

About half of misclassified cases of splice sitescan be explained by some different ways

Slide12

Ilia

M. Ilyushonak

Dr. Petr

V.

Nazarov

Dr. Laurent

Vallar

MANY THANKS TO THE MEMBERS OF OUR TEAM:

Northern Institute for

Cancer Research

Prof. Olaf Heidenreich

Slide13

THANK YOU FOR ATTENTION!