Ron Shamir School of Computer Science Tel Aviv University April 2013 1 Sources Igor Ulitsky and Ron Shamir Identification of Functional Modules using Network Topology and HighThroughput Data BMC Systems Biology 18 2007 ID: 933860
Download Presentation The PPT/PDF document "1 Joint analysis of regulatory networks ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
1
Joint analysis of regulatory networks and expression profiles
Ron ShamirSchool of Computer ScienceTel Aviv UniversityApril 2013
1
Sources:
Igor Ulitsky and Ron Shamir. Identification of Functional Modules using Network Topology and High-Throughput Data. BMC Systems Biology 1:8 (2007).
Igor Ulitsky and Ron Shamir. Identifying functional modules using expression profiles and confidence-scored protein interactions. Bioinformatics Vol. 25 no. 9 1158-1164 (2009) .
Slide2OutlineBackgroundJoint network and expression profiles
MatisseCezanne
2
Slide3Background
3
Slide4DNA
RNA
protein
transcription
translationThe hard disk
One program
Its output
4
Slide5DNA Microarrays / RNA-seqSimultaneous measurement of expression levels of all genes /
transcripts.Perform 105
-109 measurements in one experimentAllow global view of cellular processes. The most important biotechnological breakthroughs of the last /current decade
http://www.biomedcentral.com/1471-2105/12/323/figure/F2
5
Slide6The Raw Data
genes
experiments
Entries of the Raw Data matrix:
expression levels.Ratios/absolute values/… expression pattern for each gene Profile for each experiment/condition/sample/chip Needs normalization!
6
Slide77EXP
ression ANalyzer and D
isplayERClustering Identify clusters of co-expressed genesCLICK, KMeans, SOM, hierarchical
http://acgt.cs.tau.ac.il/expander
A. Maron, R. Sharan Bioinformatics 03Function. enrichmentGO, TANGOVisualization
Promoter analysis
Analyze TF binding sites of co-regulated genes
PRIMA
Biclustering
Identify homogeneous submatrices
SAMBA
A. Maron-Katz, A. Tanay, C. Linhart, I. Steinfeld, R. Sharan, Y. Shiloh, R. Elkon
BMC Bioinformatics 05
microRNA function inference:
FAME
Ulitsky et al.
Nature Protocols 10
Slide8Networks of Protein-protein interactions (PPIs)Large, readily available resourceRepresentation: Network with nodes=proteins/genes edges=interactions
8
Analysis methods:
Global propertiesMotif content analysis
Complex extractionCross-species comparison
Slide9The hairball syndrome
9
Slide10Potential inroad into pathways and functionCan the network help to improve the analysis?
10
Slide11Analysis of gene expression profiles + a network11
Slide1212Goal
Challenge: Detect
active functional modules: connected subnetwork of proteins whose genes are co-expressed“Where is the action in the network in a particular experiment?”
Slide13Ron Shamir, RNA Antalia, April 08
13
13
Slide1414
Slide1515
Ulitsky & Shamir
BMC Systems Biology 07
Slide16Input: Expression data and a PPI networkOutput: a collection of modulesConnected PPI subnetworksCorrelated expression profiles
Interaction
High expression similarity
http://acgt.cs.tau.ac.il/matisse
16
M
odular
A
nalysis for
T
opology of
I
nteractions and
S
imilarity
SE
ts
Slide17Probabilistic model
Event
Mij: i,j are mates
= highly co-expressed
P(Sij|Mij) ~ N(m , 2m)P(Sij|Mij) ~ N(n ,
2n
)
H
0
: U is a set of unrelated genes
H
1
: U is a
module
= connected subnetwork with high internal similarity
R
i
: gene
i
transcriptionally regulated
m
: fraction of mates out of module gene pairs that are transcriptionally regulated
m
= P(
M
ij
|
R
i
R
j
, H
1
)
p
m
: fraction of mates out of all gene pairs that are transcriptionally regulated
17
Slide18Probabilistic model (2)Is connected gene set U a module? Assuming pair indep:Define
mij=
m P(Ri
)P(R
j)Define nij= pm P(Ri)P(Rj).Likelihood ratio Pr(Data|H1)/Pr Data|H0)Taking log: sum of terms
ij
:
18
Slide19Probabilistic model - summary
Similarities:
mixture of two GaussiansFor a candidate group U, the likelihood ratio of originating from a module or from the background is
Module score = Gene group likelihood ratio =
sum over all the gene pairsFind connected subgraphs U with high WU19
Slide20ComplexityFinding heaviest connected subgraph: NP hard even without connectivity constraints (+/- edge weights)Devised a heuristic for the problem
20
Slide21MATISSE workflow
Seed generation
Greedy optimization
Significance filtering
Slide22Finding seedsThree seeding alternatives testedAll alternatives build a seed and delete it from the networkBuilding small seeds around single nodes:Best neighborsAll neighbors
Approximating the heaviest subgraphDelete low-degree nodes and record the heaviest subnetwork found
Slide23Greedy optimization
Simultaneous optimization of all the seeds
The following steps are considered:Node additionNode removalAssignment changeModule merge
Slide24Front vs. Back nodesOnly a fraction of the genes (front nodes) have meaningful similarity values
MATISSE can link them using other genes (back nodes).
Back nodes correspond to:Unmeasured transcriptsPost-translational regulationPartially regulated pathways
24
Slide25Advantages of MATISSENo p-vals needed for measurementsWorks when a fraction of the genes expression patterns are informativeCan handle any similarity dataNo prespecified number of modules
25
Slide26Test case: Yeast osmotic shockNetwork
: 65,990 PPIs & protein-DNA interactions among 6,246 genesExpression: 133 experimental conditions – response of perturbed strains to osmotic shock (O’Rourke & Herskowitz 04)
Front nodes: 2,000 genes with the highest variance
26
Slide27Pheromone response subnetwork
Back
Front
27
Slide28Performance comparison
% of modules with category enrichment at p< 10
-3
% annotations enriched at p<10
-3 in modules28
Slide29GO and promoter analysis
(c)
29
Slide30Application to stem cells~150 human stem cell lines of diverse types profiled using microarraysClustered profiles into groups
Adjusted Matisse to seek subnetworks that characteristic to each group Focused analysis on pluripotent stem cells
F. Müller, L. Laurent, D. Kostka, I. Ulitsky, R. Williams, C. Lu, I. Park, M. Rao, P. Schwartz, N. Schmidt, J. Loring Nature 08
30
Slide31Pluripotent stem cells network
Highlights the key protein machinery underlying pluripotency
31
Slide32Ulitsky & Shamir Bioinformatics 2009
32
Slide33Accounting for PPI confidencePPI-based analysis is made difficult by abundant false positive
/ negative interactionsVarious methods can assign
confidence (probability) to individual edgesIdea: seek modules that are connected with high probability
Ulitsky & Shamir
Bioinformatics,
2009
33
Slide34What is a confidently connected module?With high probability,
any two parts of the module are connected by an edgeAccommodates both sparse and dense pathways
Accommodates genes with low-confidence connectivity with many module genesConfidently-connected modules can be found efficiently
34
Slide35Connected with high probability?Every two genes are connected by a confident path Bias to dense pathways
There is a minimum spanning tree with high-confidence edges Same as ignoring low-confidence edges
An edge connects any two parts of the module are connected with high probability
35
Slide36CEZANNE: (Co-Expression
Zone ANalysis using
NEtworks)Edge probability p(e) Edge weight
–log(1-p(e))For any W
U, ≥1 edge connects W with U\W with probability q (e.g. 0.95) The weight of the minimum cut of U is at least -log(1-q)Algorithm: among the subnets whose minimum cut exceeds -log(1-q) find the one with the maximum co-expression scoreP({A},{B,C,D})=1-0.3*0.3=0.91
P
({A,C,D},{
B
})=0.94
P
({
A,B
},{C,D})=0.94
P
({
A,B,D
},{C})=0.994
minimum cut
0.7
0.9
0.7
0.8
A
B
C
D
36
Slide37How to find confidently connected modules?Seed identification
: Run MATISSE ignoring edge weights, then “slice” the modules using minimum cut, until all subnetworks are “legal”Greedy optimization (how to find legal moves?):Adding nodes is easy to test (positive edge weights)
Merging modules is easy to test(Re)moving modules: requires maintaining the set of ‘crucial’ nodes in each moduleSolvable in minutes on real world examples
37
Slide38DNA damage response in S. cerevisiae47 DNA Damage Response expression profiles
(Gasch et al., 01)Front nodes: 2,074 genes with at least two-fold expression change
Network and confidence values: purification enrichment (PE) scores (Collins et al. 07)
38
Slide39Module size
GO biological process
p-value
GO-slim protein complexes
p-value346
ribosome biogenesis and assembly
1.2·10
-117
ribosome
5.9·10
-91
translation
1.0·10
-85
eukaryotic 43S preinitiation complex
3.8·10
-49
rRNA processing
7.5·10
-79
small nucleolar ribonucleoprotein complex
1.5·10
-41
35S primary transcript processing
4.6·10
-44
DNA-directed RNA polymerase III complex
3.1·10
-17
ribosome assembly
4.3·10
-39
exosome (RNase complex)
4.4·10
-15
ribosomal large subunit biogenesis
9.2·10
-14
DNA-directed RNA polymerase I complex
5.7·10
-14
rRNA modification
4.4·10
-12
Noc complex
3.2·10
-6
38
protein catabolism
1.8·10
-46
proteasome complex (sensu Eukaryota)
5.7·10
-71
proteolysis
9.0·10
-44
proteasome core complex (sensu Eukaryota)
9.4·10
-32
ubiquitin cycle
1.1·10
-42
12
histone acetylation
3.6·10
-13
histone acetyltransferase complex
2.1·10
-12
chromatin modification
5.9·10
-11
transcription from RNA polymerase II promoter
1.4·10
-6
12
translation
1.1·10
-14
ribosome
1.4·10
-15
12
nuclear mRNA splicing, via spliceosome
3.5·10
-21
spliceosome complex
3.5·10
-17
small nuclear ribonucleoprotein complex
2.5·10
-15
10
barbed-end
actin
filament capping
4.8·10
-6
F-actin capping protein complex
4.8·10
-6
endocytosis
1.1·10
-5
cytoskeleton organization and biogenesis
2.8·10
-5
8
establishment and/or maintenance of chromatin architecture
1.1·10
-5
chromatin remodeling complex
4.6·10
-6
7
glycogen metabolism
3.0·10
-8
protein phosphatase type 1 complex
3.3·10
-5
sporulation
(
sensu
Fungi)
2.0·10
-6
6
translation
1.1·10
-7
ribosome
4.0·10
-8
6
tRNA
processing
2.5·10
-14
ribonuclease P complex
9.2·10
-8
rRNA
processing
2.2·10
-9
4
trehalose biosynthesis
6.8·10
-14
alpha,alpha-trehalose-phosphate synthase complex (UDP-forming)
6.8·10
-14
4
ubiquitin-dependent protein catabolism
5.2·10
-7
3
pseudohyphal growth
9.8·10
-7
cAMP-dependent protein kinase complex
9.6·10
-7
3
proteasome assembly
3.2·10
-6
protein folding
3.9·10
-6
DNA damage response modules
Cytoplasmic
ribosome biogenesis
Proteasome
Mitochondrial ribosome – small subunit
Mitochondrial ribosome – large subunit
Spliceosome
Novel
actin
-localized pathway?
Hsp90
PKA
Trehalose
biosynthesis
Ribonuclease
P
Suggests SWS2 a novel member
Novel pathway enriched with
actin
-localized proteins; Supported in other datasets; Similar deletion phenotypes
39
Slide40Comparison with prior work
Combined measure of sensitivity
(% of annotations enriched)and specificity (% of modules enriched) with p<0.001
Clustering of only expression data
Clustering expression & network (Hanisch et al., 2002)Expression similarity + network connectivity
Expression similarity + confident network connectivity
40
Slide4141
Slide42SummaryAlgorithms using co-expression + networks to detect functionally coherent modules
Accommodate both sparse and dense subnetworks
Subnetworks linked to osmotic shock and DNA damageA general framework for confident connectivity in PPI networksThe next steps:
Co-expression is not the only interesting way to utilize GE data
Scaling to complex human datasets42