BMICS 776 wwwbiostatwiscedubmi776 Spring 2018 Anthony Gitter gitterbiostatwiscedu These slides excluding thirdparty material are licensed under CC BYNC 40 by Anthony Gitter Mark Craven Colin Dewey ID: 927121
Download Presentation The PPT/PDF document "Identifying Signaling Pathways" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Identifying Signaling Pathways
BMI/CS 776www.biostat.wisc.edu/bmi776/ Spring 2018Anthony Gittergitter@biostat.wisc.edu
These slides, excluding third-party material, are licensed under
CC BY-NC 4.0
by
Anthony
Gitter, Mark Craven, Colin Dewey
Slide2Goals for lecture
Challenges of integrating high-throughput assaysConnecting relevant genes/proteins with interaction networksResponseNet algorithmEvaluating pathway predictionsClasses of signaling pathway prediction methods
2
Slide3High-throughput screening
Which genes are involved in which cellular processes?Hit: gene that affects the phenotypePhenotypes include:Growth rateCell deathCell sizeIntensity of some reporter
Many others
3
Slide4Types of screens
Genetic screeningTest genes individually or in parallelKnockout, knockdown (RNA interference), overexpression, CRISPR/Cas genome editingChemical screeningWhich genes are affected by a stimulus?
4
Slide5Differentially expressed genes
Compare mRNA transcript levels between control and treatment conditionsGenes whose expression changes significantly are also involved in the cellular processAlternatively, differential protein abundance or phosphorylation
5
Slide6Interpreting screens
Screen hits
Differentially expressed genes
Very few genes detected in both
6
Slide7Assays reveal different parts of a cellular process
KEGG
7
Database representation of a “pathway”
Slide8Assays reveal different parts of a cellular process
Genetic screen hits
Differentially expressed genes
8
Slide9Pathways connect the disjoint gene lists
Can’t rely on pathway databasesHigh-quality, low coverageInstead learn condition-specific pathways computationallyCombine data with generic physical interaction networks
9
Slide10Physical interactions
Protein-protein interactions (PPI)MetabolicProtein-DNA (transcription factor-gene)Genes and proteins are different node types
Appling
Graz
Yeger-Lotem2009
Prot
A
Prot
B
TF
Gene
10
Slide11Hairball networks
Networks are highly connectedCan’t use naïve strategy to connect screen hits and differentially expressed genes
Yeger-Lotem2009
11
Slide12Identify connections within an interaction network
Yeger-Lotem2009
12
Slide13How to define a computational “pathway”
Given:Partially directed network of known physical interactions (e.g. PPI, kinase-substrate, TF-gene)Scores on source nodesScores on target nodesDo:Return directed paths in the network connecting sources to targets
13
Slide14ResponseNet optimization goals
Connect screen hits and differentially expressed genesRecover sparse connectionsIdentify intermediate proteins missed by the screensPrefer high-confidence interactionsMinimum cost flow formulation can meet these objectives
14
Slide15Construct the interaction network
15
Protein
Gene
Slide16Transform to a flow problem
S
T
16
Slide17Max flow on graphs
S
T
17
Pump flow from source
Flow conserved to target
Incoming and outgoing flow conserved at each node
Each edge can tolerate different level of flow or have different preference of sending flow along that edge
Slide18Weighting interactions
Probability-like confidence of the interactionExample evidence: edge score of 1.016 distinct publications supporting the edge
iRefWeb
18
Slide19Weights and capacities on edges
S
T
(
w
ij
,
c
ij
)
w
ij
from interaction network confidence
c
ij
= 1
Flow capacity
19
Slide20Find the minimum cost flow
S
T
Prefer no flow on the low-weight edges if alternative paths exist
20
Return the edges with non-zero flow
Slide21Formal minimum cost flow
21
Positive flow on an edge incurs a cost
Cost is greater for low-weight edges
Flow on an edge
Parameter controlling the amount of flow from the source
Slide22Formal minimum cost flow
22
Flow coming in to a node equals flow leaving the node
Slide23Formal minimum cost flow
23
Flow leaving the source equals flow entering the target
Slide24Formal minimum cost flow
24
Flow is non-negative and does not exceed edge capacity
Slide25Formal minimum cost flow
25
Slide26Linear programming
Optimization problem is a linear programCanonical formPolynomial time complexityMany off-the-shelf solversPractical Optimization: A Gentle Introduction
Introduction to linear programming
Simplex method
Network flow
Wikipedia
26
Slide27ResponseNet pathways
Identifies pathway members that are neither hits nor differentially expressed
Ste5 recovered when
STE5
deletion is the perturbation
27
Slide28ResponseNet summary
AdvantagesComputationally efficientIntegrates multiple types of dataIncorporates interaction confidenceIdentifies biologically plausible networks
Disadvantages
Direction of flow is not biologically meaningful
Path length not considered
Requires sources and targets
Dependent on completeness and quality of input network
28
Slide29Evaluating pathway predictions
Unlike PIQ, we don’t have a complete gold standard available for evaluationCan simulate “gold standard” pathways from a networkCompare relative performance of multiple methods on independent data
29
Slide30Evaluating pathway predictions
30Ritz2016
Slide31Evaluating pathway predictions
31Ritz2016
Slide32Evaluating pathway predictions
32
MacGilvray2018
PR curves can evaluate node or edge recovery but not the global pathway structure
Slide33Evaluation beyond pathway databases
Natural language processing can also help semi-automated evaluation33
Literome
Chilibot
iHOP
Slide34Classes of pathway prediction algorithms
34
Slide3535
Classes of pathway prediction algorithms
Slide36Alternative pathway identification algorithms
k-shortest pathsRuths2007Shih2012Random walks / network diffusion / circuitsTu2006eQTL
electrical diagrams (
eQED
)
HotNet
Integer programs
Signaling-regulatory Pathway
INferencE
(
SPINE
)
Chasman2014
36
Slide37Alternative pathway identification algorithms
Path-based objectivesPhysical Network Models (PNM)Maximum Edge Orientation (MEO)Signaling and Dynamic Regulatory Events Miner (
SDREM
)
Steiner tree
Prize-collecting Steiner forest (
PCSF
)
Belief propagation approximation (
msgsteiner
)
Omics Integrator
implementation
Hybrid approaches
PathLinker
: random walk + shortest paths
ANAT
: shortest paths + Steiner tree
37
Slide38Recent developments in pathway discovery
Multi-task learning: jointly model several related biological conditionsResponseNet extension: SAMNetSteiner forest extension: Multi-PCSFSDREM extension:
MT-SDREM
Temporal data
ResponseNet
extension:
TimeXNet
Steiner forest
extension
and
ST-Steiner
Temporal Pathway
Synthesizer
38
Slide39Condition-specific genes/proteins used as input
Genetic screen hits (as causes or effects)Differentially expressed genesTranscription factors inferred from gene expressionProteomic changes (protein abundance or post-translational modifications)Kinases inferred from phosphorylation
Genetic variants or DNA mutations
Enzymes regulating metabolites
Receptors or sensory proteins
Protein interaction partners
Pathway databases or other prior knowledge
39