wwwbiostatwiscedubmi776 Spring 2021 Daifeng Wang daifengwangwiscedu These slides excluding thirdparty material are licensed under CC BYNC 40 by Anthony Gitter Mark Craven Colin Dewey and ID: 1009130
Download Presentation The PPT/PDF document "Network Biology BMI/CS 776" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. Network BiologyBMI/CS 776www.biostat.wisc.edu/bmi776/ Spring 2021Daifeng Wangdaifeng.wang@wisc.eduThese slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Anthony Gitter, Mark Craven, Colin Dewey and Daifeng Wang1
2. Goals for lectureBiological networksChallenges of integrating high-throughput assaysConnecting relevant genes/proteins with interaction networksResponseNet algorithmEvaluating pathway predictionsClasses of signaling pathway prediction methods2
3. High-throughput screeningWhich genes are involved in which cellular processes?Hit: gene that affects the phenotypePhenotypes include:Growth rateCell deathCell sizeIntensity of some reporterMany others3
4. Types of screensGenetic screeningTest genes individually or in parallelKnockout, knockdown (RNA interference), overexpression, CRISPR/Cas genome editingChemical screeningWhich genes are affected by a stimulus?4
5. Differentially expressed genesCompare mRNA transcript levels between control and treatment conditionsGenes whose expression changes significantly are also involved in the cellular processAlternatively, differential protein abundance or phosphorylation5
6. Interpreting screensScreen hitsDifferentially expressed genesVery few genes detected in both6
7. Assays reveal different parts of a cellular processKEGG7Database representation of a “pathway”
8. Assays reveal different parts of a cellular processGenetic screen hitsDifferentially expressed genes8
9. Pathways connect the disjoint gene listsCan’t rely on pathway databasesHigh-quality, low coverageInstead learn condition-specific pathways computationallyCombine data with generic physical interaction networks9
10. Physical interactionsProtein-protein interactions (PPI)MetabolicProtein-DNA (transcription factor-gene)Genes and proteins are different node typesApplingGrazYeger-Lotem2009Prot AProt BTFGene10
11. Hairball networksNetworks are highly connectedCan’t use naïve strategy to connect screen hits and differentially expressed genesYeger-Lotem200911
12. Identify connections within an interaction networkYeger-Lotem200912
13. Biological Network PropertiesDegree: number of neighbors of a nodePower law degree distributionMost nodes have low degreesFew highly connected nodes (hubs)Robust to random attackse.g., structure resilient to mutations Mutations in hubs can damage the networkModular organizationHigh clustering coefficient (short paths)Efficient signal propagation 13
14. 14Power law degree distributionProbability of finding a highly connected node decreases exponentially with KA. fulgidus (Archae) Bacterium C. elegans (Eukaryote), averaged over 43 organismsH. Jeong et al., Nature, 407 (2000)
15. ModularitySmall highly connected cohesive clusters that combine to form larger unitsCommunication between clusters through hubsHierarchical modularity overlaps with known metabolic functions E. Ravasz et al., Science 297, 1551 -1555 (2002)
16. normalizationm: total number of edges=expected edge weight that would go between i and jsum over nodes within a group (module) edge weight between nodes i and jModularity Q: measurement on strength of network division lowhighBrede, Europhysics Letters, 2010.Newman, PNAS, 2006.Clustering goal: assign each node a module to maximize “modularity” as an objective function (module is a group of highly connected nodes)Measurement of Modularity
17. 17Clustering coefficient Measures the average probability that two neighbors of a node are connectednI: # edges between node I’s neighborsk: # of neighbors of I
18. High degree nodes -> low clustering coefficient CC Network’s modularity -> CC averaged over all nodes Metabolic networks have high intrinsic modularityClustering coefficient E. Ravasz et al., Science 297, 1551 -1555 (2002)
19. Network centralitiesTopological importance of a node19G. Iacono et al., Genome Biology 20 (2019)
20. Network problemsNetwork inferenceInfer network structureMotif findingIdentify common subgraph topologiesPathway or module detectionIdentify subgraphs of genes that perform the same function or active in same conditionNetwork comparison, alignment, queryingConserved modulesIdentify modules that are shared in networks of multiple species/conditions20
21. Network motifsProblem: Find subgraph topologies that are statistically more frequent than expectedBrute force approachCount all topologies of subgraphs of size mRandomize graph (retain degree distribution) and count againOutput topologies that are over/under representedFeed-forward loop: over-represented in regulatory networksnot very common21
22. Gene regulatory network motifs22AP Boyle et al. Nature 512 , 453-456 (2014) doi:10.1038/nature13668
23. Network modulesModules: dense (highly-connected) subgraphs (e.g., large cliques or partially incomplete cliques)Problem: Identify the component modules of a networkDifficulty: definition of module is not preciseHierarchical networks have modules at multiple scalesAt what scale to define modules?23
24. How to define a computational “pathway”Given:Partially directed network of known physical interactions (e.g. PPI, kinase-substrate, TF-gene)Scores on source nodesScores on target nodesDo:Return directed paths in the network connecting sources to targets24
25. Network flow problemFinding an optimal route by minimizing transportation costs from LA to NYCci,j, the cost between City i and City jfi,j = 1 if in route, = 0 if notargminf s.t. constraints 25https://www.visualcapitalist.com/u-s-interstate-highways-transit-map/LANYC
26. ResponseNet optimization goalsConnect screen hits and differentially expressed genesRecover sparse connectionsIdentify intermediate proteins missed by the screensPrefer high-confidence interactionsMinimum cost flow formulation can meet these objectives26
27. Construct the interaction network27ProteinGene
28. Transform to a flow problemST28
29. Max flow on graphsST29Pump flow from sourceFlow conserved to targetIncoming and outgoing flow conserved at each nodeEach edge can tolerate different level of flow or have different preference of sending flow along that edge
30. Weighting interactionsProbability-like confidence of the interactionExample evidence: edge score of 1.016 distinct publications supporting the edgeiRefWeb30
31. Weights and capacities on edgesST(wij, cij)wij from interaction network confidencecij= 1Flow capacity31
32. Find the minimum cost flowSTPrefer no flow on the low-weight edges if alternative paths exist32Return the edges with non-zero flow
33. Formal minimum cost flow33Positive flow on an edge incurs a costCost is greater for low-weight edgesFlow on an edgeParameter controlling the amount of flow from the source
34. 34Flow coming in to a node equals flow leaving the nodeFormal minimum cost flow
35. 35Flow leaving the source equals flow entering the targetFormal minimum cost flow
36. 36Flow is non-negative and does not exceed edge capacityFormal minimum cost flow
37. 37Formal minimum cost flow
38. Linear programmingOptimization problem is a linear programCanonical formPolynomial time complexityMany off-the-shelf solversPractical Optimization: A Gentle IntroductionIntroduction to linear programmingSimplex methodNetwork flowWikipedia38
39. ResponseNet pathways Identifies pathway members that are neither hits nor differentially expressedSte5 recovered when STE5 deletion is the perturbation39
40. ResponseNet summaryAdvantagesComputationally efficientIntegrates multiple types of dataIncorporates interaction confidenceIdentifies biologically plausible networksDisadvantagesDirection of flow is not biologically meaningfulPath length not consideredRequires sources and targetsDependent on completeness and quality of input network40
41. Evaluating pathway predictionsUnlike PIQ, we don’t have a complete gold standard available for evaluationCan simulate “gold standard” pathways from a networkCompare relative performance of multiple methods on independent data41
42. Evaluating pathway predictions42Ritz2016https://www.nature.com/articles/npjsba20162.pdf
43. Evaluating pathway predictions43Ritz2016
44. Evaluating pathway predictions44MacGilvray2018PR curves can evaluate node or edge recovery but not the global pathway structure
45. Evaluation beyond pathway databasesNatural language processing can also help semi-automated evaluation45LiteromeChilibotiHOP
46. Classes of pathway prediction algorithms46
47. 47Classes of pathway prediction algorithms
48. Alternative pathway identification algorithmsk-shortest pathsRuths2007Shih2012Random walks / network diffusion / circuitsTu2006eQTL electrical diagrams (eQED)HotNetInteger programsSignaling-regulatory Pathway INferencE (SPINE)Chasman201448
49. Alternative pathway identification algorithmsPath-based objectivesPhysical Network Models (PNM)Maximum Edge Orientation (MEO)Signaling and Dynamic Regulatory Events Miner (SDREM)Steiner treePrize-collecting Steiner forest (PCSF)Belief propagation approximation (msgsteiner)Omics Integrator implementationHybrid approachesPathLinker: random walk + shortest pathsANAT: shortest paths + Steiner tree49
50. Recent developments in pathway discoveryMulti-task learning: jointly model several related biological conditionsResponseNet extension: SAMNetSteiner forest extension: Multi-PCSFSDREM extension: MT-SDREMTemporal dataResponseNet extension: TimeXNetSteiner forest extension and ST-SteinerTemporal Pathway Synthesizer50
51. Graph embedding for biological networks51Bioinformatics, Volume 36, Issue 4, 15 February 2020, Pages 1241–1251, https://doi.org/10.1093/bioinformatics/btz718
52. Condition-specific genes/proteins used as inputGenetic screen hits (as causes or effects)Differentially expressed genesTranscription factors inferred from gene expressionProteomic changes (protein abundance or post-translational modifications)Kinases inferred from phosphorylationGenetic variants or DNA mutationsEnzymes regulating metabolitesReceptors or sensory proteinsProtein interaction partnersPathway databases or other prior knowledge52