/
Network Biology BMI/CS 776 Network Biology BMI/CS 776

Network Biology BMI/CS 776 - PowerPoint Presentation

vivian
vivian . @vivian
Follow
66 views
Uploaded On 2023-07-17

Network Biology BMI/CS 776 - PPT Presentation

wwwbiostatwiscedubmi776 Spring 2021 Daifeng Wang daifengwangwiscedu These slides excluding thirdparty material are licensed under CC BYNC 40 by Anthony Gitter Mark Craven Colin Dewey and ID: 1009130

pathway flow edge network flow pathway network edge genes cost nodes interaction node expressed highly gene minimum module differentially

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Network Biology BMI/CS 776" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Network BiologyBMI/CS 776www.biostat.wisc.edu/bmi776/ Spring 2021Daifeng Wangdaifeng.wang@wisc.eduThese slides, excluding third-party material, are licensed under CC BY-NC 4.0 by Anthony Gitter, Mark Craven, Colin Dewey and Daifeng Wang1

2. Goals for lectureBiological networksChallenges of integrating high-throughput assaysConnecting relevant genes/proteins with interaction networksResponseNet algorithmEvaluating pathway predictionsClasses of signaling pathway prediction methods2

3. High-throughput screeningWhich genes are involved in which cellular processes?Hit: gene that affects the phenotypePhenotypes include:Growth rateCell deathCell sizeIntensity of some reporterMany others3

4. Types of screensGenetic screeningTest genes individually or in parallelKnockout, knockdown (RNA interference), overexpression, CRISPR/Cas genome editingChemical screeningWhich genes are affected by a stimulus?4

5. Differentially expressed genesCompare mRNA transcript levels between control and treatment conditionsGenes whose expression changes significantly are also involved in the cellular processAlternatively, differential protein abundance or phosphorylation5

6. Interpreting screensScreen hitsDifferentially expressed genesVery few genes detected in both6

7. Assays reveal different parts of a cellular processKEGG7Database representation of a “pathway”

8. Assays reveal different parts of a cellular processGenetic screen hitsDifferentially expressed genes8

9. Pathways connect the disjoint gene listsCan’t rely on pathway databasesHigh-quality, low coverageInstead learn condition-specific pathways computationallyCombine data with generic physical interaction networks9

10. Physical interactionsProtein-protein interactions (PPI)MetabolicProtein-DNA (transcription factor-gene)Genes and proteins are different node typesApplingGrazYeger-Lotem2009Prot AProt BTFGene10

11. Hairball networksNetworks are highly connectedCan’t use naïve strategy to connect screen hits and differentially expressed genesYeger-Lotem200911

12. Identify connections within an interaction networkYeger-Lotem200912

13. Biological Network PropertiesDegree: number of neighbors of a nodePower law degree distributionMost nodes have low degreesFew highly connected nodes (hubs)Robust to random attackse.g., structure resilient to mutations Mutations in hubs can damage the networkModular organizationHigh clustering coefficient (short paths)Efficient signal propagation 13

14. 14Power law degree distributionProbability of finding a highly connected node decreases exponentially with KA. fulgidus (Archae) Bacterium C. elegans (Eukaryote), averaged over 43 organismsH. Jeong et al., Nature, 407 (2000)

15. ModularitySmall highly connected cohesive clusters that combine to form larger unitsCommunication between clusters through hubsHierarchical modularity overlaps with known metabolic functions E. Ravasz et al., Science 297, 1551 -1555 (2002)

16. normalizationm: total number of edges=expected edge weight that would go between i and jsum over nodes within a group (module) edge weight between nodes i and jModularity Q: measurement on strength of network division lowhighBrede, Europhysics Letters, 2010.Newman, PNAS, 2006.Clustering goal: assign each node a module to maximize “modularity” as an objective function (module is a group of highly connected nodes)Measurement of Modularity

17. 17Clustering coefficient Measures the average probability that two neighbors of a node are connectednI: # edges between node I’s neighborsk: # of neighbors of I

18. High degree nodes -> low clustering coefficient CC Network’s modularity -> CC averaged over all nodes Metabolic networks have high intrinsic modularityClustering coefficient E. Ravasz et al., Science 297, 1551 -1555 (2002)

19. Network centralitiesTopological importance of a node19G. Iacono et al., Genome Biology 20 (2019)

20. Network problemsNetwork inferenceInfer network structureMotif findingIdentify common subgraph topologiesPathway or module detectionIdentify subgraphs of genes that perform the same function or active in same conditionNetwork comparison, alignment, queryingConserved modulesIdentify modules that are shared in networks of multiple species/conditions20

21. Network motifsProblem: Find subgraph topologies that are statistically more frequent than expectedBrute force approachCount all topologies of subgraphs of size mRandomize graph (retain degree distribution) and count againOutput topologies that are over/under representedFeed-forward loop: over-represented in regulatory networksnot very common21

22. Gene regulatory network motifs22AP Boyle et al. Nature 512 , 453-456 (2014) doi:10.1038/nature13668

23. Network modulesModules: dense (highly-connected) subgraphs (e.g., large cliques or partially incomplete cliques)Problem: Identify the component modules of a networkDifficulty: definition of module is not preciseHierarchical networks have modules at multiple scalesAt what scale to define modules?23

24. How to define a computational “pathway”Given:Partially directed network of known physical interactions (e.g. PPI, kinase-substrate, TF-gene)Scores on source nodesScores on target nodesDo:Return directed paths in the network connecting sources to targets24

25. Network flow problemFinding an optimal route by minimizing transportation costs from LA to NYCci,j, the cost between City i and City jfi,j = 1 if in route, = 0 if notargminf s.t. constraints 25https://www.visualcapitalist.com/u-s-interstate-highways-transit-map/LANYC

26. ResponseNet optimization goalsConnect screen hits and differentially expressed genesRecover sparse connectionsIdentify intermediate proteins missed by the screensPrefer high-confidence interactionsMinimum cost flow formulation can meet these objectives26

27. Construct the interaction network27ProteinGene

28. Transform to a flow problemST28

29. Max flow on graphsST29Pump flow from sourceFlow conserved to targetIncoming and outgoing flow conserved at each nodeEach edge can tolerate different level of flow or have different preference of sending flow along that edge

30. Weighting interactionsProbability-like confidence of the interactionExample evidence: edge score of 1.016 distinct publications supporting the edgeiRefWeb30

31. Weights and capacities on edgesST(wij, cij)wij from interaction network confidencecij= 1Flow capacity31

32. Find the minimum cost flowSTPrefer no flow on the low-weight edges if alternative paths exist32Return the edges with non-zero flow

33. Formal minimum cost flow33Positive flow on an edge incurs a costCost is greater for low-weight edgesFlow on an edgeParameter controlling the amount of flow from the source

34. 34Flow coming in to a node equals flow leaving the nodeFormal minimum cost flow

35. 35Flow leaving the source equals flow entering the targetFormal minimum cost flow

36. 36Flow is non-negative and does not exceed edge capacityFormal minimum cost flow

37. 37Formal minimum cost flow

38. Linear programmingOptimization problem is a linear programCanonical formPolynomial time complexityMany off-the-shelf solversPractical Optimization: A Gentle IntroductionIntroduction to linear programmingSimplex methodNetwork flowWikipedia38

39. ResponseNet pathways Identifies pathway members that are neither hits nor differentially expressedSte5 recovered when STE5 deletion is the perturbation39

40. ResponseNet summaryAdvantagesComputationally efficientIntegrates multiple types of dataIncorporates interaction confidenceIdentifies biologically plausible networksDisadvantagesDirection of flow is not biologically meaningfulPath length not consideredRequires sources and targetsDependent on completeness and quality of input network40

41. Evaluating pathway predictionsUnlike PIQ, we don’t have a complete gold standard available for evaluationCan simulate “gold standard” pathways from a networkCompare relative performance of multiple methods on independent data41

42. Evaluating pathway predictions42Ritz2016https://www.nature.com/articles/npjsba20162.pdf

43. Evaluating pathway predictions43Ritz2016

44. Evaluating pathway predictions44MacGilvray2018PR curves can evaluate node or edge recovery but not the global pathway structure

45. Evaluation beyond pathway databasesNatural language processing can also help semi-automated evaluation45LiteromeChilibotiHOP

46. Classes of pathway prediction algorithms46

47. 47Classes of pathway prediction algorithms

48. Alternative pathway identification algorithmsk-shortest pathsRuths2007Shih2012Random walks / network diffusion / circuitsTu2006eQTL electrical diagrams (eQED)HotNetInteger programsSignaling-regulatory Pathway INferencE (SPINE)Chasman201448

49. Alternative pathway identification algorithmsPath-based objectivesPhysical Network Models (PNM)Maximum Edge Orientation (MEO)Signaling and Dynamic Regulatory Events Miner (SDREM)Steiner treePrize-collecting Steiner forest (PCSF)Belief propagation approximation (msgsteiner)Omics Integrator implementationHybrid approachesPathLinker: random walk + shortest pathsANAT: shortest paths + Steiner tree49

50. Recent developments in pathway discoveryMulti-task learning: jointly model several related biological conditionsResponseNet extension: SAMNetSteiner forest extension: Multi-PCSFSDREM extension: MT-SDREMTemporal dataResponseNet extension: TimeXNetSteiner forest extension and ST-SteinerTemporal Pathway Synthesizer50

51. Graph embedding for biological networks51Bioinformatics, Volume 36, Issue 4, 15 February 2020, Pages 1241–1251, https://doi.org/10.1093/bioinformatics/btz718

52. Condition-specific genes/proteins used as inputGenetic screen hits (as causes or effects)Differentially expressed genesTranscription factors inferred from gene expressionProteomic changes (protein abundance or post-translational modifications)Kinases inferred from phosphorylationGenetic variants or DNA mutationsEnzymes regulating metabolitesReceptors or sensory proteinsProtein interaction partnersPathway databases or other prior knowledge52