/
expression mapping covering whole genomes duringdisease, we seek compu expression mapping covering whole genomes duringdisease, we seek compu

expression mapping covering whole genomes duringdisease, we seek compu - PDF document

test
test . @test
Follow
392 views
Uploaded On 2016-07-23

expression mapping covering whole genomes duringdisease, we seek compu - PPT Presentation

19httpwwwsmistanfordeduprojectshelixpsb98liangpdffor thedynamics of RNA andbiochemical recognition orsignaling processes Theregulatory molecules thatcontrol the expression ofthemselves the ID: 417178

19http://www.smi.stanford.edu/projects/helix/psb98/liang.pdffor thedynamics RNA andbiochemical

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "expression mapping covering whole genome..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

expression mapping covering whole genomes duringdisease, we seek computational methods to sets. Is it possible, in principle, tocompletely infer a complex regulatory network architecture patterns of its variables? We investigated this of genetic networks. Trajectories, or state transition tables of Boolean information between input states and output states, one is able to infer the of input elements controlling each element or gene in the network. This exact for complete state transition tables. We implemented(REVEAL) in a C and found theproblem to be tractable within the conditions tested so far. For n=50 and k=3 (inputs per element), the analysis of incomplete state (100 state transition pairs out of a possible 10) reliably wiring sets. While this study is limited to networks, the algorithm is generalizable to application to realistic biological sets. Theability to adequately solve the inverse problem may enable in-depth analysis ofVirtually all molecular inputs andoutputs, forming a complex feedback network. The information for the constructionmaintenance of this signaling system is stored in the genome. The DNA 19http://www.smi.stanford.edu/projects/helix/psb98/liang.pdffor thedynamics of RNA andbiochemical recognition orsignaling processes. Theregulatory molecules thatcontrol the expression ofthemselves theproducts of and within aproximal genetic network ofcomplex webs involvingvarious intra- andsystems on the one depend on the expression ofthe genes that encode them, and on the other hand control the expression of genes asthe signals terminate at transcriptional regulation. All in all, the information in the DNA dynamics of , the state ofwhich at a particular time point should be reflected in gene expression patternsSniegoski, 1996). We the basic tools to with inferring the from time series or state transition sets (Somogyi et A rational approach to designing genetic network analysis tools is based ongenerating model systems on which the performance of the tools can be tested. TheGenes correspond to correspond to the result of a signaling , resulting inaccording to Boolean rules. Given a set ofelements, wiring, rules, a particular trajectory or the state final repeating state cycle, for the simple reason that the network only has 2 states,must be found). An attractor may be a single state (corresponding to a Òsteady stateÓ) or may comprise several inputoutputABCA'B'C'000000001010010100011111100010101011110111111111 Fig. 1 A simple Boolean network. a) Wiring diagram. b)Logical (Boolean) rules. c) Complete state the state at time=t, the output column (elements may be envisioned as the response to injury, or even adaptation of change in nutrient environment in bacteria (see Kauffman, 1993; Somogyi andTesting of algorithms for extracting network transition measurements will require knowledge of the original network that was tobe inferred. This is not yet possible with living systems. Using we make about living genetic networks regarding size, connectivity, redundancy andcomplexity, we simulate these conditions in works against these confidence in warranted. Below we use systematic mutual information analysis of provides us quantitative information measure, theShannon entropy, H. The Shannon entropy is defined in terms of the probability ofsymbol or event, p, within a given log pA few illustrations (Figs. 2 & 3) of a binary system shall help explain thebehavior of H. In a binary system, an element, X, may be in either of s=2 states, or . Over a particular sequence of events (Fig. 2a), the sum of theprobabilities of X being , p(1) or , p(0) must be equal to ()()()its maximumwhen the on and off states are equiprobable (Fig. 3a), i.e. the system is using information carrying state to its fullest possible extent. As one state becomes probable than the other, H decreases - the system is becoming biased. In theone probability is unity , occurs when measured in ÒbitsÓ (binary digits), when using the 21http://www.smi.stanford.edu/projects/helix/psb98/liang.pdfOur aim is toto establish functionalelements of a network. In asystem of 2 binary elements,X (index i) essentially as above log p log p , andH(X, Y) = - log pThere are 2 sequences of X and Y,are related as In words, the uncertainty of X and theremaining uncertainty of Y given of X, H(Y|X), i.e. the information in Y that is not with X, sum to theWe can find an expression forshared or referred to as Òrate of between an between X corresponds to the remaining information of X X0111111000Y0001100111132014 Fig. 2 Determination of H. a) Single calculated from frequency of values of X and Y. b) Distribution of pairs. H is 00.250.50.751sum of if we remove the information of X that iswith Y. Using the directly in terms of the original entropies;this formulation will be important for thediagrams of Fig. 4 illustrate thethese measures. Wewill use these information principles tothe critical connections network elements from binary network stateexplored in the classification of(Somogyi & Fuhrman, The core of REVEAL: strategy of our algorithm is touse mutual information measures to the wiring relationships from state transitionlead to tables of the rules. We shall explain the5) in the analysis of the network example of We begin by determining the mutual information matrix (Fig. 5) of allthe output state of an element, AÕ. If H(X) + H(Y)H(Y) H(X|Y)H(Y) Fig. 4 Venn diagrams of relationships. In add theportions of both squares todetermine one of the the ()()()M(X,Y). Thesmall corner rectangles information that X and Y have incommon. H(Y) is with the corner rectangle onthe left instead of the right to 23http://www.smi.stanford.edu/projects/helix/psb98/liang.pdfH(X)=H(AÕ,X), i.e. it is not even necessary to calculate M(AÕ,X) explicitly, makingthe computation marginally faster. The measurement of M must be precise in the analysis of many state transition pairs, i.e. the determination of p probability) must be able to distinguish a change of (T=number of state H(A)1.00H(B)1.00H(C)1.00inputoutputH(A')1.00BA'H(A',A)2.000.0000H(A',B)1.00H(A',C)2.00M(A',C)0.00inputoutputH(B')0.81ACB'H(B',A)1.500.38000 H(B',B)1.810.00011H(B',C)1.50M(B',C)0.310.381011)2.50M(B',[A,B])0.31M(B',[A,B]) / H(B')0.381111)2.50M(B',[B,C])0.31M(B',[B,C]) / H(B')0.38H(B',[A,C])2.00M(B',[A,C])0.81M(B',[A,C]) / H(B')inputoutputABCC'1.000000H(C',A)1.81M(C',A)0.190.190010 H(C',B)1.81M(C',B)0.190.190100H(C',C)1.81M(C',C)0.190.1901111)2.50M(C',[A,B])0.50M(C',[A,B]) / H(C')0.5010000)2.50M(C',[B,C])0.50M(C',[B,C]) / H(C')0.5010111)2.50M(C',[A,C])0.50M(C',[A,C]) / H(C')0.5011011)3.00M(C',[A,B,C])1.00M(C',[A,B,C]) / H(C')1.001111 H(X) = - å p(x) log p(x)H(X,Y) = - M(X,Y) = H(X) + H(Y) - H(X,Y)M(X,[Y,Z]) = H(X) + H(Y,Z) - H(X,Y,Z) 2 3 4 5 Fig. 5 Outline of progressive M-analysis underlying shown in Fig. 1. Hs and Ms are calculated from the look-up tables according to thenetwork wiring is extracted by M-analysis (left, Pacific Symposium on Biocomputing 3:18-29 (1998) The look-up table for the input/output pairs (Fig. 1) constitutes the rule. If not all state transitions can be explained in terms of k=1, we will entropies of input pair combinations with the remaining unsolved unsolved )(Õ)more concisely, H(BÕ,X,Y)=H(X,Y), then the pair [X,Y]completely determines BÕ. In Fig. 5, no single input can predict BÕ, but the pair [A, If not all elements can be resolved in terms of k=1 k=2, the next step is tothe entropies of input triplet combinations with the remaining output elements. In our (Fig. 5), since This applies to networks of any size. If not all trajectories can be explained in termsof k=i-1, i-2 . . . 1 inputs, the pursues the entropies of i-let i-let 1, X2, X3,. .. Xi])=H(Y), or ,. . . X)= H(X,. . . X) then the i-let,i-let,1,X2,X3,. . .Xi] completely determines Y (Y=output element). Naturally, the inputvalue combinations of the state transition tables covering Y andd1,X2,X3,. . .Xi] define the look-up table of the rule.The advantage of this algorithm is that simple networks can be calculated quickly just by comparing Hs of state transition pairs. The algorithm will the Hs for higher k only as required. Of course, as k increases, the calculations ofbelow). The goal is obviously tominimize the number of computationally intensive operations. We (e.g. minimization of kbased on probable rule restrictions. Moreover, REVEAL isbiologically feasible rule 2 input rules,many do not depend on one or more of their equivalent to a k; Somogyi & Fuhrman, 1997). Since we can onlydetect rules that truly depend on all of their inputs, the best we can do is to infer the 25http://www.smi.stanford.edu/projects/helix/psb98/liang.pdfrule with minimum k. For this reason, the k-input rules we used inrules that trulydepend on all their inputs. Two of the ten two-input rules, exclusive or and, may be unlikely to behavior in networks (atypical for biological networks), would bedifficult to encode in biomolecular interactions. One may For k=3 rules, there are 218 rules of k=3, 30 of k=2, 6 of k=1 and 2=0. Of course, we construction of model networks to rules of aneffective k. Moreover, k=3 rules may be according to In order to infer the rule for a particular gene, our strategy is to first test ifit is an effective one-input rule. Since the input gene could be anywhere inthe network, there possible inputs one is tested in turnFor genes whose output isnot determined solely by any one effective k for the rule of that gene isthan one. We next gene is determined by a 12 pairs of possible inputs for a two-inputrule. For each of the these input pairs, we use the M-analysis to the input pair for the gene. In kNk!() possible inputs for a k-input rule. All of the inputIn principle, the information theoretic computation of the state transition pairs of the network for each of possible wirings of the k-input rule. For a network of =50), thenumber of configurations becomes large to compute. Fortunately, the for determining the causality relationship in the M-analysis is using anyfinite input set that the same set is used in involved. For example, the criterion that CÕ is theoutput of A and B is H([A,B],CÕ)=H(A,B). A finite set of patterns [A,B] can be used to construct a 4-bit histogram. From the histogram, we obtain the probabilities needed to compute H(A,B). H([A,B],CÕ) is computed froman 8-bit histogram using the same set of input patterns with the output[A,B] is Õ)()will be number of mis-identifications because of incidental degeneracy. In order to the size of the sample set needed to uniquely identify the right wiring for a gene, wecompute the probability of mis-identification as a function of increasing the sampleused in computing thehistogram. The probability is computed by counting the number of input wiringsnormalized to wirings for ak-input rule. Fig. 6 shows that the probability . With a , much smaller than the total number of allThe networks used in testing REVEAL using a mixture of 020406080100 Fig. 6 Reduction of mis-identified network wiring number oferroneous wirings identified by the M-analysis (normalized) versus the numberof state k value The data wasNote that a correct solution is always found; this is subtracted from the 27http://www.smi.stanford.edu/projects/helix/psb98/liang.pdfWhen a gene isassigned a k-input rule, one of k-input rules is selected for the gene at random fromall eligible rules. In the case of Fig. 7, all the rules that truly depend on There are 2 such one-input rules, 10 such two-input rules andassigned to genes for all 150=3, all the rules solution) when the number of state transition reaches 100. For one-input and two-input rules, the perfect solution is =60 respectively for all the genes in 150 networks. When S is smallerthan these limit values, some genes are allocated more than one set of inputs by thethere is more than one solution. The number of solutions as a function of the number of state transition pairs was discussed in 020406080100 Fig. 7 Convergence of solution in random network. probability of not150 random wirings for a network with 50 network isconstructed with one third each of effective k-input rules. As more exponentially at large S after a relatively flat plateau. =100 for k=3; at =60 for k=2; and at We have shown that REVEAL performs well for networks of low k (number ofFor higher k, the algorithm should be through a)and b) search efficiency of e.g. bytaking maximal advantage of wiring and rule constraints. We Boolean networks are based on the notion that biological networks can berepresented by binary, synchronously updating switching networks. In continuously in time. This can be approximated by asynchronous Boolean (reviewed in Thomas, 1998), or continuous the structure oflogical switching networks (Glass, 1975). The issue of determining the logicalstructure of a continuous network based on knowledge of the transitions wasaddressed in a previous work on oscillating neural networks (Glass andYoung, 1979). The point of REVEAL is to inference on available, mutual information. we concentrated on idealized Boolean networks, mutual information measures can beapplied to sets. Of course, are introduced, flexibility will also be found in thetiming. Since continuous behavior can be approximated by sufficient resolution, REVEAL could be applied to appropriately However, the introduction of multiple states will the number of theoretically possible state transitions; network therefore be carefully when generalizing REVEAL tomultivalued networks. For example, integration of cluster analysis for the of shared (currently applied to large scale gene expression see Michaels et could quickly identify wiring constraints andFinally, as REVEAL or potential successors become more refined, we to consider the data sets that must be generated to allow maximal depth of The algorithm relies on the analysis of state transitions or temporal responses ofparameters!) to or internal changes will be the need to be acquired and How many perturbationswill be necessary to capture many states (if more thanbinary) need to be attributed to biological signaling networks 29http://www.smi.stanford.edu/projects/helix/psb98/liang.pdfthe support of NASA Cooperative NCC2-974. WeGlass, L. (1975). Classification of Biological Networks by Their Qualitative 54:85-107.Young, R. (1979) Structure 179:207-218.(1993) The Origins of Selection inAskenazi M. Somogyi R.(1998) Cluster Analysis Data Visualization of Santa Fe Institute Working Weaver, W. (1963) The Mathematical Theory ofnetwork measure, or why the whole is more than the sum of International Workshop on Information Processing in Cells Somogyi, R., Fuhrman, S., Wuensche A. the Extraction of Genetic Network Proc. Second World Congress of Nonlinear . ElsevierThomas R. (1998) Qualitative Analysis of (1992) The Global Dynamics of Cellular Automata,