/
Bayes Net Syntax and Semantics Bayes Net Syntax and Semantics

Bayes Net Syntax and Semantics - PowerPoint Presentation

hadley
hadley . @hadley
Follow
66 views
Uploaded On 2023-05-19

Bayes Net Syntax and Semantics - PPT Presentation

Bayes Net Syntax A set of nodes one per variable X i A directed acyclic graph A conditional distribution for each node given its parent variables in the graph CPT conditional probability table each row is a distribution for child given values of its parents ID: 997878

bayes variable factor conditional variable bayes conditional factor joint net size jtruefalsetrue0 parents complexity distribution elimination variables sum product

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Bayes Net Syntax and Semantics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Bayes Net Syntax and Semantics

2. Bayes Net SyntaxA set of nodes, one per variable XiA directed, acyclic graphA conditional distribution for each node given its parent variables in the graphCPT (conditional probability table); each row is a distribution for child given values of its parentsBayes net = Topology (graph) + Local Conditional ProbabilitiesGP(C1,1 | G)gyor(1,1)0.010.10.30.59(1,2)0.10.30.50.1(1,3)0.30.50.190.01…P(G)(1,1)(1,2)(1,3)…0.110.110.11…GC1,1C1,2C3,3

3. Example: Alarm NetworkBurglaryEarthquakeAlarmJohn callsMary callsP(B)truefalse0.0010.999BEP(A|B,E)truefalsetruetrue0.950.05truefalse0.940.06falsetrue0.290.71falsefalse0.0010.999AP(J|A)truefalsetrue0.90.1false0.050.95P(E)truefalse0.0020.998AP(M|A)truefalsetrue0.70.3false0.010.99Number of free parameters in each CPT:Parent range sizes d1,…,dkChild range size d Each table row must sum to 1(d-1) Πi di11422

4. General formula for sparse BNsSupposen variablesMaximum range size is dMaximum number of parents is kFull joint distribution has size O(dn)Bayes net has size O(n .dk)Linear scaling with n as long as causal structure is local4

5. Bayes net global semanticsBayes nets encode joint distributions as product of conditional distributions on each variable:P(X1,..,Xn) = i P(Xi | Parents(Xi))

6. P(B)truefalse0.0010.999ExampleP(b,e, a, j, m) =6BEP(A|B,E)truefalsetruetrue0.950.05truefalse0.940.06falsetrue0.290.71falsefalse0.0010.999AP(J|A)truefalsetrue0.90.1false0.050.95P(E)truefalse0.0020.998AP(M|A)truefalsetrue0.70.3false0.010.99P(b) P(e) P(a|b,e) P(j|a) P(m|a) =.001x.998x.94x.1x.3=.000028 BurglaryEarthquakeAlarmJohn callsMary calls

7. Conditional independence in BNsCompare the Bayes net global semantics P(X1,..,Xn) = i P(Xi | Parents(Xi)) with the chain rule identity P(X1,..,Xn) = i P(Xi | X1,…,Xi-1)Assume (without loss of generality) that X1,..,Xn sorted in topological order according to the graph (i.e., parents before children), so Parents(Xi)  X1,…,Xi-1So the Bayes net asserts conditional independences P(Xi | X1,…,Xi-1) = P(Xi | Parents(Xi))To ensure these are valid, choose parents for node Xi that “shield” it from other predecessors

8. Example: BurglaryBurglaryEarthquakeAlarm8BurglaryEarthquakeAlarm???P(B)truefalse0.0010.999BEP(A|B,E)truefalsetruetrue0.950.05truefalse0.940.06falsetrue0.290.71falsefalse0.0010.999P(E)truefalse0.0020.998

9. Example: BurglaryAlarmBurglaryEarthquake9BurglaryEarthquakeAlarm???P(A)truefalseABP(E|A,B)truefalsetruetruetruefalsefalsetruefalsefalseAP(B|A)truefalsetruefalse??

10. Conditional independence semanticsEvery variable is conditionally independent of its non-descendants given its parentsConditional independence semantics <=> global semantics10

11. Markov blanketA variable’s Markov blanket consists of parents, children, children’s other parentsEvery variable is conditionally independent of all other variables given its Markov blanket11

12. SummaryIndependence and conditional independence are important forms of probabilistic knowledgeBayes net encode joint distributions efficiently by taking advantage of conditional independenceGlobal joint probability = product of local conditionalsLocal causality => exponential reduction in total size

13. CS 188: Artificial IntelligenceBayes Nets: Exact InferenceInstructor: Stuart Russell and Dawn Song --- University of California, Berkeley

14. Bayes NetsPart I: RepresentationPart II: Exact inferenceEnumeration (always exponential complexity)Variable elimination (worst-case exponential complexity, often better)Inference is NP-hard in generalPart III: Approximate InferenceLater: Learning Bayes nets from data

15. Examples:Posterior marginal probabilityP(Q|e1,..,ek) E.g., what disease might I have?Most likely explanation:argmaxq,r,s P(Q=q,R=r,S=s|e1,..,ek)E.g., what did he say? InferenceInference: calculating some useful quantity from a probability model (joint probability distribution)

16. Inference by Enumeration in Bayes NetReminder of inference by enumeration:Any probability of interest can be computed by summing entries from the joint distribution: P(Q | e) =  h P(Q , h, e)Entries from the joint distribution can be obtained from a BN by multiplying the corresponding conditional probabilitiesP(B | j, m) = α e,a P(B, e, a, j, m) = α e,a P(B) P(e) P(a|B,e) P(j|a) P(m|a)So inference in Bayes nets means computing sums of products of numbers: sounds easy!!Problem: sums of exponentially many products!BEAMJ

17. Can we do better?Consider uwy + uwz + uxy + uxz + vwy + vwz + vxy +vxz 16 multiplies, 7 addsLots of repeated subexpressions!Rewrite as (u+v)(w+x)(y+z)2 multiplies, 3 addse,a P(B) P(e) P(a|B,e) P(j|a) P(m|a)= P(B)P(e)P(a|B,e)P(j|a)P(m|a) + P(B)P(e)P(a|B,e)P(j|a)P(m|a) + P(B)P(e)P(a|B,e)P(j|a)P(m|a) + P(B)P(e)P(a|B,e)P(j|a)P(m|a) Lots of repeated subexpressions!17

18. Variable elimination: The basic ideasMove summations inwards as far as possibleP(B | j, m) = α e,a P(B) P(e) P(a|B,e) P(j|a) P(m|a) = α P(B) e P(e) a P(a|B,e) P(j|a) P(m|a)Do the calculation from the inside outI.e., sum over a first, then sum over eProblem: P(a|B,e) isn’t a single number, it’s a bunch of different numbers depending on the values of B and eSolution: use arrays of numbers (of various dimensions) with appropriate operations on them; these are called factors18

19. Factor Zoo

20. Factor Zoo IJoint distribution: P(X,Y)Entries P(x,y) for all x, y|X|x|Y| matrixSums to 1Projected joint: P(x,Y)A slice of the joint distributionEntries P(x,y) for one x, all y|Y|-element vectorSums to P(x)A \ Jtruefalsetrue0.090.01false0.0450.855P(A,J)P(a,J)Number of variables (capitals) = dimensionality of the tableA \ Jtruefalsetrue0.090.01

21. Factor Zoo IISingle conditional: P(Y | x)Entries P(y | x) for fixed x, all ySums to 1Family of conditionals: P(X |Y)Multiple conditionalsEntries P(x | y) for all x, ySums to |Y|A \ Jtruefalsetrue0.90.1P(J|a)A \ Jtruefalsetrue0.90.1false0.050.95P(J|A)} - P(J|a)} - P(J|a)

22. Operation 1: Pointwise productFirst basic operation: pointwise product of factors (similar to a database join, not matrix multiply!)New factor has union of variables of the two original factorsEach entry is the product of the corresponding entries from the original factorsExample: P(J|A) x P(A) = P(A,J)P(J|A)P(A)P(A,J)A \ Jtruefalsetrue0.090.01false0.0450.855A \ Jtruefalsetrue0.90.1false0.050.95true0.1false0.9x=

23. Example: Making larger factorsExample: P(A,J) x P(A,M) = P(A,J,M)P(A,J)A \ Jtruefalsetrue0.090.01false0.0450.855x=P(A,M)A \ Mtruefalsetrue0.070.03false0.0090.891A=trueA=falseP(A,J,M)

24. Example: Making larger factorsExample: P(U,V) x P(V,W) x P(W,X) = P(U,V,W,X)Sizes: [10,10] x [10,10] x [10,10] = [10,10,10,10] I.e., 300 numbers blows up to 10,000 numbers!Factor blowup can make VE very expensive

25. Operation 2: Summing out a variableSecond basic operation: summing out (or eliminating) a variable from a factorShrinks a factor to a smaller oneExample: j P(A,J) = P(A,j) + P(A,j) = P(A) A \ Jtruefalsetrue0.090.01false0.0450.855true0.1false0.9P(A)P(A,J)Sum out J

26. Summing out from a product of factorsProject the factors each way first, then sum the productsExample: a P(a|B,e) x P(j|a) x P(m|a) = P(a|B,e) x P(j|a) x P(m|a) + P(a|B,e) x P(j|a) x P(m|a)

27. Variable Elimination

28. Variable EliminationQuery: P(Q|E1=e1,.., Ek=ek) Start with initial factors:Local CPTs (but instantiated by evidence)While there are still hidden variables (not Q or evidence):Pick a hidden variable HjEliminate (sum out) Hj from the product of all factors mentioning HjJoin all remaining factors and normalizeX α

29. ExampleChoose AP(B) P(E) P(A|B,E) P(j|A) P(m|A)Query P(B | j,m) P(A|B,E) P(j|A)P(m|A)P(j,m|B,E)P(B) P(E) P(j,m|B,E)

30. ExampleNormalizeChoose EP(E) P(j,m|B,E)P(j,m|B)P(B) P(E) P(j,m|B,E)Finish with BP(B) P(j,m|B)P(j,m,B)P(B) P(j,m|B)P(B | j,m)

31. Order mattersOrder the terms Z, A, B C, DP(D) = α z,a,b,c P(z) P(a|z) P(b|z) P(c|z) P(D|z) = α z P(z) a P(a|z) b P(b|z) c P(c|z) P(D|z)Largest factor has 2 variables (D,Z)Order the terms A, B C, D, ZP(D) = α a,b,c,z P(a|z) P(b|z) P(c|z) P(D|z) P(z) = α a b c z P(a|z) P(b|z) P(c|z) P(D|z) P(z)Largest factor has 4 variables (A,B,C,D)In general, with n leaves, factor of size 2nDZABC

32. VE: Computational and Space ComplexityThe computational and space complexity of variable elimination is determined by the largest factor (and it’s space that kills you)The elimination ordering can greatly affect the size of the largest factor. E.g., previous slide’s example 2n vs. 2Does there always exist an ordering that only results in small factors?No!

33. Worst Case Complexity? Reduction from SATVariables: W, X, Y, ZCNF clauses:C1 = W v X v YC2 = Y v Z v  WC3 = X v Y v ZSentence S = C1  C2  C3P(S) > 0 iff S is satisfiable=> NP-hardP(S) = K x 0.5n where K is the number of satisfying assignments for clauses=> #P-hardSC1C2C3  WXYZ0.50.50.50.5

34. PolytreesA polytree is a directed graph with no undirected cyclesFor poly-trees the complexity of variable elimination is linear in the network size if you eliminate from the leave towards the roots

35. Bayes NetsPart I: RepresentationPart II: Exact inferenceEnumeration (always exponential complexity)Variable elimination (worst-case exponential complexity, often better)Inference is NP-hard in generalPart III: Approximate InferenceLater: Learning Bayes nets from data