Sriraam Natarajan Dept of Computer Science University of WisconsinMadison TakeAway Message Inference in SRL Models is very hard This talk Presents 3 different yet related ID: 719313
Download Presentation The PPT/PDF document "Efficient Inference Methods for Probabil..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Efficient Inference Methods for Probabilistic Logical Models
Sriraam Natarajan
Dept of
Computer Science,
University
of
Wisconsin-MadisonSlide2
Take-Away Message
Inference
in SRL Models is
very hard
!!!!
This talk – Presents
3 different yet related inference methods
The methods are independent of the underlying formalism
They have been applied to different kinds of problemsSlide3
The World is inherently Uncertain
Graphical Models (here e.g. a Bayesian network) - Model uncertainty explicitly by representing the joint distribution
Fever
Ache
Influenza
Random Variables
Direct Influences
Propositional Model!Slide4
Real-World Data (Dramatically Simplified)
PatientID
Gender
Birthdate
P1 M 3/22/63
PatientID
Date Physician Symptoms Diagnosis
P1 1/1/01 Smith palpitations hypoglycemic
P1 2/1/03 Jones fever, aches influenza
PatientID
Date Lab Test Result
P1 1/1/01 blood glucose 42
P1 1/9/01 blood glucose 45
PatientID
SNP1 SNP2 … SNP500K
P1 AA AB BB
P2 AB BB AA
PatientID
Date Prescribed Date Filled Physician Medication Dose Duration
P1 5/17/98 5/18/98 Jones
prilosec
10mg 3 months
Non-
i.i.d
Multi-Relational
Solution: First-Order Logic / Relational Databases
Shared ParametersSlide5
Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models
Logic
Probabilities
Add Probabilities
Add Relations
Statistical Relational Learning (SRL)
Uncertainty in SRL Models is captured by probabilities, weights or potential functionsSlide6
Alphabetic Soup => Endless Possibilities
Web data (
web
)
Biological data (
bio
)
Social Network Analysis (soc
)
Bibliographic data (
cite
)
Epidimiological
data (
epi
)
Communication data (
comm
)
Customer networks (
cust
)
Collaborative filtering problems (
cf
)
Trust networks (
trust
)
…
Fall 2003–
Dietterich
@ OSU, Spring 2004 –Page @ UW, Spring 2007-Neville @ Purdue, Fall 2008 – Pedro
@ CMU
Probabilistic Relational Models
(PRM)
Bayesian Logic Programs (BLP)
PRISM
Stochastic
Logic Programs (SLP)
Independent
Choice Logic (ICL)
Markov
Logic Networks (MLN)
Relational
Markov Nets (RMN)
CLP-BN
Relational
Bayes
Nets (RBN)
Probabilistic
Logic
Progam
(PLP)
ProbLog
….Slide7
Key Problem - Inference
Equivalent to counting 3SAT Models =>
#P-complete
More pronounced in SRL Models
Prohibitively large number of Objects and Relations
Inference has been the biggest bottleneck for the use of SRL Models in practiceSlide8
Grounding / Propositionalization
Difficulty(C,D), Grade(S,C,G) :- Satisfaction(S)
1 student s1, 10 Courses
Diff(c1,d1)
Diff(c2,d1)
Diff(c8,d2)
Diff(c3,d2)
Diff(c9,d4)
Diff(c7,d2)
Diff(c4,d4)
Diff(c6,d3)
Diff(c5,d1)
Diff(c10,d2)
Grade(s1,c2,A)
Grade(s1,c3,B)
Grade(s1,c4,A)
Grade(s1,c1,B)
Grade(s1,c10,A)
Grade(s1,c9,A)
Grade(s1,c8,A)
Grade(s1,c7,A)
Grade(s1,c6,B)
Grade(s1,c5,A)
Satisfaction(S)Slide9
Realistic Example – Gene-fold Prediction
Thanks to Irene OngSlide10
Recent Advances in SRL Inference
Preprocessing for Inference FROG – Shavlik & Natarajan (2009)Lifted Exact Inference Lifted Variable Elimination – Poole (2003), Braz et al(2005)
Milch et al (2008)Lifted VE + Aggregation – Kisynski & Poole (2009)Sampling Methods MCMC techniques – Milch
& Russell (2006)
Logical Particle Filter – Natarajan et al (2008),
ZettleMoyer
et al (2007) Lazy Inference – Poon et al (2008)Approximate Methods
Lifted First-Order Belief Propagation – Singla & Domingos (2008)Counting Belief Propagation – Kersting et al (2009) MAP Inference – Riedel (2008)Bounds PropagationAnytime Belief Propagation – Braz et al (2009)Slide11
Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation ConclusionSlide12
Fast Reduction of Grounded MLNs Counting Belief Propagation
Anytime Lifted Belief Propagation ConclusionSlide13
Markov Logic Networks
Weighted logic
Standard approach 1) Assume
finite
number of constants
2) Create
all possible groundings 3) Perform statistical inference (often via sampling
)
Weight of formula
i
No. of true groundings of formula
i
in
x
(Richardson &
Domingos
, MLJ 2005)Slide14
Counting Satisfied GroundingsTypically lots of redundancy in FOL sentences
x, y, z p(x)
⋀ q(x, y, z) ⋀ r(z)
w(x, y, z)
If p(John) = false,
then formula = true
for all
Y and Z valuesSlide15
e
Bi
e
B1
+ … + e
Bn
Let
A = weighted sum of formula satisfied by evidence Let Bi = weighted sum of formula in world
i
not
satisfied by
evidence
Prob
(world
i
) =
e
A
+
Bi
e
A
+
B1
+ … + e
A
+
Bn
Factoring Out the EvidenceSlide16
Take-Away Message - I
Efficiently
factor out
those formula groundings that
evidence
satisfies
Can potentially eliminate the need for approximate inferenceSlide17
Worked Example
x, y, z
GradStudent
(x)
⋀
Prof(y)
⋀
Prof(z) ⋀ TA(x, z) ⋀
SameGroup
(y, z)
AdvisedBy
(x, y)
10,000 People at some school
2000 Graduate students
1000 Professors
1000 TAs
500 Pairs of professors in the same group
Total Num of Groundings = |x|
|y|
|z| = 10
12
10
12
The EvidenceSlide18
1012
¬
GradStudent(P2)
¬
GradStudent
(P4)
…
2 × 1011
GradStudent
(x)
GradStudent
(P1)
¬
GradStudent
(P2)
GradStudent
(P3)
…
True
False
GradStudent
(P1)
GradStudent
(P3)
…
2000
Grad Students
8000
Others
All these values for
X
satisfy the clause,
regardless
of
Y
and
Z
GradStudent
(x)
⋀
Prof(y)
⋀
Prof(z)
⋀
TA(
x,z
)
⋀
SameGroup(y,z
)
AdvisedBy
(x,y
)
FROG keeps only these
X
values
Instead of 10
4 values for X,
have 2 x 10
3Slide19
2 × 1011
2 × 10
10
Prof(y)
¬ Prof(P1)
Prof(P2)
…
Prof(P2)
…
1000
Professors
¬ Prof(P1)
…
9000
Others
GradStudent
(x)
⋀
Prof(y)
⋀
Prof(z)
⋀
TA(
x,z
)
⋀
SameGroup
(
y,z
)
AdvisedBy
(
x,y
)
True
FalseSlide20
2 × 1010
2 × 10
9
GradStudent
(x)
⋀
Prof(y) ⋀ Prof(z) ⋀
TA(x,z
)
⋀
SameGroup
(
y,z
)
AdvisedBy
(
x,y
)
<<< Same as Prof(y) >>>Slide21
2 × 109
2 × 10
6
SameGroup
(y, z)
10
6
Combinations
SameGroup
(P1, P2)
…
1000 true
SameGroup’s
¬
SameGroup
(P2
, P5)
…
10
6
– 1000
Others
GradStudent
(x)
⋀
Prof(y)
⋀
Prof(z)
⋀
TA(
x,z
)
⋀
SameGroup
(
y,z
)
AdvisedBy
(
x,y
)
True
False
2
000 values of X
1000 Y:Z combinationsSlide22
TA(x, z)
2 × 10
6 Combinations
TA(P7,P5)
…
1000
TA’s
¬ TA(P8,P4)
…
2 × 10
6
– 1000
Others
≤
10
6
GradStudent
(x)
⋀
Prof(y)
⋀
Prof(z)
⋀
TA(
x,z
)
⋀
SameGroup
(
y,z
)
AdvisedBy
(
x,y
)
True
False
≤ 1
000 values of X
≤
1000 Y:Z combinationsSlide23
Original number of groundings =
1012
10
12
10
6
GradStudent
(x)
⋀
Prof(y)
⋀
Prof(z)
⋀
TA(
x,z
)
⋀
SameGroup
(
y,z)
AdvisedBy
(
x,y
)
Final number of groundings
≤
10
6Slide24
Sample Results: UWash-CSE
FROG’s Reduced Net without One Challenging Rule
FROG’s Reduced Net
Fully Grounded Net
advisedBy
(
x,y
)
advisedBy(x,z
)
samePerson
(
y,z
)
)Slide25
Fast Reduction of Grounded MLNs Counting Belief Propagation
Anytime Lifted Belief Propagation ConclusionSlide26
Belief Propagation Message passing algorithm – Inference on graphical models
For factor graphsExact – if the factor graph is a tree Approximate when it has cyclesLoopy BP does not guarantee convergence, but is found to be very useful in practice
X
3
X
2
X
1
f
1
f
2Slide27
Belief Propagation
Identical FactorsSlide28
Take-Away Message – II
Counting
shared
factors can result in great efficiency gains for (loopy) belief propagationSlide29
Counting Belief PropagationTwo
StepsCompress Factor Graph
Run modified BPSlide30
Step 1: CompressionSlide31
Step 2: Modified Belief PropagationSlide32
Factored Frontier (FF)Probabilistic
inference over time is
central to many AI problemsIn contrast to static domains, we need approximation
Variables easily become correlated over time by virtue of sharing common influences in the past
Factored
Frontier [Murphy and Weiss 01]
Unroll DBNRun (loopy) BP
Lifted First-Order FF: Use CBP in place of BPSlide33
Lifted First-order Factored Frontier
20 people over 10 time steps Max number of friends 5 Cancer never observed
Time step randomly selected
Successor fluentSlide34
Fast Reduction of Grounded MLNs Counting Belief Propagation
Anytime Lifted Belief Propagation ConclusionSlide35
The Need for ShatteringLifted BP depends on clusters of variables being symmetric, that is, sending and receiving
identical messagesIn other words, it is about dividing random variables in cases –
called as “shattering”Slide36
Intuition for Anytime Lifted BP
alarm(House
)
earthquake(
Town
)
in(
House, Town)
burglary(
House
)
next(
House
,
Another
)
lives(
Another
,
Neighbor
)
saw(
Neighbor
,
Someone
)
masked(
Someone
)
in(
House
,
Item
)
missing(
Item
)
partOf(
Entrance
,
House
)
broken(
Entrance
)
Alarm can go off due to an earthquake
Alarm can go off due to burglary
A “prior” factor makes alarm going off unlikely without those causesSlide37
Intuition for Anytime Lifted BP
Givena home in sf
with home2 and home3 next to it with neighbors jim
and
mary
,
each seeing person1 and person2,
several items in home, including a missing ring and non-missing cash,broken front but not broken back
entrances to home,an earthquake in
sf
,
what is the probability that home’s
alarm
goes off?
alarm(
House
)
earthquake(
Town
)
in(
House
,
Town
)
burglary(
House
)
next(
House
,
Another
)
lives(
Another
,
Neighbor
)
saw(
Neighbor
,
Someone
)
masked(
Someone
)
in(House,
Item)
missing(
Item)
partOf(
Entrance,House
)
broken(Entrance)Slide38
Lifted Belief Propagation
alarm(home
)
burglary(
home
)
earthquake(
sf)
in(home,
sf
)
partOf(
front
,
home
)
broken(
front
)
next(
home
,
home2
)
lives(
home2
,
jim
)
saw(
jim
,
person1
)
masked(
person1
)
in(
home
,
ring
)
missing(
ring
)
partOf(
back
,
home
)
broken(
back
)
in(
home
,
cash
)
missing(
cash
)
Item not in {
ring,cash
,…}
in(
home
,
Item
)
missing(
Item
)
next(
home
,
home3
)
lives(
home2
,
mary
)
saw(
mary
,
person2
)
masked(
person2
)
…
…
…
Complete shattering before belief propagation starts
Message passing over entire model before obtaining query answer
Model for
house ≠ home
and
town ≠
sf
not shownSlide39
Intuition for Anytime Lifted BP
alarm(home
)
burglary(
home
)
earthquake(
sf)
in(home,
sf
)
partOf(
front
,
home
)
broken(
front
)
next(
home
,
home2
)
lives(
home2
,
jim
)
saw(
jim
,
person1
)
masked(
person1
)
in(
home
,
ring
)
missing(
ring
)
partOf(
back
,
home
)
broken(
back
)
in(
home
,
cash
)
missing(
cash
)
Item not in {
ring,cash
,…}
in(
home
,
Item
)
missing(
Item
)
next(
home
,
home3
)
lives(
home2
,
mary
)
saw(
mary
,
person2
)
masked(
person2
)
…
…
…
Query
Evidence
Given earthquake, we already have a good lower bound, regardless of burglary branch
Wasted shattering!
Wasted shattering!
Wasted shattering!
Wasted shattering!
Wasted shattering!Slide40
Using only a portion of a modelBy using only a portion, we don’t have to shatter other parts of the modelHow can we use only a portion?A solution for propositional models already exists: box propagation
(Mooij
& Kappen NIPS ‘08)Slide41
Box PropagationA way of getting bounds on query without examining entire network.
A
[0, 1]Slide42
Box PropagationA way of getting bounds on query without examining entire network.
A
B
f
1
[0, 1]
[0.36, 0.67]Slide43
Box Propagation
A way of getting bounds on query without examining entire network.
A
B
f
1
[0.05, 0.5]
[0.38, 0.50]
f
2
...
f
3
...
[0.32, 0.4]
[0.1, 0.6]
[0,1]
[0,1]Slide44
Box PropagationA way of getting bounds on query without examining entire network.
A
B
f
1
[0.17, 0.3]
[0.41, 0.44]
f
2
...
f
3
...
[0.32, 0.4]
[0.3, 0.4]
[0.2,0.8]
[0,1]Slide45
Box PropagationA way of getting bounds on query without examining entire network.
A
B
f
1
0.21
0.42
f
2
...
f
3
...
0.36
0.32
0.45
0.3
Convergence after all messages are collectedSlide46
Take-Away Message - III
Anytime BP =
Incremental Shattering
+
Box PropagationSlide47
Anytime Lifted Belief Propagation
alarm(
home)
Start from query alone
[0,1]
The algorithm works by picking a cluster variable and including the factors in its blanketSlide48
burglary(
home)
Anytime Lifted Belief Propagation
alarm(
home
)
earthquake(
Town
)
in(
home
,
Town
)
[0.1, 0.9]
(alarm(
home
), in(
home
,
Town
), earthquake(
Town
))
after unifying alarm(
home
) and alarm(
House
) in
(alarm(
House
), in(
House
,
Town
), earthquake(
Town
))
producing constraint
House
=
home
Again, through unification
Blanket factors alone can determine a bound on
querySlide49
Anytime Lifted Belief Propagation
alarm(
home)
earthquake(
sf
)
in(
home,
sf)
earthquake(
Town
)
in(
home
,
Town
)
Town
≠
sf
(in(
home
,
sf
))
burglary(
home
)
Cluster in(
home
,
Town
) unifies with in(
home
,
sf
) in
(in(
home
,
sf
))
(which represents evidence)
splitting cluster around
Town
=
sf
[0.1, 0.9]
Bound remains the same because we still haven’t considered evidence on earthquakesSlide50
Anytime Lifted Belief Propagation
alarm(
home)
earthquake(
sf
)
in(
home,
sf)
earthquake(
Town
)
in(
home
,
Town
)
Town
≠
sf
burglary(
home
)
[0.8, 0.9]
No need to further expand (and shatter) other branches
If bound is good enough, there is no need to further expand (and shatter) other branches
(
earthquake(
sf
)
) represents the evidence that there was an earthquake
Now query bound becomes narrowSlide51
Anytime Lifted Belief Propagation
alarm(
home)
earthquake(
sf
)
in(
home,
sf)
earthquake(
Town
)
in(
home
,
Town
)
burglary(
home
)
[0.85, 0.9]
partOf(
front
,
home
)
broken(
front
)
Now query bound becomes narrow
We can keep expanding at will for narrower bounds…
Town
≠
sfSlide52
Anytime Lifted Belief Propagation
alarm(
home)
burglary(
home
)
earthquake(
sf)
in(home,
sf
)
partOf(
front
,
home
)
broken(
front
)
next(
home
,
home2
)
lives(
home2
,
jim
)
saw(
jim
,
person1
)
masked(
person1
)
in(
home
,
ring
)
missing(
ring
)
partOf(
back
,
home
)
broken(
back
)
in(
home
,
cash
)
missing(
cash
)
Item not in {
ring,cash
,…
}
in(
home
,
Item
)
missing(
Item
)
next(
home
,
home3
)
lives(
home2
,
mary
)
saw(
mary
,
person2
)
masked(
person2
)
…
…
…
… until convergence,
if desired.
0.8725Slide53
Connection to Resolution RefutationIncremental shattering corresponds to building a proof tree
alarm(home)
earthquake(
sf
)
in(home,
sf
)
earthquake(L), L not in {
sf
}
in(
home,L
), L
not in
{
sf
}
burglary(home)
true
…Slide54
Fast Reduction of Grounded MLNs
Counting Belief Propagation Anytime Lifted Belief Propagation
ConclusionSlide55
Conclusion Inference is the key issue in several SRL formalisms
FROG - Keeps the count of unsatisfied groundings Order of Magnitude
reduction in number of groundingsCompares favorably to Alchemy in different domainsCounting BP -
BP + grouping nodes sending and receiving identical messages
Conceptually
easy
, scaleable BP algorithmApplications to
challenging AI tasksAnytime BP – Incremental Shattering + Box PropagationOnly the most necessary fraction of model considered and shatteredStatus – Implementation and evaluation Slide56
Conclusion Algorithms are independent of representation
Variety of ApplicationsParameter Learning of Relational ModelsSocial NetworksObject Recognition
Link PredictionActivity RecognitionModel CountingBio-Medical ApplicationsRelational RLSlide57
Future Work
FROGCombine with Lifted InferenceExploit commonality
across rulesCBP Integrate with Parameter Learning in SRL Models
Extend to M
ulti-Agent RL, Lifted Pairwise BP
Anytime
BP Heuristic to expand the network
Understand closer connections to ResolutionSRL ModelsLearning Dynamic SRL ModelsStructure Learning remains an open issueSlide58
Acknowledgements*Babak Ahmadi -
Fraunhofer InstituteRodrigo de Salvo Braz – SRI International
Hung Bui – SRI InternationalVitor Santos Costa – U PortoKristian Kersting -
Fraunhofer
Institute
Gautam Kunapuli
– UW MadisonDavid Page – UW Madison
Stuart Russell – UC BerkeleyJude Shavlik – UW MadisonPrasad Tadepalli – Oregon State University* Ordered by Last nameSlide59
Thanks!