/
Efficient Inference Methods for Probabilistic Logical Models Efficient Inference Methods for Probabilistic Logical Models

Efficient Inference Methods for Probabilistic Logical Models - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
344 views
Uploaded On 2018-11-07

Efficient Inference Methods for Probabilistic Logical Models - PPT Presentation

Sriraam Natarajan Dept of Computer Science University of WisconsinMadison TakeAway Message Inference in SRL Models is very hard This talk Presents 3 different yet related ID: 719313

belief propagation lifted town propagation belief town lifted earthquake prof alarm house anytime inference gradstudent item missing burglary models

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Efficient Inference Methods for Probabil..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Efficient Inference Methods for Probabilistic Logical Models

Sriraam Natarajan

Dept of

Computer Science,

University

of

Wisconsin-MadisonSlide2

Take-Away Message

Inference

in SRL Models is

very hard

!!!!

This talk – Presents

3 different yet related inference methods

The methods are independent of the underlying formalism

They have been applied to different kinds of problemsSlide3

The World is inherently Uncertain

Graphical Models (here e.g. a Bayesian network) - Model uncertainty explicitly by representing the joint distribution

Fever

Ache

Influenza

Random Variables

Direct Influences

Propositional Model!Slide4

Real-World Data (Dramatically Simplified)

PatientID

Gender

Birthdate

P1 M 3/22/63

PatientID

Date Physician Symptoms Diagnosis

P1 1/1/01 Smith palpitations hypoglycemic

P1 2/1/03 Jones fever, aches influenza

PatientID

Date Lab Test Result

P1 1/1/01 blood glucose 42

P1 1/9/01 blood glucose 45

PatientID

SNP1 SNP2 … SNP500K

P1 AA AB BB

P2 AB BB AA

PatientID

Date Prescribed Date Filled Physician Medication Dose Duration

P1 5/17/98 5/18/98 Jones

prilosec

10mg 3 months

Non-

i.i.d

Multi-Relational

Solution: First-Order Logic / Relational Databases

Shared ParametersSlide5

Logic + Probability = Probabilistic Logic aka Statistical Relational Learning Models

Logic

Probabilities

Add Probabilities

Add Relations

Statistical Relational Learning (SRL)

Uncertainty in SRL Models is captured by probabilities, weights or potential functionsSlide6

Alphabetic Soup => Endless Possibilities

Web data (

web

)

Biological data (

bio

)

Social Network Analysis (soc

)

Bibliographic data (

cite

)

Epidimiological

data (

epi

)

Communication data (

comm

)

Customer networks (

cust

)

Collaborative filtering problems (

cf

)

Trust networks (

trust

)

Fall 2003–

Dietterich

@ OSU, Spring 2004 –Page @ UW, Spring 2007-Neville @ Purdue, Fall 2008 – Pedro

@ CMU

Probabilistic Relational Models

(PRM)

Bayesian Logic Programs (BLP)

PRISM

Stochastic

Logic Programs (SLP)

Independent

Choice Logic (ICL)

Markov

Logic Networks (MLN)

Relational

Markov Nets (RMN)

CLP-BN

Relational

Bayes

Nets (RBN)

Probabilistic

Logic

Progam

(PLP)

ProbLog

….Slide7

Key Problem - Inference

Equivalent to counting 3SAT Models =>

#P-complete

More pronounced in SRL Models

Prohibitively large number of Objects and Relations

Inference has been the biggest bottleneck for the use of SRL Models in practiceSlide8

Grounding / Propositionalization

Difficulty(C,D), Grade(S,C,G) :- Satisfaction(S)

1 student s1, 10 Courses

Diff(c1,d1)

Diff(c2,d1)

Diff(c8,d2)

Diff(c3,d2)

Diff(c9,d4)

Diff(c7,d2)

Diff(c4,d4)

Diff(c6,d3)

Diff(c5,d1)

Diff(c10,d2)

Grade(s1,c2,A)

Grade(s1,c3,B)

Grade(s1,c4,A)

Grade(s1,c1,B)

Grade(s1,c10,A)

Grade(s1,c9,A)

Grade(s1,c8,A)

Grade(s1,c7,A)

Grade(s1,c6,B)

Grade(s1,c5,A)

Satisfaction(S)Slide9

Realistic Example – Gene-fold Prediction

Thanks to Irene OngSlide10

Recent Advances in SRL Inference

Preprocessing for Inference FROG – Shavlik & Natarajan (2009)Lifted Exact Inference Lifted Variable Elimination – Poole (2003), Braz et al(2005)

Milch et al (2008)Lifted VE + Aggregation – Kisynski & Poole (2009)Sampling Methods MCMC techniques – Milch

& Russell (2006)

Logical Particle Filter – Natarajan et al (2008),

ZettleMoyer

et al (2007) Lazy Inference – Poon et al (2008)Approximate Methods

Lifted First-Order Belief Propagation – Singla & Domingos (2008)Counting Belief Propagation – Kersting et al (2009) MAP Inference – Riedel (2008)Bounds PropagationAnytime Belief Propagation – Braz et al (2009)Slide11

Fast Reduction of Grounded MLNs Counting Belief Propagation Anytime Lifted Belief Propagation ConclusionSlide12

Fast Reduction of Grounded MLNs Counting Belief Propagation

Anytime Lifted Belief Propagation ConclusionSlide13

Markov Logic Networks

Weighted logic

Standard approach 1) Assume

finite

number of constants

2) Create

all possible groundings 3) Perform statistical inference (often via sampling

)

Weight of formula

i

No. of true groundings of formula

i

in

x

(Richardson &

Domingos

, MLJ 2005)Slide14

Counting Satisfied GroundingsTypically lots of redundancy in FOL sentences

 x, y, z p(x)

⋀ q(x, y, z) ⋀ r(z)

 w(x, y, z)

If p(John) = false,

then formula = true

for all

Y and Z valuesSlide15

e

Bi

e

B1

+ … + e

Bn

Let

A = weighted sum of formula satisfied by evidence Let Bi = weighted sum of formula in world

i

not

satisfied by

evidence

Prob

(world

i

) =

e

A

+

Bi

e

A

+

B1

+ … + e

A

+

Bn

Factoring Out the EvidenceSlide16

Take-Away Message - I

Efficiently

factor out

those formula groundings that

evidence

satisfies

Can potentially eliminate the need for approximate inferenceSlide17

Worked Example

 x, y, z

GradStudent

(x)

Prof(y)

Prof(z) ⋀ TA(x, z) ⋀

SameGroup

(y, z)

AdvisedBy

(x, y)

10,000 People at some school

2000 Graduate students

1000 Professors

1000 TAs

500 Pairs of professors in the same group

Total Num of Groundings = |x|

|y|

|z| = 10

12

10

12

The EvidenceSlide18

1012

¬

GradStudent(P2)

¬

GradStudent

(P4)

2 × 1011

GradStudent

(x)

GradStudent

(P1)

¬

GradStudent

(P2)

GradStudent

(P3)

True

False

GradStudent

(P1)

GradStudent

(P3)

2000

Grad Students

8000

Others

All these values for

X

satisfy the clause,

regardless

of

Y

and

Z

GradStudent

(x)

Prof(y)

Prof(z)

TA(

x,z

)

SameGroup(y,z

) 

AdvisedBy

(x,y

)

FROG keeps only these

X

values

Instead of 10

4 values for X,

have 2 x 10

3Slide19

2 × 1011

2 × 10

10

Prof(y)

¬ Prof(P1)

Prof(P2)

Prof(P2)

1000

Professors

¬ Prof(P1)

9000

Others

GradStudent

(x)

Prof(y)

Prof(z)

TA(

x,z

)

SameGroup

(

y,z

)

AdvisedBy

(

x,y

)

True

FalseSlide20

2 × 1010

2 × 10

9

GradStudent

(x)

Prof(y) ⋀ Prof(z) ⋀

TA(x,z

)

SameGroup

(

y,z

)

AdvisedBy

(

x,y

)

<<< Same as Prof(y) >>>Slide21

2 × 109

2 × 10

6

SameGroup

(y, z)

10

6

Combinations

SameGroup

(P1, P2)

1000 true

SameGroup’s

¬

SameGroup

(P2

, P5)

10

6

– 1000

Others

GradStudent

(x)

Prof(y)

Prof(z)

TA(

x,z

)

SameGroup

(

y,z

)

AdvisedBy

(

x,y

)

True

False

2

000 values of X

1000 Y:Z combinationsSlide22

TA(x, z)

2 × 10

6 Combinations

TA(P7,P5)

1000

TA’s

¬ TA(P8,P4)

2 × 10

6

– 1000

Others

10

6

GradStudent

(x)

Prof(y)

Prof(z)

TA(

x,z

)

SameGroup

(

y,z

)

AdvisedBy

(

x,y

)

True

False

≤ 1

000 values of X

1000 Y:Z combinationsSlide23

Original number of groundings =

1012

10

12

10

6

GradStudent

(x)

Prof(y)

Prof(z)

TA(

x,z

)

SameGroup

(

y,z)

 AdvisedBy

(

x,y

)

Final number of groundings

10

6Slide24

Sample Results: UWash-CSE

FROG’s Reduced Net without One Challenging Rule

FROG’s Reduced Net

Fully Grounded Net

advisedBy

(

x,y

) 

advisedBy(x,z

)

samePerson

(

y,z

)

)Slide25

Fast Reduction of Grounded MLNs Counting Belief Propagation

Anytime Lifted Belief Propagation ConclusionSlide26

Belief Propagation Message passing algorithm – Inference on graphical models

For factor graphsExact – if the factor graph is a tree Approximate when it has cyclesLoopy BP does not guarantee convergence, but is found to be very useful in practice

X

3

X

2

X

1

f

1

f

2Slide27

Belief Propagation

Identical FactorsSlide28

Take-Away Message – II

Counting

shared

factors can result in great efficiency gains for (loopy) belief propagationSlide29

Counting Belief PropagationTwo

StepsCompress Factor Graph

Run modified BPSlide30

Step 1: CompressionSlide31

Step 2: Modified Belief PropagationSlide32

Factored Frontier (FF)Probabilistic

inference over time is

central to many AI problemsIn contrast to static domains, we need approximation

Variables easily become correlated over time by virtue of sharing common influences in the past

Factored

Frontier [Murphy and Weiss 01]

Unroll DBNRun (loopy) BP

Lifted First-Order FF: Use CBP in place of BPSlide33

Lifted First-order Factored Frontier

20 people over 10 time steps Max number of friends 5 Cancer never observed

Time step randomly selected

Successor fluentSlide34

Fast Reduction of Grounded MLNs Counting Belief Propagation

Anytime Lifted Belief Propagation ConclusionSlide35

The Need for ShatteringLifted BP depends on clusters of variables being symmetric, that is, sending and receiving

identical messagesIn other words, it is about dividing random variables in cases –

called as “shattering”Slide36

Intuition for Anytime Lifted BP

alarm(House

)

earthquake(

Town

)

in(

House, Town)

burglary(

House

)

next(

House

,

Another

)

lives(

Another

,

Neighbor

)

saw(

Neighbor

,

Someone

)

masked(

Someone

)

in(

House

,

Item

)

missing(

Item

)

partOf(

Entrance

,

House

)

broken(

Entrance

)

Alarm can go off due to an earthquake

Alarm can go off due to burglary

A “prior” factor makes alarm going off unlikely without those causesSlide37

Intuition for Anytime Lifted BP

Givena home in sf

with home2 and home3 next to it with neighbors jim

and

mary

,

each seeing person1 and person2,

several items in home, including a missing ring and non-missing cash,broken front but not broken back

entrances to home,an earthquake in

sf

,

what is the probability that home’s

alarm

goes off?

alarm(

House

)

earthquake(

Town

)

in(

House

,

Town

)

burglary(

House

)

next(

House

,

Another

)

lives(

Another

,

Neighbor

)

saw(

Neighbor

,

Someone

)

masked(

Someone

)

in(House,

Item)

missing(

Item)

partOf(

Entrance,House

)

broken(Entrance)Slide38

Lifted Belief Propagation

alarm(home

)

burglary(

home

)

earthquake(

sf)

in(home,

sf

)

partOf(

front

,

home

)

broken(

front

)

next(

home

,

home2

)

lives(

home2

,

jim

)

saw(

jim

,

person1

)

masked(

person1

)

in(

home

,

ring

)

missing(

ring

)

partOf(

back

,

home

)

broken(

back

)

in(

home

,

cash

)

missing(

cash

)

Item not in {

ring,cash

,…}

in(

home

,

Item

)

missing(

Item

)

next(

home

,

home3

)

lives(

home2

,

mary

)

saw(

mary

,

person2

)

masked(

person2

)

Complete shattering before belief propagation starts

Message passing over entire model before obtaining query answer

Model for

house ≠ home

and

town ≠

sf

not shownSlide39

Intuition for Anytime Lifted BP

alarm(home

)

burglary(

home

)

earthquake(

sf)

in(home,

sf

)

partOf(

front

,

home

)

broken(

front

)

next(

home

,

home2

)

lives(

home2

,

jim

)

saw(

jim

,

person1

)

masked(

person1

)

in(

home

,

ring

)

missing(

ring

)

partOf(

back

,

home

)

broken(

back

)

in(

home

,

cash

)

missing(

cash

)

Item not in {

ring,cash

,…}

in(

home

,

Item

)

missing(

Item

)

next(

home

,

home3

)

lives(

home2

,

mary

)

saw(

mary

,

person2

)

masked(

person2

)

Query

Evidence

Given earthquake, we already have a good lower bound, regardless of burglary branch

Wasted shattering!

Wasted shattering!

Wasted shattering!

Wasted shattering!

Wasted shattering!Slide40

Using only a portion of a modelBy using only a portion, we don’t have to shatter other parts of the modelHow can we use only a portion?A solution for propositional models already exists: box propagation

(Mooij

& Kappen NIPS ‘08)Slide41

Box PropagationA way of getting bounds on query without examining entire network.

A

[0, 1]Slide42

Box PropagationA way of getting bounds on query without examining entire network.

A

B

f

1

[0, 1]

[0.36, 0.67]Slide43

Box Propagation

A way of getting bounds on query without examining entire network.

A

B

f

1

[0.05, 0.5]

[0.38, 0.50]

f

2

...

f

3

...

[0.32, 0.4]

[0.1, 0.6]

[0,1]

[0,1]Slide44

Box PropagationA way of getting bounds on query without examining entire network.

A

B

f

1

[0.17, 0.3]

[0.41, 0.44]

f

2

...

f

3

...

[0.32, 0.4]

[0.3, 0.4]

[0.2,0.8]

[0,1]Slide45

Box PropagationA way of getting bounds on query without examining entire network.

A

B

f

1

0.21

0.42

f

2

...

f

3

...

0.36

0.32

0.45

0.3

Convergence after all messages are collectedSlide46

Take-Away Message - III

Anytime BP =

Incremental Shattering

+

Box PropagationSlide47

Anytime Lifted Belief Propagation

alarm(

home)

Start from query alone

[0,1]

The algorithm works by picking a cluster variable and including the factors in its blanketSlide48

burglary(

home)

Anytime Lifted Belief Propagation

alarm(

home

)

earthquake(

Town

)

in(

home

,

Town

)

[0.1, 0.9]

(alarm(

home

), in(

home

,

Town

), earthquake(

Town

))

after unifying alarm(

home

) and alarm(

House

) in

(alarm(

House

), in(

House

,

Town

), earthquake(

Town

))

producing constraint

House

=

home

Again, through unification

Blanket factors alone can determine a bound on

querySlide49

Anytime Lifted Belief Propagation

alarm(

home)

earthquake(

sf

)

in(

home,

sf)

earthquake(

Town

)

in(

home

,

Town

)

Town

sf

(in(

home

,

sf

))

burglary(

home

)

Cluster in(

home

,

Town

) unifies with in(

home

,

sf

) in

(in(

home

,

sf

))

(which represents evidence)

splitting cluster around

Town

=

sf

[0.1, 0.9]

Bound remains the same because we still haven’t considered evidence on earthquakesSlide50

Anytime Lifted Belief Propagation

alarm(

home)

earthquake(

sf

)

in(

home,

sf)

earthquake(

Town

)

in(

home

,

Town

)

Town

sf

burglary(

home

)

[0.8, 0.9]

No need to further expand (and shatter) other branches

If bound is good enough, there is no need to further expand (and shatter) other branches

(

earthquake(

sf

)

) represents the evidence that there was an earthquake

Now query bound becomes narrowSlide51

Anytime Lifted Belief Propagation

alarm(

home)

earthquake(

sf

)

in(

home,

sf)

earthquake(

Town

)

in(

home

,

Town

)

burglary(

home

)

[0.85, 0.9]

partOf(

front

,

home

)

broken(

front

)

Now query bound becomes narrow

We can keep expanding at will for narrower bounds…

Town

sfSlide52

Anytime Lifted Belief Propagation

alarm(

home)

burglary(

home

)

earthquake(

sf)

in(home,

sf

)

partOf(

front

,

home

)

broken(

front

)

next(

home

,

home2

)

lives(

home2

,

jim

)

saw(

jim

,

person1

)

masked(

person1

)

in(

home

,

ring

)

missing(

ring

)

partOf(

back

,

home

)

broken(

back

)

in(

home

,

cash

)

missing(

cash

)

Item not in {

ring,cash

,…

}

in(

home

,

Item

)

missing(

Item

)

next(

home

,

home3

)

lives(

home2

,

mary

)

saw(

mary

,

person2

)

masked(

person2

)

… until convergence,

if desired.

0.8725Slide53

Connection to Resolution RefutationIncremental shattering corresponds to building a proof tree

alarm(home)

earthquake(

sf

)

in(home,

sf

)

earthquake(L), L not in {

sf

}

in(

home,L

), L

not in

{

sf

}

burglary(home)

true

…Slide54

Fast Reduction of Grounded MLNs

Counting Belief Propagation Anytime Lifted Belief Propagation

ConclusionSlide55

Conclusion Inference is the key issue in several SRL formalisms

FROG - Keeps the count of unsatisfied groundings Order of Magnitude

reduction in number of groundingsCompares favorably to Alchemy in different domainsCounting BP -

BP + grouping nodes sending and receiving identical messages

Conceptually

easy

, scaleable BP algorithmApplications to

challenging AI tasksAnytime BP – Incremental Shattering + Box PropagationOnly the most necessary fraction of model considered and shatteredStatus – Implementation and evaluation Slide56

Conclusion Algorithms are independent of representation

Variety of ApplicationsParameter Learning of Relational ModelsSocial NetworksObject Recognition

Link PredictionActivity RecognitionModel CountingBio-Medical ApplicationsRelational RLSlide57

Future Work

FROGCombine with Lifted InferenceExploit commonality

across rulesCBP Integrate with Parameter Learning in SRL Models

Extend to M

ulti-Agent RL, Lifted Pairwise BP

Anytime

BP Heuristic to expand the network

Understand closer connections to ResolutionSRL ModelsLearning Dynamic SRL ModelsStructure Learning remains an open issueSlide58

Acknowledgements*Babak Ahmadi -

Fraunhofer InstituteRodrigo de Salvo Braz – SRI International

Hung Bui – SRI InternationalVitor Santos Costa – U PortoKristian Kersting -

Fraunhofer

Institute

Gautam Kunapuli

– UW MadisonDavid Page – UW Madison

Stuart Russell – UC BerkeleyJude Shavlik – UW MadisonPrasad Tadepalli – Oregon State University* Ordered by Last nameSlide59

Thanks!