/
1 Learning 1 Learning

1 Learning - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
365 views
Uploaded On 2015-11-23

1 Learning - PPT Presentation

Markov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept of Computer Science and Eng University of Washington Seattle USA Joint work with Pedro Domingos 2 Synopsis of LHL ID: 203209

advises teaches tas lhl teaches advises lhl tas cs1 pete sam cs2 paul sara busl cs8 msl cs3 mln

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "1 Learning" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

1

LearningMarkov Logic Network Structure Via Hypergraph Lifting

Stanley KokDept. of Computer Science and Eng.University of WashingtonSeattle, USAJoint work with Pedro DomingosSlide2

2

Synopsis of LHL

Input: Relational DB

Advises

Pete

Sam

Pete

Saul

Paul

Sara

TAs

Sam

CS1

Sam

CS2

Sara

CS1

Teaches

Pete

CS1

Pete

CS2

Paul

CS2

2.7

Teaches

(p, c)

Æ

TAs

(s, c)

)

Advises

(p, s)1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c)-1.1 TAs(s, c) Æ Advises (s, p)…

Output: Probabilistic KB

Input

: Relational DB

Advises

Pete

Sam

Pete

Saul

Paul

Sar

TAs

Sam

CS1

Sam

CS2

Sara

CS1

Teaches

Pete

CS1

Pete

CS2

Paul

CS2

2

.7

Teaches

(p, c)

Æ

TAs(s, c) ) Advises(p, s)1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c)-1.1 TAs(s, c) ) Advises(s, p)…

Output: Probabilistic KB

Sam

Pete

CS1

CS2

CS3

CS4

CS5

CS6

CS7

CS8

Paul

Pat

Phil

Sara

Sau

l

Sue

TAs

Advises

Teaches

Pete

Paul

Pat

Phil

Sam

Sara

Saul

Sue

CS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Teaches

TAs

Advises

Professor

Student

Course

Goal

of LHLSlide3

3

Area under Prec Recall Curve (AUC)

Conditional Log-Likelihood (CLL)LHLBUSL

MSL

LHL

BUSL

M

SL

Experimental

Results

LHL

LHL

BUSL

MSL

BUSL

MSLSlide4

Background

Learning via

Hypergraph

Lifting

Experiments

Future Work

Background

Learning via

Hypergraph

Lifting

Experiments

Future Work

4

OutlineSlide5

5

Markov LogicA logical KB is a set of

hard constraintson the set of possible worldsLet’s make them soft constraints:When a world violates a formula,it becomes less probable, not impossibleGive each formula a weight(Higher weight  Stronger constraint)Slide6

6

Markov LogicA Markov logic network (MLN) is a set of pairs (

F,w)F is a formula in first-order logicw is a real number

v

ector

of

truth assignments

to

ground atoms

p

artition function

weight of

i

th

formula

#

true groundings

of

ith formulaSlide7

Challenging task Few approaches to date

[Kok & Domingos, ICML’05

; Mihalkova & Mooney, ICML’07; Biba et. al. ECAI’08; Huynh & Mooney, ICML’08]Most MLN structure learnersGreedily and systematically enumerate formulasComputationally expensive; large search spaceSusceptible to local optima

7

MLN Structure LearningSlide8

While

beam not empty Add unit clauses to beam

While beam has changed For each clause c in beam

c’ Ã add a literal to

c

newClauses

Ã

newClauses

[

c’

beam

à k

best clauses in beam [ newClauses Add best clause in beam to MLN

8

MSL [Kok & Domingos, ICML’05]Slide9

Find paths of linked ground atoms

!formulasPath

´ conjunction that is true at least onceExponential search space of pathsRestricted to short paths9Relational Pathfinding

[Richards & Mooney, AAAI’92]

Sam

Pete

CS1

CS2

CS3

CS4

CS5

CS6

CS7

CS8

Paul

Pat

Phil

Sara

Saul

Sue

Teaches

TAs

Advises

Pete

CS1

Sam

Advises

(Pete, Sam)

Æ

Teaches

(Pete

,

CS1)

Æ

TAs

(Sam, CS1

)

Advises

( p , s )

Æ

Teaches

( p

,

c )

Æ

TAs

( s , c

)

Slide10

Find short paths with a form of relational pathfinding

Path ! Boolean variable !

Node in Markov networkGreedily tries to link the nodes with edgesCliques ! clausesForm disjunctions of atoms in clique’s nodesGreedily adds clauses to an empty MLN10

BUSL[Mihalkova & Mooney, ICML’07]

Advises

(

p,s

)

Æ

Teaches

(

p

,c

)

TAs

(s,c)

Advises

(

p,s) V Teaches(p

,c)

V TAs(s,c)

:Advises

(p,s) V : Teaches(p,c

) V TAs(s,c) … Slide11

BackgroundLearning via

Hypergraph LiftingExperimentsFuture Work

11OutlineSlide12

Uses relational pathfinding to fuller extent

Induces a hypergraph over clusters of constants

12Learning viaHypergraph Lifting (LHL)

Sam

Pete

CS1

CS2

CS3

CS4

CS5

CS6

CS7

CS8

Paul

Pat

Phil

Sara

Saul

Sue

Teaches

TAs

Advises

Pete

Paul

Pat

Phil

Sam

Sara

Saul

Sue

CS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Teaches

TAs

Advises

Lift

”Slide13

Uses a

hypergraph (V,E)

V : set of nodesE : set of labeled, non-empty, ordered subsets of VFind paths in a hypergraph

Path: set of hyperedges s.t. for any two

e

0

and

e

n

,

9

sequence of

hyperedges

in set that leads from

e

0 Ã e

n13

Learning via

Hypergraph

Lifting (LHL)Slide14

Relational DB can be viewed as hypergraph

Nodes ´ ConstantsHyperedges ´ True ground atoms

14DB

Advises

Pete

Sam

Pete

Saul

Paul

Sara

TAs

Sam

CS1

Sam

CS2

Sara

CS1

Teaches

Pete

CS1

Pete

CS2

Paul

CS2

Learning via

Hypergraph

Lifting (LHL)

Sam

Pete

CS1

CS2

CS3

CS4

CS5

CS6

CS7

CS8

Paul

Pat

Phil

Sara

Sau

l

Sue

TAs

Advises

TeachesSlide15

LHL “lifts” hypergraph into more compact rep.

Jointly clusters nodes into higher-level conceptsClusters hyperedgesTraces paths in lifted hypergraph

15

LHL = Clustering +

Relational

Pathfinding

Sam

Pete

CS1

CS2

CS3

CS4

CS5

CS6

CS7

CS8

Paul

Pat

Phil

Sara

Saul

Sue

Teaches

TAs

Advises

Pete

Paul

Pat

Phil

Sam

Sara

Saul

Sue

CS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Teaches

TAs

Advises

Lift

”Slide16

LHL has

three componentsLiftGraph: Lifts hypergraph

FindPaths: Finds paths in lifted hypergraphCreateMLN: Creates rules from paths, and adds good ones to empty MLN

16

Learning via

Hypergraph

LiftingSlide17

Defined using Markov logicJointly clusters constants in bottom-up agglomerative manner

Allows information to propagate from one cluster to anotherGround atoms also clustered#Clusters need not be specified in advanceEach lifted hyperedge contains

¸ one true ground atom17LiftGraphSlide18

Find cluster assignment C

that maxmizes posterior prob. P(C |

D) / P(D | C) P(C)

18

Learning Problem

in

LiftGraph

Truth values of

ground atoms

Defined with

an MLN

Defined with

another MLNSlide19

For each predicate

r and each cluster combination containing a true ground atom of r, we have an atom prediction rule

19

LiftGraph’s

P(

D

|

C

) MLN

Pete

Paul

Pat

Phil

Professor

Student

Sam

Sara

Saul

Sue

Teaches

TAs

Advises

CS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Course

Pete

Paul

Pat

Phil

Professor

CS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Course

TeachesSlide20

For each predicate

r and each cluster combination containing a true ground atom of r

, we have an atom prediction rule

CS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Course

20

LiftGraph’s

P(

D

|

C

) MLN

Pete

Paul

Pat

Phil

Professor

Teaches

Teaches

(

p

,

c

)

p

2

Æ

c

2

)Slide21

For each predicate

r, we have a default atom prediction rule

21LiftGraph’s

P(D

|

C

) MLN

Pete

Paul

Pat

Phil

Professor

CS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Course

Sam

Sara

Saul

Sue

x

2

Æ

y

2

x

2

Pete

Paul

Pat

Phil

Professor

Æ

y 2

Default

ClusterCombination

)

Teaches(x,y)

StudentSlide22

Each symbol belongs to exactly one clusterInfinite weight

Exponential prior on #cluster combinationsNegative weight -¸

22

LiftGraph’s

P(

C

) MLNSlide23

Hard assignments of constants to clustersWeights and log-posterior computed in closed form

Searches for cluster assignment with highest log-posterior

23LiftGraphSlide24

24

LiftGraph’s

Search Algm

Pete

Paul

CS1

CS2

CS3

Sam

Sara

Teaches

Advises

Pete

Paul

Pete

PaulSlide25

25

LiftGraph’s

Search Algm

CS1

CS2

CS3

Sam

Sara

Teaches

Advises

Pete

Paul

CS1

CS2

CS1

CS2

CS3

CS1

CS2

CS3

Sam

Sara

Sam

Sara

Teaches

AdvisesSlide26

26

FindPaths

Pete

Paul

Pat

Phil

Sam

Sara

Saul

Sue

CS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Teaches

TAs

Advises

Paths Found

Pete

Paul

Pat

Phil

Sam

Sara

Saul

Sue

CS1 CS2

CS3 CS4

CS5 CS6

CS7 CS8

Advises(

, )

Advises

(

,

) ,Teaches ( , )

Advises

(

,

) ,

Teaches ( , ),TAs( , )

Slide27

Advises

(

, ) ,

Pete

Paul

Pat

Phil

Sam

Sara

Saul

Sue

Teaches

(

,

) ,

CS1 CS2

CS3 CS4CS5 CS6CS7 CS8

PetePaul

PatPhil

TAs

(

,

)

SamSaraSaulSue

CS1 CS2CS3 CS4

CS5 CS6CS7 CS8:

Advises(p,

s) V :Teaches

(p, c) V

:TAs(s,

c

)

27

Clause Creation

Advises

(

,

)

PetePaul

PatPhil

SamSaraSaulSue

Teaches

( , ) CS1 CS2CS3 CS4CS5 CS6CS7 CS8PetePaulPatPhilTAs( , )SamSaraSaulSue

CS1 CS2CS3 CS4

CS5 CS6CS7 CS8

Æ

Æ

Advises

(

,

)

Teaches(

, )

TAs( ,

Æ

pp

s

sc

c

Advises

(

p, s)

Æ Teaches(p,

c) Æ TAs

(s, c)

Advises

(p

, s) V :Teaches

(p, c) V

:

TAs(s, c

)

Advises

(p, s

) V Teaches(p,

c) V :TAs

(s, c)

…Slide28

28

Clause Pruning

: Advises

(p,

s

)

V

:

Teaches

(

p

,

c

)

V

TAs(s, c

) Advises

(p, s

) V :Teaches(p

, c) V

TAs(s, c

)…

: Advises

(p, s

) V

:Teaches(p,

c): Advises

(p, s

) V TAs(

s, c)

: Advises(

p, s)

: Teaches(p,

c)

:Teaches(p

, c) V TAs

(s, c)

TAs(s

, c)

Score

-1.15 -1.17

-2.21

-2.23

-2.03 -3.13 -2.93 -3.93……`Slide29

29

Clause Pruning

: Advises

(p,

s

)

V

:

Teaches

(

p

,

c

)

V

TAs(s, c

) Advises

(p, s

) V :Teaches(p

, c) V

TAs(s, c

)…

: Advises

(p, s

) V

:Teaches(p,

c): Advises

(p, s

) V TAs(

s, c)

: Advises(

p, s)

: Teaches(p,

c)

:Teaches(p

, c) V TAs

(s, c)

TAs(s

, c)

Score

-1.15 -1.17

-2.21

-2.23

-2.03 -3.13 -2.93 -3.93……

Compare each clause against its

sub-clauses (taken individually) Slide30

Add clauses to empty MLN in order of decreasing scoreRetrain weights of clauses each time clause is added

Retain clause in MLN if overall score improves30

MLN CreationSlide31

BackgroundLearning via Hypergraph

LiftingExperimentsFuture Work

31OutlineSlide32

IMDB

Created from IMDB.com DBMovies, actors, etc., and relationships17,793 ground atoms; 1224

true onesUW-CSEDescribes academic departmentStudents, faculty, etc., and relationships260,254 ground atoms; 2112 true ones

32DatasetsSlide33

C

oraCitations to computer science papersPapers, authors, titles, etc., and their relationships687,422 ground atoms;

42,558 true ones33DatasetsSlide34

Five-fold cross validationInferred prob. true for groundings of each predicate

Groundings of all other predicates as evidenceEvaluation measuresArea under precision-recall curve (AUC)Average conditional log-likelihood (

CLL)34MethodologySlide35

MCMC inference algms

in Alchemy to evaluate the test atoms1 million samples24 hours

35MethodologySlide36

Compared with

MSL [Kok & Domingos, ICML’05]

BUSL [Mihalkova & Mooney, ICML’07] Lesion studyNoLiftGraph: LHL with no hypergraph liftingFind paths directly from

unlifted hypergraphNoPathFinding: LHL with

no

pathfinding

Use MLN representing

LiftGraph

36

MethodologySlide37

37

LHL vs. BUSL vs. MSL

Area under Prec-Recall Curve

System

IMDB

UW-CSE

AUC

CLL

AUC

CLL

LHL

0.69

§

0.01

-0.13

§

0.00

0.22

§

0.01

-0.04

§

0.00

BUSL

0.47

§

0.01

-0.14

§

0.00

0.21

§

0.01

-0.05

§

0.00

MSL

0.47

§

0.01

-0.17

§

0.00

0.18

§

0.01

-0.57

§

0.00

LHL

BUSL

MSL

IMDB

UW-CSE

Cora

System

Cora

AUC

CLL

LHL

0.87

§

0.00

-0.26

§

0.00

BUSL

0.17

§

0.00

-0.37

§

0.00

MSL

0.17

§

0.00

-0.37

§

0.00

LHL

BUSL

MSL

LHL

BUSL

MSLSlide38

LHL vs. BUSL vs. MSL

Conditional Log-likelihood

System

IMDB

UW-CSE

AUC

CLL

AUC

CLL

LHL

0.69

§

0.01

-0.13

§

0.00

0.22

§

0.01

-0.04

§

0.00

BUSL

0.47

§

0.01

-0.14

§

0.00

0.21

§

0.01

-0.05

§

0.00

MSL

0.47

§

0.01

-0.17

§

0.00

0.18

§

0.01

-0.57

§

0.00

IMDB

UW-CSE

System

Cora

AUC

CLL

LHL

0.87

§

0.00

-0.26

§

0.00

BUSL

0.17

§

0.00

-0.37

§

0.00

MSL

0.17

§

0.00

-0.37

§

0.00

Cora

LHL

BUSL

MSL

LHL

BUSL

MSL

LHL

BUSL

MSLSlide39

39

LHL vs. BUSL vs. MSL

RuntimeSystem

IMDB

UW-CSE

Cora

(Minutes)

(Hours)

(Hours)

LHL

15.63

§

1.88

7.55

§

1.51

14.82

§

1.78

BUSL

4.69

§

1.02

12.97

§

9.80

18.65

§

9.52

MSL

0.17

§

0.10

2.13

§

0.38

65.60

§

1.82

UW-CSE

IMDB

Cora

min

hr

hr

LHLBUSL

MSL

LHL

BUSL

MSL

LHL

BUSL

MSLSlide40

40

LHL vs. NoLiftGraphArea under Prec

-Recall Curve

System

Cora

AUC

CLL

LHL

0.87

§

0.00

-0.26

§

0.00

LHL-

FindPaths

0.91

§

0.00

-0.17

§

0.00

IMDB

UW-CSE

Cora

NoLift

Graph

LHL

NoLift

Graph

LHL

NoLift

Graph

LHLSlide41

41

LHL vs.

NoLiftGraphConditional Log-likelihood

System

Cora

AUC

CLL

LHL

0.87

§

0.00

-0.26

§

0.00

LHL-

FindPaths

0.91

§

0.00

-0.17

§

0.00

IMDB

UW-CSE

Cora

NoLift

Graph

LHL

NoLift

Graph

LHL

NoLift

Graph

LHLSlide42

42

System

IMDB

UW-CSE

Cora

(Minutes)

(Hours)

(Hours)

LHL

15.63

7.55

14.82

LHL-

FindPaths

242.41

158.24

5935.5

LHL vs.

NoLiftGraph

Runtime

IMDB

UW-CSE

Cora

min

hr

hr

NoLift

Graph

LHL

NoLift

Graph

LHL

NoLift

Graph

LHLSlide43

43

LHL vs. NoPathFinding

SystemIMDB

UW-CSE

AUC

CLL

AUC

CLL

LHL

0.69

§

0.01

-0.13

§

0.00

0.22

§

0.01

-0.04

§

0.00

LHL-

LiftGraph

0.45

§

0.01

-0.27

§

0.00

0.14

§

0.01

-0.06

§

0.00

AUC

AUC

CLL

CLL

IMDB

UW-CSE

NoPath

Finding

LHL

NoPath

Finding

LHL

NoPath

Finding

LHL

NoPath

Finding

LHLSlide44

if

a is an actor and d is a director,

and they both worked in the same movie,then a probably worked under dif p is a professor,

and p co-authored a paper with s,

then

s

is likely a student

if

papers

x

and

y

have same author

then

x and y are likely to be same paper

44Examples of Rules LearnedSlide45

MotivationBackgroundLearning via

Hypergraph LiftingExperimentsFuture Work

45OutlineSlide46

Integrate the components of LHLIntegrate LHL with lifted inference [

Singla & Domingos, AAAI’08]Construct ontology simultaneously with probabilistic KBFurther scale LHL upApply LHL to larger, richer domains e.g., the Web

46Future WorkSlide47

LHL

= Clustering + Relational Pathfinding“Lifts” data into more compact formEssential for speeding up relational pathfinding

LHL outperforms state-of-the-art structure learners47Conclusion