Markov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept of Computer Science and Eng University of Washington Seattle USA Joint work with Pedro Domingos 2 Synopsis of LHL ID: 203209
Download Presentation The PPT/PDF document "1 Learning" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
1
LearningMarkov Logic Network Structure Via Hypergraph Lifting
Stanley KokDept. of Computer Science and Eng.University of WashingtonSeattle, USAJoint work with Pedro DomingosSlide2
2
Synopsis of LHL
Input: Relational DB
Advises
Pete
Sam
Pete
Saul
Paul
Sara
…
…
TAs
Sam
CS1
Sam
CS2
Sara
CS1
…
…
Teaches
Pete
CS1
Pete
CS2
Paul
CS2
…
…
2.7
Teaches
(p, c)
Æ
TAs
(s, c)
)
Advises
(p, s)1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c)-1.1 TAs(s, c) Æ Advises (s, p)…
Output: Probabilistic KB
Input
: Relational DB
Advises
Pete
Sam
Pete
Saul
Paul
Sar
…
…
TAs
Sam
CS1
Sam
CS2
Sara
CS1
…
…
Teaches
Pete
CS1
Pete
CS2
Paul
CS2
…
…
2
.7
Teaches
(p, c)
Æ
TAs(s, c) ) Advises(p, s)1.4 Advises(p, s) ) Teaches(p, c) Æ TAs(s, c)-1.1 TAs(s, c) ) Advises(s, p)…
Output: Probabilistic KB
Sam
Pete
CS1
CS2
CS3
CS4
CS5
CS6
CS7
CS8
Paul
Pat
Phil
Sara
Sau
l
Sue
TAs
Advises
Teaches
Pete
Paul
Pat
Phil
Sam
Sara
Saul
Sue
CS1 CS2
CS3 CS4
CS5 CS6
CS7 CS8
Teaches
TAs
Advises
Professor
Student
Course
Goal
of LHLSlide3
3
Area under Prec Recall Curve (AUC)
Conditional Log-Likelihood (CLL)LHLBUSL
MSL
LHL
BUSL
M
SL
Experimental
Results
LHL
LHL
BUSL
MSL
BUSL
MSLSlide4
Background
Learning via
Hypergraph
Lifting
Experiments
Future Work
Background
Learning via
Hypergraph
Lifting
Experiments
Future Work
4
OutlineSlide5
5
Markov LogicA logical KB is a set of
hard constraintson the set of possible worldsLet’s make them soft constraints:When a world violates a formula,it becomes less probable, not impossibleGive each formula a weight(Higher weight Stronger constraint)Slide6
6
Markov LogicA Markov logic network (MLN) is a set of pairs (
F,w)F is a formula in first-order logicw is a real number
v
ector
of
truth assignments
to
ground atoms
p
artition function
weight of
i
th
formula
#
true groundings
of
ith formulaSlide7
Challenging task Few approaches to date
[Kok & Domingos, ICML’05
; Mihalkova & Mooney, ICML’07; Biba et. al. ECAI’08; Huynh & Mooney, ICML’08]Most MLN structure learnersGreedily and systematically enumerate formulasComputationally expensive; large search spaceSusceptible to local optima
7
MLN Structure LearningSlide8
While
beam not empty Add unit clauses to beam
While beam has changed For each clause c in beam
c’ Ã add a literal to
c
newClauses
Ã
newClauses
[
c’
beam
à k
best clauses in beam [ newClauses Add best clause in beam to MLN
8
MSL [Kok & Domingos, ICML’05]Slide9
Find paths of linked ground atoms
!formulasPath
´ conjunction that is true at least onceExponential search space of pathsRestricted to short paths9Relational Pathfinding
[Richards & Mooney, AAAI’92]
Sam
Pete
CS1
CS2
CS3
CS4
CS5
CS6
CS7
CS8
Paul
Pat
Phil
Sara
Saul
Sue
Teaches
TAs
Advises
Pete
CS1
Sam
Advises
(Pete, Sam)
Æ
Teaches
(Pete
,
CS1)
Æ
TAs
(Sam, CS1
)
Advises
( p , s )
Æ
Teaches
( p
,
c )
Æ
TAs
( s , c
)
Slide10
Find short paths with a form of relational pathfinding
Path ! Boolean variable !
Node in Markov networkGreedily tries to link the nodes with edgesCliques ! clausesForm disjunctions of atoms in clique’s nodesGreedily adds clauses to an empty MLN10
BUSL[Mihalkova & Mooney, ICML’07]
Advises
(
p,s
)
Æ
Teaches
(
p
,c
)
TAs
(s,c)
…
Advises
(
p,s) V Teaches(p
,c)
V TAs(s,c)
:Advises
(p,s) V : Teaches(p,c
) V TAs(s,c) … Slide11
BackgroundLearning via
Hypergraph LiftingExperimentsFuture Work
11OutlineSlide12
Uses relational pathfinding to fuller extent
Induces a hypergraph over clusters of constants
12Learning viaHypergraph Lifting (LHL)
Sam
Pete
CS1
CS2
CS3
CS4
CS5
CS6
CS7
CS8
Paul
Pat
Phil
Sara
Saul
Sue
Teaches
TAs
Advises
Pete
Paul
Pat
Phil
Sam
Sara
Saul
Sue
CS1 CS2
CS3 CS4
CS5 CS6
CS7 CS8
Teaches
TAs
Advises
“
Lift
”Slide13
Uses a
hypergraph (V,E)
V : set of nodesE : set of labeled, non-empty, ordered subsets of VFind paths in a hypergraph
Path: set of hyperedges s.t. for any two
e
0
and
e
n
,
9
sequence of
hyperedges
in set that leads from
e
0 Ã e
n13
Learning via
Hypergraph
Lifting (LHL)Slide14
Relational DB can be viewed as hypergraph
Nodes ´ ConstantsHyperedges ´ True ground atoms
14DB
Advises
Pete
Sam
Pete
Saul
Paul
Sara
…
…
TAs
Sam
CS1
Sam
CS2
Sara
CS1
…
…
Teaches
Pete
CS1
Pete
CS2
Paul
CS2
…
…
Learning via
Hypergraph
Lifting (LHL)
Sam
Pete
CS1
CS2
CS3
CS4
CS5
CS6
CS7
CS8
Paul
Pat
Phil
Sara
Sau
l
Sue
TAs
Advises
TeachesSlide15
LHL “lifts” hypergraph into more compact rep.
Jointly clusters nodes into higher-level conceptsClusters hyperedgesTraces paths in lifted hypergraph
15
LHL = Clustering +
Relational
Pathfinding
Sam
Pete
CS1
CS2
CS3
CS4
CS5
CS6
CS7
CS8
Paul
Pat
Phil
Sara
Saul
Sue
Teaches
TAs
Advises
Pete
Paul
Pat
Phil
Sam
Sara
Saul
Sue
CS1 CS2
CS3 CS4
CS5 CS6
CS7 CS8
Teaches
TAs
Advises
“
Lift
”Slide16
LHL has
three componentsLiftGraph: Lifts hypergraph
FindPaths: Finds paths in lifted hypergraphCreateMLN: Creates rules from paths, and adds good ones to empty MLN
16
Learning via
Hypergraph
LiftingSlide17
Defined using Markov logicJointly clusters constants in bottom-up agglomerative manner
Allows information to propagate from one cluster to anotherGround atoms also clustered#Clusters need not be specified in advanceEach lifted hyperedge contains
¸ one true ground atom17LiftGraphSlide18
Find cluster assignment C
that maxmizes posterior prob. P(C |
D) / P(D | C) P(C)
18
Learning Problem
in
LiftGraph
Truth values of
ground atoms
Defined with
an MLN
Defined with
another MLNSlide19
For each predicate
r and each cluster combination containing a true ground atom of r, we have an atom prediction rule
19
LiftGraph’s
P(
D
|
C
) MLN
Pete
Paul
Pat
Phil
Professor
Student
Sam
Sara
Saul
Sue
Teaches
TAs
Advises
CS1 CS2
CS3 CS4
CS5 CS6
CS7 CS8
Course
Pete
Paul
Pat
Phil
Professor
CS1 CS2
CS3 CS4
CS5 CS6
CS7 CS8
Course
TeachesSlide20
For each predicate
r and each cluster combination containing a true ground atom of r
, we have an atom prediction rule
CS1 CS2
CS3 CS4
CS5 CS6
CS7 CS8
Course
20
LiftGraph’s
P(
D
|
C
) MLN
Pete
Paul
Pat
Phil
Professor
Teaches
Teaches
(
p
,
c
)
p
2
Æ
c
2
)Slide21
For each predicate
r, we have a default atom prediction rule
21LiftGraph’s
P(D
|
C
) MLN
Pete
Paul
Pat
Phil
Professor
CS1 CS2
CS3 CS4
CS5 CS6
CS7 CS8
Course
Sam
Sara
Saul
Sue
x
2
Æ
y
2
x
2
Pete
Paul
Pat
Phil
Professor
Æ
y 2
…
Default
ClusterCombination
)
Teaches(x,y)
StudentSlide22
Each symbol belongs to exactly one clusterInfinite weight
Exponential prior on #cluster combinationsNegative weight -¸
22
LiftGraph’s
P(
C
) MLNSlide23
Hard assignments of constants to clustersWeights and log-posterior computed in closed form
Searches for cluster assignment with highest log-posterior
23LiftGraphSlide24
24
LiftGraph’s
Search Algm
Pete
Paul
CS1
CS2
CS3
Sam
Sara
Teaches
Advises
Pete
Paul
Pete
PaulSlide25
25
LiftGraph’s
Search Algm
CS1
CS2
CS3
Sam
Sara
Teaches
Advises
Pete
Paul
CS1
CS2
CS1
CS2
CS3
CS1
CS2
CS3
Sam
Sara
Sam
Sara
Teaches
AdvisesSlide26
26
FindPaths
Pete
Paul
Pat
Phil
Sam
Sara
Saul
Sue
CS1 CS2
CS3 CS4
CS5 CS6
CS7 CS8
Teaches
TAs
Advises
Paths Found
Pete
Paul
Pat
Phil
Sam
Sara
Saul
Sue
CS1 CS2
CS3 CS4
CS5 CS6
CS7 CS8
Advises(
, )
Advises
(
,
) ,Teaches ( , )
Advises
(
,
) ,
Teaches ( , ),TAs( , )
Slide27
Advises
(
, ) ,
Pete
Paul
Pat
Phil
Sam
Sara
Saul
Sue
Teaches
(
,
) ,
CS1 CS2
CS3 CS4CS5 CS6CS7 CS8
PetePaul
PatPhil
TAs
(
,
)
SamSaraSaulSue
CS1 CS2CS3 CS4
CS5 CS6CS7 CS8:
Advises(p,
s) V :Teaches
(p, c) V
:TAs(s,
c
)
27
Clause Creation
Advises
(
,
)
PetePaul
PatPhil
SamSaraSaulSue
Teaches
( , ) CS1 CS2CS3 CS4CS5 CS6CS7 CS8PetePaulPatPhilTAs( , )SamSaraSaulSue
CS1 CS2CS3 CS4
CS5 CS6CS7 CS8
Æ
Æ
Advises
(
,
)
Teaches(
, )
TAs( ,
)Æ
Æ
pp
s
sc
c
Advises
(
p, s)
Æ Teaches(p,
c) Æ TAs
(s, c)
Advises
(p
, s) V :Teaches
(p, c) V
:
TAs(s, c
)
Advises
(p, s
) V Teaches(p,
c) V :TAs
(s, c)
…Slide28
28
Clause Pruning
: Advises
(p,
s
)
V
:
Teaches
(
p
,
c
)
V
TAs(s, c
) Advises
(p, s
) V :Teaches(p
, c) V
TAs(s, c
)…
: Advises
(p, s
) V
:Teaches(p,
c): Advises
(p, s
) V TAs(
s, c)
…
: Advises(
p, s)
: Teaches(p,
c)
:Teaches(p
, c) V TAs
(s, c)
TAs(s
, c)
Score
-1.15 -1.17
-2.21
-2.23
-2.03 -3.13 -2.93 -3.93……`Slide29
29
Clause Pruning
: Advises
(p,
s
)
V
:
Teaches
(
p
,
c
)
V
TAs(s, c
) Advises
(p, s
) V :Teaches(p
, c) V
TAs(s, c
)…
: Advises
(p, s
) V
:Teaches(p,
c): Advises
(p, s
) V TAs(
s, c)
…
: Advises(
p, s)
: Teaches(p,
c)
:Teaches(p
, c) V TAs
(s, c)
TAs(s
, c)
Score
-1.15 -1.17
-2.21
-2.23
-2.03 -3.13 -2.93 -3.93……
Compare each clause against its
sub-clauses (taken individually) Slide30
Add clauses to empty MLN in order of decreasing scoreRetrain weights of clauses each time clause is added
Retain clause in MLN if overall score improves30
MLN CreationSlide31
BackgroundLearning via Hypergraph
LiftingExperimentsFuture Work
31OutlineSlide32
IMDB
Created from IMDB.com DBMovies, actors, etc., and relationships17,793 ground atoms; 1224
true onesUW-CSEDescribes academic departmentStudents, faculty, etc., and relationships260,254 ground atoms; 2112 true ones
32DatasetsSlide33
C
oraCitations to computer science papersPapers, authors, titles, etc., and their relationships687,422 ground atoms;
42,558 true ones33DatasetsSlide34
Five-fold cross validationInferred prob. true for groundings of each predicate
Groundings of all other predicates as evidenceEvaluation measuresArea under precision-recall curve (AUC)Average conditional log-likelihood (
CLL)34MethodologySlide35
MCMC inference algms
in Alchemy to evaluate the test atoms1 million samples24 hours
35MethodologySlide36
Compared with
MSL [Kok & Domingos, ICML’05]
BUSL [Mihalkova & Mooney, ICML’07] Lesion studyNoLiftGraph: LHL with no hypergraph liftingFind paths directly from
unlifted hypergraphNoPathFinding: LHL with
no
pathfinding
Use MLN representing
LiftGraph
36
MethodologySlide37
37
LHL vs. BUSL vs. MSL
Area under Prec-Recall Curve
System
IMDB
UW-CSE
AUC
CLL
AUC
CLL
LHL
0.69
§
0.01
-0.13
§
0.00
0.22
§
0.01
-0.04
§
0.00
BUSL
0.47
§
0.01
-0.14
§
0.00
0.21
§
0.01
-0.05
§
0.00
MSL
0.47
§
0.01
-0.17
§
0.00
0.18
§
0.01
-0.57
§
0.00
LHL
BUSL
MSL
IMDB
UW-CSE
Cora
System
Cora
AUC
CLL
LHL
0.87
§
0.00
-0.26
§
0.00
BUSL
0.17
§
0.00
-0.37
§
0.00
MSL
0.17
§
0.00
-0.37
§
0.00
LHL
BUSL
MSL
LHL
BUSL
MSLSlide38
LHL vs. BUSL vs. MSL
Conditional Log-likelihood
System
IMDB
UW-CSE
AUC
CLL
AUC
CLL
LHL
0.69
§
0.01
-0.13
§
0.00
0.22
§
0.01
-0.04
§
0.00
BUSL
0.47
§
0.01
-0.14
§
0.00
0.21
§
0.01
-0.05
§
0.00
MSL
0.47
§
0.01
-0.17
§
0.00
0.18
§
0.01
-0.57
§
0.00
IMDB
UW-CSE
System
Cora
AUC
CLL
LHL
0.87
§
0.00
-0.26
§
0.00
BUSL
0.17
§
0.00
-0.37
§
0.00
MSL
0.17
§
0.00
-0.37
§
0.00
Cora
LHL
BUSL
MSL
LHL
BUSL
MSL
LHL
BUSL
MSLSlide39
39
LHL vs. BUSL vs. MSL
RuntimeSystem
IMDB
UW-CSE
Cora
(Minutes)
(Hours)
(Hours)
LHL
15.63
§
1.88
7.55
§
1.51
14.82
§
1.78
BUSL
4.69
§
1.02
12.97
§
9.80
18.65
§
9.52
MSL
0.17
§
0.10
2.13
§
0.38
65.60
§
1.82
UW-CSE
IMDB
Cora
min
hr
hr
LHLBUSL
MSL
LHL
BUSL
MSL
LHL
BUSL
MSLSlide40
40
LHL vs. NoLiftGraphArea under Prec
-Recall Curve
System
Cora
AUC
CLL
LHL
0.87
§
0.00
-0.26
§
0.00
LHL-
FindPaths
0.91
§
0.00
-0.17
§
0.00
IMDB
UW-CSE
Cora
NoLift
Graph
LHL
NoLift
Graph
LHL
NoLift
Graph
LHLSlide41
41
LHL vs.
NoLiftGraphConditional Log-likelihood
System
Cora
AUC
CLL
LHL
0.87
§
0.00
-0.26
§
0.00
LHL-
FindPaths
0.91
§
0.00
-0.17
§
0.00
IMDB
UW-CSE
Cora
NoLift
Graph
LHL
NoLift
Graph
LHL
NoLift
Graph
LHLSlide42
42
System
IMDB
UW-CSE
Cora
(Minutes)
(Hours)
(Hours)
LHL
15.63
7.55
14.82
LHL-
FindPaths
242.41
158.24
5935.5
LHL vs.
NoLiftGraph
Runtime
IMDB
UW-CSE
Cora
min
hr
hr
NoLift
Graph
LHL
NoLift
Graph
LHL
NoLift
Graph
LHLSlide43
43
LHL vs. NoPathFinding
SystemIMDB
UW-CSE
AUC
CLL
AUC
CLL
LHL
0.69
§
0.01
-0.13
§
0.00
0.22
§
0.01
-0.04
§
0.00
LHL-
LiftGraph
0.45
§
0.01
-0.27
§
0.00
0.14
§
0.01
-0.06
§
0.00
AUC
AUC
CLL
CLL
IMDB
UW-CSE
NoPath
Finding
LHL
NoPath
Finding
LHL
NoPath
Finding
LHL
NoPath
Finding
LHLSlide44
if
a is an actor and d is a director,
and they both worked in the same movie,then a probably worked under dif p is a professor,
and p co-authored a paper with s,
then
s
is likely a student
if
papers
x
and
y
have same author
then
x and y are likely to be same paper
44Examples of Rules LearnedSlide45
MotivationBackgroundLearning via
Hypergraph LiftingExperimentsFuture Work
45OutlineSlide46
Integrate the components of LHLIntegrate LHL with lifted inference [
Singla & Domingos, AAAI’08]Construct ontology simultaneously with probabilistic KBFurther scale LHL upApply LHL to larger, richer domains e.g., the Web
46Future WorkSlide47
LHL
= Clustering + Relational Pathfinding“Lifts” data into more compact formEssential for speeding up relational pathfinding
LHL outperforms state-of-the-art structure learners47Conclusion