Denis Krompaß 1 Maximilian Nickel 2 and Volker Tresp 13 1 Department of Computer Science Ludwig Maximilian University 2 MIT Cambridge and Istituto Italiano di Tecnologia ID: 934754
Download Presentation The PPT/PDF document "Querying Factorized Probabilistic Triple..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Querying Factorized Probabilistic Triple Databases
Denis Krompaß1, Maximilian Nickel2 and Volker Tresp1,31 Department of Computer Science. Ludwig Maximilian University, 2 MIT, Cambridge and Istituto Italiano di Tecnologia3 Corporate Technology, Siemens AG
1
21.10.2014
Slide2Outline
Introduction and MotivationKnowledge Bases are Triple StoresExploiting Uncertainty in KBsConstructing Probabilistic Knowledge Bases with the RESCAL Tensor-FactorizationComplex Querying of Factorized Probabilistic Knowledge Base Representations2
Slide3Knowledge Bases are Triple Stores
Represents facts of the world in a machine readable form
S
UBJECT
P
REDICATE
O
BJECT
Many False Facts
Incomplete
Only few attempts to represent triple-uncertainty
+
-
Machine Readable
Utilize Background Knowledge in Applications
Provide Additional Information (Search)
3
Slide4Benefits of Exploiting Uncertainty in Knowledge Bases
0.10.70.40.50.1
0.1
0.8
0.5
0.4
0.2
0.7
?
?
?
?
?
?
?
?
?
?
?
Subject
Object
P
RndVar
Jack
Jane
0.5
X
1
Jack
Lucy
0.1
X
2
Jack
Jim
0.1
X
3
Jane
Lucy
0.4
X
4
…
…
…
…
Subject
Object
P
RndVar
Jack
Jane
0.7
Y
1
Jack
Lucy
0.2
Y
2
JackJim0.4Y3JaneLucy0.4Y4…………
0.4
?
Probabilistic Tuple-Independent Database
“Does Jack know someone who is a friend of Lucy?”
“No, Jack does not know such a person.”
“Jack might know such a person with a probability of 63%.”*
Subject
Object
Jack
JaneJimLucy
SubjectObjectJackJaneJaneJimJimLucy
*Considering that a person knows and is a friend of itself
p≥0.5
Deterministic Database
4
Slide5Challenges when Exploiting Uncertainty in Knowledge Bases
“Does Jack know someone who is a friend of Lucy?” Can be intractable for larger KBsMillions of EntitiesThousands of Relation TypesIntractable many possible triplesUnsafe queries can be intractable due to exponential complexity (#P)
Safe queries polynomial complexity
can lead to long query processing times.
Reintroducing Uncertainty
Complex Querying
RESCAL
0.1
0.7
0.4
0.5
0.1
0.1
0.8
0.5
0.4
0.2
0.7
Subject
Object
P
RndVar
Jack
Jane
0.5
X
1
Jack
Lucy
0.1
X
2
Jack
Jim
0.1
X
3
Jane
Lucy
0.4
X
4
…
…
…
…
Subject
Object
P
RndVar
Jack
Jane
0.7
Y
1
Jack
Lucy
0.2
Y
2
Jack
Jim0.4Y3JaneLucy0.4Y4……
…
…
0.4
Probabilistic
Tuple
-Independent Database
“Jack might know such a person with a probability of 63%.”*
*Considering that a person knows and is a friend of itself
??
??????????
SubjectObjectJackJaneJimLucySubjectObject
Jack
Jane
Jane
Jim
Jim
Lucy
Deterministic Database
5
Slide6Subject
Object
P
RndVar
Jack
Jane
0.5
X
1
Jack
Lucy
0.1
X
2
Jack
Jim
0.1
X
3
Jane
Lucy
0.7
X
4
…
…
…
…
Subject
Object
P
RndVar
Jack
Jane
0.6
Y
1
Jack
Lucy
0.2
Y
2
Jack
Jim
0.4
Y
3
Jane
Lucy
0.8
Y
4
…
…
…
…
Probabilistic KB Construction with RESCAL
≈
×
×
0.1
0.50.70.50.10.10.80.50.80.20.6???????
?
?
?
?
0.4
?
Denis Krompaß et al.
Large-Scale Factorization of Type-Constrained Multi-Relational Data.DSAA’2014
Explicit
Representation1.6 ×1011TriplesFactorizedRepresentation225 × 106ParametersSubjectObjectJack
JaneJimLucySubjectObjectJackJane
Jane
Jim
Jim
Lucy
Adjacency Tensor
1.0
-∞
+∞
Query a triple:
„Is Jack a friend of Lucy?“
„Jack might be a friend of Lucy with a probability of 0.1 %“
6
Slide7Complex Querying: Pure Extensional Query Evaluation on Factorized PKBs
0.10.50.70.50.1
0.1
0.8
0.5
0.8
0.2
0.6
0.4
“Does Jack know someone who is a friend of Lucy?”
“Does Jack know a soccer player?”
For each existential quantifier the
independent-project
rule has to be applied
Nested loops
Complexity
scales already
cubic
in the size of the database if we ask the query
for all persons
in the database
7
Slide8Querying
DBpedia-Music(44345 Entities, 7 Relations)„What songs or albums from the Pop-Rock genre are from musical artists that have/had a contract with Atlantic Records?“
„Which musical artists from the Hip-Hop
Music genre
have/had a contract with Shady , Aftermath
or Death Row
Records?“
8
<2 seconds
Slide9Avoiding Independent-Project
“Does Jack know a soccer player?”
knows()
,
soccer()
knowsSoccerPlayerOf
()
A
is already known from past factorization of the initial KB
Construct deterministic compound relation
X
(*)
Approximate latent representation of compound relation
X
(*)
Initial Deterministic KB
Factorized KB
9
Slide10Querying
DBpedia-Music(44345 Entities, 7 Relations)„What songs or albums from the Pop-Rock genre are from musical artists that have/had a contract with Atlantic Records?“
„Which musical artists from the Hip-Hop
Music genre have/had a contract with Shady , Aftermath
or Death Row
Records?“
10
Slide11Querying DBpedia-Music(44345 Entities, 7 Relations)
„Which musical artists from the Hip-Hop music genre are associated with musical artists that have/had a contract with Interscope Records and are involved in an album whose first letter is a ‘T’?“Exploiting Approximated Compound-RelationsPure Extensional Query Evaluation
Processing not finished
after
6 hours!
3.6 secondsAUC: 0.985
11
Slide12SummaryUncertainty can be reintroduced into deterministic representations of Knowledge Bases with RESCAL
The factorized Representation can be exploited for complex queryingExtensional query evaluation can be significantly accelerated by exploiting the RESCAL model (Compound Relations)12
Slide13Questions ?
13http://www.dbs.ifi.lmu.de/~krompass/Denis.Krompass@campus.lmu.de