Max Planck Institute for Informatics amp Saarland University httpwwwmpiinfmpgdeweikum Semantic Search from Names and Phrases to Entities and ID: 567520
Download Presentation The PPT/PDF document "Gerhard Weikum" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Gerhard Weikum Max Planck Institute for Informatics & Saarland Universityhttp://www.mpi-inf.mpg.de/~weikum/
Semantic
Search
:
from
Names
and
Phrases
to
Entities
and
RelationsSlide2
AcknowledgementsSlide3
Big Picture: Opportunities Now !
KB Population
Info
Extraction
Semantic
Authoring
Entity
Linkage
Web
of
Data
Web
of
Users & Contents
Very
Large
Knowledge
Bases
Semantic
Docs
DisambiguationSlide4
Big Picture: Opportunities Now !
KB Population
Info
Extraction
Semantic
Authoring
Entity
Linkage
Web
of
Data
Web
of
Users & Contents
Very
Large
Knowledge
Bases
Semantic
Docs
Disambiguation
This
talk:
How
Do
We
Search
this
World
of
Knowledge
, Data,
and
Text
(
and
cope
with
ambiguity
)
for
Knowledge
Harvesting
see
talks
at
College de France
and
at
VLDB School in KunmingSlide5
http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.png
Web of Data: RDF, Tables,
Microdata
YAGO
Cyc
TextRunner
/
ReVerb
WikiTaxonomy
/
WikiNet
SUMO
ConceptNet
5
BabelNet
ReadTheWeb
30
Bio
. SPO
triples
(RDF)
and
growingSlide6
http://richard.cyganiak.de/2007/10/lod/lod-datasets_2011-09-19_colored.png
Web of Data: RDF, Tables,
Microdata
YAGO
30
Bio
. SPO
triples
(RDF)
and
growing
10M
entities
in
350K classes
120M facts for
100 relations 100
languages 95% accuracy
4M
entities in 250 classes
500M facts for 6000
properties live updates
25M entities
in
2000
topics
100M
facts
for
4000
properties
powers
Google
knowledge
graph
Ennio_Morricone
type
composer
Ennio_Morricone type GrammyAwardWinner
composer
subclassOf musician
Ennio_Morricone bornIn
Rome
Rome locatedIn
Italy
Ennio_Morricone created
Ecstasy_of_GoldEnnio_Morricone
wroteMusicFor
The_Good,_the_Bad_,and_the_Ugly
Sergio_Leone directed The_Good,_the_Bad_,and_the_Ugly
Slide7
owl:sameAs
rdf.freebase.com/ns/
en.rome
owl:sameAs
owl:sameAs
data.nytimes.com/
51688803696189142301
Coord
geonames.org/
3169070/
roma
N 41° 54' 10'' E 12° 29' 2
''
dbpprop:citizenOf
dbpedia.org/
resource
/
Rome
rdf:type
rdfs:subclassOf
yago
/
wordnet:Actor109765278
rdf:type
rdfs:subclassOf
yago
/
wikicategory:ItalianComposer
yago
/
wordnet
: Artist109812338
prop:actedIn
imdb.com/name/nm0910607/
Linked
RDF Triples on the Web
prop
:
composedMusicFor
imdb.com/title/
tt0361748
/
dbpedia.org/
resource
/Ennio_Morricone
500 Mio. linksSlide8
Embedding (RDF) Microdata in HTML Pages
May 2, 2011
Maestro
Morricone
will perform
on
the
stage
of the
Smetana
Hall
to
conduct the
Czech
National
Symphony
Orchestra and Choir.
The
concert
will
feature
both
Classical
compositions
and
soundtracks
such
as
t
he
Ecstasy
of
Gold.
In
programme
two concerts for
July
14th and
15th.
<html … May 2, 2011
<div
typeof
=
event:music
>
<span
id
="
Maestro_Morricone
">
Maestro
Morricone
<a
rel="sameAs"
resource="dbpedia
/Ennio_Morricone "/></span>
…<span property = "event:location"
>Smetana Hall </span>…
<span property="rdf:type"
resource="
yago:performance">The concert </span> will
feature …<span property="
event:date" content=
"14-07-2011"></span>July 1
</div>
Supported
by
RDFa
and
microformats like schema.orgSlide9
Outline
Opportunities
Now
Entity
Name
Disambiguation
Question
Answering
Disambiguation
Reloaded
Wrap-Up
Semantic
Search Today
Slide10
Semantic Search Today (1)Slide11
Semantic Search Today (1)Slide12
Semantic Search Today (1)Slide13
Semantic Search Today (1)Slide14
Semantic Search Today (1)Slide15
Semantic Search Today (2)
Select ?x
Where
{
?x type
composer
[western
movie
]
.
?x
wasBornIn
?y . ?y
locatedIn
Europe . } Slide16
Semantic Search Today (2)
Select ?x
Where
{
?x type
composer
.
?x
participatedIn
?y . ?y type
western_film
. } Slide17
Semantic Search Today (3)Slide18
Semantic Search Today (3)Slide19
Semantic Search Today (3)Slide20
Semantic Search Today (4)Slide21
Semantic Search Today (4)
Key
problem
in
semantic
search
:
diversity
and
ambiguity
of
names
and
phrases
!Slide22
Outline
Opportunities
Now
Entity
Name
Disambiguation
Question
Answering
Disambiguation
Reloaded
Wrap-Up
Semantic
Search Today
Slide23
Three Different NLP Problems
Harry fought with you know who. He defeats the dark lord.
1) named-entity
detection
: segment & label by HMM or CRF
(e.g. Stanford NER tagger)
2) co-reference
resolution
: link to preceding NP
(trained classifier over linguistic features)
3) named-entity
disambiguation
:
map each mention (name) to canonical entity (entry in KB)
Three NLP tasks:
Harry
Potter
Dirty
Harry
Lord
Voldemort
The Who
(band)
Prince Harry
of England
3-
23Slide24
Sergio talked to
Ennio about
Eli‘s role in the
Ecstasy scene.
This sequence on
the graveyard
was a highlight in
Sergio‘s trilogy
of western films.
Named Entity Disambiguation
D5 Overview May 30, 2011
Sergio means Sergio_Leone
Sergio means Serge_Gainsbourg
Ennio means Ennio_Antonelli
Ennio means Ennio_Morricone
Eli means Eli_(bible)
Eli means ExtremeLightInfrastructure
Eli means Eli_Wallach
Ecstasy means Ecstasy_(drug)
Ecstasy means Ecstasy_of_Gold
trilogy means Star_Wars_Trilogy
trilogy means Lord_of_the_Rings
trilogy means Dollars_Trilogy
… … …
KB
Eli (bible)
Eli Wallach
Mentions
(surface names)
Entities
(meanings)
Dollars Trilogy
Lord of the Rings
Star Wars Trilogy
Benny Andersson
Benny Goodman
Ecstasy of Gold
Ecstasy (drug)
?
3-
24Slide25
Sergio talked to
Ennio about
Eli‘s role in the
Ecstasy scene.
This sequence on
the graveyard
was a highlight in
Sergio‘s trilogy
of western films.
Mention-Entity Graph
Dollars Trilogy
Lord of the Rings
Star Wars
Ecstasy of Gold
Ecstasy (drug)
Eli (bible)
Eli Wallach
KB+Stats
weighted undirected graph with two types of nodes
Popularity
(m,e):
freq(e|m)
length(e)
#links(e)
Similarity
(m,e):
cos/Dice/KL
(context(m),
context(e))
bag-of-words or
language model:
words, bigrams,
phrases
3-
25Slide26
Sergio talked to
Ennio about
Eli‘s role in the
Ecstasy scene.
This sequence on
the graveyard
was a highlight in
Sergio‘s trilogy
of western films.
Mention-Entity Graph
Dollars Trilogy
Lord of the Rings
Star Wars
Ecstasy of Gold
Ecstasy (drug)
Eli (bible)
Eli Wallach
KB+Stats
weighted undirected graph with two types of nodes
Popularity
(m,e):
freq(e|m)
length(e)
#links(e)
Similarity
(m,e):
cos/Dice/KL
(context(m),
context(e))
joint
mapping
3-
26Slide27
Mention-Entity Graph27 / 20
Dollars Trilogy
Lord of the Rings
Star Wars
Ecstasy of Gold
Ecstasy(drug)
Eli (bible)
Eli Wallach
KB+Stats
weighted undirected graph with two types of nodes
Popularity
(m,e):
freq(m,e|m)
length(e)
#links(e)
Similarity
(m,e):
cos/Dice/KL
(context(m),
context(e))
Coherence
(e,e‘):
dist(types)
overlap(links)
overlap
(anchor words)
Sergio talked to
Ennio about
Eli‘s role in the
Ecstasy scene.
This sequence on
the graveyard
was a highlight in
Sergio‘s trilogy
of western films.
3-
27Slide28
Mention-Entity Graph28 / 20
KB+Stats
weighted undirected graph with two types of nodes
Popularity
(m,e):
freq(m,e|m)
length(e)
#links(e)
Similarity
(m,e):
cos/Dice/KL
(context(m),
context(e))
Coherence
(e,e‘):
dist(types)
overlap(links)
overlap
(anchor words)
American Jews
film actors
artists
Academy Award winners
Metallica songs
Ennio Morricone songs
artifacts
soundtrack music
spaghetti westerns
film trilogies
movies
artifacts
Dollars Trilogy
Lord of the Rings
Star Wars
Ecstasy of Gold
Ecstasy (drug)
Eli (bible)
Eli Wallach
Sergio talked to
Ennio about
Eli‘s role in the
Ecstasy scene.
This sequence on
the graveyard
was a highlight in
Sergio‘s trilogy
of western films.
3-
28Slide29
Mention-Entity Graph29 / 20
KB+Stats
weighted undirected graph with two types of nodes
Popularity
(m,e):
freq(m,e|m)
length(e)
#links(e)
Similarity
(m,e):
cos/Dice/KL
(context(m),
context(e))
Coherence
(e,e‘):
dist(types)
overlap(links)
overlap
(anchor words)
http://.../wiki/Dollars_Trilogy
http://.../wiki/The_Good,_the_Bad, _the_Ugly
http://.../wiki/Clint_Eastwood
http://.../wiki/Honorary_Academy_Award
http://.../wiki/The_Good,_the_Bad,_the_Ugly
http://.../wiki/Metallica
http://.../wiki/Bellagio_(casino)
http://.../wiki/Ennio_Morricone
http://.../wiki/Sergio_Leone
http://.../wiki/The_Good,_the_Bad,_the_Ugly
http://.../wiki/For_a_Few_Dollars_More
http://.../wiki/Ennio_Morricone
Dollars Trilogy
Lord of the Rings
Star Wars
Ecstasy of Gold
Ecstasy (drug)
Eli (bible)
Eli Wallach
Sergio talked to
Ennio about
Eli‘s role in the
Ecstasy scene.
This sequence on
the graveyard
was a highlight in
Sergio‘s trilogy
of western films.
3-
29Slide30
Mention-Entity Graph30 / 20
KB+Stats
Popularity
(m,e):
freq(m,e|m)
length(e)
#links(e)
Similarity
(m,e):
cos/Dice/KL
(context(m),
context(e))
Coherence
(e,e‘):
dist(types)
overlap(links)
overlap
(anchor words)
Metallica on Morricone tribute
Bellagio water fountain show
Yo-Yo Ma
Ennio Morricone composition
The Magnificent Seven
The Good, the Bad, and the Ugly
Clint Eastwood
University of Texas at Austin
For a Few Dollars More
The Good, the Bad, and the Ugly
Man with No Name trilogy
soundtrack by Ennio Morricone
weighted undirected graph with two types of nodes
Dollars Trilogy
Lord of the Rings
Star Wars
Ecstasy of Gold
Ecstasy (drug)
Eli (bible)
Eli Wallach
Sergio talked to
Ennio about
Eli‘s role in the
Ecstasy scene.
This sequence on
the graveyard
was a highlight in
Sergio‘s trilogy
of western films.
3-
30Slide31
Joint Mapping Build mention-entity graph or joint-inference factor graph
from knowledge and statistics in KB
Compute
high-likelihood mapping
(ML or MAP) or
dense subgraph such that:
each m is connected to exactly one e (or at most one e)
90
30
5
100
100
50
20
50
90
80
90
30
10
10
20
30
30
3-
31Slide32
Coherence Graph Algorithm
Compute
dense
subgraph
to maximize
min weighted degree
among entity
nodes such that: each m is
connected to
exactly one e (or
at most
one e)
Greedy approximation: iteratively remove
weakest entity and its
edges Keep alternative solutions, then
use local/randomized search
90
30
5
100
100
50
50
90
80
90
30
10
20
10
20
30
30
[J. Hoffart et al.: EMNLP‘11]
140
180
50
470
145
230
3-
32Slide33
Mention-Entity Popularity Weights
Collect
hyperlink
anchor
-text / link-target
pairs from
Wikipedia redirects
Wikipedia links between
articles
Interwiki links between Wikipedia editions
Web links pointing to
Wikipedia articles …
Build statistics to
estimate P[entity | name
]
Need dictionary with
entities‘ names:
full names: Arnold Alois Schwarzenegger, Los Angeles, Microsoft Corp.
short names
: Arnold, Arnie, Mr. Schwarzenegger, New York, Microsoft, …
nicknames & aliases: Terminator, City of Angels,
Evil Empire, … acronyms
:
LA, UCLA, MS, MSFT
role
names
:
the
Austrian
action
hero
,
Californian
governor
, CEO
of MS, … … plus
gender info
(useful for resolving pronouns in context):
Bill and Melinda met
at MS. They
fell in love
and he kissed
her.
[Milne/Witten 2008, Spitkovsky/Chang 2012]
3-33Slide34
Mention-Entity Similarity Edges
Extent of partial
matches
Weight of
matched
words
Precompute
characteristic
keyphrases
q
for
each
entity
e:
anchor
texts
or noun
phrases
in e
page
with
high
PMI:
Match
keyphrase
q
of
candidate
e in
context
of
mention
m
Compute
overall
similarity
of context(m) and
candidate e„
Metallica tribute to Ennio Morricone“
The Ecstasy
piece was covered by Metallica on the Morricone tribute
album.
3-
34Slide35
Entity-Entity Coherence EdgesPrecompute
overlap
of
incoming
links
for entities e1 and e2
Alternatively
compute
overlap
of
anchor texts
for e1 and e2
or overlap
of keyphrases, or
similarity of bag-of-words,
or …
Optionally
combine
with
type
distance
of
e1
and
e2
(e.g.,
Jaccard
index
for
type
instances)
For special
types of e1
and e2 (locations, people, etc.)use spatial
or temporal distance
3-35Slide36
AIDA: Accurate Online Disambiguationhttp://www.mpi-inf.mpg.de/yago-naga/aida/
3-
36Slide37
AIDA: Accurate Online Disambiguationhttp://www.mpi-inf.mpg.de/yago-naga/aida/
3-
37Slide38
http://www.mpi-inf.mpg.de/yago-naga/aida/
AIDA:
Very
Difficult
Example
3-
38Slide39
http://www.mpi-inf.mpg.de/yago-naga/aida/
AIDA:
Very
Difficult
Example
3-
39Slide40
AIDA: Accurate Online Disambiguationhttp://www.mpi-inf.mpg.de/yago-naga/aida/
3-
40Slide41
AIDA: Accurate Online Disambiguationhttp://www.mpi-inf.mpg.de/yago-naga/aida/
3-
41Slide42
Some NED Online Tools forJ. Hoffart et al.: EMNLP 2011, VLDB 2011https://d5gate.ag5.mpi-sb.mpg.de/webaida/
P.
Ferragina
, U.
Scaella
: CIKM 2010
http://tagme.di.unipi.it/R.
Isele, C. Bizer: VLDB 2012http://spotlight.dbpedia.org/demo/index.html
Reuters Open Calaishttp://viewer.opencalais.com/
S. Kulkarni, A. Singh, G. Ramakrishnan
, S. Chakrabarti: KDD 2009
http://www.cse.iitb.ac.in/soumen/doc/CSAW/D. Milne, I. Witten: CIKM 2008http://wikipedia-miner.cms.waikato.ac.nz/demos/annotate
/perhaps
moresome use Stanford NER
tagger for detecting
mentionshttp://nlp.stanford.edu/software/CRF-NER.shtml
3-42Slide43
NED: Experimental Evaluation
Benchmark:
Extended
CoNLL
2003
dataset
:
1400
newswire
articles
originally
annotated with
mention markup (NER),
now with NED mappings
to Yago and
Freebase difficult
texts: …
Australia beats
India … Australian_Cricket_Team
… White House
talks
to
Kreml …
President_of_the_USA
… EDS
made
a
contract
with
…
HP_Enterprise_Services
Results
:Best: AIDA method
with prior+sim+coh + robustness
test82% precision @100% recall, 87%
mean average precisionComparison
to other methods,
see paper
J. Hoffart et al.: Robust Disambiguation of
Named Entities in Text, EMNLP 2011http://www.mpi-inf.mpg.de/yago-naga/aida/
3-43Slide44
Ongoing Research & Remaining Challenges
More
efficient
graph algorithms (multicore, etc.)
Short and
difficult texts
:
tweets, headlines, etc.
fictional texts: novels, song lyrics, etc.
incoherent texts
Disambiguation
beyond
entity
names:
coreferences:
pronouns, paraphrases, etc.
common
nouns, verbal phrases (
general WSD)
Leverage deep-parsing structures, leverage semantic types
Example: Page played Kashmir on his Gibson
subj
obj
mod
Allow mentions of
unknown entities
, mapped to null
Structured Web
data
:
tables
and
lists
3-
44Slide45
Variants of NED at Web Scale
How
to
run
this
on
big
batch of 1 Mio. input
texts?
partition inputs across
distributed machines,
organize dictionary appropriately, …
exploit
cross-document contexts
How to handle
Web-scale inputs (100 Mio.
pages) restricted to a set
of interesting
entities
?
(e.g.
tracking
politicians
and
companies
)
Tools
can
map
short text onto entities in a
few seconds
3-45Slide46
Outline
Opportunities
Now
Entity
Name
Disambiguation
Question
Answering
Disambiguation
Reloaded
Wrap-Up
Semantic
Search Today
Slide47
Deep Question Answering
99 cents got me a 4-pack of
Ytterlig
coasters from this Swedish chain
This town is known as "Sin City" & its downtown is "Glitter Gulch"
William Wilkinson's "An Account of the Principalities of Wallachia and Moldavia" inspired this author's most famous novel
As of 2010, this is the only
former Yugoslav republic in the EU
YAGO
knowledge
back-
ends
question
classification
&
decomposition
D.
Ferrucci
et al.:
Building Watson.
AI Magazine, Fall 2010.
IBM Journal
of
R&D 56(3/4), 2012:
This
is
Watson.Slide48
Semantic Keyword Search Need to map (groups of
)
keywords
onto
entities & relationshipsbased on name-entity
similarities/probabilities
q: composer
Rome scores westerns
[Ilyas et al. Sigmod‘10]
Media
Composer
video
editor
Western Digital
Rome
(
Italy
)
goal
in
football
film
music
composer
(
creator
of
music
)
Rome
(NY)
Lazio
Roma
western
movies
western
world
Western (
airline
)
AS
Roma
Western (NY)
…
born
in …
…
plays
for
…
…
used
in …
…
recorded
at
…Slide49
Natural Language Questions are Natural Who composed
scores
for
westerns
and is
from Rome
?
translate
question
into
Sparql
query
:
dependency
parsing
to
decompose
question
mapping
of
question
units
onto entities, classes, relations
Who
composed
scores
for
westerns
and
is from
Rome?
map
results
into
tabular
or
visual presentationor speech Slide50
From Questions to Queries
NL
question
:
Who
composed
scores
for
westerns
and
is from Rome
?
scores
for
westerns
is
from
Rome
Who
composed
scores
Dependency
parsing
exposes
structure
of
question
„
triploids
“
(sub-
cues
)
2-
50Slide51
From Triploids to TriplesWho composed scores for westerns
and
is
from
Rome?
Who
is from
Rome
Who
composed
scores
scores
for
westerns
?x
composed
scores
?x
bornIn
Rome
scores
contributesTo
?y
?y type
westernMovie
?x type
composer
?x
composed
?s
?s
contributesTo
?y
?s type
music
2-
51Slide52
Pattern Dictionary for Relations
[N.
Nakashole
et al.: EMNLP 2012]
WordNet
-style dictionary/taxonomy for
relational phrases
based on
SOL patterns
(syntactic-lexical-ontological)
Relational phrases can be
synonymous
One relational phrase can
subsume
another
R
elational
phrases are
typed
Problem
:
cope
with
language
diversity
&
ambiguity
Example
:
composed
…,
wrote
…,
created
…, …
“graduated from”
“obtained degree in
*
from”
“and $PRP ADJ advisor”
“under the supervision of”
“wife of”
“ spouse of”
<person>
graduated from
<university>
<singer>
released
<album>
<singer>
covered
<song> <book>
covered
<event>Slide53
PATTY: Pattern Taxonomy for Relations
[N.
Nakashole
et al.: EMNLP 2012,
demo
at
VLDB 2012]
350 000 SOL
patterns
with 4 Mio. instances
Derived
from large
data (Wikipedia, NYT, ClueWeb
)
by
scalable
sequence mining
a
ccessible at: www.mpi-inf.mpg.de/yago-naga/pattySlide54
Disambiguation Mapping for Triploids
Who
composed
scores
for
westerns
and
is
from
Rome
?
composed
composed
scores
s
cores
for
westerns
i
s
from
Rome
Who
q1
q2
q3
q4
Combinatorial
Optimization
by
ILP (
with
type
constraints
etc.)
e
:
Rome
(
Italy
)
e
: Lazio Roma
c
:
person
c
:
musician
e
: WHO
r
:
created
r
:
wroteComposition
r
:
wroteSoftware
c
:soundtrack
r
:
soundtrackFor
r
:
shootsGoalFor
r
:
bornIn
r
:
actedIn
c
: western
movie
e
: Western Digital
w
eighted
edges
(
coherence
,
similarity
, etc.)Slide55
Relaxing
Overconstrained
Queries
Select ?p
Where
{
?p
composed
?s . ?s type
music
.
?s
for
?m . ?m type
movie
. ?p
bornIn Rome . }
Select ?p Where {
?p composed ?s . ?s type music .
?s for ?m . ?m type
movie [western] .
?p bornIn Rome . }
Select ?p
Where
{
?p
?rel1
?s
[
composed
]
. ?s type
music
.
?s
?rel2
?m . ?m type
movie
[western]
. ?p bornIn
Rome . }
with
extended SPARQL-FullText: SPOX quad patterns
(S.
Elbassuoni
et al.: CIKM‘10, ESWC’11, SIGIR‘12)
Select ?p
Where
{
?p
composed ?s . ?s type music .
?s for
?m . ?m type movie [western] . ?p bornIn
Rome . } Slide56
Preliminary Results(M. Yahya et al.: WWW‘12, EMNLP‘12)
http://www.mpi-inf.mpg.de/yago-naga/deanna/Slide57
Outline
Opportunities
Now
Entity
Name
Disambiguation
Question
Answering
Disambiguation
Reloaded
Wrap-Up
Semantic
Search Today
Slide58
Disambiguation Mapping
Who
composed
scores
for
westerns
and
is
from
Rome
?
composed
composed
scores
s
cores
for
westerns
i
s
from
Rome
Who
q1
q2
q3
q4
e:Rome (
Italy
)
e:Lazio Roma
c:person
c:musician
e:WHO
r:created
r:wroteComposition
r:wroteSoftware
c:soundtrack
r:soundtrackFor
r:shootsGoalFor
r:bornIn
r:actedIn
c:western
movie
e:Western Digital
w
eighted
edges
(
coherence
,
similarity
, etc.)
Selection
:
X
i
Assignment
:
Y
ij
Joint
Mapping:
Z
kl
[
M.Yahya
et al.: EMNLP‘12]Slide59
Disambig
. Mapping:
Objective Function
Who
composed
scores
for
westerns
and
is
from
Rome
?
composed
composed
scores
s
cores
for
westerns
i
s
from
Rome
Who
q1
q2
q3
q4
e:Rome (
Italy
)
e:Lazio Roma
c:person
c:musician
e:WHO
r:created
r:wroteComposition
r:wroteSoftware
c:soundtrack
r:soundtrackFor
r:shootsGoalFor
r:bornIn
r:actedIn
c:western
movie
e:Western Digital
w
eighted
edges
(
coherence
,
similarity
, etc.)
Selection
:
X
i
Assignment
:
Y
ij
Joint
Mapping:
Z
kl
maximize
i,j
w
ij
Y
ij
+
k,l
v
kl
Z
kl
+…
subject
to
:
Y
ij
X
i
for
all
i,j
j
Y
ij
1
for
all i
Z
kl
i,j
Y
ik
and
Z
kl j Yil for all k,lXi,Yij,Zkl
{0,1}wijvklSlide60
Disambig
. Mapping:
Constraints
Who
composed
scores
for
westerns
and
is
from
Rome
?
composed
composed
scores
s
cores
for
westerns
i
s
from
Rome
Who
q1
q2
q3
q4
e:Rome (
Italy
)
e:Lazio Roma
c:person
c:musician
e:WHO
r:created
r:wroteComposition
r:wroteSoftware
c:soundtrack
r:soundtrackFor
r:shootsGoalFor
r:bornIn
r:actedIn
c:western
movie
e:Western Digital
w
eighted
edges
(
coherence
,
similarity
, etc.)
Selection
:
X
i
Assignment
:
Y
ij
Joint
Mapping:
Z
kl
maximize
i,j
w
ij
Y
ij
+
k,l
v
kl
Z
kl
+…
subject
to
:
Q
hi
= 1
g
Q
hg
= 3
for
all
h,i
X
i
+
X
g
1
for
all
mutually
exclusive
i,g
Q
hi
= 1
g,j Qhg Ygj = 1 for relation
nodes jwijvklSelection: QhiSlide61
Disambig
. Mapping:
Type
Constraints
Who
composed
scores
for
westerns
and
is
from
Rome
?
composed
composed
scores
s
cores
for
westerns
i
s
from
Rome
Who
q1
q2
q3
q4
e:Rome (
Italy
)
e:Lazio Roma
c:person
c:musician
e: WHO
r:created
r:wroteComposition
r:wroteSoftware
c:soundtrack
r:soundtrackFor
r:shootsGoalFor
r:bornIn
r:actedIn
c:western
movie
e:Western Digital
w
eighted
edges
(
coherence
,
similarity
, etc.)
Selection
:
X
i
Assignment
:
Y
ij
Joint
Mapping:
Z
kl
maximize
i,j
w
ij
Y
ij
+
k,l
v
kl
Z
kl
+…
subject
to
:
Y
ij
= 1
and
j
is
relation
node
and
Z
kj
=1
and
Z
jl
=1
domain
(j)
types
(k)
and
range(j) types(l) wijvkl
Selection: Qhi
ILP
optimizers
like
Gurobi
solve
this
in 1 or 2
secondsSlide62
Outline
Opportunities
Now
Entity
Name
Disambiguation
Question
Answering
Disambiguation
Reloaded
Wrap-Up
Semantic
Search Today
Slide63
Summary
Web
of
Data &
Knowledge
& Text (RDF +
P
hrases)
Calls for
S
emantic Search by
Entities, Classes &
Relations Diversity &
Ambiguity of Names
and Phrases
Calls for D
isambiguation Mapping
Strong Story for Entity
Name Disambiguation
Ongoing Work on Relation Phrase
Disambiguation
Cornerstone of Question Answering
with Natural Language or
Advanced
Keywords
Great
opportunity
towards
next
-generation
search
Challenging
problems
:
robustness, scale,
dynamics & transferSlide64
Take-Home Message
Solve
„
Who
composed
the
Ecstasy
and
other
pieces
for
westerns
?
“
can
solve
semantic
search
with
natural-
language
disambiguation