/
Outline Introduction Harvesting Outline Introduction Harvesting

Outline Introduction Harvesting - PowerPoint Presentation

celsa-spraggs
celsa-spraggs . @celsa-spraggs
Follow
347 views
Uploaded On 2019-06-27

Outline Introduction Harvesting - PPT Presentation

Classes Harvesting Facts Common Sense Knowledge Knowledge Consolidation Web Content Analytics WrapUp Goal Extraction from text Consistency reasoning Extraction from Tables Open IE Sourcecentric IE vs Yieldcentric IE ID: 760378

died amp surajit occurs amp died occurs surajit patterns extraction means einstein tables reasoning elvis pattern 1955 harvesting born

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Outline Introduction Harvesting" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Outline

Introduction

Harvesting Classes Harvesting FactsCommon Sense KnowledgeKnowledge ConsolidationWeb Content AnalyticsWrap-Up

Goal

Extraction from text

Consistency reasoning

Extraction from Tables

Open

IE

Slide2

Source-centric IE vs. Yield-centric IE

many

sources

one

source

Surajit

obtained

hisPhD in CS from Stanford ...

Document 1:

instanceOf (Surajit, scientist)

inField (Surajit, c.science)

almaMater (Surajit, Stanford U)…

Yield-centric IE

Student

University

Surajit

Chaudhuri Stanford U

Jim Gray UC Berkeley

… …

Student

Advisor

Surajit

Chaudhuri Jeffrey Ullman

Jim Gray Mike Harrison

… …

1)

recall

!

2)

precision

1)

precision

!2) recall

Source-centric IE

worksAt

hasAdvisor

+ (optional)

targetedrelations

2

Slide3

We focus on yield-centric IE

many

sources

Yield-centric IE

Student

University

Surajit

Chaudhuri Stanford U

Jim Gray UC Berkeley

… …

Student

Advisor

Surajit

Chaudhuri Jeffrey Ullman

Jim Gray Mike Harrison

… …

1)

precision

!

2)

recall

worksAt

hasAdvisor

+ (optional)

targeted

relations

3

Slide4

Goal: Find facts of given binary relations

...find instances of these relationshasAdvisor (JimGray, MikeHarrison)hasAdvisor (Susan Davidson, Hector Garcia-Molina)graduatedAt (JimGray, Berkeley)graduatedAt (HectorGarcia-Molina, Stanford)bornOn (JohnLennon, 9-Oct-1940)

Given binary relations with type signaturehasAdvisor: Person  PersongraduatedAt: Person  UniversitybornOn: Person  Date

4

Slide5

Facts

Patterns

(JimGray, MikeHarrison)

(BarbaraLiskov, JohnMcCarthy)

&

Fact Candidates

X and his advisor Y

X under the guidance of Y

X and Y in their paper

X co-authored with Y

X rarely met his advisor Y

good for

recall

noisy, drifting

not robust

enough

for high precision

(

Surajit

, Jeff)

(Sunita, Mike)

(Alon, Jeff)

(Renee, Yannis)

(

Surajit, Microsoft)

(Sunita, Soumen)

(Surajit, Moshe)

(Alon, Larry)

(Soumen, Sunita)

Facts yield patterns – and vice versa

5

[Brin@WebDB1998 "DIPRE"; Agichtein@SIGMOD2001 "

Snowball

"]

Slide6

Facts

Patterns

(JimGray, MikeHarrison)

(BarbaraLiskov, JohnMcCarthy)

&

Fact Candidates

X and his advisor Y

X under the guidance of Y

X and Y in their paper

X co-authored with Y

X rarely met his advisor Y

good

for

recall

noisy, drifting not robust enough for high precision

(

Surajit, Jeff)

(Sunita, Mike)

(Alon, Jeff)

(Renee, Yannis)

(

Surajit, Microsoft)

(Sunita, Soumen)

(Surajit, Moshe)

(Alon, Larry)

(Soumen, Sunita)

Facts yield patterns – and vice versa

6

Extensions:

use

statistics

to

estimate

the

trustworthiness

of patterns

use

counter

examples

to "

punish

"

bad

patterns

[

Ravichandran

2002;

Suchanek

2006; ...]

3. use

deep

parsing

to

generalize

patterns

[

Bunescu

2005 , Suchanek 2006,

…]

Slide7

Outline

Introduction

Harvesting Classes Harvesting FactsCommon Sense KnowledgeKnowledge ConsolidationWeb Content AnalyticsWrap-Up

Goal

Extraction from text

Consistency reasoning

Extraction from Tables

Open

IE

Slide8

Reasoning

[Suchanek@WWW2009]

8

occurs("

Elvis","died in",528)occurs("Einstein","died in",1955)died(Einstein,1955), born(Elvis, 1935)occurs(X',P,Y) & means(X',X) & R(X,Y) => pattern(P,R)occurs(X',P,Y) & means(X',X) & pattern(P,R) => R(X,Y)born(X,Y) & died(X,Z) => Z>Y…

Einstein died in 1955

Slide9

Reasoning

[

Suchanek@WWW2009]

9

occurs("

Elvis","died in",528)occurs("Einstein","died in",1955)died(Einstein,1955), born(Elvis, 1935)occurs(X',P,Y) & means(X',X) & R(X,Y) => pattern(P,R)occurs(X',P,Y) & means(X',X) & pattern(P,R) => R(X,Y)born(X,Y) & died(X,Z) => Z>Y…

Solving

a

weighted

MAX SAT

problem

at

scale

Slide10

Reasoning

[Suchanek@WWW2009]

10

occurs("Elvis","died in",528)occurs("Einstein","died in",1955)died(Einstein,1955), born(Elvis, 1935)occurs(X',P,Y) & means(X',X) & R(X,Y) => pattern(P,R)occurs(X',P,Y) & means(X',X) & pattern(P,R) => R(X,Y)born(X,Y) & died(X,Z) => Z>Y…

Slide11

Reasoning

[Suchanek@WWW2009]

11

occurs("Elvis","died in",528)occurs("Einstein","died in",1955)died(Einstein,1955), born(Elvis, 1935)occurs(X',P,Y) & means(X',X) & R(X,Y) => pattern(P,R)occurs(X',P,Y) & means(X',X) & pattern(P,R) => R(X,Y)born(X,Y) & died(X,Z) => Z>Y…

Extensions:

parallelize

the

reasoning

by performing a min cut on the dependency graph [Nakashole@WSDM2011 "Prospera"]use Markov logic networks to represent the entire joint probability distribution [M. Richardson / P. Domingos 2006]

MLN

>

Slide12

Using Markov Logic Networks

12

We can model/computethe marginal probabilitiesthe joint distributionthe MAP (=maximum a posteriori), i.e. the most likely world

World 1:

World 2:

Probability

:

Application:

Extracting

facts at large scale [Zhu@WWW2009 "StatSnowball", "EntityCube"]

528

528

Slide13

Outline

Introduction

Harvesting Classes Harvesting FactsCommon Sense KnowledgeKnowledge ConsolidationWeb Content AnalyticsWrap-Up

Goal

√Extraction from text √Consistency reasoning √Extraction from TablesOpen IE

tables>

Slide14

Web Tables provide relational information

[

Cafarella et al: PVLDB 08; Sarawagi et al: PVLDB 09]

14

Slide15

Web Tables can be annotated with YAGO

[Limaye, Sarawagi, Chakrabarti: PVLDB 10]

Goal: enable semantic search over Web tables

Idea:Map column headers to Yago classes,Map cell values to Yago entitiesUsing joint inference for factor-graph learning model

15

Title

Author

A short history of time

S Hawkins

D Adams

Hitchhiker's guide

Book

Person

Entity

hasAuthor

webtables

>

Slide16

Statistics yield semantics of Web tables

[Venetis,Halevy et al: PVLDB 11]

Idea: Infer classes from co-occurrences, headers are class names

 

Result

from

12 Mio. Web tables:1.5 Mio. labeled columns (=classes)155 Mio. instances (=values)

16

but:

classes&entities

not

canonicalized

.

Instances

may

include

:

Google Inc., Google, NASDAQ GOOG, Google

search

engine

, …

Jet Li, Li

Lianjie

,  Ley Lin

Git

, Li

Yangzhong

,

Nameless

hero

, …

Slide17

ID-Based Extraction

887128476661

Unique

identifiers

exist

for books (ISBN),

products

(GTIN),

companies

(VAT), people (emails*), etc.

Unique

identifiers

can

be

found

by

regular

expression + check digit

verification

Slide18

id Name URL

123 Puma PowerTech url1123 Please choose url1123 Puma PowerTech url2123 Puma Power Shoe url2124 Puma Slow Cat url3779 Please choose url3779 Canon PowerShot url3…

ID-Based Extraction

Slide19

ID-Based Extraction

[Talaika@WebDB2015 "IBEX"]

Slide20

Outline

Introduction

Harvesting Classes Harvesting FactsCommon Sense KnowledgeKnowledge ConsolidationWeb Content AnalyticsWrap-Up

Goal

Extraction from text

Consistency reasoning

Extraction from Tables

Open

IE

Slide21

Open Information Extraction

S

o far we assumed given relations with type signatures <entity1, relation, entity2>

< CarlaBruni marriedTo NicolasSarkozy>  Person  R  Person < NataliePortman wonAward AcademyAward >  Person  R  Prize

Open IE aims to discover new entities and new relation types <name1, phrase, name2>

Madame Bruni in her happy marriage with Sarko…

21

<Madame Bruni, her happy marriage with, Sarko>

details

>

Slide22

Open IE with ReVerb

[A. Fader et al. 2011, T. Lin 2012, Mausam 2012]

Idea: Consider all subject-verb-object triples as facts.

Problem 1: uninformative extractions “Gold has an atomic weight of 196”  <Gold,has,atomicweight> “Faust made a deal with the devil”  <Faust, made, a deal>

Solutions: enforce regular expressions over POS tags, such as VB (N | ADJ | ADV | PRN | DET)* PREP2. require relation phrase appear with many distinct arg pairs3. intersect with Freebase

Problem 2: over-specific extractions “Elvis is the first and greatest rock and roll star of America”  <..., is the first and greatest rock and roll star of, …>

22

Slide23

23

http://openie.cs.washington.edu/

PATTY>

Slide24

Syntactic-Lexical-Ontological (SOL) patterns combineontological typeslexical surface formsyntactic properties

Amy Winehouse’s cosy voice in her song ‘Rehab’Jim Morrison’s haunting voice and charisma in ‘The End’Joan Baez’s angel-like voice in ‘Farewell Angelina’SOL pattern: <singer> ’s ADJECTIVE voice * in <song>

[Nakashole@EMNLP2012 "PATTY"]

24

Enhanced Patterns

Patterns

can

subsume

each

other

:

"

wife

of" => "

spouse

of"

which

means

that

we

can

create

synsets

of patterns

and arrange

them

in a

taxonomy

.

Slide25

350 000 SOL

patterns

with 4 Mio. instancesaccessible at: www.mpi-inf.mpg.de/yago-naga/patty

25

[Nakashole@EMNLP2012 "PATTY"]

Enhanced Patterns

Slide26

Open Problems and Grand Challenges

Real-time

&

incremental fact extractionfor continuous KB growth & maintenance(life-cycle management over years and decades)

Extensions to ternary & higher-arity relations

events in context: who did what to/with whom when where why …?

Robust

fact

extraction with both high precision & recall

as

highly automated (self-tuning) as possible

Extend

the approaches to other languages

26