/
Enabling Search for Facts and Enabling Search for Facts and

Enabling Search for Facts and - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
373 views
Uploaded On 2016-07-23

Enabling Search for Facts and - PPT Presentation

Implied Facts in Historical Documents David W Embley Stephen W Liddle Deryle W Lonsdale Spencer Machado Thomas Packer Joseph Park Nathan Tate Andrew Zitzelberger Brigham Young University ID: 416242

query facts extraction wok facts query wok extraction historical implied mary precision reasoning amp ely knowledge web documents recall

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Enabling Search for Facts and" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Enabling Search for Facts andImplied Facts in Historical Documents

David W. Embley, Stephen W. Liddle, Deryle W. Lonsdale, Spencer Machado, Thomas Packer, Joseph Park, Nathan Tate, Andrew ZitzelbergerBrigham Young University

BYU

D

ata

E

xtraction Research

G

roupSlide2

WoK

-HD(A Web of Knowledge Superimposed over Historical Documents)

…Slide3

WoK

-HD(A Web of Knowledge Superimposed over Historical Documents)

grandchildren of Mary Ely

…Slide4

WoK

-HD(A Web of Knowledge Superimposed over Historical Documents)

grandchildren of Mary ElySlide5

WoK

-HD(A Web of Knowledge Superimposed over Historical Documents)

grandchildren of Mary Ely

…Slide6

grandchildren of Mary Ely

WoK-HD

(A Web of Knowledge Superimposed over Historical Documents)

…Slide7

WoK-HD InputSlide8

Querying for Facts & Implied FactsSlide9

Querying for Facts & Implied Facts

Animation ofExtraction query, results, highlightingReasoned Query, results, reasoning chain, highlightingSlide10

Extraction OntologiesSlide11

Extraction OntologiesSlide12

Fact ExtractionSlide13

Fact ExtractionSlide14

Fact ExtractionSlide15

Reasoning for Implied FactsSlide16

Reasoning for Implied FactsSlide17

Reasoning for Implied FactsSlide18

Reasoning for Implied FactsSlide19

Query Interpretation

“Mary Ely” grandchildSlide20

Query Interpretation

“Mary Ely” grandchildSlide21

Query Interpretation

“Mary Ely” grandchildSlide22

Generated SPARQL QuerySlide23

Generated SPARQL QuerySlide24

Query ResultsSlide25

Results of Processingthe Ely Ancestry (all 830 Pages)

Number of facts extracted: 22,2518,740 Person-Birthdate facts3,803 Person-

Deathdate facts9,708 children facts, including5,020 Child-has-parent-Person facts

2,394 Son-of-Person facts2,294 Daughter-of-Person factsNumber of implied grandchild facts inferred: 5,277

Processing time:

~18 seconds per page

CPU time: ~4 hours

Processing 10 in parallel: ~24 minutesSlide26

Results of Processingthe Ely Ancestry (all 830 Pages)

Precision: .52 (by randomly selecting & checking 100 of the 22,251 facts)

Recall: .33 & Precision: .40 (by randomly selecting and checking 2 fact-filled family pages)Errors:

Name recognizerText pattern expectationsOCRVarying accuracy (for pages checked)

Recall: .11, Precision: .11

(bad combination of all problems)

Recall: .50, Precision: .68

(some problems, but closer to expectations)

Recall: .59, Precision: .71

(10 pages, mostly as expected)

Recall: .91, Precision: .94 (tuned, no problems except a few OCR errors)Slide27

Current and Future Work

Implementation Status:Full line works (but is fragile & needs finishing touches)HyKSS integrated (but not all features)Scalability:Handcrafted extraction

ontologies & reasoning rules (worth the work for certain applications)

ListReader (plus bootstrapping for lists and general extraction)Optimization (especially for query processing)Integration:

Mapping extraction

ontologies

to domain

ontologies

Object identity for people and placesSlide28

Summary and Conclusion

WoK-HDSuperimposes a web of knowledge over a collection of historical documents

Works as a proof-of-concept prototypeTo build and deploy the WoK-HD successfully:

Efficient implementationBetter, more cost-effective extractionIntegration and record linkage

www.deg.byu.edu