/
Slot Filling based on Knowledge Graph and Truth Finding Slot Filling based on Knowledge Graph and Truth Finding

Slot Filling based on Knowledge Graph and Truth Finding - PowerPoint Presentation

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
342 views
Uploaded On 2019-11-21

Slot Filling based on Knowledge Graph and Truth Finding - PPT Presentation

Slot Filling based on Knowledge Graph and Truth Finding Dian Yu Haibo Li Hongzhao Huang and Heng Ji Computer Science Department Rensselaer Polytechnic Institute yud2 jihrpiedu November 18 2013 ID: 766408

knowledge path prep graph path knowledge graph prep slot finding truth filler system query work credibility type extraction approach

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Slot Filling based on Knowledge Graph an..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slot Filling based on Knowledge Graph and Truth Finding Dian Yu, Haibo Li, Hongzhao Huang and Heng Ji Computer Science Department Rensselaer Polytechnic Institute {yud2 , jih}@rpi.edu November 18, 2013

2 Our Starting Point = 0 BLENDER SF2010 System BLENDER SFV2012 System Why? Because Heng wants everything new in her new place.

3 Outline Limitations of state-of-the-art Our Vision and Approach Overview Knowledge Graph Construction Knowledge Path ExtractionKnowledge Path SelectionKnowledge Graph ClusteringTruth Finding2-Layer Mutual Enhancement Truth Finding Credibility InitializationCredibility PropagationExperimental ResultsRemaining Challenges Conclusions and Future Work

4 Outline Limitations of State-of-the-art Our Vision and Approach Overview Knowledge Graph ConstructionKnowledge Path ExtractionKnowledge Path SelectionKnowledge Graph ClusteringTruth Finding2-Layer Mutual Enhancement Truth Finding Credibility InitializationCredibility PropagationExperimental Results Remaining ChallengesConclusions and Future Work

Unpleasant Situation of Slot Filling 2009-2012 The most challenging task in KBP Most previous systems hit the 30% “performance ceiling” No significant publications on this task at major venues What are the bottlenecks?Limited amount of labeled data  supervised learning is infeasible Low coverage of patterns  construct knowledge graph with enriched semantic IE annotations and coreference Knowledge gap  linguistic constraints mining, path selection and graph clustering Conflicting results  truth finding

Inspiration from Gravitational Theory Query Slot Filler? Slot Type?

Semantic Annotation Knowledge Graph Construction Approach Overview Query Query Expansion Source Corpus Information Retrieval Information Extraction Dependency Parsing Path Extraction Path Selection Graph Clustering Truth Finding Slot Fills Wikipedia Mining Redundancy Removal & Filler Normalization Merged KBs Alternative Name Slot Fill Extraction

8 Outline Limitations of State-of-the-art Our Vision and Approach Overview Knowledge Graph ConstructionKnowledge Path ExtractionKnowledge Path SelectionKnowledge Graph ClusteringTruth Finding2-Layer Mutual Enhancement Truth Finding Credibility InitializationCredibility PropagationExperimental Results Remaining ChallengesConclusions and Future Work

Inspiration from Gravitational Theory Query Slot Filler? Slot Type?

Manually crafted/edited patterns: low coverage; expensiveBootstrapping: hard to generalize; long-tail distributionTypical Dependency patterns for per:place_of_birth <Query_PER> nsubjpass-1 born prep_in <Filler_LOC> <Query_PER> partmod born prep_in <Filler_LOC><Query_PER> nsubjpass-1 born prep_on <Filler_LOC><Query_PER> rcmod born prep_in <Filler_LOC>Missing some simple casesCharles Gwathmey [1] was born on June 19 , 1938 , in Charlotte [2] , N.C..Dependency path between [1] and [2]:[ 'nsubjpass ', 'born', 'prep_on', 'June', 'prep_in', 'N.C', 'nn') ] Bottleneck: Low Coverage of Patterns

Typical Dependency Patterns for per:place_of_death <Q_PER> nsubj-1 dies prep_in <A_LOC> <Q_PER> nsubj-1 died prep_in <A_LOC><Q_PER> nsubj-1 died prep_on <A_LOC><Q_PER> nsubj-1 died prep_in hospital nn <A_LOC>Missing some simple cases``60 Minutes'' was the brainchild of Don Hewitt [1], the show 's longtime executive producer who died Wednesday of pancreatic cancer at his home in Bridgehampton, N.Y. [2] , at age 86 .Dependency path between [1] and [2]: [ 'appos', "producer", 'nsubj', 'died', "who", 'rcmod', 'died', 'prep_at', 'home', 'prep_in‘]  Bottleneck: Low Coverage of Patterns

Deep Knowledge Acquisition: Nominal Coreference Almost overnight, he became fabulously rich, with a $3-million book deal, a $100,000 speech making fee, and a lucrative multifaceted consulting business, Giuliani Partners . As a celebrity rainmaker and lawyer, his income last year exceeded $17 million. His consulting partners included seven of those who were with him on 9/11, and in 2002 Alan Placa, his boyhood pal, went to work at the firm.After successful karting career in Europe, Perera became part of the Toyota F1 Young Drivers Development Program and was a Formula One test driver for the Japanese company in 2006.“Alexandra Burke is out with the video for her second single … taken from the British artist’s debut album” “a woman charged with running a prostitution ring … her business, Pamela Martin and Associates ”Our Solution: Online knowledge graph construction; enrich paths with semantic annotations and Information Extraction (coreference/relation/event) Knowledge Gap 1

Knowledge Path Extraction ① ② ③ ① Relevant Document Set ② Sentence Set[Tree representation] ③ Extracted Paths

Knowledge Path Extraction Mays, 50, had died in his sleep at his Tampa home the morning of June 28. Extracted Knowledge Paths:1) Mays {}…amod…502) Mays…nsubj…died…prep_at …home…Tampa3) Mays…nsubj…died…prep_at …June, 28 {PER, NAM, Billy Mays} {Death-Trigger} {NUM} {GPE, NAM, FL-USA} {06/28/2009, TIME-WITHIN} {FAC, NOM} {Located} {PER, PRO, Mays}

15 Knowledge Path Extraction Each node is a entity/time/value mention extent or a word, enriched by Entity type/subtype Time normalization, role Mention headFull entity mention name a mention node refers toSlot type of trigger phrases mined from Gigaword, Wikipedia articles and KBsEach edge is a derivation path from syntactic parsing, ora type labeled dependency path, or a event/semantic relation extracted by IE, labeled with argument roles

16 Outline Limitations of State-of-the-art Our Vision and Approach Overview Knowledge Graph ConstructionKnowledge Path ExtractionKnowledge Path SelectionKnowledge Graph ClusteringTruth Finding2-Layer Mutual Enhancement Truth Finding Credibility InitializationCredibility PropagationExperimental Results Remaining ChallengesConclusions and Future Work

Deep Knowledge Acquisition: Implicit paraphrases & long-tail distribution “employee/member” :Sutil, a trained pianist, tested for Midland in 2006 and raced for Spyker in 2007 where he scored one point in the Japanese Grand Prix.Daimler Chrysler reports 2004 profits of $3.3 billion; Chrysler earns $1.9 billion.In her second term, she received a seat on the powerful Ways and Means CommitteeJennifer Dunn was the face of the Washington state Republican Party for more than two decadesState of Residence: Davis became Virginia's first Republican woman elected to Congress in 2000, and she was a member of the House Armed Services Committee and the Foreign Affairs CommitteeBuchwald lied about his age and escaped into the Marine Corps . By 1942, Peterson was performing with one of Canada's leading big bands, the Johnny Holmes Orchestra . Even more : “would join”, “would be appointed”, “will start at”, “went to work”, “was transferred to”, “was recruited by”, “took over as”, “succeeded PERSON”, “began to teach piano”, … “spouse”: Buchwald 's 1952 wedding -- Lena Horne arranged for it to be held in London 's Westminster Cathedral -- was attended by Gene Kelly , John Huston , Jose Ferrer , Perle Mesta and Rosemary Clooney , to name a few Knowledge Gap 2

Need to filter out noisy contexts: 97% paths are irrelevant Our Solution: Multi-Layer Path Selection Encode s lot type-specific linguistic constraints for deep understanding Constraint Examples Candidate/context node attributes (entity type, mention type, time, number, url…) Stop words, upper case/lower case Match name gazetteer and existing KBs (YAGO, Wikipedia infoboxes, Freebase, DBPedia) and KB mined from Wikipedia Mining Path length His most noticeable moment in the public eye came in 1979 , when Muslim militants in Iran seized the U.S. Embassy and took the Americans stationed there hostage . path = ( 'poss', 'moment', 'nsubj', 'came', 'advcl', 'seized', 'nsubj', 'Muslim militants','amod') Coreference link/relation argument roles/event argument roles Position of a particular node/edge type in the path Semantic categories of context nodes from IE annotations Entity node’s role in the entire sentence (e.g. remove commenter/reporter) Filter “orgin” if the person is a commenter: “Canada and Russia , they have unbelievable rosters , '' Forsberg said .”   Too Rich is not Always a Good Thing

More Constraint Examples Edge type place_of_death and place_of_birth paths should include prep_in or prep_at edges) Filter “Employee” if the dependency path includes “prep_on” Manh called on the Asian Development Bank to play a greater role in helping improve national infrastructure . (path: ['nsubj', 'called', 'prep_on‘])Lexical Constraints based on trigger phrases/words [Heng’s several days paper-pen work] Mining from Gigaword, Wikipedia articlesMining from KBs (Wikipedia infoboxes, Freebase, YAGO, DBPedia) CMU NELL knowledge base (e.g. religion list from is-a relation)Examples: “ top-employees”: chief executive officer, chief financial officer, chief operating officer, chief strategy and development officer, chiev information officer, e-commerce and security officer,… “headquarters”: based, headquarter, headquarters, 's Disease list from medical ontology Comparison with competing context nodes

20 Outline Limitations of State-of-the-art Our Vision and Approach Overview Knowledge Graph ConstructionKnowledge Path ExtractionKnowledge Path SelectionKnowledge Graph ClusteringTruth Finding2-Layer Mutual Enhancement Truth Finding Credibility InitializationCredibility PropagationExperimental Results Remaining ChallengesConclusions and Future Work

Inspiration from Gravitational Theory Query Slot Filler? Slot Type?

Slot Filling != Binary Relation Extraction A sentence is usually anchored by a predicate instead of a pair of entities Slot fillers need to be extracted from multiple documents instead of a local context involving two entities (main difference from ACE relation extraction) Capture the interactions among query, candidate slot filler and all other (competing) entities from global contexts, instead of only the path between query and candidate slot filler Cross-slot Cross-entity reasoning is requiredGeneralization of similar specific graphsModel a candidate mention or context word’s latent semantic role based on its local context knowledge graph

Filler competing with popular entities involved in centroid events/topics “Hewitt was born Dec. 14 , 1922 , in New York City , but his family soon moved to Boston , where his father worked as the classified advertising manager for the Boston Herald American.” Query: HewittCandidate Filler 1: New York City Path: 'nsubjpass', 'born', 'prep_in‘ Candidate Filler 2: Boston Path: 'nsubjpass', 'born', 'conj_but', 'moved', 'prep_to‘Small Universe of “Boston” Small Universe of a Mention/Word Boston movement Boston coref Herald American mod h is father employee family Hewitt h is family family

Knowledge Graph Clustering Hypothesis : Entity mentions/words that share similar local graph structures and labels are likely to play similar rolesLocal graphs for correct spouse fillers:Local graphs for incorrect spouse fillers:

Knowledge Graph Clustering Similarity Measures Structure similaritynumber of nodesnumber of edgesradiusdegree assortativity of graphthe maximum degree centrality for nodes in the graph (density)Similarity between attributes of nodes and edges (PageRank)More powerful for persons than organizations

26 Outline Limitations of State-of-the-art Our Vision and Approach Overview Knowledge Graph ConstructionKnowledge Path ExtractionKnowledge Path SelectionKnowledge Graph ClusteringTruth Finding2-Layer Mutual Enhancement Truth Finding Credibility InitializationCredibility PropagationExperimental Results Remaining ChallengesConclusions and Future Work

Negative Statement Steinmeier , who became Chancellor Angela Merkel's foreign minister in 2005, has denied the U.S. planned to send Kurnaz to Germany.Conflicting Evidence from Multiple SourcesYolanda King , daughter of Martin Luther King Jr. , dies ATLANTAShe was 51 King died late Tuesday in Santa Monica , California , at age 51 , said Steve Klein , a spokesman for the King CenterA unique challenge that did not exist in traditional single-document information extraction Our SolutionDevelop a new validation approach based on a "truth-finding" frameworkPropagate evidence among system, evidence and claim Truth Finding

Hypothesis 1: A claim is likely to be true if it's supported by many trustworthy evidences. An evidence is more likely to be trustworthy if many claims it supports are true Hypothesis 2: A claim or evidence is more likely to be true if it is extracted by many trustworthy systems. And a system is more likely to be credible if it can extract many trustworthy claims or evidences Hypotheses

Claim-Evidence Networks 2-Layer Mutual Enhancement Truth Finding   Evidence Claim System                   Claim-System Networks

There might be multiple true claims, some redundant, some distinct but of the same type Most of the previous truth finding methods relied on the crowd of wisdom (“great minds think alike"); but majority voting may not always work because certain implicit truths might only be discovered by a few good systems/sources The performance of a system may vary over time Systems may share similar resources and be dependent on each other Not only provide confidence scores, but also detailed evidence and aspects What’s New

Credibility Initialization   System 1 System 2

Credibility Initialization Initializing scores for claims: Evaluate each claim based on evidence from its dependency path Query Filler d ependency tag Entity d ependency tag sentence : US actress Patricia Neal dies at 84. query : Patricia Neal slot : per:origin Filler : US dependency path from query to filler : nn

Credibility Propagation based on Tri-HITS Propagating credibility scores from claim to system Update system credibility Propagate credibility from system to claim Update claim credibility (Huang, 2012)

34 Outline Limitations of State-of-the-art Our Vision and Approach Overview Knowledge Graph ConstructionKnowledge Path ExtractionKnowledge Path SelectionKnowledge Graph ClusteringTruth Finding 2-Layer Mutual Enhancement Truth FindingCredibility InitializationCredibility Propagation Experimental ResultsRemaining ChallengesConclusions and Future Work

Overall Performance System Data Precision RecallF-Measure2010 System2010 Eval Data27.4%26.6%27.0%2013 System2012 Eval Data (Approx.)36.4%61.0% 45.6%2013 Eval Data40.7%29.0%33.9% Our fresh system significantly outperforms our old system (18.6% ) The pool is much better this year (11.7% ) Top 3 among all teams; Top 1 among all DEFT teams

Impact of Knowledge Path Extraction Alternative name feedback based query expansion: 1.1% gain in F-Measure Entity Coreference Resolution to enrich knowledge paths: 1.2% gain in F-Measure Relax path length constraints: 0.5% gain in F-Measure

Impact of Knowledge Path Selection and Truth Finding KBP2013 SF Systems F-Measure

38 Outline Limitations of State-of-the-art Our Vision and Approach Overview Knowledge Graph ConstructionKnowledge Path ExtractionKnowledge Path SelectionKnowledge Graph ClusteringTruth Finding 2-Layer Mutual Enhancement Truth FindingCredibility InitializationCredibility Propagation Experimental ResultsRemaining ChallengesConclusions and Future Work

39 Remaining Challenges Name Tagging Errors Coreference Resolution Errors He worked his way up the organization under founder Ted Arison and his son Micky , who now leads Carnival Corp. and called Dickinson, `` one of the most influential people in the development of the modern-day cruise industry.Indiana Muslim running for Congress wants to combat ignorance about his [Andre Carson] faith INDIANAPOLIS -- A convert to Islam stands an election victory away from becoming the second Muslim elected to Congress and a role model for a faith community seeking to make its mark in national politics. Vague JustificationIt was in December 1970 that Anderson criticized Hoover 's pretrial attack on two Roman Catholic priests , Daniel J. and Philip F. Berrigan , who were later convicted of destroying draft board records.  religion filler?Fuzzy DefinitionShe and Russell Simmons, 50, have two daughters : 8-year-old Ming Lee and 5-year-old Aoki Lee.

40 Remaining Challenges Distinguish Slot Directions Organization parent/subsidiary; members/member_of Implicit RelationsHe [Pascal Yoadimnadji] has been evacuated to France on Wednesday after falling ill and slipping into a coma in Chad, Ambassador Moukhtar Wawa Dahab told The Associated Press. His wife, who accompanied Yoadimnadji to Paris, will repatriate his body to Chad, the amba.  is he dead? in Paris?Until last week, Palin was relatively unknown outside Alaska, and as facts have dribbled out about her, the McCain campaign has insisted that its examination of her background was thorough and that nothing that has come out about her was a surprise.  does she live in Alaska? The list says that the state is owed $2,665,305 in personal income taxes by singer Dionne Warwick of South Orange, N.J., with the tax lien dating back to 1997.  does she live in NJ?Vernon Bellecourt -- whose Ojibwe name, WaBun-Inini, means "Man of Dawn" or "Daybreak" -- was born on the White Earth Indian Reservation in Minnesota. He left home at 15 after finding work in a carnival.  did he live in Minnesota?

41 Outline Limitations of State-of-the-art Our Vision and Approach Overview Knowledge Graph ConstructionKnowledge Path ExtractionKnowledge Path SelectionKnowledge Graph ClusteringTruth Finding 2-Layer Mutual Enhancement Truth FindingCredibility InitializationCredibility Propagation Experimental ResultsRemaining ChallengesConclusions and Future Work

42 Conclusions and Future Work Mined and incorporated rich knowledge from multiple lexical, syntactic and semantic levels for slot filling Proposed a new knowledge graph representation Developed a new truth-finding framework for answer validation Married low-level IE with high-level Data MiningFuture WorkIncorporate more knowledge resources such as NELL into path selection Hierarchal knowledge graph clustering Collective joint extraction across queries and slot typesTruth FindingSource: Publication Agency, Reporter’s profile, social network and his/her role in the event, Reporting time and location System: add history, profile and confidence values (this year’s data is not very discriminative) Claim: compute similarity based on coreference resolution, entity/event clustering and equivalence, modeling complexity; distance from answers from the top-tier systems Evidence Dimensions: soft constraints in path selection

43 Resources Sharing Plans January 2014 Heng’s paper-pen made constraints & dictionaries BLENDER KB (merged and cleaned from Wikipedia infoboxes, Freebase, YAGO and DBPedia) March 2014Slot Filling system to share with KBP community; integrated into BBN DEFT platform

44