/
Entity Queries Seminar by Entity Queries Seminar by

Entity Queries Seminar by - PowerPoint Presentation

walsh
walsh . @walsh
Follow
66 views
Uploaded On 2023-06-22

Entity Queries Seminar by - PPT Presentation

Pankaj Vanwari Under guidance of Dr S Sudarshan Overview of Presentation Introduction to Entity Queries Keyword search on structured data Querying over unstructured data Entity queries using ontology based extraction ID: 1001447

queries entity vanwari pankaj entity queries pankaj vanwari guidance sudarshan search relations query based document model ontology index graph

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Entity Queries Seminar by" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Entity QueriesSeminar by Pankaj Vanwari Under guidance of Dr. S. Sudarshan

2. Overview of PresentationIntroduction to Entity QueriesKeyword search on structured dataQuerying over unstructured dataEntity queries using ontology based extractionEntity-relationship queriesConclusion and future workEntity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

3. IntroductionQuery on database using keyword searchRestricted to retrieving pages/documentsEntity search on World Wide WebAnnotations and semantic links to textWikipedia, Word-Net, etc… as sourcesEntity near queries, indexing and rankingEntity-relationship search to find relationships between the entitiesEntity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

4. Keyword search over graph structured dataSimple searching and browsing of data.User types few keywords and then follows the hyper-links interactively.Database is modeled as graph.Uses proximity based ranking, based on foreign key and other similar links.Useful in searching enterprise database for information without a query language.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

5. BANKS (Browsing ANd Keyword Searching)RDB tuples constitute nodes of the graph. Each foreign key- primary key link is a directed edge (to avoid “hubs”).Link with higher importance is given lower weight.Query result is a rooted directed tree.Backward edge (v, u) with weight based on the number of links to v from the nodes of same type as u.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

6. Formal database model of BANKSs(R(u), R(v)) denote the similarity between two relations R(u) and R(v) of nodes u & v.If edge(u, v) exists but (v, u) does not then weight w(u, v) = s(R(u), R(v))If (u, v) does not exist and (v, u) does then w(u, v) = INv(u) * s(R(v), R(u))If both exists then the weight is minimum of the above equations.Overall relevance score is obtained from the normalized edge and node scores. Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

7. Querying over unstructured data Worlds Wide Web supported keyword searching but not entity search.Entities as first class citizens as opposed to pages.No schema information on web documents to browse as in BANKS.Statistics from large corpus with scoring and ranking from IR can be useful.Challenges: Indexing and Annotations.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

8. CSAWScaling Entity Search to world wide webMajor components: Catalog, Corpus and Query Processor.Data model of CSAWIndexes used in CSAW system: The stem and full atype indexes, Reachability index and Forward index.Scoring in CSAW: Selector energy, Gap and Decay and Aggregation.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

9. Entity Search with Dual-Inversion IndexDual inversion index : Document inverted index and Entity inverted index.Document inverted index: Given entity type E, maps to the documents where entity of type E occurs.Entity Inverted Index: Entity instances as output from keywords as input.Comparison of document and entity inverted indices.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

10. Entity Rank (Searching directly and holistically)Integrates both local and global information in ranking.ow(amazon customer service #phone)Entity search needs to be contextual, holistic, uncertainty, associative, and discriminative.Three layer model: Access (Global), Recognition (Local) and Validation (Hypothesis Testing).Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

11. Entity queries using ontology based extraction Knowledge representation model such as RDFS having general-purpose ontology on top of these representations.Two ways of extracting knowledge structures automatically from text corpora: NLP/machine learning or human annotations.YAGO, YAGO2 and ESTER all based on second approach with difference.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

12. YAGO (Yet Another Great Ontology)YAGO combines Wikipedia categories with the Word-Net ontology.Extracts facts based on fixed relations.Fact is a triple having fact identifier I. y : I (I U C U R)XRX(I U C U R)Compatable to RDF.Relations: Type, SubClassOf, Means, …Other relations: BornInYear, PoliticianOf,…Meta relations: Describes, Context,…Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

13. YAGO2 (extension of YAGO)Focus on temporal and spatial knowledge.Declarative rules stored in text files. Temporal dimensionFacts can only hold time points; time spans are represented by two relations. 4 entity types (people, groups, artifacts and events) 9 relations generalized to 2 relations (StartsExistingOn and EndsExistingOn). Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

14. YAGO2 continued…Spatial DimensionHarvests geo-entities from two sources Wikipedia and GeoNames.class yagoGeoEntity groups all geo-entities related by hasGeoCoordinates to yagoGeoCoordinates.3 entity types (events, groups & artifacts).2 relations generalized to placedIn.Relation occursIn holds fact and geo-entity.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

15. ESTER (Efficient Search on Text, Entities and Relations)Combined full-text and ontology search system. Input is corpus and ontology.Three components: An entity recognizer, a query engine, and a user interface.Entity recognition adds at position 0, the artificial word < c >:< x > for each top-level category c of which x is an instance. For a fact (x; r; y) from YAGO add following artificial words: At pos1, add < r >:< p >, and at pos p, add entity :< y >.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

16. ESTER continued…Query engine produces lists of word-in-document occurrences; each item consisting of a document-id, a word-id, a score, & a position within the document.Two basic operations prefix search & join.Given two occurrence lists, produced by prefix search, join operation computes a single list of all items whose word ids occur in both lists, and sorted by document id.Proactive interface to user.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

17. Entity relationship queries over annotated webExample query: “Find cities and countries in Europe where cities are capitals of respective countries”.ERQ to handle relationships among entities across several pages.High algorithmic complexity.Scoring entities individually and aggregating the scores.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

18. WikiERQ: SSQ (Shallow Semantic Queries)ERQ directly over text Example query: “Find cities and countries in Europe where cities are capitals of respective countries”.Position based BCM for ranking answers. Key components proximity, ordering and mutual exclusion.Single predicate scoringMultiple predicate scoringEntity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

19. WikiBANKSExtended graph model combines graph model of BANKS with document model.Each Wikipedia page/document by a node in the graph.Near query model: find C near (K)Query evaluation algorithm: selection predicates individually as near query and then using entity lists to evaluate the relation predicates (2 approaches).Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

20. WikiCSAWERQ over highly scalable CSAW system.Queries in Master-Slave configurationCategory keyword mapping.Optimizing ERQ over CSAW: Entity-Type and Keyword Pair Postings to improve merge step. Compound Token-AND Iterator.Scoring based on Entity, Relation and node prestige with weights. Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

21. Conclusion and Future WorkChallenges faced by different approaches.Adding artificial words to link other pages by enterprise (manually or defining rules).Integration of data by standards like RDF.Domain-centric concept search to handle scalability. Ontology based mapping of user keywords to domains for higher accuracy. Need for annotation of relations.Complex operations for adhoc queries.Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

22. Questions ?Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan

23. Thank You Entity Queries by Pankaj Vanwari under guidance of Dr. S. Sudarshan