Vinay K Chaudhri 1 Outline Introduction KBBio101 Biology textbook knowledge base Representing structure and function Representation and Reasoning needs Upper ontology Representing structure ID: 787545
Download The PPT/PDF document "Deep Representation of Biological Knowle..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Deep Representation of Biological Knowledge for Question Answering
Vinay K. Chaudhri
1
Slide2Outline
IntroductionKB_Bio_101- Biology textbook knowledge baseRepresenting structure and function Representation and Reasoning needsUpper ontologyRepresenting structureRepresenting functionRepresenting structure function relationshipAnswering questionsSome open research problems
Summary
2
Slide3Deep Representation
3
Bio-medical
Ontologies
KB Bio 101
Representation Language:
Web Ontology Language (OWL)
Representation Language:
Existential Rules
OOKB
(e.g., Gene Ontology)
Classes
Relations
DisjointnessMultiple InheritanceDomain/range constraints
Every instance of a class Is related to a set of individuals Which are themselves related in arbitrary ways
Frame-based
Systems
Representation Language: Open Knowledge Base ConnectivityOKBC
Sufficient properties
Reasoning operations:
get-class-subclassesget-slot-valuesget-facet-valuesget-instance-types
Reasoning operations:
ClassificationConsistency checking
Reasoning operations:
Query answering
Non-standard unification
Slide4Deep Representation
4
Bio-medical
Ontologies
KB Bio 101
Ontology Language:
Very few relations
Ontology Language:
Small number of
domain-general relations
(e.g., Gene Ontology)
Frame-based
Systems
Ontology Language:
Small number of
Domain-specific relations
is_ahas_partlocated_inhas_participant
has_partpossesseshas_regionmaterial
element
agentobjectinstrumentbaseraw-materialresult
Process_directionCell_line_locationAssociated_downstream_activities
T-edgeA-edgeP-edge
Slide5Question Answering
5
Watson
KB Bio 101
Primarily w-h question
What, when, where,
etc
Short answers
94.5 % of correct answers were the titles of some Wikipedia page
No Dialog
There was a single shot interaction with the system
Educationally useful questions
Describe, compare, relate
Long answers
Answers are synthesized and are like an essay
Drill down
Unlimited drill-down on the returned answer is available
Slide6KB_Bio_101
Campbell Biology is a textbook used in an advanced placement biology course in schools and introductory biology courses in collegesA team of biologists used AURA to curate the KB from the textbook, using a sophisticated knowledge authoring process The KB is a valuable asset: it was created by an estimated 12 person years of encoding effort by biologists, and an estimated 5 person years of work on the upper ontology (CLib)
Vulcan has released this asset free of charge for research purposes
http://www.ai.sri.com/halo/halobook2010/exported-kb/biokb.html
KB Bio 101 is used in an electronic textbook application
6
Slide77
Slide8KB_Bio_101 Statistics
# Classes
# Relations
# Constants
Avg. #
Skolems
/ Class
Avg
. #
Atoms /
Necessary Condition
Avg. # Atoms / Sufficient
Condition
6430
455
634
24
64
4
# Constant
Typings
#
Taxonomi
cal Axioms
#
Disjointness
Axioms
# Equality Assertions
# Qualified
Number Restrictions
714
6993
18616
108755
936
Regarding Class Axioms:
Regarding Relation Axioms:
# DRAs
# RRAs
# RHAs
# QRHAs
# IRAs
# 12NAs
/
# N21As
# TRANS +
# GTRANS
449
447
13
39
212
10
/
132
431
# Cyclical
Classes
# Cycles
Avg. Cycle Length
#
Skolem
Functions
1008
8604
41
73815
Regarding Other Aspects:
8
Slide9Core Themes in Biology
Challenge
Structure
and Function
Relating
structure and function
Regulation
Qualitative reasoning about
dynamic processes
Energy Transfer
Representing energy production, consumption
Continuity and Change
Representing genetic change across generations
Evolution
Models of population dynamics
Science as a Process
Experimentation and hypothesis testing
Interdependence in Nature
Represent large inter-related complex systems
Science, Technology,
Society
Represent technological and social forces
9
Slide10Core Theme: Structure & Function
Structure and function
are correlated at all levels of biological organization:
The form fits the function
10
Figures from Biology (9
th
Edition) by Neil A. Campbell and Jane B. Reece.
Copyright © 2011 by Pearson Education, Inc. Used by permission of Pearson Education, Inc.
Slide11Computational Meaning of a Core Theme
Identify the requirements in terms of a set of questionsDiagnostic questionsHelp assess the basics of KR&REducationally useful questionsThe question must be of interest to teachers and studentsThe question must be ``Google hard”The question should not require solving an open-ended research problem
11
Slide12Diagnostic S&F Questions
What is the structure of X?
What is the function of X?
12
Slide13Educationally Useful Questions
Relate Structures to FunctionsWhat structure of Biomembrane facilitates a function of biomembrane
, namely
phagocytosis
?
Qualitative Comparisons
If the Loop of
Henle
gets longer, how will its function be impacted?
Detailed Comparisons
What is the functional similarity between
prions
and
viroids
?Similarity ReasoningGlucose is to Glycogen as ATP is to what?Negatively Modified Structures Impacting Functions
If hydrogen is removed from a saturated fatty acid, then how is its function impacted?13
Slide14Starting Point - Component Library
A simple upper ontology designed to be accessible to domain experts (Barker et. al, KCAP 2001)
Other key distinctions:
Roles
Properties
14
See
http://www.ai.sri.com/halo/public/clib/20130328/clib-tree.html
for more information
Slide15Component Library
A vocabulary of relations to describe events
Event to Entity
Event to Event
Event to Value
agent
first-
subevent
direction
object
next-event
distance
instrument
causes
duration
raw-materialenables
frequencyresult
preventsintensity
siteinhibitsrate
origin
by-means-of
15
Slide16Representing Structure
Structure of an entity represents its parts, their spatial arrangements and sizes
Meronymic
Spatial
Properties
has-part
is-at
length
has-region
is-inside
diameter
material
is-outside
height
possesses
abutsarea
elementis-between
depthis-along
volume
16
Slide17Choosing Structural Slots
Inspired by work of Maria Keet, but simplified for use by biologists:It must make sense to say ``X has Y” in EnglishX has-region Y ifY is a region of space defined in relation to XIt does not make sense to associate Y with properties such as mass or density, but can be associated with measures such as length, area, or volumeX has material Y only if Y is tangible and pervasive in X
X has element Y if
X is a set of entities of the same type (or sibling types) that Y is an instance of
X possesses Y only if
Y is Energy, bond or gradient
Otherwise X has part Y
17
Slide18Example Structure Representation
18
Slide19A difficult example: Carbon Skeleton
What should be the relationship between an organic molecule and a skeleton?
It is more than simply a set of entities
Can have length and shape
Is not an entity in its own right
- Biologists do not associate mass with it
The remaining choice is has-region
-behaves differently than a human skeleton
19
Slide20Representing Functions
Is function a primitive or a computed notion?Could function be inferred from participant relations, thus, reducing the encoding time?
20
Slide21Representing Functions
Is function a primitive or a computed notion?It is a primitive notion and should be encoded by a biologist
has-function
21
Slide22Representing Functions
What is a function?We understand functions as “special” events in which an entity participatesAlternatively, a function is an event which is a reason for an entity’s existenceThe “special” nature of functions will be indicated by using a new slot called has-functionTypes of functionsInherent functions of an entity
These will appear on the entity’s concept graph
Contextual functions of an entity
These will appear on
another
entity or event’s concept graph
22
Slide23Example of an Inherent Function
An inherent function of a Golgi Apparatus is to store chemicalsThis is true regardless of which specific type of cell it is a part ofInherent functions are placed on the Entity graph, using the has-function slot
23
Slide24Example of Function in an Environment
Not every smooth ER detoxifies drugsHowever, drug detoxification is the function of a smooth ER in a liver cell
24
Slide25Answering Questions
Create an ABOXInstantiate every concept in the knowledge base and compute the individuals it is related to up to depth threeConjunctive query answeringReduce questions to conjunctive queries on an ABOXPath findingFind all possible paths between two individualsComparisonsComputer intersection and difference between two sets of triples
25
Slide2626
Slide2727
Slide2828
Slide2929
Generating good sentences is a research problem in natural language generation
See
http://kbgen.org
30
Slide31Relate Structure to Function
What structures of a plasma membrane facilitate a function of the plasma membrane, namely active movement of ions?
31
Slide32Path-Based Similarity Reasoning
Model Relation
Path Similar Relation
32
Slide33Prior Work
Structure, Behavior & Function (Chandrasekran, 2000)Basic Foundational Ontology (Arp & Smith, 2008)General Formal Ontology (Herre, et. al., 2006) DOLCE (Borgo et. al. 2010)
33
Slide34Open Research Problems
Representing core themesGenerality outside the current textbook34
Slide35Representing Structure and Function
What are some longer-term research problems?Defining spatial slots for the whole bookSpecifically, boundaries, regions and cavitiesPreliminary work done by Bennett et. al. published at the 2013 Conference on Spatial Information TheoryAvailable at: http://www.ai.sri.com/pub_list/1959 Specifying the structure at multiple levels of detail and from multiple perspectives
35
Slide36Representing rest of the textbook
36
Frequently, the challenge lies in taking a piece of biological information and reducing it to a known modeling approach
Deep KR Challenge Workshop
https://sites.google.com/site/dkrckcap2011/
Number of Sentences from chapters 2-12
Slide37Generalization to multiple textbooks
Textbook
Middle school biology
Comparable to Campbell biology
Cell biology
Neuroscience
Introductory college physics
Introductory college algebra
Introductory college US
history
Introductory college psychology
Slide38Generalization to multiple textbooks
Textbook
General Aspects:
Conceptual and qualitative knowledge cuts across domains
Some domains are more mathematical than others and require mathematical/symbolic problem solving
Challenges in representing Campbell also exist in other disciplines: models, hypotheses, experiments
Unique aspects:
Each domain requires domain-specific vocabulary
design
Each domain has some new question formulation challenges
Each domain has some new unique representations needs
Slide39Summary
Deep representation of biological knowledge can enable advanced form of question answering such as comparing and relatingWe have made substantial advance in achieving this goal by using a language based on existential rules and an ontology richer than used in current bio-ontologiesSuch capability has been found useful for education, and we expect similar benefits in bio-informatics39
Slide4040
Thank You!
Slide4141
Backup Slides
Slide42KR in KB_Bio_101
All the favorite featuresClassesNecessary and sufficient conditionsDisjoint-nessMultiple InheritanceRelations and Property Valuesdomain, rangeinverse relationstransitivity
Relation hierarchy
Relation composition
qualified number restrictions
Nominals
Cell ⊑ Living-Entity ⊓
(
hasPart.Ribosome
) ⊓
(
hasPart.Chromosome
)
Every Cell is a Living Entity and
has a Ribosome and a Chromosome part
42
Slide43KR in OOKB
Unique Features in relation to DLs (and FDNC, Datalog±, ASPfs)Graph structured descriptions
Every Eukaryotic Cell is a Cell and
has parts Eukaryotic Chromosome, Nucleus and a Ribosome such that
Eukaryotic Chromosome is inside the Nucleus
The knowledge shown in red is not expressible in known decidable description logics such as OWL 2
This can be captured in Rule Languages
43
Slide44KR in OOKB
Unique Features in relation to DLs Inherit and Specialize
In the Eukaryotic Cell
Ribosome was inherited from Cell
Chromosome was inherited from Cell and specialized to Eukaryotic Chromosome
44
Slide45KR in OOKB
Unique Features in relation to DLsInherit and Specialize
In the Eukaryotic Cell
Ribosome was inherited from Cell
Chromosome was inherited from Cell and specialized to Eukaryotic Chromosome
45
Slide46KR in OOKB
Unique Features in relation to DLsInherit and Specialize
In the Eukaryotic Cell
Ribosome was inherited from Cell
Chromosome was inherited from Cell and specialized to Eukaryotic Chromosome
=
=
46
Slide47KR in KB_Bio_101
Computational propertiesReasoning with KR in KB_Bio_101 is, in general, un-decidableThere are, however, some decidable fragments that introduce guardedness and acyclic structure in the KBObject Oriented Knowledge Bases in Logic Programming (Chaudhri, et. al., Technical Communication of the International Conference on Logic Programming, 2013)Available at:
http://www.ai.sri.com/pub_list/1958
A more thorough formal investigation is an open problem
Challenge for TPTP
reasoners
:
http://www.ai.sri.com/pub_list/1937
Challenge for OWL
reasoners: http://www.ai.sri.com/pub_list/1961 Challenge for ASP
reasoners 47
Slide48Structure Function Relationship
We know how an entity participates in a function
48
Slide49Structure Function Relationship
We do not know how an entity participates in a function
has-part
For example, Chlorophyll-A contains
Poryphrin
. The textbook says that
Poryphrin
facilitates Chlorophyll-A’s function of absorbing violet-blue light, but does not say how.
49
Slide50Structure Function Relationship
We do not know how an entity participates in a function
50
Slide51System architecture
Encoding
Problem solving
Significant effort devoted to the usability of answers by students
51
Slide52Knowledge Authoring in AURA
Knowledge engineers provide a small library of domain independent representationsThe Component Library (CLIB) contains classes representing physical actions, e.g., Move, Attach, Penetrate, and semantic relations, e.g., agent, object, has-part (Barker, Clark, Porter, KCAP’01)
See
http://www.ai.sri.com/pub_list/864
Biologists apply those representations to encode biology knowledge
AURA provides graphical editing
See
http://www.ai.sri.com/pub_list/1545
and
http://www.ai.sri.com/pub_list/865
52