Murali Mani Mohamad Alawa Arunlal Kalyanasundaram University of Michigan Flint Presented at IDEAS 2011 Provenance Metadata Data about origins of data Applications Check whether data item is valid in health records ID: 247330
Download Presentation The PPT/PDF document "Query Language Constructs for Provenance" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Query Language Constructs for Provenance
Murali
Mani,
Mohamad
Alawa
,
Arunlal
Kalyanasundaram
University of Michigan, Flint
Presented at IDEAS 2011.Slide2
Provenance Metadata
Data about origins of data
Applications:
Check whether data item is valid – in health records
How much do we trust an inference/observation – scientific computation
Audit trails – manufacturing/shipping/trading
Database community found provenance could be useful in
updating views
maintenance of materialized views
interpretation of query results
querying probabilistic/uncertain data
In short, numerous applications …Slide3
OPM (Open Provenance Model)
http://openprovenance.org/
Developed by several researchers who have been involved with provenance
Describes a logical representation of provenance information for a wide variety of applications.
Provenance information represented as a directed graph consisting of:
Nodes (can be artifact, process, or agent)
Edges or dependencies. There are 5 types of edges
Used: a process used an artifact
wasGeneratedBy
: an artifact generated by a process
wasControlledBy
: a process controlled by an agent
wasTriggeredBy
: a process trigged by another process
wasDerivedFrom
: an artifact derived from another artifact
Nodes and edges have annotations (attribute-value pairs)Slide4
OPM: A Simple Example
P
A1
A2
A3
A4
u
sed(divisor)
u
sed(dividend)
wasGeneratedBy
(remainder)
wasGeneratedBy
(
quotient)
t
ype=division
A1, A2 are artifacts
P = a process that is performing division (A1/A2) – note the used edges between P and A1, A2
A3, A4 are artifacts generated by P (representing quotient, remainder) – note the
wasGeneratedBy
edges between P and A3, A4
Example taken from http://openprovenance.org/tutorial/Slide5
Queries for OPM
We can write complex “multi-step inference” queries using
Datalog
/SQL based on the different edges in OPM
Example: find artifacts directly or indirectly derived from another artifact (recursive query using
wasDerivedFrom
edges)
However, is it sufficient? We may need to express
Sub-graph isomorphism (given a graph query pattern, check whether the pattern appears in a provenance graph)
Studied in graph query languages ([Graph-QL]), [OPQL] …
Shortest path queries (using some notion of distance)Typically not studied in graph query languagesSlide6
Our approach
Two sets of constructs
Constructs for Querying Content
Select nodes, edges based on annotations (attribute values) associated with them
Operators include typical relational algebra operators: select, project, union,
Constructs for Querying Structure
6 basic functions
from (e)/to (e): node from where e starts/e ends
from
-1
(n)/to-1 (n): edges that start at node n/end at node n
next (n): nodes to where is an edge from nprev (n): nodes from where there is an edge to nGeneralized selection operator, specified as
specifies what nodes in G must appear in the result
specifies what edges in G must appear in the result
Result:
,
is a sub-graph of G (i.e.,
,
)
Slide7
Examples of Generalized Selection Operator
descendant graph given a set of nodes S
= set of nodes, n | there is a path from s
S to n
= set of edges between the nodes selected by
shortest path graph between s and t
= set of edges
on the shortest path between s and t
= set of
nodes adjacent to an edge selected by
Note: The constructs for querying content and for querying structure can be integrated to yield a powerful query model, that can express a wide range of queries.
Slide8
Conclusions and Future Work
Observation: Provenance query language should not be restricted to
Datalog
/SQL.
Developed a query model that provides constructs for querying structure and for querying content.
Using our query model, we can express a wide range of queries including shortest path (not expressible using SQL/
Datalog
).Slide9
References
[Graph-QL]:
He, H., and Singh, A. K. 2008. Graphs-at-a-time: Query Language and Access Methods for Graph Databases.
ACM SIGMOD
(2008
).
[OPQL]: Lim
, C., Lu, S.,
Chebotko, A., and Fatouhi
, F. 2011. OPQL: A First OPM-Level Query Language for Scientific Workflow Provenance. IEEE SCC (2011
).[OPM]: The OPM Provenance Model (OPM), available at http://openprovenance.org/