/
Query Language Constructs for Provenance Query Language Constructs for Provenance

Query Language Constructs for Provenance - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
426 views
Uploaded On 2016-03-08

Query Language Constructs for Provenance - PPT Presentation

Murali Mani Mohamad Alawa Arunlal Kalyanasundaram University of Michigan Flint Presented at IDEAS 2011 Provenance Metadata Data about origins of data Applications Check whether data item is valid in health records ID: 247330

graph edges provenance query edges graph query provenance nodes opm process querying artifact path set queries model wasgeneratedby shortest constructs language data

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Query Language Constructs for Provenance" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Query Language Constructs for Provenance

Murali

Mani,

Mohamad

Alawa

,

Arunlal

Kalyanasundaram

University of Michigan, Flint

Presented at IDEAS 2011.Slide2

Provenance Metadata

Data about origins of data

Applications:

Check whether data item is valid – in health records

How much do we trust an inference/observation – scientific computation

Audit trails – manufacturing/shipping/trading

Database community found provenance could be useful in

updating views

maintenance of materialized views

interpretation of query results

querying probabilistic/uncertain data

In short, numerous applications …Slide3

OPM (Open Provenance Model)

http://openprovenance.org/

Developed by several researchers who have been involved with provenance

Describes a logical representation of provenance information for a wide variety of applications.

Provenance information represented as a directed graph consisting of:

Nodes (can be artifact, process, or agent)

Edges or dependencies. There are 5 types of edges

Used: a process used an artifact

wasGeneratedBy

: an artifact generated by a process

wasControlledBy

: a process controlled by an agent

wasTriggeredBy

: a process trigged by another process

wasDerivedFrom

: an artifact derived from another artifact

Nodes and edges have annotations (attribute-value pairs)Slide4

OPM: A Simple Example

P

A1

A2

A3

A4

u

sed(divisor)

u

sed(dividend)

wasGeneratedBy

(remainder)

wasGeneratedBy

(

quotient)

t

ype=division

A1, A2 are artifacts

P = a process that is performing division (A1/A2) – note the used edges between P and A1, A2

A3, A4 are artifacts generated by P (representing quotient, remainder) – note the

wasGeneratedBy

edges between P and A3, A4

Example taken from http://openprovenance.org/tutorial/Slide5

Queries for OPM

We can write complex “multi-step inference” queries using

Datalog

/SQL based on the different edges in OPM

Example: find artifacts directly or indirectly derived from another artifact (recursive query using

wasDerivedFrom

edges)

However, is it sufficient? We may need to express

Sub-graph isomorphism (given a graph query pattern, check whether the pattern appears in a provenance graph)

Studied in graph query languages ([Graph-QL]), [OPQL] …

Shortest path queries (using some notion of distance)Typically not studied in graph query languagesSlide6

Our approach

Two sets of constructs

Constructs for Querying Content

Select nodes, edges based on annotations (attribute values) associated with them

Operators include typical relational algebra operators: select, project, union,

Constructs for Querying Structure

6 basic functions

from (e)/to (e): node from where e starts/e ends

from

-1

(n)/to-1 (n): edges that start at node n/end at node n

next (n): nodes to where is an edge from nprev (n): nodes from where there is an edge to nGeneralized selection operator, specified as

specifies what nodes in G must appear in the result

specifies what edges in G must appear in the result

Result:

,

is a sub-graph of G (i.e.,

,

)

 Slide7

Examples of Generalized Selection Operator

descendant graph given a set of nodes S

= set of nodes, n | there is a path from s

S to n

= set of edges between the nodes selected by

shortest path graph between s and t

= set of edges

on the shortest path between s and t

= set of

nodes adjacent to an edge selected by

Note: The constructs for querying content and for querying structure can be integrated to yield a powerful query model, that can express a wide range of queries.

 Slide8

Conclusions and Future Work

Observation: Provenance query language should not be restricted to

Datalog

/SQL.

Developed a query model that provides constructs for querying structure and for querying content.

Using our query model, we can express a wide range of queries including shortest path (not expressible using SQL/

Datalog

).Slide9

References

[Graph-QL]:

He, H., and Singh, A. K. 2008. Graphs-at-a-time: Query Language and Access Methods for Graph Databases.

ACM SIGMOD

(2008

).

[OPQL]: Lim

, C., Lu, S.,

Chebotko, A., and Fatouhi

, F. 2011. OPQL: A First OPM-Level Query Language for Scientific Workflow Provenance. IEEE SCC (2011

).[OPM]: The OPM Provenance Model (OPM), available at http://openprovenance.org/