Charge identify and briefly describe four most important computational challenges for data citation give examples use cases We did not spend time on negative identifications Participants ID: 499268
Download Presentation The PPT/PDF document "Group 2" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Group 2
Charge
: identify and briefly describe four most important computational challenges for data citation; give examples / use cases.
(We
did not spend time on negative
identifications
.)
Participants
: Altman, Cohen-
Boulakia
, Davidson,
Duerr
, Fan, Goble,
Groth
, Howe,
Martone
, TannenSlide2
1. Modeling the referent of a data citation
Define a
formal
framework to be used by different fields to give their respective definition of referent
Referents can be very different things: a set of
tuples
, a
bitstream
, a landing page, they can be extensional or
intensional
(see next)
Relevant to all three categories of users: data exporters, data citers, citation consumersSlide3
2. Handling
intensional
referents
Extensional referent: a data set that exists as such somewhere.
Intensional
: defined by computational means, e.g., a query, a workflow
Use case: when existing extensional referents are too large/complicated
Relevant to all three categories of users: data exporters, data citers, citation consumersSlide4
3. Information closure, e.g. attribution stacking
When the referent contains, explicitly or implicitly citations or links to other referents it depends
on
Limit how far (deep?) you go (until you
hit ground?)
Use cases: closure in sources, in attribution, and in time (e.g., the history of a referent)
Relevant to data exporters and data citers (make it transparent to citation consumers)Slide5
4. Automatic detection of referent relationship
Can we automatically detect and verify whether the (external) referents of two different citations are related/overlapping
?
Related to “fixity”?
Use case: citations to fragments of
Facebook
Relevant to citation consumersSlide6
5. Collecting citations during data processing
During the execution of an ensemble of workflows using multiple data sets
Calculating which data sets used are significant enough for citations
Use case: GBIF
Relevant to data citersSlide7
6. A language for spec. levels of granularity
Whenever possible automatically infer levels of granularity
Reconcile conflicts between the data exporters and the data citers
Relevant to data exporters and data citersSlide8
7. Citing semantically unique data sets that have multiple syntactic/physical
representations
Semantic
resolution: a big problem everywhere not just data citation; we hope to find computationally tractable
instances
Related to “fixity”?
Use case:
DBpedia
has multiple serializations
Relevant to data exporters, data citers, and citation consumers