Data Lens Provenance

Data Lens Provenance Ontology

The Data Lens Provenance Ontology expresses details about a Lens' process. The ontology extends PROV Ontology (see PROV-O) and the RDF Graph Literals and Named Graphs (RDFG) using the OWL2 Web Ontology Language (OWL2). The following table extends all the prefixes used in this document.

Prefix

Namespace

Prefix

Namespace

dlo

http://www.data-lens.co.uk/ontology#

prov

http://www.w3.org/ns/prov#

rdfg

http://www.w3.org/2009/rdfg#

rdf

http://www.w3.org/1999/02/22-rdf-syntax-ns#

This diagram outlines the main classes of the Data Lens Provenance Ontology augmented with the PROV and RDFG ontologies.

 

The Provenance Terms

Graphs

  • rdfg:Graph - An RDF graph which is assigned a name in the form of a URI.

  • dlo:ProvenanceGraph - A named graph for the provenance metadata. To link output data with the relevant provenance sub-graph all the data triples are collected in a named graph which is described in the provenance as type prov:Entity and rdfg:Graph.

Classes

Agent

  • prov:Agent - An agent is something that bears some form of responsibility for an activity taking place, for the existence of an entity, or for another agent's activity.

  • prov:SoftwareAgent - A software agent is running software.

  • SoftwareAgent - Data Lens software processing activity for the conversion of some input data into an RDF graph.

Activity

  • prov:Activity - An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.

  • KafkaActivity - An activity starts while receiving a single message from the listening Kafka topic. The activity stops when the outcome Kafka topic receive either valid results file or error message.

  • LensExecution - An activity of the converting of the input data to the RDF graph. It started at receiving a full path to the mapping files directory and the input file if it is needed. SQL Lens receives the input dataset details in the mapping file. The Lens Execution consists of several sub-activities. In different Lenses the sub-activities are different but usually, they are a raw input data conversion or a single iteration activity.

Entity

  • prov:Entity - An entity is a physical, digital, conceptual, or other kinds of thing with some fixed aspects; entities may be real or imaginary.

  • FileEntity - A raw input data. It is used if the input data are represented as a file (Structured File Lens and Document Lens).

Properties

  • prov:used - Usage is the beginning of utilising an entity by an activity. Before usage, the activity had not begun to utilise this entity and could not have been affected by the entity. In our model, this property declares a relation between a Lens execution and an input file.

  • prov:wasGeneratedBy - Generation is the completion of production of a new entity by an activity. This entity did not exist before generation and becomes available for usage after this generation.

  • prov:wasDerivedFrom - A derivation is a transformation of an entity into another, an update of an entity resulting in a new one, or the construction of a new entity based on a pre-existing entity.

  • prov:wasAttributedTo - Attribution is the ascribing of an entity to an agent.

  • prov:wasAssociatedWith - An activity association is an assignment of responsibility to an agent for an activity, indicating that the agent had a role in the activity.

  • prov:startedAtTime - Start is when an activity is deemed to have been started by an entity, known as a trigger. The activity did not exist before its start.

  • prov:endedAtTime - End is when an activity is deemed to have been ended by an entity, known as a trigger. The activity no longer exists after its end.

  • dlo:kafkaTopicName - Name of the topic the message was read from.

  • dlo:kafkaMessageKey - Value of the key of a Kafka message.

  • dlo:kafkaMessageValue - Value of a Kafka message.

  • dlo:applicationName - Name of the agent. The value is built-in in the code.

  • dlo:friendlyName - User-friendly name of the agent. The value is configurable. It is usually related to the user’s specific flavour.

  • dlo:applicationVersion - The version of the agent.

  • dlo:subActivityOf - Sub-activity is an activity being an integrated part of another activity. Usually, the main activity triggers the sub-activity.

  • dlo:isSubGraphOf - Sub-graph is a graph created by the sub-activity.

 

Extended terms

Classes and subclasses

  • dlo:DocumentLensExecution

    • dlo:DocumetLensInputFileDownload - Download of the input file in the Document Lens.

    • dlo:DocumentLensModelUnification - Unification of RDF data and metadata models in the Document Lens.

    • dlo:DocumentLensConceptAnnotation - Annotation of concepts from text in the Document Lens.

    • dlo:DocumentLensOutputFileUpload - Upload of the output file in the Document Lens.

    • dlo:DocumentLensSemantification - Storage of information in RDF format in the Document Lens.

    • dlo:DocumentLensTextExctraction - Extraction of text from files in the Document Lens.

  • dlo:StructuredFileLensExecution

    • dlo:StructuredFileLensIteration - Single iteration of the data from the source file. If the input file is a CSV file, and its size is larger than the specified maximum threshold, then the data is processed in several iterations.

  • dlo:SQLLensExecution

    • dlo:QueryProcessing - Customised SQL query execution. If the limit and offset are specified in the query, and the size of the query result is greater than this limit threshold, then the data is processed in several iterations (instances of the QueryProcessing).

    • dlo:TableProcessing - SQL query against a single table. This corresponds to using rr:tableName in the SQL mapping file, or the execution of "SELECT * FROM ...".

    • dlo:DatabaseRequest - Request of the most atomic SQL query to a database.

Properties

  • dlo:iteration - Number of iterations expressed as an integer.

  • dlo:sqlQuery - The SQL query in the current iteration.

  • dlo:tableName - The SQL database table name in dlo:TableProcessing.