Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This is the full User Guide for the SQL Lens, it contains an in-depth set of instructions to fully set up, configure, and run the Lens so you can start ingesting data as part of an end-to-end system. For a guide to get the Lens up and running in the quickest and simplest possible way, see the Quick Start Guide. Once deployed, you can utilise any of our ready-made sample input, mapping, and expected output files to test your Lens. For a list of what has changed since the last release, visit the User Release Notes.

...

Table of Contents

Table of Contents

...

The first step in configuring the SQL Lens is to create a mapping file. The mapping file is what creates the links between your source database and your target model (ontology). This can be created using our online Data Lens Mapping Tool utilising an intuitive web-based UI. Log in hereto get started, and select the option for SQL Lens. The SQL Lens is capable of ingesting relational databases. Alternatively, the Mapping Tool can be deployed to your own infrastructure, this enables additional functionality such as the ability to update mapping files on a running Lens. To do this, follow these instructions /wiki/spaces/DLD/pages/336560199.

However, if you wish you create your RML mapping files manually, there is a detailed step by step guide on creating one from scratch.

...

All Lenses supplied by Data Lens are configurable through the use of Environment Variables. How to declare these environment variables will differ slightly depending on how you choose to run the Lens, so please see Running the Lens for more info. For a breakdown of every configuration option in the Structured File Lens, see the full list here.

Mandatory Configuration

  • License - LICENSE

    • This is the license key required to operate the lens, request your new unique license key here.

  • Mapping Directory URL - MAPPINGS_DIR_URL

    • This is the directory where your mapping file(s) is located. As with all directories, this can be either local or on a remote S3 bucket. Mapping files for the SQL Lens can be created using our Mapping Config Web App and can be pushed directly to a running Lens.

  • Output directory URL - OUTPUT_DIR_URL

    • This is the directory where all generated RDF files are saved to. This also supports local and remote URLs.

  • Provenance Output Directory URL - PROV_OUTPUT_DIR_URL

    • Out of the box, the SQL Lens supports Provenance and it is generated by default. Once generated, the Provenance is saved to separate output files to the transformed source data. This option specifies the directory where provenance RDF files are saved to, which also supports local and remote URLs.

    • If you do not wish to generate Provenance, you can turn it off by setting the RECORD_PROVO variable to false. In this case, the PROV_OUTPUT_DIR_URL option is no longer required. For more information on Provenance configuration, see below.

...

  • SQL Limit - SQL_LIMIT

    • The SQL Limit provides the maximum limit to the number of records that can be processed in any one query. This means that if your database contains more records that this set variable, the Lens will batch process the records from the query and output multiple RDF files.

    • This value must be an integer greater than zero. It defaults to zero, meaning that iterative queries are switched off.

  • SQL Offset - SQL_OFFSET

    • The SQL Offset provides the ability to offset the start index of the iterative processing. This defaults to zero.

    JDBC Connector - JDBC_CONNECTOR
    • The JDBC driver is a software component enabling a Java application to interact with a database. This currently defaults to com.mysql.cj.jdbc.Driver, which is the driver for a MySQL database, however, you can override this with the JDBC of the database you wish to use.

AWS Configuration

If you wish to use cloud services such as Amazon Web Services you need to specify an AWS Access Key and Secret Key, and AWS Region, through AWS_ACCESS_KEY, AWS_SECRET_KEY, and S3_REGION respectively. By providing your AWS credentials, this will give you permission for accessing, downloading, and uploading remote files to S3 Buckets. The S3 Region option specifies the region of where in AWS your files and services reside. Please note that all services must be in the same region, including if you choose to run the Lens in an EC2 instance.

...

One of the many ways to interface with the Lens is through the use of Apache Kafka. With the SQL Lens, a Kafka Message Queue can be used for managing the output of data from the Lens. To properly set up your Kafka Cluster, see the instructions here. Once complete, use the following Kafka configuration variables to connect the cluster with your Lens. If you do not wish to use Kafka, please set the variable LENS_RUN_STANDALONE to true.

...

There is also a further selection of optional configurations for given situations, see here for the full list.

Directories in Lenses

...

Info

For more information on the Architecture and Deployment of an Enterprise System, see our guide.

AWS Marketplace

We now have full support for the Amazon Web Services Marketplace, where you can directly subscribe to a Lens. Then, using our CloudFormation Templates, you can deploy a one-click solution to run your Lens. See here for further details and instructions to get you started.

...

Ingesting Data / Triggering the Lens

While the mapping file previously created handles the querying of your target databases, triggering the Lens to start the ingestion of data can be done in a number of ways.

...

In addition to the RESTful service, there is also a built-in Quartz Time Scheduler. This uses a user-configurable Cron Expression to set up a time-based job scheduler which will schedule the Lens to ingest your specified data from your database(s) periodically at fixed times, dates, or intervals.

...

If you have a Kafka Cluster set up and running, then the successfully generated RDF file URL(s) will be pushed to you Kafka Queue. It will be pushed to the Topic specified in the KAFKA_TOPIC_NAME_SUCCESS config option, which defaults to “success_queue”. This will happen with both methods of triggering the Lens. One of the many advantages of using this approach is that now this transformed data can be ingested using our Lens Writer which will publish the RDF to a Semantic Knowledge Graph (or selection of Property Graphs) of your choice!

...

For more information on how the provenance is laid out, as well as how to query it from your Triple Store, see the Provenance Guide.

...

REST API Endpoints

In addition to the Process Endpoint designed for triggering the ingestion of data into the Lens, there is a selection of built-in exposed endpoints for you to call.

...

If when designing your mapping file for your Lens, you require a function to be executed that cannot perform your required operation simply by using the built-in functions, it is possible to create and use your own. This can be done by triggering this endpoint with a Turtle Mapping and a compile Jar containing your new functions. For further instructions on how to correctly carry out this process, please see our guide.