...
This is the full User Guide for the SQL Lens, it contains an in-depth set of instructions to fully set up, configure, and run the Lens so you can start ingesting data as part of an end-to-end system. For a guide to get the Lens up and running in the quickest and simplest possible way, see the Quick Start Guide. Once deployed, you can utilise any of our ready-made sample input, mapping, and expected output files to test your Lens. For a list of what has changed since the last release, visit the User Release Notes.
...
Table of Contents
Table of Contents |
---|
...
The first step in configuring the SQL Lens is to create a mapping file. The mapping file is what creates the links between your source database and your target model (ontology). This can be created using our online Data Lens Mapping Tool utilising an intuitive web-based UI. Log in hereto get started, and select the option for SQL Lens. The SQL Lens is capable of ingesting relational databases. Alternatively, the Mapping Tool can be deployed to your own infrastructure, this enables additional functionality such as the ability to update mapping files on a running Lens. To do this, follow these instructions /wiki/spaces/DLD/pages/336560199.
However, if you wish you create your RML mapping files manually, there is a detailed step by step guide on creating one from scratch.
...
All Lenses supplied by Data Lens are configurable through the use of Environment Variables. How to declare these environment variables will differ slightly depending on how you choose to run the Lens, so please see Running the Lens for more info. For a breakdown of every configuration option in the Structured File Lens, see the full list here.
Mandatory Configuration
License -
LICENSE
This is the license key required to operate the lens, request your new unique license key here.
Mapping Directory URL -
MAPPINGS_DIR_URL
This is the directory where your mapping file(s) is located. As with all directories, this can be either local or on a remote S3 bucket. Mapping files for the SQL Lens can be created using our Mapping Config Web App and can be pushed directly to a running Lens.
Output directory URL -
OUTPUT_DIR_URL
This is the directory where all generated RDF files are saved to. This also supports local and remote URLs.
Provenance Output Directory URL -
PROV_OUTPUT_DIR_URL
Out of the box, the SQL Lens supports Provenance and it is generated by default. Once generated, the Provenance is saved to separate output files to the transformed source data. This option specifies the directory where provenance RDF files are saved to, which also supports local and remote URLs.
If you do not wish to generate Provenance, you can turn it off by setting the
RECORD_PROVO
variable to false. In this case, thePROV_OUTPUT_DIR_URL
option is no longer required. For more information on Provenance configuration, see below.
...
SQL Limit -
SQL_LIMIT
The SQL Limit provides the maximum limit to the number of records that can be processed in any one query. This means that if your database contains more records that this set variable, the Lens will batch process the records from the query and output multiple RDF files.
This value must be an integer greater than zero. It defaults to zero, meaning that iterative queries are switched off.
SQL Offset -
SQL_OFFSET
The SQL Offset provides the ability to offset the start index of the iterative processing. This defaults to zero.
JDBC_CONNECTOR
The JDBC driver is a software component enabling a Java application to interact with a database. This currently defaults to
com.mysql.cj.jdbc.Driver
, which is the driver for a MySQL database, however, you can override this with the JDBC of the database you wish to use.
AWS Configuration
If you wish to use cloud services such as Amazon Web Services you need to specify an AWS Access Key and Secret Key, and AWS Region, through AWS_ACCESS_KEY
, AWS_SECRET_KEY
, and S3_REGION
respectively. By providing your AWS credentials, this will give you permission for accessing, downloading, and uploading remote files to S3 Buckets. The S3 Region option specifies the region of where in AWS your files and services reside. Please note that all services must be in the same region, including if you choose to run the Lens in an EC2 instance.
...
One of the many ways to interface with the Lens is through the use of Apache Kafka. With the SQL Lens, a Kafka Message Queue can be used for managing the output of data from the Lens. To properly set up your Kafka Cluster, see the instructions here. Once complete, use the following Kafka configuration variables to connect the cluster with your Lens. If you do not wish to use Kafka, please set the variable LENS_RUN_STANDALONE
to true.
...
There is also a further selection of optional configurations for given situations, see here for the full list.
Directories in Lenses
...
An Amazon Web Services Elastic Container Service (ECS) cluster, hosting a single EC2 instance. Running the following containers:
Apache Kafka
Data Lens: SQL Lens
Data Lens: Lens WriterData Lens: Lens Mapping Configuration Tool
An S3 bucket
Info |
---|
For more information on the Architecture and Deployment of an Enterprise System, see our guide. |
...
Ingesting Data / Triggering the Lens
While the mapping file previously created handles the querying of your target databases, triggering the Lens to start the ingestion of data can be done in a number of ways.
...
In addition to the RESTful service, there is also a built-in Quartz Time Scheduler. This uses a user-configurable Cron Expression to set up a time-based job scheduler which will schedule the Lens to ingest your specified data from your database(s) periodically at fixed times, dates, or intervals.
...
If you have a Kafka Cluster set up and running, then the successfully generated RDF file URL(s) will be pushed to you Kafka Queue. It will be pushed to the Topic specified in the KAFKA_TOPIC_NAME_SUCCESS
config option, which defaults to “success_queue”. This will happen with both methods of triggering the Lens. One of the many advantages of using this approach is that now this transformed data can be ingested using our Lens Writer which will publish the RDF to a Semantic Knowledge Graph (or selection of Property Graphs) of your choice!
...
For more information on how the provenance is laid out, as well as how to query it from your Triple Store, see the Provenance Guide.
...
REST API Endpoints
In addition to the Process Endpoint designed for triggering the ingestion of data into the Lens, there is a selection of built-in exposed endpoints for you to call.
...