Quick Start Guide - SQL Lens v2.0

This is a quick start guide to get the SQL Lens up and running in the quickest and simplest possible way so you can start ingesting and transforming data straight away. For a more in-depth set of instructions go to the User Guide.

In this guide we will be setting up and running the Lens as a docker image deployed to your local machine or as the AWS Marketplace offering run on ECS, however we support a number of cloud deployments technologies, including full support of AWS. Once deployed, you can utilise any of our ready-made sample mapping files and expected output files to test your Lens.

 

1. Creating a Mapping File

The first step in configuring the SQL Lens is to create a mapping file. The mapping file is what creates the links between your source database and your target model (ontology). To create a mapping file, we have created a detailed step by step guide on creating one, along with a number of examples. Within this mapping file is also where you specify your SQL Database credentials, along with any SQL queries you may want to execute in order to retrieve your data.

 

2. Configuring the Lens

All Lenses supplied by Data Lens are configurable through the use of Environment Variables. There are number of options to tailor the Lens to your needs, however to get started, the following config options are recommended:

  • License - LICENSE

    • This is the license key required to operate the Lens when being run on a local machine outside of AWS, request your new unique license key here.

  • Lens Directory - LENS_DIRECTORY

    • This is the directory where all Lens files are stored (assuming individual file dir config haven’t been edited). On Lens startup, if this has been declared, it will create folders at the specified location for mapping, output, yaml-mapping, provenance output, and config backup.

    • By default, this option is set to a local directory within the docker container (file:///var/local/) so isn't mandatory. As with all directories in the Lens, this can be either local or on a remote S3 bucket - we recommend using S3 when running the Lens on AWS (for example - s3://example-bucket/sqllens/)

  • Record Provenance - RECORD_PROVO

    • In the Lenses, time-series data is supported as standard, so every time a Lens ingests some data, we add provenance information. This means that you have a full record of data over time. Allowing you to see what the state if the data was at any moment. 

    • When setting up and testing a Lens for the first time, it may be practical to turn off provenance until your environment is ready for production. This is done by setting RECORD_PROVO to false, this can then be turned on at a later date by calling the updateConfig endpoint. Provenance files will then be saved to where the PROV_OUTPUT_DIR_URL option is set to or the provenance directory created by the Lens directory config option.

 

3. Running the Lens

All of our Lenses are designed and built to be versatile, allowing them to be set up and run on a number of environments, including in cloud or on-premise. This is achieved through the use of Docker Containers. To run the Lens on AWS, simply use the CloudFormation template we have created to start up an ECS Cluster with all the required permissions and networking, with the Lens running within as a task. Alternatively, to run the Lens' Docker image locally, please first ensure you have Docker installed. Once installed, execute a docker run command with the following structure, and Docker will start the container and run the Lens from your downloaded image. NB The tag for the image you are using may differ to the example here.

For UNIX based machines (macOS and Linux):

docker run \ -e LICENSE=<<<REQUEST LICENSE>>> \ -e LENS_DIRECTORY=file:///data/sqllens/ \ -e RECORD_PROVO=false \ -p 8080:8080 \ -v /User/DataLens/sqlens/:/data/sqllens/ \ lens-sql:Release_2.0.9.234

For Windows:

docker run ^ -e LICENSE=<<<REQUEST LICENSE>>> ^ -e LENS_DIRECTORY=file:///data/sqllens/ ^ -e RECORD_PROVO=false ^ -p 8080:8080 ^ -v //c//User/DataLens/sqlens/:/data/sqllens/ ^ lens-sql:Release_2.0.9.234

The above examples demonstrate how to override configuration options using environment variables in your Lens. Given the Lens is run on port 8080, line 5 exposes and binds that port of the host machine so that the APIs can be triggered. The -v flag seen on line 6 mounts the working directory into the container; when the host directory of a bind-mounted volume doesn’t exist, Docker will automatically create this directory on the host for you. And finally, line 7 is the name and version of the Docker image you wish to run. For more information of running Docker Images, see the official Docs.

 

4. Ingesting Data / Triggering the Lens

Using the process endpoint will trigger the Lens, for example, using a GET request: <lens-ip>:<lens-port>/process → http://127.0.0.1:8080/process

Once the ingestion and transformation has successfully been processed, the response returned from the Lens is in the form of a JSON. This JSON response contains a list of all the URLs of generated RDF files as well as some other useful information. Multiple files are only generated when specifying an SQL Limit and Offset in your mapping query.

Sample output:

{ "successfulIterations": 3, "outputFileLocations": [ "file:///data/sqllens/output/SQL-Lens-31235bfd-a6fb-43aa-bb51-e3d41c481983.nq", "file:///data/sqllens/output/SQL-Lens-5b27a2a4-c971-4fd4-b128-d131bf5c3981.nq", "file:///data/sqllens/output/SQL-Lens-5ecd6e81-65ff-4dbc-aed8-d6d320b7d04a.nq" ], "processingTime": 20 }

Cron job

The process to trigger the ingestion of your DB data can be done in multiple ways, in addition to the RESTful service, there is also a built-in Quartz Time Scheduler. This uses a user-configurable Cron Expression to set up a time-based job scheduler which will schedule the Lens to ingest your specified data from your database(s) periodically at fixed times, dates, or intervals. More information on this can be found in the full User Guide.

Kafka

You can also trigger the Lens by sending a message to a Kafka queue. To utilise this you must ensure LENS_RUN_STANDALONE is set to false and the Kafka configuration is configured correctly for you brokers. Once set up, you can simply push an empty message to the topic specified by the KAFKA_TOPIC_NAME_SOURCE.