This is a quick start guide to get the SQL Lens up and running in the quickest and simplest possible way so you can start ingesting and transforming data straight away. For a more in-depth set of instructions go to the User Guide.
In this guide we will be setting up and running the Lens as a docker image deployed to your local machine, however we support a number of cloud deployments technologies, including full support of AWS.
1. Creating a Mapping File
The first step in configuring the SQL Lens is to create a mapping file. The mapping file is what creates the links between your source database and your target model (ontology). This can be created using our online Data Lens Mapping Tool utilising an intuitive web-based UI. Log in here to get started, and select the option for SQL Lens.
2. Configuring the Lens
All Lenses supplied by Data Lens are configurable through the use of Environment Variables. The following config options are required for running the Lens:
License -
LICENSE
This is the license key required to operate the lens, request your new unique license key here.
Mapping Directory URL -
MAPPINGS_DIR_URL
This is the directory where your mapping file(s) is located. As with all directories, this can be either local or on a remote S3 bucket. Mapping files for the SQL Lens can be created using our Mapping Config Web App and can be pushed directly to a running Lens.
For this quick start, we will use a local directory.
Output directory URL -
OUTPUT_DIR_URL
This is the directory where all generated RDF files are saved to. This also supports local and remote URLs.
For this quick start, we will use a local directory.
Run Standalone Mode -
LENS_RUN_STANDALONE
Each of the Lenses are designed to be run as part of a larger end-to-end system, with the end result of data being uploaded into Semantic Knowledge Graphs or Property Graphs. As part of this process, Apache Kafka message queues are used for communicating between services.
For this quick start, we are going to enable standalone mode by setting this value to
true
, so that the Lens won't attempt to connect to external services.
Record Provenance -
RECORD_PROVO
orPROV_OUTPUT_DIR_URL
In the Lenses, time-series data is supported as standard, so every time a Lens ingests some data, we add provenance information. This means that you have a full record of data over time. Allowing you to see what the state if the data was at any moment.
For this quick start, we can take two approaches. One would be to turn provenance off by setting
RECORD_PROVO
tofalse
, or the other is to leave it on (true by default) and set thePROV_OUTPUT_DIR_URL
to a directory for your provenance RDF files to be saved to. The latter is the approach we will take in the following examples.
3. Running the Lens
All of our Lenses are designed and built to be versatile, allowing them to be set up and ran on a number of environments, including in cloud or on-premise. This is achieved through the use of Docker Containers. For this quick start guide, we are going to use the simplest method of deployment, and this is to run the Lens' Docker image locally. To do this, please first ensure you have Docker installed. Once installed, simply by running a command with the following structure, Docker will start the container and run the Lens from your downloaded image.
For UNIX based machines (macOS and Linux):
docker run \ -v /var/data:/data \ -e LICENSE \ -e MAPPINGS_DIR_URL=file:///data \ -e OUTPUT_DIR_URL=file:///data/output/ \ -e LENS_RUN_STANDALONE=true \ -e PROV_OUTPUT_DIR_URL=file:///data/prov-output/ \ -p 8080:8080 \ lens-sql:latest
For Windows
docker run ^ -v /data:/data ^ -e LICENSE ^ -e MAPPINGS_DIR_URL="file:///data" ^ -e OUTPUT_DIR_URL="file:///data/output" ^ -e LENS_RUN_STANDALONE=true ^ -e PROV_OUTPUT_DIR_URL="file:///data/prov-output/" ^ -p 8080:8080 ^ lens-sql:latest
The above examples demonstrate how to override configuration options using environment variables in your Lens. Line 3 shows the use of passing in an environment variable saved to the machine, whereas lines 4-7 show simply a string value being passed it. Given the Lens is ran on port 8080, line 8 exposes and binds that port of the host machine so that the APIs can be triggered. The -v
flag seen on line 2 mounts the working directory into the container; when the host directory of a bind-mounted volume doesn’t exist, Docker will automatically create this directory on the host for you. And finally, line 9 is the name and version of the Docker image you wish to run.
For more information of running Docker Images, see the official Docs.
4. Ingesting Data / Triggering the Lens
Using the process
endpoint will trigger the Lens, for example, using a GET request: <lens-ip>:<lens-port>/process
→ http://127.0.0.1:8080/process
Once the ingestion and transformation has successfully been processed, the response returned from the Lens is in the form of a JSON. This JSON response contains a list of all the URLs of generated RDF files. Multiple files are only generated when specifying an SQL Limit and Offset in your mapping query.
Sample output:
{ "outputFileLocation1": "file:///data/output/SQL-Lens-31235bfd-a6fb-43aa-bb51-e3d41c481983.nq", "outputFileLocation2": "file:///data/output/SQL-Lens-5b27a2a4-c971-4fd4-b128-d131bf5c3981.nq", "outputFileLocation3": "file:///data/output/SQL-Lens-5ecd6e81-65ff-4dbc-aed8-d6d320b7d04a.nq" }
Cron job
The process to trigger the ingestion of your DB data can be done in two ways, in addition to the RESTful service, there is also a built-in Quartz Time Scheduler. This uses a user-configurable Cron Expression to set up a time-based job scheduler which will schedule the Lens to ingest your specified data from your database(s) periodically at fixed times, dates, or intervals. More information on this can be found in the full User Guide.
Add Comment