This is a quick start guide to get the Stream Lens up and running in the quickest and simplest possible way so you can start ingesting and transforming data straight away. For a more in-depth set of instructions go to the User Guide.

In this guide we will be setting up and running the Lens as a docker image deployed to your local machine, however we support a number of cloud deployments technologies, including full support of AWS. Once deployed, you can utilise any of our ready-made sample input, mapping, and expected output files to test your Lens.

 

1. Creating the Mapping File

The first step in configuring the Stream Lens is to create a mapping file. The mapping file is what creates the links between your source data and your target model (ontology). This can be created using our online Data Lens Mapping Tool utilising an intuitive web-based UI. Log in here to get started, and select the option for Structured File Lens. Similar to the Stream Lens, The Structured File Lens is capable of ingesting XML, CSV, and JSON files. Creation of mapping files differ slightly between file types so ensure to select the correct options for your use case. Alternatively, if you wish you create your RML mapping files manually, there is a detailed step by step guide on creating one from scratch, along with a number of examples.

 

2. Configuring the Lens

All Lenses supplied by Data Lens are configurable through the use of Environment Variables. The following config options are required for running the Lens:

 

3. Running the Lens

All of our Lenses are designed and built to be versatile, allowing them to be set up and ran on a number of environments, including in cloud or on-premise. This is achieved through the use of Docker Containers. For this quick start guide, we are going to use the simplest method of deployment, and this is to run the Lens' Docker image locally. To do this, we will utilise a docker-compose file to build our stack. First, please ensure you have Docker installed.

  1. Once you have pulled the latest version of the Stream Lens, tag your local image as latest using: docker tag datalensltd/lens-stream:{VERSION} datalensltd/lens-stream:latest.

  2. Next, download the docker-compose file and the lens-configuration file. We will go into more detail about how the docker-compose file works in the full user guide.

  3. Configure your Lens by opening and adding your variables to the lens-configuration.env file.

    1. In this example, we have set our mapping and output directories to a local directory simply by assigning the variable a string value. As for the License, in this format, the value will be taken from the machines environment variable named ‘License’.

    2. The /tmp directory on the local machine is a mounted volume to the /mnt/efs directory on the running docker images. Therefore if we wish to use locally stored input, mapping, and output files, using the provided docker-compose file, we should store our files in the /tmp directory on your local machine.

      # Please configure your lens, example: KEY=VALUE. Leaving a field blank will take the value from your environment variables.
      
      LICENSE
      MAPPINGS_DIR_URL=file:///mnt/efs/mapping/
      OUTPUT_DIR_URL=file:///mnt/efs/output/
      PROV_OUTPUT_DIR_URL=file:///mnt/efs/prov-output/
  4. Please ensure the /tmp/data/ directory exists on the host machine using: mkdir -p /tmp/data/.

  5. Now from within the directory you downloaded the compose and config files, run the Lens by using docker-compose up.

For more information of running Docker Images with docker-compose, see the official Docs.

 

4. Ingesting Data / Triggering the Lens

The easiest way to ingest a file into the Stream Lens is to use the built-in endpoint. Using the process endpoint, you can specify the URL of a file to ingest, and in return, you will be provided with the URL of the generated RDF data file.

For example, using a GET request:

<lens-ip>:<lens-port>/process?inputFileURL=<input-file-url>http://127.0.0.1:8080/process?inputFileURL=file:///var/local/input-data.csv

Once an input file has successfully been processed, the response returned from the Lens is in the form of a JSON, and within the JSON response is the outputFileLocations element. This element contains a list of all the URLs of generated RDF files within that processing transformation.

Sample output:

{
    "input": "file:///mnt/efs/input/input-data.csv",
    "failedIterations": 0,
    "successfulIterations": 1,
    "outputFileLocations": [
        "file:///mnt/efs/output/Stream-Lens-44682bd6-3fbc-429b-988d-40dda8892328.nq"
    ]
}