...
This is the full User Guide for the Lens Writer, it contains an in-depth set of instructions to fully set up, configure, and run the Writer so you can start writing data as part of an end-to-end system. For a guide to get the Writer up and running in the quickest and simplest possible way, see the Quick Start Guide. Once deployed, you can utilise any of our ready-made sample output NQaud files to test your Writer. For a list of what has changed since the last release, visit the User Release Notes.
...
Table of Contents
Table of Contents |
---|
...
As with the Lenses supplied by Data Lens, the Lens Writer is also configurable through the use of Environment Variables. How to declare these environment variables will differ slightly depending on how you choose to run the Writer, so please see Running the Lens Writer for more info. For a breakdown of every configuration option in the Lens Writer, see the full list here.
Mandatory Configuration
For the Lens to operate the following configuration options are required.
License -
LICENSE
This is the license key required to operate the lens, request your new unique license key here.
Triple Store Endpoint -
TRIPLESTORE_ENDPOINT
This is the endpoint for your Triple Store you wish to upload your RDF to and therefore required for the Lens Writer to work.
Triple Store Type -
TRIPLESTORE_TYPE
This is the type of your Triple Store, some graphs will support the default
sparql
type (e.g. AllegroGraph), however certain graphs require specific type declaration, these includegraphdb
,stardog
,blazegraph
,neptune
, andneo4j
. Please see the Types of Graph section for more info.
Triple Store Username and Password -
TRIPLESTORE_USERNAME
andTRIPLESTORE_PASSWORD
This is the username and password of your Triple Store. You can leave these fields blank if your Triple Store does not require any authentication.
...
One of the many ways to interface with the Writer is through the use of Apache Kafka. With the Lens Writer, a Kafka Message Queue can be used for managing the input of RDF data into the Writer. To properly set up your Kafka Cluster, see the instructions here. Once complete, use the following Kafka configuration variables to connect the cluster with your Writer. If you do not wish to use Kafka, please set the variable LENS_RUN_STANDALONE
to true.
...
All other Kafka configuration variables can be found here, all of which have default values that can be overridden.
...
Logging in the Lens Writer works the same wayas the Lens, and like with most functionality is configurable through the use of environment variables; this list override-able options and their descriptions can be found here. When running the Lens Writer locally from the command line using the instructions below, the Writer will automatically log to your terminal instance. In addition to this, the archives of logs will be saved within the docker container at /var/log/datalens/archive/current/
and /var/log/datalens/json/archive/
for text and JSON logs respectively, where the current logs can be found at /var/log/datalens/text/current/
and /var/log/datalens/json/current/
. By default, a maximum of 7 log files will be archived for each file type, however this can be overridden. If running a Writer on cloud in an AWS environment, then connect to your instance via SSH or PuTTY, and the previously outlined logging locations apply.
...
Neo4j is a Property Graph database management system with native graph storage and processing. As Neo4j is not a Semantic Knowledge Graph, storing RDF data in Neo4j in a lossless manner may require additional configuration options to be set. The defaults have been set, as seen here, for the most likely scenario, and also to allow imported RDF to be subsequently exported without losing a single triple in the process. As with all config, this can be overridden to suit your needs. For more information on how your data is represented in Neo4j see below.
Optional Configuration
There is also a further selection of optional configurations for given situations, see here for the full list.
Accessing the configuration of a running Writer
Once a Writer has started and is operational, you can request to view the current config by calling one of the Writer’s built-in APIs, this is explained in more detail below. Please note, that in order to change any config variable on a running Writer, it must be shut down and restarted.
...
The deployment approach we recommend at Data Lens is to use Amazon Web Services, this is both to store your source and RDF data, and to host and run your Lenses and Writer. We have written a brief DevOps guide intended to support you in deploying Data Lens into the AWS
The aim is to deploy the Lens and other services using AWS by setting up the following architecture:
An Amazon Web Services Elastic Container Service (ECS)
...
The workflow the guide aims to achieve is as follows:
A source data file is placed into the S3 bucket
The Lambda is monitoring this bucket and notifies Kafka
The Lens reads the message from Kafka and transforms the source data file into RDF
The transformed data is passed to the Writer, which writes it to a Semantic Knowledge Graph or Property Graph
This is achieved by setting up the following architecture:
An Amazon Web Services Elastic Container Service cluster, hosting a single EC2 instance. cluster, hosting a single EC2 instance. Running the following containers:
Apache Kafka
Data Lens: Lens Writer
An S3 bucket
Info |
---|
For more information on the Architecture and Deployment of an Enterprise System, see our guide. |
...
Ingesting RDF Data
The Lens Writer is designed to ingest RDF data, most commonly in the form of NQuads (.nq) files, and this can be done in a number of ways.
...
To use a local URL for directories and files, both the format of
file:///var/local/data-lens-output/
and/var/local/data-lens-output/
are supported.To use a remote http(s) URL for files,
https://example.com/input-rdf-file.nq
is supported.To use a remote AWS S3 URL for directories and files,
s3://example/folder/
is supported where the format iss3://<bucket-name>/<directory>/<file-name>
. If you are using an S3 bucket for any directory, you must specify an AWS access key and secret key.
Also included in the Writer, is the ability to delete your source NQuad input files after they have been ingested into your Triple Store. This is done by setting the DELETE_SOURCE
config value to true
. Enabling this means that your S3 Bucket or local file store, will not continuously fill up with RDF NQuad data generated from your Lenses.
Endpoint
First, the easiest way to ingest an RDF file into the Lens Writer is to use the built-in APIs. Using the process
GET endpoint, you can specify the URL of an RDF file to ingest, and in return, you will be provided with the success status of the operation.
...
The second, and the more versatile and scalable ingestion method, is to use a message queue such as Apache Kafka. To set up a Kafka Cluster, follow the instructions here, but in short, to ingest RDF files into the Lens Writer you require a Producer. The topic name for which this Producer subscribes to must be the same name that you specified in the KAFKA_TOPIC_NAME_SUCCESS
config option (defaults to “success_queue”). Please ensure that this is the same as the success queue topic name in the Lenses you wish to ingest transformed data from. Once set up, if manually pushing data to Kafka, each message sent from the Producer must consist solely of URL of the file, for example, > s3://examplebucket/folder/input-rdf-data.nq
.
...
For more information on how the provenance is laid out, as well as how to query in from you Triple Store, see the Provenance Guide.
...
REST API Endpoints
In addition to the Process Endpoint designed for ingesting data into the Writer, there is a selection of built-in exposed endpoints for you to call.
...
As previously outlined in the Ingesting Data via Endpoint section, using the process endpoint is one way of triggering the Lens to ingest your source data. When an execution of the Writer fails after being triggered in this way, the response will be a status 400 Bad Request
as follows.
...