Configurable Options - Lens Writer v1.4

Below is a table containing all of the configurable options within the Lens Writer. To see how to set config variables, see the Quick Start Guide or the full User Guide. Mandatory variables are highlighted in red.

 

 

Lens Writer Configuration

Environment Variable

Default Value

Description

Version

Environment Variable

Default Value

Description

Version

FRIENDLY_NAME

Lens-Writer

The name you wish to set your Writer up with.

v1.3+

LICENSE

 

The License key provided required for running the Writer.

v1.3+

TRIPLESTORE_ENDPOINT

 

The endpoint for your Triple Store you wish to upload your RDF to.

v1.3+

TRIPLESTORE_TYPE

sparql

The Triple Store type, some graphs will support the default sparql type (e.g. AllegroGraph), however certain graphs require specific type declaration, these include graphdb, stardog, blazegraph, neptune, and neo4j.

v1.3+

TRIPLESTORE_REASONING

false

Whether you want reasoning enabled or disabled.

v1.3+

TRIPLESTORE_USERNAME

 

The username of your Triple Store. Leave blank if your Triple Store does not require any authentication.

v1.3+

TRIPLESTORE_PASSWORD

 

The password of your Triple Store. Leave blank if your Triple Store does not require any authentication.

v1.3+

S3_REGION

us-east-1

The region in AWS where your files and services reside. Note: all services must be in the same region.

v1.3+

AWS_ACCESS_KEY

 

Your access key for AWS.

v1.3+

AWS_SECRET_KEY

 

Your secret key for AWS.

v1.3+

DELETE_SOURCE

false

Whether you wish to delete the source NQuads file after it has been written to the Triple Store

v1.3+

LENS_RUN_STANDALONE

false

The Lens Writer is designed to run as part of a larger end to end system with the Lens providing the Writer with RDF files to write to a Triple Store. As part of this process, Kafka is used to communicate between services. This is enabled by default, however if you want to run the Lens Writer as standalone without communicating to other services, set this property to true.

v1.3+

INGESTION_MODE

insert

How to process the ingested data.

  • 'insert': the new data are ingested in full and are no linked with already existing data. The new dataset adds new value to already existing subject-predicate.

  • 'update': the new data are used for updating the existing data. The new dataset replaces value in existing subject-predicate.

v1.4+

 

Kafka Configuration

Environment Variable

Default Value

Description

Version

Environment Variable

Default Value

Description

Version

KAFKA_BROKERS

localhost:9092

The Kafka Broker is what tells the Writer where to look for your Kafka Cluster. Set with the following structure <kafka-ip>:<kafka-port>. The recommended port is 9092.

v1.3+

KAFKA_TOPIC_NAME_SOURCE

source_urls

The topic used for the Consumer to read messages from containing input file URLs in order to ingest data.

v1.3+

KAFKA_TOPIC_NAME_DLQ

dead_letter_queue

The topic used to push messages containing reasons for failure within the Writer. These messages are represented as a JSON.

v1.3+

KAFKA_TOPIC_NAME_SUCCESS

success_queue

The topic used for the messages sent containing the file URLs of the successfully transformed RDF data files.

v1.3+

KAFKA_GROUP_ID_CONFIG

consumerGroup1

The identifier of the group this consumer belongs to.

v1.3+

KAFKA_AUTO_OFFSET_RESET_CONFIG

earliest

What to do when there is no initial offset in Kafka or if an offset is out of range.

earliest: automatically reset the offset to the earliest offset

latest: automatically reset the offset to the latest offset

v1.3+

KAFKA_MAX_POLL_RECORDS

100

The maximum number of records returned in a single call to poll.

v1.3+

KAFKA_TIMEOUT

1000000

Kafka consumer polling time out.

v1.3+

 

Neo4j Configuration

Environment Variable

Default Value

Description

Version

Environment Variable

Default Value

Description

Version

NEO4J_HANDLE_VOCAB_URIS

KEEP

  • 'SHORTEN': Full URIs are shortened using prefixes for property names, relationship names and labels

  • 'IGNORE': URIs are ignored and only local names are kept

  • 'MAP': Vocabulary element mappings are applied on import

  • 'KEEP': URIs are kept unchanged

v1.3+

NEO4J_APPLY_NEO4J_NAMING

false

When set to true and in combination with handleVocabUris: 'IGNORE', Neo4j capitalisation is applied to vocabulary elements (all caps for relationship types, capital first for labels, etc.)

v1.3+

NEO4J_HANDLE_MULTIVAL

ARRAY

  • 'OVERWRITE': Property values are kept single-valued. Multiple values in the imported RDF are overwritten (only the last one is kept).

  • 'ARRAY': Properties are stored in an array enabling storage of multiple values.

v1.3+

NEO4J_KEEP_LANG_TAG

true

When set to true, the language tag is kept along with the property value. Useful for multilingual datasets.

v1.3+

NEO4J_TYPES_TO_LABEL

false

When set to true, rdf:type statements are imported as node labels in Neo4j.

v1.3+

NEO4J_VERIFY_URI_SYNTAX

true

By default, URI syntax is checked. This can be disabled by setting this parameter to false.

v1.3+

NEO4J_KEEP_CUSTOM_DATA_TYPES

true

When set to true, all properties containing a custom data type will be saved as a string followed by their custom data type IRIs.

v1.3+

 

Provenance Configuration

Environment Variable

Default Value

Description

Version

Environment Variable

Default Value

Description

Version

RECORD_PROVO

false

Currently, the Lens Writer does not generate its own provenance meta-data and so this option is set to false

v1.3+

 

Logging Configuration

Environment Variable

Default Value

Description

Version

Environment Variable

Default Value

Description

Version

LOGGING_LEVEL

WARN

Global log level

v1.3+

LOGGING_LOGGERS_DATALENS

DEBUG

Log level for Data Lens loggers

v1.3+

LOGGING_LOGGERS_DROPWIZARD

INFO

Log level for Dropwizard loggers

v1.3+

LOGGING_APPENDERS_CONSOLE_TIMEZONE

UTC

Timezone for console logging

v1.3+

LOGGING_APPENDERS_TXT_FILE_THRESHOLD

ALL

Threashold for text logging

v1.3+

Log Format (not overridable)

%-6level [%d{HH:mm:ss.SSS}] [%t] %logger{5} - %X{code} %msg %n

Pattern for logging messages

v1.3+

Current Log Filename (not overridable)

/var/log/datalens/text/current/application_${applicationName}_${timeStamp}.txt.log

Pattern for log file name

v1.3+

LOGGING_APPENDERS_TXT_FILE_ARCHIVE

true

Archive log text files

v1.3+

Archived Log Filename Pattern (not overridable)

/var/log/datalens/text/archive/application_${applicationName}_${timeStamp}_to_%d{yyyy-MM-dd}.txt.log

Log file rollover frequency depends on pattern in following property. For example %d{yyyy-MM-ww} declares rollover weekly

v1.3+

LOGGING_APPENDERS_TXT_FILE_ARCHIVED_TXT_FILE_COUNT

7

Max number of archived text files

v1.3+

LOGGING_APPENDERS_TXT_FILE_TIMEZONE

UTC

Timezone for text file logging

v1.3+

LOGGING_APPENDERS_JSON_FILE_THRESHOLD

ALL

Threashold for text logging

v1.3+

Log Format (not overridable)

%-6level [%d{HH:mm:ss.SSS}] [%t] %logger{5} - %X{code} %msg %n

Pattern for logging messages

v1.3+

Current Log Filename (not overridable)

/var/log/datalens/json/current/application_${applicationName}_${timeStamp}.json.log

Pattern for log file name

v1.3+

LOGGING_APPENDERS_JSON_FILE_ARCHIVE

true

Archive log text files

v1.3+

Archived Log Filename Pattern (not overridable)

/var/log/datalens/json/archive/application_${applicationName}_${timeStamp}_to_%d{yyyy-MM-dd}.json.log

Log file rollover frequency depends on pattern in following property. For example %d{yyyy-MM-ww} declares rollover weekly

v1.3+

LOGGING_APPENDERS_JSON_FILE_ARCHIVED_FILE_COUNT

7

Max number of archived text files

v1.3+

LOGGING_APPENDERS_JSON_FILE_TIMEZONE

UTC

Timezone for text file logging

v1.3+

LOGGING_APPENDERS_JSON_FILE_LAYOUT_TYPE

json

The layout type for the json logger

v1.3+

Â