Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Below is a table containing all of the configurable options within the Structured File Lens. To see how to set config variables, see the Quick Start Guide or the full User Guide. With version 2.0 and beyond, fewer configuration is required to have your Lens operational, just setting the Lens Directory is enough. In addition, no configuration is required at startup as config can be updated on a running Lens.

...

Environment Variable

Entry

Default Value

Description

FRIENDLY_NAME

friendlyName

Structured-File-Lens

The name you wish to call your Lens.

LICENSE

license

 

The License key provided required for running the Lens. Only required when running a non AWS Marketplace version of the Lens.

LENS_DIRECTORY

lensDirectory

file:///var/local/

This is the directory where all Lens files are stored (assuming individual file dir config haven’t been edited). On Lens startup, if this has been declared, it will create folders at the specified location for mapping, output, yaml-mapping, prov output, and config backup.

MAPPINGS_DIR_URL

mappingsDirUrl

 file:///var/local/mapping/

The URL of the directory containing the mapping file(s). Can be local or remote, see here for more details. 

MASTER_MAPPING_FILE

masterMappingFile

mapping.ttl

The filename of the master mapping file 

YAML_MAPPINGS_DIR_URL

yamlMappingsDirUrl

file:///var/local/yaml-mapping/

The URL of the directory containing the yaml mapping file(s) if used.

OUTPUT_DIR_URL

outputDirUrl

 file:///var/local/output/

The URL of the directory you wish the generated RDF to be output to. Can be local or remote, see here for more details.

OUTPUT_FILE_FORMAT

outputFileFormat

nquads

The file type that will be constructed when the RDF is created. The options are: nquads, ntriples, jsonld, turtle, trig, and trix.

CONFIG_BACKUP

configBackup

file:///var/local/config-backup/

The URL directory where the config will be backed up to when calling the upload config endpoint

MAX_CSV_ROWS

maxCsvRows

100000

The maximum number of rows a CSV file can be before it is split into smaller files (of specified length) and processed individually.

VALIDATE_CSV

validateCsv

true

By default, the Lens will check a CSV file is valid and remove any invalid or multiline rows. To turn this off, set this property to false.

S3_REGION

s3Region

us-east-1

The region in AWS where your files and services reside. Note: all services must be in the same region.

AWS_ACCESS_KEY

awsAccessKey

 

Your access key for AWS

AWS_SECRET_KEY

awsSecretKey

 

Your secret key for AWS

LENS_RUN_STANDALONE

runStandalone

true

Each of the Lenses are designed to run as part of a larger end to end system with the end result being the data is uploaded to a Knowledge or Property Graph. As part of this process, Kafka is used to communicate between services. This is enabled by default, however if you want to run the Lens as standalone without communicating to other services, set this property to true.

UPLOAD_CUSTOM_FUNCTIONS

uploadCustomFunctions

false

If you require a function to be executed that doesn’t perform the required operation using the built-in functions, it is possible to create and use your own. To do this, set this variable to true, and follow the instructions laid out in this guide.

 

Property Graph Configuration

Environment Variable

Entry

Default Value

Description

KAFKA_BROKERS

kafkaBrokers

localhost:9092

The Kafka Broker is what tells the Lens where to look for your Kafka Cluster. Set with the following structure <kafka-ip>:<kafka-port>. The recommended port is 9092.

Kafka Configuration

Environment Variable

Entry

Default Value

Description

KAFKA_BROKERS

kafkaBrokers

localhost:9092

The Kafka Broker is what tells the Lens where to look for your Kafka Cluster. Set with the following structure <kafka-ip>:<kafka-port>. The recommended port is 9092.

KAFKA_TOPIC_NAME_SOURCE

topicNameSource

source_urls

The topic used for the Consumer to read messages from containing input file URLs in order to ingest data.

KAFKA_TOPIC_NAME_DLQ

topicNameDLQ

dead_letter_queue

The topic used to push messages containing reasons for failure within the Lens. These messages are represented as a JSON.

KAFKA_TOPIC_NAME_SUCCESS

topicNameSuccess

success_queue

The topic used for the messages sent containing the file URLs of the successfully transformed RDF data files.

KAFKA_GROUP_ID_CONFIG

groupIdConfig

consumerGroup1

 The identifier of the group this consumer belongs to.

KAFKA_AUTO_OFFSET_RESET_CONFIG

autoOffsetResetConfig

earliest

What to do when there is no initial offset in Kafka or if an offset is out of range.

earliest: automatically reset the offset to the earliest offset

latest: automatically reset the offset to the latest offset

KAFKA_MAX_POLL_RECORDS

maxPollRecords

100

 The maximum number of records returned in a single call to poll.

KAFKA_TIMEOUT

timeout

1000000

Kafka consumer polling time out.

...

Environment Variable

Default Value

Description

LOGGING_LEVEL

WARN

Global log level

LOGGING_APPENDERS_CONSOLE_TIMEZONE

UTC

Timezone for console logging

LOGGING_APPENDERS_TXT_FILE_THRESHOLD

ALL

Threashold for text logging

Log Format (not overridable)

%-6level [%d{HH:mm:ss.SSS}] [%t] %logger{5} - %X{code} %msg %n

Pattern for logging messages

Current Log Filename (not overridable)

/var/log/datalens/text/current/application_${applicationName}_${timeStamp}.txt.log

Pattern for log file name

LOGGING_APPENDERS_TXT_FILE_ARCHIVE

true

Archive log text files

Archived Log Filename Pattern (not overridable)

/var/log/datalens/text/archive/application_${applicationName}_${timeStamp}_to_%d{yyyy-MM-dd}.txt.log

Log file rollover frequency depends on pattern in following property. For example %d{yyyy-MM-ww} declares rollover weekly

LOGGING_APPENDERS_TXT_FILE_ARCHIVED_TXT_FILE_COUNT

7

Max number of archived text files

LOGGING_APPENDERS_TXT_FILE_TIMEZONE

UTC

Timezone for text file logging

LOGGING_APPENDERS_JSON_FILE_THRESHOLD

ALL

Threashold for text logging

Log Format (not overridable)

%-6level [%d{HH:mm:ss.SSS}] [%t] %logger{5} - %X{code} %msg %n

Pattern for logging messages

Current Log Filename (not overridable)

/var/log/datalens/json/current/application_${applicationName}_${timeStamp}.json.log

Pattern for log file name

LOGGING_APPENDERS_JSON_FILE_ARCHIVE

true

Archive log text files

Archived Log Filename Pattern (not overridable)

/var/log/datalens/json/archive/application_${applicationName}_${timeStamp}_to_%d{yyyy-MM-dd}.json.log

Log file rollover frequency depends on pattern in following property. For example %d{yyyy-MM-ww} declares rollover weekly

LOGGING_APPENDERS_JSON_FILE_ARCHIVED_FILE_COUNT

7

Max number of archived text files

LOGGING_APPENDERS_JSON_FILE_TIMEZONE

UTC

Timezone for text file logging

LOGGING_APPENDERS_JSON_FILE_LAYOUT_TYPE

json

The layout type for the json logger

...