Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Once you have your Structured File Lens and your Lens Writer up and running, explained below is an example of an end-to-end enterprise-ready highly-scalable system showing what is required to ingest your structured files (CSV/XML/JSON) into your Knowledge or Property Graph. The intended flow of your data through the systems is as follows:

 

Source File System → Kafka → Structured File Lens → Kafka → Lens Writer → Triple Store

The first thing to determine is where your source data files are stored. Whether locally, remotely, or in an S3 Bucket, the Structured File Lens must be told where to retrieve these file from. Here we utilise message queues in the form of Apache Kafka. By setting up a Kafka Producer which subscribes to the topic name as specified in the Lens’ KAFKA_TOPIC_NAME_SOURCE config variable (defaults to “source_urls”), you are able to directly send file URLs to the Lens. Once set up, each message sent from the Producer must consist solely of URL of the file, for example, s3://examplebucket/folder/input-data.csv. Additionally, if you are using Kafka and S3 Buckets, you can use our AWS Lambda to automatically push a message to the Kafka Queue whenever a new file is uploaded to your S3 Bucket, which in turn will automatically trigger the Lens to ingest and transform this data.

...

Once you have your SQL Lens and your Lens Writer up and running, explained below is an example of an end-to-end enterprise-ready highly-scalable system showing what is required to ingest data from your Relational SQL Databases into your Knowledge or Property Graph. The intended flow of your data through the systems is as follows: 

Relational SQL Database → Cron Scheduler / API Endpoint → SQL Lens → Kafka → Lens Writer → Triple Store

As seen in the SQL Lens User Guide, the connection to your Database lies within the mapping files that you have created. The process in order for the Lens to start ingesting data from your DB can be triggered in two ways. One is to use the exposed API Endpoint, this is simply a GET request targeting the Lens, for example, http://<lens-ip>:<lens-port>/process. Another is to use a Cron Expression to set up a time-based job scheduler which will schedule the Lens to ingest your specified data from your database(s) periodically at fixed times, dates, or intervals.

...

RESTful Lens + Kafka + Lens Writer

Once you have your RESTful Lens and your Lens Writer up and running, explained below is an example of an end-to-end enterprise-ready highly-scalable system showing what is required to ingest data from your REST API endpoint into your Knowledge or Property Graph. The intended flow of your data through the systems is as follows: 

REST API → Cron Scheduler / API Endpoint → RESTful Lens → Kafka → Lens Writer → Triple Store

As seen in the RESTful Lens User Guide, the connection to your REST API lies within the JSON_REST_CONFIG_URL configuration variable. The process in order for the Lens to start ingesting data from your API can be triggered in two ways. One is to use the Lens' exposed API Endpoint, this is simply a GET request targeting the Lens, for example, http://<lens-ip>:<lens-port>/process. Another is to use a Cron Expression to set up a time-based job scheduler which will schedule the Lens to ingest your API data periodically at fixed times, dates, or intervals.

...

Document Lens + Kafka + Lens Writer

Once you have your Document Lens and your Lens Writer up and running, explained below is an example of an end-to-end enterprise-ready highly-scalable system showing what is required to ingest your document files (PDF/doc(x)/txt) into your Knowledge or Property Graph. The intended flow of your data through the systems is as follows: 

Source File System → Kafka → Document Lens → Kafka → Lens Writer → Triple Store

The first thing to determine is where your source data files are stored. Whether locally, remotely, or in an S3 Bucket, the Document Lens must be told where to retrieve these file from. Here we utilise message queues in the form of Apache Kafka. By setting up a Kafka Producer which subscribes to the topic name as specified in the Lens’ KAFKA_TOPIC_NAME_SOURCE config variable (defaults to “source_urls”), you are able to directly send file URLs to the Lens. Once set up, each message sent from the Producer must consist solely of URL of the file, for example, s3://examplebucket/folder/input-data.pdf. Additionally, if you are using Kafka and S3 Buckets, you can use our AWS Lambda to automatically push a message to the Kafka Queue whenever a new file is uploaded to your S3 Bucket, which in turn will automatically trigger the Lens to ingest and transform this data.

...