AWS Marketplace Deployment

Data Lens offer all their lenses as products in AWS Marketplace. Using AWS CloudFormation Templates, we have constructed a way in which you are able to use a one-click solution to deploy any of the lenses. Each template creates an AWS ECS service that contains a task running the required lens. The template also creates all other infrastructure required to host and access the ECS service (route tables, security groups, roles). This includes a VPC, 1 public subnet, and 2 private subnets. For security, the service is run in the private subnet and can be accessed only from a bastion host in the public subnet. The second private subnet is required only to allow a load balancer to be attached to the service but is otherwise redundant. This section provides information specific to running the lenses as products from the AWS MarketPlace. For more detailed instructions on working with lenses please see the main user guide for the specific lens you are working with.

AWS Architecture

Quick Create Stacks

Each lens has a Quick Create Stack for deployment which can be used once you have subscribed to the Lens in the relevant AWS Marketplace product page. Before creating the stack all that is required is to fill out the parameters fields which allow you to input custom values when you create or update the stack and check the box acknowledging that Cloudformation may create IAM resources. The details of the parameters required for each lens are shown below.

Structured File Lens

  • ECSAMI is the machine image for the bastion host EC2 instance. This is defaulted to the Amazon recommended image and unless you have very specific requirements for the bastion host it can be left as is.

  • KeyName is the name of an existing EC2 KeyPair that can be used for SSH access to the bastion host. This will be required to run the lens as it is only possible to hit the /process endpoint of the lens with a REST request from the bastion host.

  • MappingsDirUrl is the URL for the directory where your mapping file is located. NB This has to be the S3 path i.e. beginning with s3:// . An example S3 path has been included as the default so must be replaced by your actual bucket S3 path. For the lens to be able to access your bucket it also must be run in the same region as the bucket. Details for creating a mapping file can be found here

  • OutputDirUrl is the URL for the directory where your transformed RDF output files should be stored. Again the example default must be replaced, S3 path used and the stack created in the same region.

  • ProvOutputDirUrl is the URL for the directory where your transformed RDF output files should be stored. Again the example default must be replaced, S3 path used and the stack created in the same region.

Once the parameters have been input and the acknowledgment check box ticked the stack can be created.

 

 

 

 

 

 

 

SQL Lens

  • ECSAMI is the machine image for the bastion host EC2 instance. This is defaulted to the Amazon recommended image and unless you have very specific requirements for the bastion host it can be left as is.

  • KeyName is the name of an existing EC2 KeyPair that can be used for SSH access to the bastion host. This will be required to run the lens as it is only possible to hit the /process endpoint of the lens with a REST request from the bastion host.

  • MappingsDirUrl is the URL for the directory where your mapping file is located. NB This has to be the S3 path i.e. beginning with s3:// . An example S3 path has been included as the default so must be replaced by your actual bucket S3 path. For the lens to be able to access your bucket it also must be run in the same region as the bucket. Details for creating a mapping file can be found here

  • OutputDirUrl is the URL for the directory where your transformed RDF output files should be stored. Again the example default must be replaced, S3 path used and the stack created in the same region.

  • ProvOutputDirUrl is the URL for the directory where your transformed RDF output files should be stored. Again the example default must be replaced, S3 path used and the stack created in the same region.

  • SQLLimit provides the maximum limit to the number of records that can be processed in any one query. This means that if your database contains more records that this set variable, the Lens will batch process the records from the query and output multiple RDF files. This value must be an integer greater than zero. It defaults to zero, meaning that iterative queries are switched off.

  • SQLOffset provides the ability to offset the start index of the iterative processing. This defaults to zero.

Once the parameters have been input and the acknowledgment check box ticked the stack can be created.

 

 

 

 

 

 

 

RESTful Lens

  • ECSAMI is the machine image for the bastion host EC2 instance. This is defaulted to the Amazon recommended image and unless you have very specific requirements for the bastion host it can be left as is.

  • EndpointMode allows the selection of one of the two modes that the RESTful Lens can operate in depending on whether the lens is being used to work with a standard RESTful endpoint or with one that conforms to the JSON:API specification.

  • JsonApiConfigUrl is the URL for the directory where the lens can obtain the JSON:API config file. For details on how to create a JSON:API config file please see here. NB This has to be the S3 path i.e. beginning with s3:// . An example S3 path has been included as the default so must be replaced by your actual bucket S3 path. For the lens to be able to access your bucket it also must be run in the same region as the bucket. If you are not using JSON:API mode this field is not required

  • JsonRestConfigUrl is the URL for the directory where the lens can obtain the JSON REST config file. For details on how to create a JSON REST config file please see here. Again the example default must be replaced, S3 path used and the stack created in the same region. If you are not using JSON REST mode this field is not required.

  • KeyName is the name of an existing EC2 KeyPair that can be used for SSH access to the bastion host. This will be required to run the lens as it is only possible to hit the /process endpoint of the lens with a REST request from the bastion host.

  • MappingsDirUrl is the URL for the directory where your mapping file is located. Details for creating a mapping file can be found here. Again the example default must be replaced, S3 path used and the stack created in the same region

  • OutputDirUrl is the URL for the directory where your transformed RDF output files should be stored. Again the example default must be replaced, S3 path used and the stack created in the same region.

  • ProvOutputDirUrl is the URL for the directory where your transformed RDF output files should be stored. Again the example default must be replaced, S3 path used and the stack created in the same region.

Once the parameters have been input and the acknowledgment check box ticked the stack can be created.

 

 

 

Document Lens

  • ECSAMI is the machine image for the bastion host EC2 instance. This is defaulted to the Amazon recommended image and unless you have very specific requirements for the bastion host it can be left as is.

  • KeyName is the name of an existing EC2 KeyPair that can be used for SSH access to the bastion host. This will be required to run the lens as it is only possible to hit the /process endpoint of the lens with a REST request from the bastion host.

  • OutputDirUrl is the URL for the directory where your transformed RDF output files should be stored. Again the example default must be replaced, S3 path used and the stack created in the same region.

  • ProvOutputDirUrl is the URL for the directory where your transformed RDF output files should be stored. Again the example default must be replaced, S3 path used and the stack created in the same region.

Once the parameters have been input and the acknowledgment check box ticked the stack can be created.

 

 

 

 

 

 

 

 

Lens Writer

  • ECSAMI is the machine image for the bastion host EC2 instance. This is defaulted to the Amazon recommended image and unless you have very specific requirements for the bastion host it can be left as is.

  • IngestionMode determines how the ingested data is to be processed. 'insert' ingests the new data in full and does not link it with already existing data. The new dataset adds new values to already existing subject-predicate combinations creating new additional triples. 'update', uses the new data for updating the existing data. The new dataset replaces object values in existing subject-predicate combinations amending the existing triples.

  • KeyName is the name of an existing EC2 KeyPair that can be used for SSH access to the bastion host. This will be required to run the lens as it is only possible to hit the /process endpoint of the lens with a REST request from the bastion host.

  • TripleStoreEndpoint is the endpoint for the triple store that you wish to upload RDF to.

  • TripleStorePassword is the password for the triple store. This can be left blank if your triple store does not require any authentication.

  • TripleStoreType is the type of your Triple Store and is required as the methods needed for the Lens Writer to connect vary slightly between different graphs. AllegroGraph supports the default sparql type but GraphDB, Stardog, Blazegraph, Neptune, Neo4j and RDFox have their own implementations.

  • TripleStoreUsername is the username for your Triple Store. This can be left blank if your Triple Store does not require any authentication.

Once the parameters have been input and the acknowledgment check box ticked the stack can be created.

 

 

 

 

 

 

 

Connecting To Lenses

Once the creation of a stack has completed in cloudformation an Outputs section should be available. The Outputs section for all of the lenses will contain the dns name for the bastion host and and the dns name for the loadbalancer attached to the ECS Cluster with the suffix for the endpoint to run the process method on the lens (ApiEndpoint).

As the lens is in a private subnet we need to ssh into the bastion host from our local terminal to be able to send REST requests to the lens. The format for this command is as follows

ssh -i "<private-key-file>" ec2-user@<bastion-host-dns-name>

We provided the key file name as one of our parameters. The bastion host dns name is provided as an Output from the stack. Putting these into the above format we have the command

ssh -i "cloudformation-key.pem" ec2-user@ec2-3-228-6-67.compute-1.amazonaws.com

From the bastion host we can then use curl to perform a GET request to the ApiEndpoint value providing whatever specific parameter the lens requires.