Content on this page
Introduction
The previous example showed how to set-up a local Kubernetes cluster and run Sneller. Running the cluster inside AWS EKS is a much more practical solution and provides the following advantages:
- More flexible scaling
- Use S3 for object storage
- Use IAM roles for least-privileged operation
Instead of running all scripts manually, it is much better to provision your infrastructure using code. AWS does provide CloudFormation, but we’ll use Terraform instead. Terraform also works for other cloud environments, such as Microsoft Azure or Google Cloud Platform. Additionally, it has providers for Kubernetes, Helm and other systems.1
Creating the complete infrastructure consists of two steps:
You can also find all the scripts in the https://github.com/SnellerInc/examples/tree/master/terraform/install-eks repository.
Step 1: Create the VPC and EKS cluster
The first step will set up the VPC and EKS cluster. This is based on the Provision an EKS Cluster (AWS) example in the Hashicorp documentation. Make sure to check it out to get more in-depth knowledge about setting up the VPC and EKS cluster.
Setting up Terraform
The first step is to set-up Terraform. It uses the AWS provider and it will use the current user’s AWS credentials, so make sure you have sufficient rights.
This script uses the following variables:
region
specifies the AWS region where to deploy the cluster.instance_type
specifies the instance-types of the EKS nodes. Make sure to choose instance types that support AVX-512 and have enough memory to cache data in memory.prefix
specifies a prefix that is used for all global resources. Some resources need unique names (i.e. S3 buckets). If you don’t specify a prefix, then it will generate a random 4 character prefix instead.
- click to show/hide content
- step1/main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
}
}
provider "aws" {
region = var.region
}
variable "region" {
type = string
description = "AWS region"
default = "us-east-1"
}
variable "instance_type" {
type = string
description = "Instance type for the EKS nodes"
default = "m5.large"
}
variable "prefix" {
type = string
description = "Prefix for all resources (required to make resources unique)"
default = "" # a 4 character random prefix will be used, when left empty
}
To ensure that we always have a prefix, we need some “magic” to create a randomized prefix if no prefix was set. Note that Terraform also stores the random prefix in the state, so it won’t change between runs.
- click to show/hide content
- step1/prefix.tf
resource "random_string" "random_prefix" {
length = 4
special = false
numeric = false
upper = false
}
locals {
prefix = var.prefix != "" ? "${var.prefix}-" : "${random_string.random_prefix.id}-"
}
Creating the VPC
The EKS cluster will run in its own private VPC called XYZXYZXYZ. The VPC will use up to 3 availability zones to ensure high availability. It creates both a public and a private subnet for each availability zone.
- click to show/hide content
- step1/vpc.tf
data "aws_availability_zones" "available" {
state = "available"
}
locals {
cluster_name = "${local.prefix}sneller"
cidr = "10.0.0.0/16"
}
module "vpc" {
# See https://registry.terraform.io/modules/terraform-aws-modules/vpc/aws/latest
source = "terraform-aws-modules/vpc/aws"
version = "4.0.2"
name = "${local.prefix}sneller"
cidr = local.cidr
# Use up to 3 availability zones
azs = slice(data.aws_availability_zones.available.names, 0, 3)
public_subnets = [for i, az in module.vpc.azs : cidrsubnet(local.cidr, 8, i)]
private_subnets = [for i, az in module.vpc.azs : cidrsubnet(local.cidr, 8, 128 + i)]
enable_nat_gateway = true
single_nat_gateway = true
enable_dns_hostnames = true
public_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/elb" = 1
}
private_subnet_tags = {
"kubernetes.io/cluster/${local.cluster_name}" = "shared"
"kubernetes.io/role/internal-elb" = 1
}
}
Create the Kubernetes cluster (EKS)
The EKS cluster is created using the following script. There is only a
single node-group that uses on-demand instances. The eks_managed_node_groups
can be extended to also make use of spot-instances. Refer to the
terraform-aws-modules/eks/aws
documentation
for more information about this.
- click to show/hide content
- step1/eks.tf
module "eks" {
# See https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest
source = "terraform-aws-modules/eks/aws"
version = "19.15.1"
cluster_name = local.cluster_name
cluster_version = "1.26"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets
cluster_endpoint_public_access = true
eks_managed_node_group_defaults = {
instance_types = [var.instance_type]
}
eks_managed_node_groups = {
primary = {
name = "primary"
instance_types = [var.instance_type]
min_size = 1
max_size = 10
desired_size = 3
}
}
}
Output
We need some information from this step for the next step, so all relevant output will be stored into variables.
- click to show/hide content
- step1/output.tf
output "region" {
description = "AWS region"
value = var.region
}
output "prefix" {
description = "Prefix for all resources"
value = local.prefix
}
output "cluster_name" {
description = "Kubernetes Cluster Name"
value = module.eks.cluster_name
}
output "vpc_id" {
description = "VPC identifier"
value = module.vpc.vpc_id
}
output "provider_arn" {
description = "OIDC provider ARN"
value = module.eks.oidc_provider_arn
}
Create the infrastructure
This is all we need to create the VPC and EKS cluster, so you can now initialize Terraform and apply the script:
export TF_VAR_prefix=test # make sure to use your own prefix here
export TF_VAR_region=us-east-2
terraform init # only needed once
terraform apply
If everything is fine, then Terraform will show you a detailed plan
of the required infrastructure changes. If you want to make changes,
then alter the variables or Terraform scripts and run terraform apply
again.
Note that some changes may require a rebuild of the VPC and/or EKS cluster. If they are recreated, then you lose all data inside the cluster. ALWAYS check the Terraform plan before making any changes!!
If you need to destroy the VPC and EKS cluster, then run terraform destroy
. Make sure you first destroy the infrastructure in step 2
before destroying the infrastructure of step 1.
Step 2: Deploy Sneller
Setting up Terraform
The first step is to set-up Terraform again for this step. It uses the AWS provider, but it now also uses the Kubernetes and Helm providers to provision the cluster.
This script reads the output from the previous step (via the
data.terraform_remote_state.step1
resource), so it knows about
the prefix, EKS cluster, etc. It also introduces the following new variables:
namespace
specifies the namespace within the Kubernetes cluster (defaults tosneller
)database
/table
specifies the name of the database and table.
- click to show/hide content
- step2/main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
kubernetes = {
source = "hashicorp/kubernetes"
}
helm = {
source = "hashicorp/helm"
}
}
}
provider "aws" {
region = data.terraform_remote_state.step1.outputs.region
}
provider "kubernetes" {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = [
"eks",
"get-token",
"--cluster-name",
data.aws_eks_cluster.cluster.name
]
}
}
provider "helm" {
kubernetes {
host = data.aws_eks_cluster.cluster.endpoint
cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority.0.data)
exec {
api_version = "client.authentication.k8s.io/v1beta1"
command = "aws"
args = [
"eks",
"get-token",
"--cluster-name",
data.aws_eks_cluster.cluster.name
]
}
}
}
variable "namespace" {
type = string
description = "Kubernetes namespace"
default = "sneller"
}
variable "database" {
type = string
description = "Database name"
default = "tutorial"
}
variable "table" {
type = string
description = "Table name"
default = "table1"
}
data "terraform_remote_state" "step1" {
backend = "local"
config = {
path = "../step1/terraform.tfstate"
}
}
data "aws_eks_cluster" "cluster" {
name = local.cluster_name
}
locals {
region = data.terraform_remote_state.step1.outputs.region
vpc_id = data.terraform_remote_state.step1.outputs.vpc_id
prefix = data.terraform_remote_state.step1.outputs.prefix
cluster_name = data.terraform_remote_state.step1.outputs.cluster_name
provider_arn = data.terraform_remote_state.step1.outputs.provider_arn
}
Creating the namespace
It is good practice to group resources for a specific service within a namespace, so a new namespace is created using the following resource:
- click to show/hide content
- step2/namespace.tf
# Kubernetes namespace for Sneller
resource "kubernetes_namespace" "sneller" {
metadata {
name = var.namespace
}
}
Creating the S3 buckets
Sneller uses two kinds of buckets:
- Source buckets that hold the data that will be ingested. The data can either stay in these buckets or it can be removed after ingestion.
- Ingestion bucket that holds the data that has been ingested by Sneller. The query engine always uses this data, so make sure it isn’t deleted (it’s not a cache). You can always export data back to the original JSON format.
Source bucket
First we’ll create the source bucket and make sure public access is denied. In this example we’ll also add some (small) sample data to ensure that we have some trial data by adding three ND-JSON encoded files to the bucket.
- click to show/hide content
- step2/s3-source.tf
locals {
sneller_source_prefix = "sample_data/"
}
resource "aws_s3_bucket" "sneller_source" {
bucket = "${local.prefix}sneller-source"
tags = {
Name = "Source bucket for Sneller"
}
}
resource "aws_s3_bucket_public_access_block" "sneller_source" {
bucket = aws_s3_bucket.sneller_source.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_object" "sneller_source_data" {
for_each = fileset(path.module, "${local.sneller_source_prefix}*")
key = each.key
bucket = aws_s3_bucket.sneller_source.id
source = each.key
}
These are the three ND-JSON encoded data files:
- click to show/hide content
- step2/sample_data/test1.ndjson
{"value":1}
- click to show/hide content
- step2/sample_data/test2.ndjson
{"value":2}
{"value":3}
- click to show/hide content
- step2/sample_data/test3.ndjson
{"value":4}
{"value":5}
{"value":6}
Ingestion bucket
The ingestion bucket should also disallow public access. It holds the
table definition file that is stored in
s3://<ingestion-bucket>/db/<dbname>/<tablename>/definition.json
and
points to the sample data files.
- click to show/hide content
- step2/s3-ingest.tf
resource "aws_s3_bucket" "sneller_ingest" {
bucket = "${local.prefix}sneller-ingest"
force_destroy = true
tags = {
Name = "Ingest bucket for Sneller"
}
}
resource "aws_s3_bucket_public_access_block" "sneller_ingest" {
bucket = aws_s3_bucket.sneller_ingest.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_s3_object" "table_def" {
key = "db/${var.database}/${var.table}/definition.json"
bucket = aws_s3_bucket.sneller_ingest.id
content = jsonencode({
input = [
{
pattern = "s3://${aws_s3_bucket.sneller_source.bucket}/${local.sneller_source_prefix}*.ndjson"
format = "json"
}
]
})
}
Service Accounts
Kubernetes has a concept of service accounts to add an identity to the cluster with custom security policies. A service account is a Kubernetes concept, but it can be mapped to an IAM role. This is explained in detail in IAM roles for service accounts, but conceptually it is used to map a Kubernetes service account to an IAM role. This ensures that the services within the pod run with the security policies of that IAM role.
We have two services and we’ll apply the principle of least privilege to both services.
sdb
requires read-only rights to the source bucket and should be able to read/write to the ingestion bucket.snellerd
requires only read-only rights to the ingestion bucket.
Both services will use their own dedicated service account. Mapping a service account to an IAM role requires that the Kubernetes service account is created. It also requires an IAM role (with some IAM policies attached). Additionally, there should be a trust relationship, so when EKS requests a certain service account, then the pod is granted the identity of the associated IAM role.
The service account itself is decorated with the eks.amazonaws.com/role-arn
annotation that holds the requested IAM role for the service account. The IAM
role itself is created and a trust-relationship is set up to allow the cluster’s
OIDC provider to use this role for specific service accounts. This is a rather
complicated setup, but Terraform makes this simple by using the
iam-role-for-service-accounts-eks
submodule.
The service account definition with IAM role for sdb
looks like this:
- click to show/hide content
- step2/sa-sdb.tf
resource "kubernetes_service_account" "sdb" {
metadata {
name = "sdb"
namespace = kubernetes_namespace.sneller.metadata[0].name
annotations = {
"eks.amazonaws.com/role-arn" : module.iam_eks_sdb.iam_role_arn
}
}
}
module "iam_eks_sdb" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "5.20.0"
role_name = "${local.prefix}sdb"
role_policy_arns = {
policy = aws_iam_policy.sdb.arn
}
oidc_providers = {
main = {
provider_arn = local.provider_arn
namespace_service_accounts = ["${var.namespace}:sdb"]
}
}
}
resource "aws_iam_policy" "sdb" {
name = "${local.prefix}sdb"
policy = data.aws_iam_policy_document.sdb.json
}
data "aws_iam_policy_document" "sdb" {
# Read access for the source bucket
statement {
actions = ["s3:ListBucket"]
resources = [aws_s3_bucket.sneller_source.arn]
}
statement {
actions = ["s3:GetObject"]
resources = ["${aws_s3_bucket.sneller_source.arn}/*"]
}
# Read/Write access for the ingest bucket
statement {
actions = ["s3:ListBucket"]
resources = [aws_s3_bucket.sneller_ingest.arn]
condition {
test = "StringLike"
variable = "s3:prefix"
values = ["db/*"]
}
}
statement {
actions = ["s3:PutObject", "s3:GetObject", "s3:DeleteObject"]
resources = ["${aws_s3_bucket.sneller_ingest.arn}/db/*"]
}
}
The service account definition with IAM role for snellerd
looks like this:
- click to show/hide content
- step2/sa-snellerd.tf
resource "kubernetes_service_account" "snellerd" {
metadata {
name = "snellerd"
namespace = kubernetes_namespace.sneller.metadata[0].name
annotations = {
"eks.amazonaws.com/role-arn" : module.iam_eks_snellerd.iam_role_arn
}
}
}
module "iam_eks_snellerd" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "5.20.0"
role_name = "${local.prefix}snellerd"
role_policy_arns = {
policy = aws_iam_policy.snellerd.arn
}
oidc_providers = {
main = {
provider_arn = local.provider_arn
namespace_service_accounts = ["${var.namespace}:snellerd"]
}
}
}
resource "aws_iam_policy" "snellerd" {
name = "${local.prefix}snellerd"
policy = data.aws_iam_policy_document.snellerd.json
}
data "aws_iam_policy_document" "snellerd" {
# Read access for the ingest bucket
statement {
actions = ["s3:ListBucket"]
resources = [aws_s3_bucket.sneller_ingest.arn]
condition {
test = "StringLike"
variable = "s3:prefix"
values = ["db/*"]
}
}
statement {
actions = ["s3:GetObject"]
resources = ["${aws_s3_bucket.sneller_ingest.arn}/db/*"]
}
}
The great thing about mapping service accounts to IAM roles is that we don’t need static AWS access keys/secrets anymore. The service account will use an automatic rotating web identity that can be exchanged for temporary AWS credentials. These temporary AWS credentials are only valid for about an hour and will automatically rotate.
Deploy Sneller
The final step is to actually deploy Sneller. We have created a
Helm package to allow an easy installation. The
Helm package will install three pods that run the Sneller daemon and schedule
an sdb
job that ingests new data every minute.
The index file is protected using an 256-bit index key that should be treated
as a secret. It is used by sdb
to sign the index file and by the Sneller daemon
to check if the index file hasn’t been tampered with. Note that the index file
also contains the Etag (hash) of the ingested data files, so tampering with
data files will immediately render them invalid.
The service accounts are mapped to the service accounts that we created before and the S3 bucket is set to the ingest bucket.
- click to show/hide content
- step2/sneller.tf
locals {
commit = substr("7cf4289fb3bcc03464b9f9228391bd7a3348346b", 0, 7)
}
resource "random_id" "index_key" {
byte_length = 32
}
resource "helm_release" "sneller" {
name = "sneller"
namespace = kubernetes_namespace.sneller.metadata[0].name
# TODO: Switch to the production https://charts.sneller.ai
repository = "https://charts.sneller.ai"
chart = "sneller"
version = "0.0.0-${local.commit}"
set {
name = "snellerd.image"
value = "snellerinc/snellerd:${local.commit}-master"
}
set {
name = "sdb.image"
value = "snellerinc/sdb:${local.commit}-master"
}
set {
name = "snellerd.serviceAccountName"
value = kubernetes_service_account.snellerd.metadata[0].name
type = "string"
}
set {
name = "sdb.serviceAccountName"
value = kubernetes_service_account.sdb.metadata[0].name
type = "string"
}
set {
name = "sdb.cronJob"
value = "* * * * *"
}
set {
name = "sdb.database"
value = var.database
type = "string"
}
set {
name = "sdb.tablePattern"
value = var.table
type = "string"
}
set {
name = "snellerd.replicaCount"
value = 3 # TODO: Fetch from the number of actual nodes
}
set {
name = "secrets.index.values.snellerIndexKey"
value = random_id.index_key.b64_std
type = "string"
}
set {
name = "secrets.s3.values.awsRegion"
value = aws_s3_bucket.sneller_ingest.region
}
set {
name = "configuration.values.s3Bucket"
value = "s3://${aws_s3_bucket.sneller_ingest.bucket}"
}
}
Output
We need some information from this step when querying Sneller, so all relevant output will be stored into variables again.
- click to show/hide content
- step2/output.tf
output "namespace" {
description = "Kubernetes namespace"
value = var.namespace
}
output "database" {
description = "Database name"
value = var.database
}
output "table" {
description = "Table name"
value = var.table
}
Create the infrastructure
This is all we need to actually deploy Sneller, so you can now initialize Terraform and apply the script:
terraform init # only needed once
terraform apply
If everything is fine, then Terraform will show you a detailed plan
of the required infrastructure changes. If you want to make changes,
then alter the variables or Terraform scripts and run terraform apply
again.
Note that some changes may require require to re-ingest the data or you may even lose your source data. ALWAYS check the Terraform plan before making any changes!!
Using Sneller
Using this setup, by default the Sneller daemon can only be accessed from within the
Kubernetes cluster. First we need to get access to our cluster, so we’ll
issue the following command to update the kubeconfig
file.
cd ../step1
export EKS_NAME=$(terraform output -json cluster_name | jq -r '.')
export EKS_REGION=$(terraform output -json region | jq -r '.')
aws eks update-kubeconfig --region $EKS_REGION --name $EKS_NAME
cd ../step2
export SNELLER_NAMESPACE=$(terraform output -json namespace | jq -r '.')
export SNELLER_DATABASE=$(terraform output -json database | jq -r '.')
export SNELLER_TABLE=$(terraform output -json table | jq -r '.')
kubectl config set-context --current --namespace=$SNELLER_NAMESPACE
Sneller always uses a bearer token to be able to access it. This bearer token can be fetched using this:
export SNELLER_TOKEN=$(kubectl get secret sneller-token --template={{.data.snellerToken}} | base64 --decode)
Accessing Sneller within the cluster
The Helm script exposes Sneller as a service, so it can be accessed using the
standard way. You can fire up a pod inside the cluster using the following
command (make sure you have already read the token in SNELLER_TOKEN
):
kubectl run test --restart=Never --rm -i --tty --image=curlimages/curl:latest \
--env "SNELLER_TOKEN=$SNELLER_TOKEN" \
--env "SNELLER_NAMESPACE=$SNELLER_NAMESPACE" \
--env "SNELLER_DATABASE=$SNELLER_DATABASE" \
--env "SNELLER_TABLE=$SNELLER_TABLE" \
--command -- /bin/sh
You should now be running inside a pod in the cluster and you can invoke a query using the following statement:
curl -H "Authorization: Bearer $SNELLER_TOKEN" \
-H "Accept: application/json" \
-s "http://sneller-snellerd.$SNELLER_NAMESPACE.svc.cluster.local:8000/query?database=$SNELLER_DATABASE" \
--data-raw "SELECT COUNT(*) FROM $SNELLER_TABLE"
Once you are done, you can simply enter exit
and the pod is gone.
Accessing Sneller locally (using port forwarding)
It’s also possible to use kubectl
to set up a port forwarding to your local
machine using:
kubectl port-forward service/sneller-snellerd 8000 > /dev/null &
SNELLERD_PID=$!
The Sneller daemon port-forwarding is running in the background and can be
stopped again using kill $SNELLERD_PID
when it’s not needed anymore. For now
we’ll keep it running. Now that port-forwarding is active, we should be able to
access the Sneller daemon:
curl http://localhost:8000
Now you can invoke a query using:
curl -H "Authorization: Bearer $SNELLER_TOKEN" \
-H "Accept: application/json" \
"http://localhost:8000/query?database=$SNELLER_DATABASE" \
--data-raw "SELECT COUNT(*) FROM $SNELLER_TABLE"
Using ingress to expose the Sneller daemon
The typical way to expose REST endpoints to the outside world is by using an Ingress controller. Kubernetes supports different Ingress controllers, but in this example we’ll use the AWS Load Balancer Controller.2
Allowing ingress using AWS Load Balancer Controller
A full walk-through about installing the AWS Load Balancer Controller can be found in the AWS documentation.
Creating the service account
The Load Balancer Controller should be able to create AWS resources, so it
requires some IAM permissions to do this. The
iam-role-for-service-accounts-eks
submodule will create an IAM role that
attaches the proper rights to the IAM role and ensures that the IAM role
can be assumed by the specified Kubernetes service account.
- click to show/hide content
- step3/sa-aws-load-balancer-controller.tf
locals {
alb_service_account_name = "aws-load-balancer-controller"
}
resource "kubernetes_service_account" "aws_load_balancer" {
metadata {
name = local.alb_service_account_name
namespace = var.namespace
labels = {
"app.kubernetes.io/name" = local.alb_service_account_name
"app.kubernetes.io/component" = "controller"
}
annotations = {
"eks.amazonaws.com/role-arn" = module.lb_role.iam_role_arn
"eks.amazonaws.com/sts-regional-endpoints" = "true"
}
}
}
module "lb_role" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
role_name = "${local.prefix}sneller-lb"
attach_load_balancer_controller_policy = true
oidc_providers = {
main = {
provider_arn = local.provider_arn
namespace_service_accounts = ["${var.namespace}:${local.alb_service_account_name}"]
}
}
}
Installing the AWS Load Balancer Controller
With the appropriate service account, the AWS Load Balancer Controller can be installed using a Helm script:
- click to show/hide content
- step3/aws-load-balancer-controller.tf
resource "helm_release" "lb" {
name = "aws-load-balancer-controller"
repository = "https://aws.github.io/eks-charts"
chart = "aws-load-balancer-controller"
namespace = var.namespace
depends_on = [
kubernetes_service_account.aws_load_balancer
]
set {
name = "clusterName"
value = local.cluster_name
}
set {
name = "region"
value = local.region
}
set {
name = "vpcId"
value = local.vpc_id
}
set {
name = "serviceAccount.create"
value = "false"
}
set {
name = "serviceAccount.name"
value = kubernetes_service_account.aws_load_balancer.metadata[0].name
}
}
The Helm script requires several values, but they are straightforward. Check the documentation to view all the the chart values.
Choosing a hostname in your domain
When the service is accessed from the internet, then it needs to have a fully
qualified domain name (FQDN). It consists of the hostname
and domain
part
that together make up the FQDN:
- click to show/hide content
- step3/ingress-vars.tf
variable "hostname" {
type = string
description = "Hostname (excluding domain) at which the Sneller service should be available"
}
variable "domain" {
type = string
description = "Domain where the Sneller service should be available"
}
locals {
fqdn = "${var.hostname}.${var.domain}"
}
output "fqdn" {
description = "FQDN of the Sneller service"
value = local.fqdn
}
Changing the service type
The default Sneller service type
is ClusterIP
.
This only allows access from within the cluster, so we need to change the service type to
NodePort
instead. The service type can be set using the snellerd.serviceType
value in the Helm chart.
Creating the ingress resource
The AWS Load Balancer Controller works pretty simple. Each ingress resource with the proper annotations will be exposed via an AWS load balancer. Check the documentation for a complete list of all annotations.
First, we’ll enable ingress by setting the Helm chart value ingress.enabled
to true
and ingress.hosts.0
to the FQDN of the service. To ensure the
service being exposed to the internet the following annotations should be
set too:
kubernetes.io/ingress.class
is set toalb
to create an application load balanceralb.ingress.kubernetes.io/scheme
is set tointernet-facing
to ensure a public load balancer that is accessible from the internet.alb.ingress.kubernetes.io/certificate-arn
can optionally be set to the ARN of the certificate that is used to enable TLS.
Now the Sneller configuration depends on the AWS Load Balancer Controller, it
shouldn’t be created before the controller has been installed. This can be done
by adding depends_on = [helm_release.lb]
to the Sneller Helm release.
The updated sneller.tf
now has some additional settings:
- click to show/hide content
- step3/sneller.tf
locals {
commit = substr("7cf4289fb3bcc03464b9f9228391bd7a3348346b", 0, 7)
}
resource "random_id" "index_key" {
byte_length = 32
}
resource "helm_release" "sneller" {
depends_on = [helm_release.lb]
name = "sneller"
namespace = kubernetes_namespace.sneller.metadata[0].name
repository = "https://charts.sneller.ai"
chart = "sneller"
version = "0.0.0-${local.commit}"
set {
name = "snellerd.image"
value = "snellerinc/snellerd:${local.commit}-master"
}
set {
name = "sdb.image"
value = "snellerinc/sdb:${local.commit}-master"
}
set {
name = "snellerd.serviceAccountName"
value = kubernetes_service_account.snellerd.metadata[0].name
type = "string"
}
set {
name = "sdb.serviceAccountName"
value = kubernetes_service_account.sdb.metadata[0].name
type = "string"
}
set {
name = "sdb.cronJob"
value = "* * * * *"
}
set {
name = "sdb.database"
value = var.database
type = "string"
}
set {
name = "sdb.tablePattern"
value = var.table
type = "string"
}
set {
name = "snellerd.replicaCount"
value = 3 # TODO: Fetch from the number of actual nodes
}
set {
name = "secrets.index.values.snellerIndexKey"
value = random_id.index_key.b64_std
type = "string"
}
set {
name = "secrets.s3.values.awsRegion"
value = aws_s3_bucket.sneller_ingest.region
}
set {
name = "configuration.values.s3Bucket"
value = "s3://${aws_s3_bucket.sneller_ingest.bucket}"
}
# The following settings are only used when exposing
# Sneller via the AWS ingress controller.
set {
name = "snellerd.serviceType"
value = "NodePort"
}
set {
name = "ingress.enabled"
value = true
}
set {
name = "ingress.annotations.alb\\.ingress\\.kubernetes\\.io/scheme"
value = "internet-facing"
}
set {
name = "ingress.annotations.kubernetes\\.io/ingress\\.class"
value = "alb"
}
set {
name = "ingress.hosts.0"
value = local.fqdn
}
# The following settings are only used when exposing
# Sneller using TLS certificates
set {
name = "ingress.annotations.alb\\.ingress\\.kubernetes\\.io/certificate-arn"
value = aws_acm_certificate.sneller.arn
}
}
Create a DNS entry
The AWS load balancer that is created for the ingress has a complicated name
that is generated by AWS, such as k8s-sneller-snellers-a9299ac223-552088442.us-east-1.elb.amazonaws.com
,
so it should be mapped to the FQDN that has been defined using the hostname
and domain
variables.
Domain hosted in same AWS account
If the domain is hosted in the same AWS account, then you can use the following Terraform script to create an alias to the AWS load balancer
- click to show/hide content
- step3/dns.tf
data "aws_route53_zone" "domain" {
name = var.domain
}
data "aws_lb" "sneller" {
depends_on = [helm_release.sneller]
tags = {
"elbv2.k8s.aws/cluster" = local.cluster_name
"ingress.k8s.aws/resource" = "LoadBalancer"
"ingress.k8s.aws/stack" = "${var.namespace}/sneller-snellerd"
}
}
resource "aws_route53_record" "ingress_alias" {
zone_id = data.aws_route53_zone.domain.zone_id
name = local.fqdn
type = "A"
alias {
name = "dualstack.${data.aws_lb.sneller.dns_name}"
zone_id = data.aws_lb.sneller.zone_id
evaluate_target_health = true
}
}
Domain hosted outside the AWS account
If the domain is not hosted in this AWS account, then you should create
a CNAME
entry in your domain that points to the AWS load balancer. Once
the load balancer has been created, it can be obtained using the following
command:
kubectl get ingress sneller-snellerd
Make sure you update your DNS configuration, so the actual FQDN points to the load balancer.
If your DNS provider also provides a Terraform plug-in, then you can use
Terraform to create this CNAME
entry too.
Create a certificate
AWS automates certificate management using AWS Certificate Manager (ACM) and can issue and renew certificates automatically. ACM always validates certificates using a DNS challenge. If you want to automate this process, then ensure that the required DNS entries are made.
Domain hosted in same AWS account
If the domain is hosted in the same AWS account, then you can use the following Terraform script to generate a certificate and the required DNS entries that are used for validation.
- click to show/hide content
- step3/certificate.tf
resource "aws_acm_certificate" "sneller" {
domain_name = local.fqdn
# beware of including domains used by other regions. See https://github.com/SnellerInc/sneller-core/issues/2784
subject_alternative_names = []
validation_method = "DNS"
lifecycle {
create_before_destroy = true
}
}
resource "aws_route53_record" "sneller_cert" {
for_each = {
for dvo in aws_acm_certificate.sneller.domain_validation_options : dvo.domain_name => {
name = dvo.resource_record_name
record = dvo.resource_record_value
type = dvo.resource_record_type
}
}
allow_overwrite = true
name = each.value.name
records = [each.value.record]
ttl = 60
type = each.value.type
zone_id = data.aws_route53_zone.domain.zone_id
}
resource "aws_acm_certificate_validation" "sneller" {
timeouts { # to avoid getting stuck for the entire default duration i.e. 75min
create = "15m"
}
certificate_arn = aws_acm_certificate.sneller.arn
validation_record_fqdns = [for record in aws_route53_record.sneller_cert : record.fqdn]
}
Domain hosted outside the AWS account
If the domain is not hosted in this AWS account, then you should create the required DNS entries in your domain by hand (or automate it using your DNS provider’s Terraform plug-in).
You also may want to use another ingress controller (i.e. nginx
) and
use cert-manager
to issue the certificates. cert-manager
can also
validate certificates using an HTTP challenge, which may be easier in
some situations.
Create the infrastructure
Now the ingress and certificates are set up, we’ll update the cluster with the new configuration:
export TF_VAR_hostname=sneller
export TF_VAR_domain=example.com # make sure to use your own domain
terraform apply
If everything is fine, then Terraform will show you a detailed plan
of the required infrastructure changes. If you want to make changes,
then alter the variables or Terraform scripts and run terraform apply
again.
Run some queries
Export the following variables to obtain the proper database, table, Sneller end-point and token:
export SNELLER_DATABASE=$(terraform output -json database | jq -r '.')
export SNELLER_TABLE=$(terraform output -json table | jq -r '.')
export SNELLER_ENDPOINT=$(terraform output -json fqdn | jq -r '.')
export SNELLER_TOKEN=$(kubectl get secret sneller-token --template={{.data.snellerToken}} | base64 --decode)
Now you can run the query directly on the Sneller end-point using TLS:
curl -H "Authorization: Bearer $SNELLER_TOKEN" \
-H "Accept: application/json" \
-s "https://$SNELLER_ENDPOINT/query?database=$SNELLER_DATABASE" \
--data-raw "SELECT COUNT(*) FROM $SNELLER_TABLE"
Adding data
Data can be added easily by adding data to the source bucket. It will be
automatically picked up by the sdb
cronjob that runs every minute.