Content on this page
Prerequisites
Before ingesting AWS logging in this script, make sure:
- You should already have credentials to access your AWS account.
- You registered with Sneller Cloud and you have a bearer token.
- You setup Sneller with a proper ingestion bucket and IAM roles.
If you followed the cloud onboarding, then you should be fine. You can download the Terraform scripts as follows:
git clone https://github.com/snellerinc/examples
cd examples/terraform/ingest-aws-logging
All examples are written for Linux and should also work on MacOS. All examples have also been tested in WSL2 (Windows Subsystem for Linux).
Summary
The scripts have been written to work with the onboarding scripts, so you should be able to run the scripts like this:
export TF_VAR_sneller_token=<your-bearer-token>
terraform init # only needed once
terraform apply
The terraform scripts perform the following tasks:
- Create an S3 bucket for AWS logging and allow AWS to store logging in it. The script tries to detect the prefix that is used during onboarding and will use the same prefix for the logging bucket.
- Allow the Sneller IAM role to access (read-only) the bucket with AWS logging for ingestion.
- Create a table definition (database:
aws
, table:cloudtrail
) that ingests the CloudTrail logging. - Create a table definition (database:
aws
, table:flow
) that ingests the default VPC flow logging.
AWS logging batches the delivery of events, so it may take a while before data shows up in Sneller. You can browse through the AWS console to make sure that API calls are invoked on your account. Spinning up an EC2 instance in the default VPC will also generate some VPC activity.
If not already done, then set the variables to access the Sneller query engine:
export SNELLER_TOKEN=<your token here>
export SNELLER_ENDPOINT=https://snellerd-production.<region>.sneller.ai
Now run the following command to determine the number of items per service (via CloudTrail).
curl -H "Authorization: Bearer $SNELLER_TOKEN" \
-H "Accept: application/json" \
-s "$SNELLER_ENDPOINT/query?database=aws" \
--data-raw "SELECT eventSource, COUNT(*) FROM cloudtrail GROUP BY eventSource ORDER BY COUNT(*) DESC LIMIT 100"
This command can be used to determine the number of packets per interface ID (via VPC flow logs):
curl -H "Authorization: Bearer $SNELLER_TOKEN" \
-H "Accept: application/json" \
-s "$SNELLER_ENDPOINT/query?database=aws" \
--data-raw "SELECT interface_id, COUNT(*) FROM flow GROUP BY interface_id ORDER BY COUNT(*) DESC LIMIT 10"
Details
Setting up Terraform
These scripts depend on the AWS and Sneller provider. The AWS provider uses the current user’s AWS credentials, so make sure you have sufficient rights.
This script uses the following variables:
region
specifies the AWS region of your Sneller instance (default:us-east-1
).sneller_token
should hold the Sneller bearer token. If it’s not set, then Terraform will ask for it.prefix
specifies a prefix that is used for the S3 bucket name. If you don’t specify a prefix, then it will try to autodetect the prefix or create a new random prefix.
- click to show/hide content
- main.tf
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
sneller = {
source = "snellerinc/sneller"
}
}
}
provider "aws" {
region = var.region
}
provider "sneller" {
api_endpoint = "https://api-production.${var.region}.sneller.ai/"
default_region = var.region
token = var.sneller_token
}
variable "region" {
type = string
description = "AWS region"
default = "us-east-1"
}
variable "sneller_token" {
type = string
description = "Sneller token"
}
variable "database" {
type = string
description = "Database name for the AWS logging tables"
default = "aws"
}
variable "prefix" {
type = string
description = "Prefix for all resources (required to make resources unique)"
default = "" # a 4 character random prefix will be used, when left empty
}
data "aws_caller_identity" "current" {}
data "aws_partition" "current" {}
The next steps require the SQS queue and IAM role that Sneller uses, so it
should be determined using the sneller_tenant_region
data source that
provides this information.
- click to show/hide content
- sneller-tenant-region.tf
data "sneller_tenant_region" "sneller" {
region = var.region
}
locals {
# Role name is the text after the slash in the IAM role ARN
sneller_iam_role_name = split("/", data.sneller_tenant_region.sneller.role_arn)[1]
}
Some “magic” is used to automatically derive the prefix from the current Sneller IAM role. When that’s not possible, a random prefix will be generated:
- click to show/hide content
- prefix.tf
resource "random_string" "random_prefix" {
length = 4
special = false
numeric = false
upper = false
}
locals {
# If no prefix is set, then we first check if there is a
# prefix in the IAM role-name that we can use. If not,
# then a unique 4 character prefix is used instead.
_suggested_prefix = endswith(local.sneller_iam_role_name, "-sneller") ? trimsuffix(local.sneller_iam_role_name, "-sneller") : random_string.random_prefix.id
prefix = var.prefix != "" ? "${var.prefix}-" : "${local._suggested_prefix}-"
}
S3 bucket for AWS logging
All AWS logging is written into an S3 bucket with the following characteristics:
- Disallow public access.
- Add the bucket policy to allow the AWS services to write to the bucket.
- Add S3 event notification to notify Sneller when new AWS logging objects are available.
- click to show/hide content
- s3-aws-logging.tf
# Cloudtrail delivers the log files to the following bucket
resource "aws_s3_bucket" "sneller_aws_logging" {
bucket = "${local.prefix}sneller-aws-logging"
force_destroy = true
tags = {
Name = "AWS logging data"
}
}
# Public access to the Cloudtrail log bucket is disabled
resource "aws_s3_bucket_public_access_block" "sneller_aws_logging" {
bucket = aws_s3_bucket.sneller_aws_logging.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
# Enable S3 event notification to notify the Sneller ingestion
# pipeline to ingest new data as it arrives
resource "aws_s3_bucket_notification" "sneller_aws_logging" {
bucket = aws_s3_bucket.sneller_aws_logging.id
queue {
id = "sneller-aws-logging"
queue_arn = data.sneller_tenant_region.sneller.sqs_arn
events = ["s3:ObjectCreated:*"]
}
}
# Cloudtrail should be granted access to deliver log files to the bucket
resource "aws_s3_bucket_policy" "sneller_aws_logging" {
bucket = aws_s3_bucket.sneller_aws_logging.id
policy = data.aws_iam_policy_document.sneller_aws_logging_bucket_policy.json
}
data "aws_iam_policy_document" "sneller_aws_logging_bucket_policy" {
source_policy_documents = [
data.aws_iam_policy_document.sneller_cloudtrail_bucket_policy.json, # required for CloudTrail logging
data.aws_iam_policy_document.sneller_flow_bucket_policy.json, # required for VPC Flow logging
]
}
Note that this file doesn’t contain the actual bucket policy, but merges all the bucket policies for the individual logging services.
Enable IAM role to access AWS logging
The IAM role that is assumed by Sneller to read the source data should be granted access to the AWS log data:
- click to show/hide content
- iam-role-aws-logging.tf
resource "aws_iam_role_policy" "sneller_aws_logging" {
role = local.sneller_iam_role_name
name = "aws-logging"
policy = data.aws_iam_policy_document.sneller_aws_logging.json
}
data "aws_iam_policy_document" "sneller_aws_logging" {
# Read access for the cloudtrail bucket
statement {
actions = ["s3:ListBucket"]
resources = [aws_s3_bucket.sneller_aws_logging.arn]
}
statement {
actions = ["s3:GetObject"]
resources = ["${aws_s3_bucket.sneller_aws_logging.arn}/*"]
}
}
CloudTrail logging
In this example, all service logging in all regions will be enabled, but this can be customized using event filtering. The CloudTrail logging is stored in the logging bucket, so a policy is added to allow AWS to write to this bucket.
The data is exposed via the cloudtrail
Sneller table that is created here
as well. The table is partitioned based on the region of
the CloudTrail data. This makes queries on a single region faster and more
cost efficient.
- click to show/hide content
- aws-cloudtrail.tf
locals {
cloudtrail_name = "sneller"
}
# Table that holds all the ingested Cloudtrail log files
resource "sneller_table" "aws_cloudtrail" {
# Enable this for production to avoid trashing your table
# lifecycle { prevent_destroy = true }
database = var.database
table = "cloudtrail"
inputs = [
{
pattern = "s3://${aws_s3_bucket.sneller_aws_logging.bucket}/AWSLogs/${data.aws_caller_identity.current.account_id}/CloudTrail/{region}/*/*/*/*.json.gz"
format = "cloudtrail.json.gz"
}
]
partitions = [
{
field = "region"
}
]
}
# Enable CloudTrail in the AWS account
resource "aws_cloudtrail" "sneller" {
# The S3 bucket policy needs to be set before CloudTrail
# can write to the bucket
depends_on = [ aws_s3_bucket_policy.sneller_aws_logging ]
name = local.cloudtrail_name
s3_bucket_name = aws_s3_bucket.sneller_aws_logging.id
include_global_service_events = true # also log global events (i.e. IAM)
is_multi_region_trail = true # log from all AWS regions
# You can also filter which events should be logged. Refer to
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/cloudtrail
# for more detailed information
}
# AWS logging bucket policy that allows the CloudTrail service to
# write to the S3 bucket that holds all the source data.
data "aws_iam_policy_document" "sneller_cloudtrail_bucket_policy" {
# See https://docs.aws.amazon.com/awscloudtrail/latest/userguide/create-s3-bucket-policy-for-cloudtrail.html
statement {
sid = "AWSCloudTrailAclCheck"
effect = "Allow"
principals {
type = "Service"
identifiers = ["cloudtrail.amazonaws.com"]
}
actions = ["s3:GetBucketAcl"]
resources = [aws_s3_bucket.sneller_aws_logging.arn]
condition {
test = "StringEquals"
variable = "aws:SourceArn"
values = ["arn:${data.aws_partition.current.partition}:cloudtrail:${var.region}:${data.aws_caller_identity.current.account_id}:trail/${local.cloudtrail_name}"]
}
}
statement {
sid = "AWSCloudTrailWrite"
effect = "Allow"
principals {
type = "Service"
identifiers = ["cloudtrail.amazonaws.com"]
}
actions = ["s3:PutObject"]
resources = ["${aws_s3_bucket.sneller_aws_logging.arn}/AWSLogs/${data.aws_caller_identity.current.account_id}/*"]
condition {
test = "StringEquals"
variable = "s3:x-amz-acl"
values = ["bucket-owner-full-control"]
}
condition {
test = "StringEquals"
variable = "aws:SourceArn"
values = ["arn:${data.aws_partition.current.partition}:cloudtrail:${var.region}:${data.aws_caller_identity.current.account_id}:trail/${local.cloudtrail_name}"]
}
}
}
VPC flow logging
In this example, the VPC flow logging of the region’s default VPC is logged. The VPC flow logging is stored in the logging bucket, so a policy is added to allow AWS to write to this bucket.
The data is exposed via the flow
Sneller table that is created here
as well.
- click to show/hide content
- aws-flow.tf
resource "aws_flow_log" "sneller" {
# The S3 bucket policy needs to be set before flow logging
# can write to the bucket
depends_on = [ aws_s3_bucket_policy.sneller_aws_logging ]
log_destination_type = "s3"
log_destination = aws_s3_bucket.sneller_aws_logging.arn
traffic_type = "ALL"
vpc_id = data.aws_vpc.default.id
}
data "aws_vpc" "default" {
default = true
}
# Table that holds all the ingested flow log files
resource "sneller_table" "aws_flow" {
# Enable this for production to avoid trashing your table
# lifecycle { prevent_destroy = true }
database = var.database
table = "flow"
inputs = [
{
pattern = "s3://${aws_s3_bucket.sneller_aws_logging.bucket}/AWSLogs/${data.aws_caller_identity.current.account_id}/vpcflowlogs/{region}/*/*/*/*.log.gz"
format = "csv.gz"
csv_hints = {
skip_records = 1
separator = " "
fields = [
{ name = "version", type = "int" },
{ name = "account_id", type = "string" },
{ name = "interface_id", type = "string" },
{ name = "srcaddr", type = "string" },
{ name = "dstaddr", type = "string" },
{ name = "srcport", type = "int" },
{ name = "dstport", type = "int" },
{ name = "protocol", type = "int" },
{ name = "packets", type = "int" },
{ name = "bytes", type = "int" },
{ name = "start", type = "datetime", format = "unix_seconds" },
{ name = "end", type = "datetime", format = "unix_seconds" },
{ name = "action", type = "string" },
{ name = "log_status", type = "string" },
]
}
}
]
partitions = [
{
field = "region"
}
]
}
# AWS logging bucket policy that allows the Flow logging delivery service to
# write to the S3 bucket that holds all the source data.
data "aws_iam_policy_document" "sneller_flow_bucket_policy" {
# See https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs-s3.html
statement {
sid = "AWSLogDeliveryWrite"
effect = "Allow"
principals {
type = "Service"
identifiers = ["delivery.logs.amazonaws.com"]
}
actions = ["s3:PutObject"]
resources = ["${aws_s3_bucket.sneller_aws_logging.arn}/*"]
condition {
test = "StringEquals"
variable = "aws:SourceAccount"
values = [data.aws_caller_identity.current.account_id]
}
condition {
test = "StringEquals"
variable = "s3:x-amz-acl"
values = ["bucket-owner-full-control"]
}
condition {
test = "ArnLike"
variable = "aws:SourceArn"
values = ["arn:${data.aws_partition.current.partition}:logs:${var.region}:${data.aws_caller_identity.current.account_id}:*"]
}
}
statement {
sid = "AWSLogDeliveryAclCheck"
effect = "Allow"
principals {
type = "Service"
identifiers = ["delivery.logs.amazonaws.com"]
}
actions = ["s3:GetBucketAcl","s3:ListBucket"]
resources = [aws_s3_bucket.sneller_aws_logging.arn]
condition {
test = "StringEquals"
variable = "aws:SourceAccount"
values = [data.aws_caller_identity.current.account_id]
}
condition {
test = "ArnLike"
variable = "aws:SourceArn"
values = ["arn:${data.aws_partition.current.partition}:logs:${var.region}:${data.aws_caller_identity.current.account_id}:*"]
}
}
}