Content on this page
Introduction
Sneller can be used to ingest data from a variety of sources. This section lists sample definition.json
files that can used for easy setup.
Create new table
All (definition) state for a table is stored in a single JSON file in S3. See here for the full specification.
Creating a new table is straightforward:
- create a
definition.json
file and update thepattern
attribute to point to the correct location in the source bucket - copy the
definition.json
file to the ingestion bucket under the pathdb/<db-name>/<table-name>/
(and update<db-name>
and<table-name>
accordingly) - If not already configured for another table, enable S3 Event Notifications for the source bucket
That is all there is to it (and see here for an example). If there is existing data in the source bucket it will be ingested or when a new object is created it will be added automatically.
Table template definitions
AWS CloudTrail
CloudTrail is Amazon’s most comprehensive logging service that virtually includes everything (with a few very exceptions). We have provided the Terraform scripts to enable CloudTrail logging and making sure it is ingested into Sneller. If you rather setup CloudTrail ingestion manually, then follow these instructions and use the following table definition:
- click to show/hide content
- aws-cloudtrail-definition.json
{
"input": [
{
"pattern": "s3://config-bucket-XXXXXXXX/cloudtrail/AWSLogs/ACCOUNT-ID/CloudTrail/{region}/*/*/*/*.json.gz",
"format": "cloudtrail.json.gz"
}
],
"partitions": [ { "field": "region" } ]
}
AWS Config
Setup: instructions.
- click to show/hide content
- aws-config-definition.json
{
"input": [
{
"pattern": "config-bucket-XXXXXXXX/AWSLogs/ACCOUNT-ID/Config/{region}/*/*/*/*/*.log.gz",
"format": "json.gz"
}
],
"partitions": [ { "field": "region" } ]
}
AWS WAF
Setup: instructions.
- click to show/hide content
- aws-waf-definition.json
{
"input": [
{
"pattern": "s3://aws-waf-logs-DOC-EXAMPLE-BUCKET-SUFFIX/DOC-EXAMPLE-KEY-NAME-PREFIX/AWSLogs/ACCOUNT-ID/WAFLogs/{region}/web-acl-name/*/*/*/*/*/*.log.gz",
"format": "json.gz"
}
],
"partitions": [ { "field": "region" } ]
}
AWS S3 Inventory
Setup: instructions (use output format of Apache Parquet).
- click to show/hide content
- aws-s3-inventory-definition.json
{
"input": [
{
"pattern": "s3://DESTINATION-PREFIX/SOURCE-BUCKET/config-ID/data/*.parquet",
"format": "parquet"
}
]
}
AWS Route53 (Resolver query logs)
Setup: instructions.
- click to show/hide content
- aws-route53-definition.json
{
"input": [
{
"pattern": "your_bucket_name/AWSLogs/ACCOUNT-ID/*.log.gz",
"format": "json.gz"
}
]
}
AWS VPC Flow Logs
Setup: instructions. We have provided the Terraform scripts to enable VPC flow logging for the default VPC and making sure it is ingested into Sneller.
- click to show/hide content
- aws-vpc-flow-logs-definition.json
{
"input": [
{
"pattern": "s3://YOUR_SOURCE_BUCKET_HERE/vpcflowlogs/AWSLogs/YOUR_AWS_ACCOUNT_ID_HERE/vpcflowlogs/{region}/{yyyy}/{mm}/{dd}/*.log.gz",
"format": "csv.gz",
"hints": {
"skip_records": 1,
"separator": " ",
"missing_values": [
"-"
],
"fields": [
{ "name": "version", "type": "int" },
{ "name": "account_id", "type": "string" },
{ "name": "interface_id", "type": "string" },
{ "name": "srcaddr", "type": "string" },
{ "name": "dstaddr", "type": "string" },
{ "name": "srcport", "type": "int" },
{ "name": "dstport", "type": "int" },
{ "name": "protocol", "type": "int" },
{ "name": "packets", "type": "int" },
{ "name": "bytes", "type": "int" },
{ "name": "start", "type": "datetime", "format": "unix_seconds" },
{ "name": "end", "type": "datetime", "format": "unix_seconds" },
{ "name": "action", "type": "string" },
{ "name": "log_status", "type": "string" }
]
}
}
],
"partitions": [
{ "field": "region" },
{ "field": "date", "value": "$yyyy/$mm/$dd" }
],
"retention_policy": {
"field": "end",
"valid_for": "1m"
}
}
AWS S3 Access
Setup: instructions.
Note: Alternatively CloudTrail can be used, see here for more info.
- click to show/hide content
- aws-s3-access-definition.json
{
"input": [
{
"pattern": "s3://YOUR_SOURCE_BUCKET_HERE/AWSLogs/YOUR_AWS_ACCOUNT_ID_HERE/*.log.gz",
"format": "csv.gz",
"hints": {
"skip_records": 1,
"separator": " ",
"missing_values": [
"-"
],
"fields": [
{ "name": "bucket_owner", "type": "string" },
{ "name": "bucket", "type": "string" },
{ "name": "time", "type": "datetime" }
]
}
}
]
}
AWS ELB
- click to show/hide content
- aws-elb-definition.json
{
"input": [
{
"pattern": "s3://bucket/prefix/AWSLogs/ACCOUNT-ID/elasticloadbalancing/{region}/*/*/*/*.log.gz",
"format": "csv.gz",
"hints": {
"skip_records": 1,
"separator": " ",
"missing_values": [
"-"
],
"fields": [
{ "name": "type", "type": "string" },
{ "name": "time", "type": "datetime" },
{ "name": "elb", "type": "string" }
]
}
}
]
}