Content on this page
Tutorial
This tutorial assumes you have successfully completed the steps in the previous section and it continues from there.
Add some more data
With event notifications setup we can copy simply new files to the
S3 source bucket. You can either add some ND-JSON encoded files to the
sample_data
folder and run terraform apply
again or manually copy
some JSON data using the AWS CLI:
Note: since Sneller supports dynamic schemas. it will ingest any valid JSON data, irrespective of the structure of the data.
export SNELLER_SOURCE=$(terraform output -json sneller_source | jq -r '.')
aws s3 cp *.ndjson s3://$SNELLER_SOURCE/sample_data/
AWS will automatically send S3 event notifications for the source bucket to the SQS queue and Sneller will immediately add the data to the table.
Query again
After a few seconds the data will be available if you run the query again and you should see a larger count of records:
curl -H "Authorization: Bearer $SNELLER_TOKEN" \
-H "Accept: application/json" \
-s "$SNELLER_ENDPOINT/query?database=$SNELLER_DATABASE" \
--data-raw "SELECT COUNT(*) FROM $SNELLER_TABLE"
Of course you can continue copying more data into the source bucket in order to ingest it.
Create new table
Creating a new table is very simple since it only involves creating a new definition.json
file in the right location and adding data to the source bucket.
For this example we will be creating a table for the gharchive (GitHub archive) data and ingest some data.
Add table definition
You can simply create the definition.json
like this and copy it into the S3 ingestion bucket:
export SNELLER_SOURCE=$(terraform output -json sneller_source | jq -r '.')
export SNELLER_INGEST=$(terraform output -json sneller_ingest | jq -r '.')
cat > definition.json <<EOF
{
"input": [
{
"pattern": "s3://$SNELLER_SOURCE/gharchive/*.json.gz",
"format": "json.gz"
}
]
}
EOF
aws s3 cp definition.json s3://$SNELLER_INGEST/db/demo/gharchive/
Note: the pattern
in the definition.json
file refers to the source bucket whereas the definition.json
itself goes into the ingestion bucket.
Add some data
Simply copy some data into the source bucket at the correct path to add it to the gharchive
table:
wget https://data.gharchive.org/2015-01-01-{15..16}.json.gz
aws s3 mv 2015-01-01-15.json.gz s3://$SNELLER_SOURCE/gharchive/
aws s3 mv 2015-01-01-16.json.gz s3://$SNELLER_SOURCE/gharchive/
Query the table
Now you can simply query the gharchive
table (from the demo
database):
curl -H "Authorization: Bearer $SNELLER_TOKEN" \
-H "Accept: application/json" \
-s "$SNELLER_ENDPOINT/query?database=demo" \
--data-raw "SELECT COUNT(*) FROM gharchive"
or do a more adventurous query …
curl -H "Authorization: Bearer $SNELLER_TOKEN" \
-H "Accept: application/x-ndjson" \
-s "$SNELLER_ENDPOINT/query?database=demo" \
--data-raw "SELECT type, COUNT(*) FROM gharchive GROUP BY type ORDER BY COUNT(*) DESC"
… and copy some more data …
wget https://data.gharchive.org/2015-01-01-{17..18}.json.gz
aws s3 mv 2015-01-01-17.json.gz s3://$SNELLER_SOURCE/gharchive/
aws s3 mv 2015-01-01-18.json.gz s3://$SNELLER_SOURCE/gharchive/
… and repeat the query (for more results) …
curl -H "Authorization: Bearer $SNELLER_TOKEN" \
-H "Accept: application/x-ndjson" \
-s "$SNELLER_ENDPOINT/query?database=demo" \
--data-raw "SELECT type, COUNT(*) FROM gharchive GROUP BY type ORDER BY COUNT(*) DESC"
Ingesting CloudTrail
For a more elaborate example, see ingesting AWS CloudTrail (and some other AWS services).
Final words
Final words: although we would hate to see you go, in case you want to tear everything down, here’s how to do it:
terraform destroy
Hasta la vista, baby 😀