Content on this page
Introduction
In the previous part of the walkthrough we learned how to
store the data in object storage (i.e. AWS S3) and to ensure that the data is
highly available and stored securely. In this part we will move away from using
sdb
as a query engine and start using the Sneller daemon instead.
Using the Sneller daemon requires a bit more knowledge and some decisions, so this walkthrough is more theoretical and doesn’t provide any sample queries. But don’t worry, the next part is hands-on again.
Sneller daemon
The Sneller daemon uses the exact same query engine as sdb
, but it allows
distributing the workload to execute a query across multiple nodes. The daemon
can be split in two parts:
- The daemon provides an HTTP endpoint that allows query execution. This part also performs query planning and distributes the query across the other daemons. Finally, it combines all the partials results and passes it back to the client.
- The worker part is waiting for workloads that are distributed by the daemon. It does the actual data crunching for the query execution.
The Sneller daemon combines both roles in a single executable and all nodes are considered equal. To provide high availability, the Sneller daemon is often using a load-balancer in front to distribute the load across the various instances.
Using multiple Sneller daemons ensures that Sneller is highly available and makes it a cloud-native and robust query engine that scales linearly and can handle huge amounts of data. Another benefit is that each Sneller node caches data in memory to reduce the network I/O traffic between the Sneller nodes and object storage.
Sneller Cloud
The easiest way to use the Sneller daemon is to sign-up with Sneller and let us manage the Sneller daemons. The Sneller Cloud is a SaaS offering that allows you to use our servers on a pay-per-request basis. This means that you only pay for the queries that you are running.
A full example on how to register and set up your first table is shown in this tutorial.
On-premise hosting
Sneller also allows you to run your own cluster. The most convenient method is to use our Kubernetes Helm package to deploy Sneller in your Kubernetes cluster. It’s technically possible to run it on barebone servers, but using Kubernetes is the only supported scenario.
A full example on how to use Sneller in your own EKS cluster is shown in this tutorial.
Comparison
Sneller Cloud has some distinct advantages over on-premise hosting. Because of its pay-per-byte-scanned pricing (at $50/PB or $0.05/TB) it will generally be easier and cheaper to run compared to do-it-yourself on-premise hosting.
Sneller Cloud | On-premise hosting | |
---|---|---|
Pricing | Pay per byte scanned | Free1 |
Scaling | Automatic | You need to manually scale your cluster based on the current load |
Management | Managed | Unmanaged, so you need to take care of updating your instances manually (i.e. OS patching, updating Sneller, etc.) |
Ingestion | Automatic2 | Manually call sdb sync to ingest new data |
Next…
In the next part of the series we will show you how to run a cluster of Sneller daemons using your own infrastructure via Kubernetes.