Integer division is an arithmetic operation that is not provided natively by SIMD instruction set extensions. In this article we provide a vectorized solution to successfully divide signed 64-bit integers by taking advantage of AVX-512
Sneller uses 16 parallel data lanes for almost all tasks, including loading and decompressing data, all without the use of branches. We heavily rely on predicated instruction execution provided by the AVX-512 instruction set to achieve this. In this post, we will explain a simple example of converting a string to uppercase, which is frequently used in our string processing functions.
We present a high-performance regular expression engine that uses 16 parallel lanes, that does not need branching or backtracking. This engine is developed for the Intel Icelake processor, and is written in AVX-512 assembly.
We present a method to perform case-insensitive comparison of UTF-8 encoded strings using 16 parallel lanes and no branching. This method is used to implement the ILIKE operator for the Intel SkylakeX processor, written in AVX-512 assembly.
This article explores the possibility of branchlessly converting multiple signed 64-bit integers to strings by taking advantage of AVX-512 extensions. Most research and implementations focus on improving the performance of converting a single value instead of performing multiple conversions at once. At Sneller, we use AVX-512 to process 16 values in parallel, and thus we would like to describe how we have done it in our query engine.
One of Sneller’s novel features is a bytecode-based virtual machine written almost entirely in AVX-512 assembly. While Sneller is far from the first project to incorporate SIMD acceleration into a query engine, our interpreter is unusual in that it is implemented entirely in assembly.