We present a high-performance regular expression engine that uses 16 parallel lanes, that does not need branching or backtracking. This engine is developed for the Intel Icelake processor, and is written in AVX-512 assembly.
We present a method to perform case-insensitive comparison of UTF-8 encoded strings using 16 parallel lanes and no branching. This method is used to implement the ILIKE operator for the Intel SkylakeX processor, written in AVX-512 assembly.
This article explores the possibility of branchlessly converting multiple signed 64-bit integers to strings by taking advantage of AVX-512 extensions. Most research and implementations focus on improving the performance of converting a single value instead of performing multiple conversions at once. At Sneller, we use AVX-512 to process 16 values in parallel, and thus we would like to describe how we have done it in our query engine.
One of Sneller’s novel features is a bytecode-based virtual machine written almost entirely in AVX-512 assembly. While Sneller is far from the first project to incorporate SIMD acceleration into a query engine, our interpreter is unusual in that it is implemented entirely in assembly.