Branchless Code With AVX-512
Sneller uses 16 parallel data lanes for almost all tasks, including loading and decompressing data, all without the use of branches. We heavily rely on predicated instruction execution provided by the AVX-512 instruction set to achieve this. In this post, we will explain a simple example of converting a string to uppercase, which is frequently used in our string processing functions.