Efficient Computation of Positional Population Counts Using SIMD Instructions

by   Marcus D. R. Klarqvist, et al.

In several fields such as statistics, machine learning, and bioinformatics, categorical variables are frequently represented as one-hot encoded vectors. For example, given 8 distinct values, we map each value to a byte where only a single bit has been set. We are motivated to quickly compute statistics over such encodings. Given a stream of k-bit words, we seek to compute k distinct sums corresponding to bit values at indexes 0, 1, 2, ..., k-1. If the k-bit words are one-hot encoded then the sums correspond to a frequency histogram. This multiple-sum problem is a generalization of the population-count problem where we seek the sum of all bit values. Accordingly, we refer to the multiple-sum problem as a positional population-count. Using SIMD (Single Instruction, Multiple Data) instructions from recent Intel processors, we describe algorithms for computing the 16-bit position population count using less than half of a CPU cycle per 16-bit word. Our best approach uses up to 400 times fewer instructions and is up to 50 times faster than baseline code using only regular (non-SIMD) instructions, for sufficiently large inputs.



There are no comments yet.


page 1


Faster-Than-Native Alternatives for x86 VP2INTERSECT Instructions

We present faster-than-native alternatives for the full AVX512-VP2INTERS...

On the complexity of the correctness problem for non-zeroness test instruction sequences

In this paper, we consider the programming of the function on bit string...

Quantitative Expressiveness of Instruction Sequence Classes for Computation on Single Bit Registers

The number of instructions of an instruction sequence is taken for its l...

Base64 encoding and decoding at almost the speed of a memory copy

Many common document formats on the Internet are text-only such as email...

Constructing graphs with limited resources

We discuss the amount of physical resources required to construct a give...

Faster Base64 Encoding and Decoding Using AVX2 Instructions

Web developers use base64 formats to include images, fonts, sounds and o...

Stream VByte: Faster Byte-Oriented Integer Compression

Arrays of integers are often compressed in search engines. Though there ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.