Efficient Computation of Positional Population Counts Using SIMD Instructions

11/07/2019
by   Marcus D. R. Klarqvist, et al.
0

In several fields such as statistics, machine learning, and bioinformatics, categorical variables are frequently represented as one-hot encoded vectors. For example, given 8 distinct values, we map each value to a byte where only a single bit has been set. We are motivated to quickly compute statistics over such encodings. Given a stream of k-bit words, we seek to compute k distinct sums corresponding to bit values at indexes 0, 1, 2, ..., k-1. If the k-bit words are one-hot encoded then the sums correspond to a frequency histogram. This multiple-sum problem is a generalization of the population-count problem where we seek the sum of all bit values. Accordingly, we refer to the multiple-sum problem as a positional population-count. Using SIMD (Single Instruction, Multiple Data) instructions from recent Intel processors, we describe algorithms for computing the 16-bit position population count using less than half of a CPU cycle per 16-bit word. Our best approach uses up to 400 times fewer instructions and is up to 50 times faster than baseline code using only regular (non-SIMD) instructions, for sufficiently large inputs.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

12/12/2021

Faster-Than-Native Alternatives for x86 VP2INTERSECT Instructions

We present faster-than-native alternatives for the full AVX512-VP2INTERS...
05/15/2018

On the complexity of the correctness problem for non-zeroness test instruction sequences

In this paper, we consider the programming of the function on bit string...
04/18/2019

Quantitative Expressiveness of Instruction Sequence Classes for Computation on Single Bit Registers

The number of instructions of an instruction sequence is taken for its l...
10/02/2019

Base64 encoding and decoding at almost the speed of a memory copy

Many common document formats on the Internet are text-only such as email...
02/27/2018

Constructing graphs with limited resources

We discuss the amount of physical resources required to construct a give...
03/30/2017

Faster Base64 Encoding and Decoding Using AVX2 Instructions

Web developers use base64 formats to include images, fonts, sounds and o...
09/25/2017

Stream VByte: Faster Byte-Oriented Integer Compression

Arrays of integers are often compressed in search engines. Though there ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.