Squares: A Fast Counter-Based RNG

by   Bernard Widynski, et al.

In this article, we present a new counter-based random number generator (RNG) based on John von Neumann's middle square. We've discovered that only three rounds of squaring are sufficient to provide satisfactory random data. This appears to be one of the fastest counter-based RNGs.



There are no comments yet.


page 1

page 2

page 3

page 4


Some complete ω-powers of a one-counter language, for any Borel class of finite rank

We prove that, for any natural number n ≥ 1, we can find a finite alphab...

Countdown games, and simulation on (succinct) one-counter nets

We answer an open complexity question by Hofman, Lasota, Mayr, Totzke (L...

The Well Structured Problem for Presburger Counter Machines

We introduce the well structured problem as the question of whether a mo...

Explaining with Counter Visual Attributes and Examples

In this paper, we aim to explain the decisions of neural networks by uti...

Deep counter networks for asynchronous event-based processing

Despite their advantages in terms of computational resources, latency, a...

Towards Knowledge-Grounded Counter Narrative Generation for Hate Speech

Tackling online hatred using informed textual responses - called counter...

Cyclic Sequence Generators as Program Counters for High-Speed FPGA-based Processors

This paper compares the performance of conventional radix-2 program coun...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


1 Introduction

In 2011, D. E. ShawResearch published “Parallel Random Numbers: As Easy as 1, 2, 3”[Salmon]. A new type of RNG was introduced, the counter-based RNG. It is distinguished from the conventional RNG in that there is no state. Random numbers are generated only using a counter. The Philox 4x32-10 RNG described in the paper has been installed in MATLAB, NVIDIA’s cuRAND, and Intel’s MKL. Philox generates random data with ten rounds of computation. In this paper, we propose a new RNG that uses John von Neumann’s middle-square transformation [Neumann]. This new RNG which we will call “Squares”uses only three rounds of squaring. It is twice as fast as Philox and produces data of equivalent or better quality.

2 Algorithm

The squares RNG was derived using ideas from “Middle Square Weyl Sequence RNG”[Widynski]. The msws generator uses a half-square implementation. That is, only half of the actual square is computed. The upper bits of this half square are the “middle”that is returned. These middle bits are easily obtained by either rotating or shifting the result. The middle square provides the randomization. Uniformity and period length are obtained by adding in a Weyl sequence.

For the squares RNG, we replaced the Weyl sequence (w += s) with a counter multiplied by a key. This turns out to be in effect the same thing. Mathematically, (w += s) is equivalent to w = i * s mod for i = 0 to . That is, i * s will produce the same sequence as (w += s). In place of i and s, we use a counter and a key. So, if we add counter * key to a square, we should see the same effect as adding a Weyl sequence. The output will be uniform and random numbers will be available per key. In the squares RNG, several rounds of squaring and adding are computed and the result is returned. Three rounds have been shown to be sufficient to pass the statistical tests. The squares RNG in C is shown below.

inline static uint32_t squares(uint64_t ctr, uint64_t key) {
   uint64_t x, y, z;
   y = x = ctr * key; z = y + key;
   x = x*x + y; x = (x>>32) | (x<<32);        /* round 1 */
   x = x*x + z; x = (x>>32) | (x<<32);        /* round 2 */
   return (x*x + y) >> 32;                    /* round 3 */

3 Discussion

We used the parameters “ctr”and “key”to be consistent with Philox parameters. This generator would be used in a similar way as Philox.

After computing the square, a rotation (circular shift) by 32 bits is performed. This is done to position the random data into the lower 32 bits which results in a better randomization on the next round.

The key should be an irregular bit pattern with roughly half ones and half zeros. A utility in the software download111Software download available at http://squaresrng.wixsite.com/rand is provided to create such keys. The keys are chosen so the the upper 8 digits are different and also that the lower 8 digits are different. Different digits assure sufficient change when adding ctr*key on each invocation of the RNG.

For keys generated by the key utility, either ctr*key or (ctr+1)*key will have non-zero digits. The variables y and z store these values. Adding one or the other will assure non-zero digits in the computation. This improves the randomization and also provides uniformity.

Since the counter is a 64-bit integer, one can generate random numbers per key. Even on modern super-computers, is sufficient for most usages. Assume we have a super-computer with 10 million cores. If the is divided equally among all the cores, one could provide a stream of about 1.8 trillion random numbers per core. Should longer streams be needed, of course, one could use more keys, but is is likely that for most usages a single key would be adequate. The key utility provided in the software download11footnotemark: 1 can provide about 2 billion keys. This should be a sufficient number for quite some time in the future.

4 Statistical and Timing Tests

The test validation for squares was similar to the validation for Philox. ShawResearch stated that Philox was subjected to at least 89 BigCrush [Lecuyera] tests. For squares, we ran 300 BigCrush tests using random keys. We also ran inter-stream correlation tests, subset of counter space tests, counter tests with increments other than one, bits-reversed tests, and a basic uniformity test, all with no failures. Similar to Philox the squares RNG is “crush-resistant”. Additionally, we subjected squares to the PractRand [Dotyhumphrey] test. 300 PractRand tests with random keys were run to 256 gigabytes, with no failures.

The time to generate one billion random numbers was computed using an Intel Core i7-9700 3.0 GHz processor running Cygwin64. The time for Philox was 2.21 sec. The time for squares was 1.04 sec.

5 Summary

In this paper, we briefly described a new counter-based RNG based on John von Neumann’s middle square. Counter-based RNGs have no state and are well suited to parallel computing. Many of the ideas from the “Middle Square Weyl Sequence RNG”were used in this new generator. We discovered that with only three rounds of squaring we could obtain satisfactory data. The squares RNG was subjected to similar testing as Philox and shows twice the speed with equivalent or better data.

A software package with example programs is available at


I would like to thank ShawResearch for creating the counter-based RNG. To those people who helped bring about the Middle Square Weyl Sequence RNG, I remain thankful.

Also, I think I might mention the following. I had not actually been working on RNGs for some time. I was daydreaming and reminiscing about things I had worked on and for some reason remembered Prof. Knuth, the author of “The Art of Computer Programming”[Knuth]. I soon found myself questioning if one could create a counter-based RNG using the middle square. I tried some ideas on my home computer and after a few attempts had passed BigCrush. Merely remembering Prof. Knuth led to this investigation. We didn’t actually interact. Nevertheless, I think I should give Knuth credit for inspiration.

Also, I think I’ll mention one more thing. Similar to the Middle Square Weyl Sequence RNG, I encountered a problem with uniformity. One case had too many zeros in the output. I didn’t see an easy fix. However, I was reminded of the 3n+1 problem, also known as the Ulam conjecture or Collatz conjecture. After remembering this, a simple solution occurred to me. All we needed to do was add (ctr+1) * key. This solved the problem. The similarity is the +1. Again, no actual interaction, but I think that Ulam in some indirect sense inspired this solution and I am thankful.


Appendix - Four Round Version

In this appendix we present a four round version of the squares RNG. It was subjected to the same statistical tests as the three round version. It is presented here because of a potential for future usage. With an extra round of squaring it may be more likely to pass some (as of yet unknown) statistical test. This version is slower than the three round version, but still faster than Philox. It can generate one billion random numbers in 1.34 sec on an Intel Core i7-9700 3.0 GHz processor running Cygwin64.

inline static uint32_t squares(uint64_t ctr, uint64_t key) {
   uint64_t x, y, z;
   y = x = ctr * key; z = y + key;
   x = x*x + y; x = (x>>32) | (x<<32);        /* round 1 */
   x = x*x + z; x = (x>>32) | (x<<32);        /* round 2 */
   x = x*x + y; x = (x>>32) | (x<<32);        /* round 3 */
   return (x*x + z) >> 32;                    /* round 4 */