Random Sampling: Practice Makes Imperfect

10/25/2018
by   Philip B. Stark, et al.
0

The pseudo-random number generators (PRNGs), sampling algorithms, and algorithms for generating random integers in some common statistical packages and programming languages are unnecessarily inaccurate, by an amount that may matter for statistical inference. Most use PRNGs with state spaces that are too small for contemporary sampling problems and methods such as the bootstrap and permutation tests. The random sampling algorithms in many packages rely on the false assumption that PRNGs produce IID U[0, 1) outputs. The discreteness of PRNG outputs and the limited state space of common PRNGs cause those algorithms to perform poorly in practice. Statistics packages and scientific programming languages should use cryptographically secure PRNGs by default (not for their security properties, but for their statistical ones), and offer weaker PRNGs only as an option. Software should not use methods that assume PRNG outputs are IID U[0,1) random variables, such as generating a random sample by permuting the population and taking the first k items or generating random integers by multiplying a pseudo-random binary fraction or float by a constant and rounding the result. More accurate methods are available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2020

A practical approach to testing random number generators in computer algebra systems

This paper has a practical aim. For a long time, implementations of pseu...
research
09/20/2021

Benchmarking the Status of Default Pseudorandom Number Generators in Common Programming Languages

The ever-increasing need for random numbers is clear in many areas of co...
research
06/06/2018

On Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms

Despite of the recent successes of probabilistic programming languages (...
research
06/06/2018

Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms

Despite the recent successes of probabilistic programming languages (PPL...
research
06/02/2021

semopy 2: A Structural Equation Modeling Package with Random Effects in Python

Structural Equation Modeling (SEM) is an umbrella term that includes num...
research
11/17/2020

Sampling with censored data: a practical guide

In this review, we present a simple guide for researchers to obtain pseu...
research
05/28/2018

Fast Random Integer Generation in an Interval

In simulations, probabilistic algorithms and statistical tests, we often...

Please sign up or login with your details

Forgot password? Click here to reset