Random problems with R

by   Kellie Ottoboni, et al.

R (Version 3.5.1 patched) has an issue with its random sampling functionality. R generates random integers between 1 and m by multiplying random floats by m, taking the floor, and adding 1 to the result. Well-known quantization effects in this approach result in a non-uniform distribution on { 1, …, m}. The difference, which depends on m, can be substantial. Because the sample function in R relies on generating random integers, random sampling in R is biased. There is an easy fix: construct random integers directly from random bits, rather than multiplying a random float by m. That is the strategy taken in Python's numpy.random.randint() function, among others. Example source code in Python is available at https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py (see functions getrandbits() and randbelow_from_randbits()).



There are no comments yet.


page 1

page 2

page 3

page 4


PyPhi: A toolbox for integrated information theory

Integrated information theory provides a mathematical framework to fully...

LENs: a Python library for Logic Explained Networks

LENs is a Python module integrating a variety of state-of-the-art approa...

Automating Augmentation Through Random Unidimensional Search

It is no secret amongst deep learning researchers that finding the right...

gnlse-python: Open Source Software to Simulate Nonlinear Light Propagation In Optical Fibers

The propagation of pulses in optical fibers is described by the generali...

A computational approach to the Kiefer-Weiss problem for sampling from a Bernoulli population

We present a computational approach to solution of the Kiefer-Weiss prob...

ROS Rescue : Fault Tolerance System for Robot Operating System

In this chapter we discuss the problem of master failure in ROS1.0 and i...

Interpolating Points on a Non-Uniform Grid using a Mixture of Gaussians

In this work, we propose an approach to perform non-uniform image interp...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


  • Goldberg [1991] D. Goldberg.

    What every computer scientist should know about floating-point arithmetic.

    ACM Computing Surveys, 23:5–48, 1991.
  • Knuth [1997] Donald E. Knuth. Art of Computer Programming, Volume 2: Seminumerical Algorithms. Addison-Wesley Professional, Reading, Mass, 3 edition edition, November 1997. ISBN 978-0-201-89684-8.
  • Matsumoto and Nishimura [1998] M. Matsumoto and T. Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans. on Modeling and Computer Simulation, 8:3–30, 1998. doi: 10.1145/272991.272995.
  • R Core Team [2018] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2018. URL https://www.R-project.org.