Random problems with R

09/18/2018 ∙ by Kellie Ottoboni, et al. ∙ 0

R (Version 3.5.1 patched) has two issues with its random sampling functionality. First, it uses a version of the Mersenne Twister known to have a seeding problem, which was corrected by the authors of the Mersenne Twister in 2002. Updated C source code is available at http://www.math.sci.hiroshima-u.ac.jp/ m-mat/MT/MT2002/CODES/mt19937ar.c. Second, R generates random integers between 1 and m by multiplying random floats by m, taking the floor, and adding 1 to the result. Well-known quantization effects in this approach result in a non-uniform distribution on { 1, ..., m}. The difference, which depends on m, can be substantial. Because the sample function in R relies on generating random integers, random sampling in R is biased. There is an easy fix: construct random integers directly from random bits, rather than multiplying a random float by m. That is the strategy taken in Python's numpy.random.randint() function, among others. Example source code in Python is available at https://github.com/statlab/cryptorandom/blob/master/cryptorandom/cryptorandom.py (see functions getrandbits() and randbelow_from_randbits()).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

References

  • Goldberg [1991] D. Goldberg.

    What every computer scientist should know about floating-point arithmetic.

    ACM Computing Surveys, 23:5–48, 1991.
  • Knuth [1997] Donald E. Knuth. Art of Computer Programming, Volume 2: Seminumerical Algorithms. Addison-Wesley Professional, Reading, Mass, 3 edition edition, November 1997. ISBN 978-0-201-89684-8.
  • Matsumoto and Nishimura [1998] M. Matsumoto and T. Nishimura. Mersenne twister: A 623-dimensionally equidistributed uniform pseudorandom number generator. ACM Trans. on Modeling and Computer Simulation, 8:3–30, 1998. doi: 10.1145/272991.272995.
  • R Core Team [2018] R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2018. URL https://www.R-project.org.