Several randomness test suites have been proposed as evaluation methods for random or pseudorandom number generators (PRNGs)[1, 2], in which randomness is tested at two levels. The first-level test is an individual test that yields p-values as well as pass or fail results for each tested sequence, and the second-level test evaluates the results of the first-level tests. As one of the second-level tests, the uniformity of p-values obtained by the first-level test was tested using the goodness-of-fit test. However, it is known that the exact distribution of p-values differs from the uniform distribution depending on the first-level test[3, 4, 5]. For the test adopted as one of the second-level tests in the test suite NIST SP80022, the effect of this difference on the test results was analyzed, and upper limits of sample size (number of tested sequences) were proposed by F. Pareschi et. al, by H. Haramoto, and . Pareschi et. al. also considered adopting the Kolmogorov-Smirnov (K-S) test as a second-level test 
, but their analysis was limited to the case where first-level tests were based on the binomial distribution.
In this study, we adopt the K-S test as the second-level test, without restricting the nature of the first-level tests. We analyze the effect of the deviation of the exact distribution of p-values from the uniform distribution on
, which is usually assumed by the null hypothesis of randomness. Therefore, we derive an inequality that provides an upper bound on the expected value of the K-S test statistic. The obtained inequality is numerically examined for a toy distribution of p-values and some of the practical first-level tests in NIST SP800-22. This inequality also allows us to estimate the maximal sample sizes required to pre-empt a high probability of incorrectly identifying an ideal generator as non-random. To improve the second-level test, we propose using the K-S test based on the empirical distribution of p-values generated by the first-level test results of ideal random sequences. In practice, we propose using pseudorandom sequences obtained from the chaotic true orbits of the Bernoulli map[6, 7] as a substitute for such ideal random sequences.
2 Second-level randomness test based on the K-S test
Using the K-S test, we can test the goodness-of-fit between the empirical distribution and the reference distribution, or between two empirical distributions.
Let be the p-values obtained by the first-level randomness test. The empirical distribution with samples is defined as
where and denotes the number of elements in a set .
Let the null hypothesis be from the reference distribution . The reference distribution is usually assumed to be a uniform distribution . However, there are some cases in which the exact distribution of the p-value is different from depending on the first-level randomness test.
The test statistic of the one-sample K-S test with reference distribution is defined as follows.
The null hypothesis is accepted if
where is the boundary value for the significant level . This boundary value can be approximated as for a large and small . The boundary values for and are given by and , respectively.
3 Inequality for the expected value of test statistic
Let be the exact distribution for . The test statistic of the K-S test with the exact reference distribution is defined as
The distribution of asymptotically obeys the Kolmogorov distribution under the null hypothesis if the exact reference distribution is continuous. If the distribution of p-values of the first-level test is discrete, is not continuous but is a piecewise constant. Following Pareschi et. al. , we also assume that the distribution of still obeys the Kolmogorov distribution, even if is piecewise constant.
In the following, we analyze the difference between the expected values of the test statistics and under this assumption. Applying the triangle inequality to the right-hand side of Equation (2), we obtain
This is a constant determined by the reference distribution and the exact distribution for the first-level test.
Considering the expectation with respect to the direct product of the measure determined by for inequality (5), we obtain the inequality
It is known that the expected value converges to the constant
when , and the constant is independent of  .
Inequality (7) implies that the difference has an upper bound of . Note that for the test, the difference between the expected value of the test statistic based on the reference distribution that differs from the exact distribution and that based on the exact distribution is proportional to .
From this perspective, the K-S test is regarded as more robust to increasing sample size than the test, because the difference in the test statistics is proportional to for the K-S test. However, for the same reason, the power of the K-S test is expected to be lower than that of the test.
Furthermore, the safety of the randomness test was evaluated using inequality (7). If the difference is admissible for , the maximum sample size within the difference is given by .
|(a) The one-sample K-S test with the uniform distribution||(b) The two-sample K-S test with the empirical distribution|
|No.||Test name||p-value||Pass Rate||p-value||Pass Rate|
|2||Block Frequency Test||0.499||0.303||10/10||10/10||0.511||0.329||10/10||10/10|
|4||Longest Run of Ones Test||0.000||0.000||0/10||0/10||0.594||0.252||10/10||10/10|
|5||Binary Matrix Rank Test||0.000||0.000||0/10||0/10||0.504||0.321||10/10||10/10|
Discrete Fourier Transform Test
|7||Non-overlapping Template Matching Test (1)||0.394||0.314||10/10||10/10||0.656||0.303||10/10||10/10|
|8||Overlapping Template Matching Test||0.000||0.000||0/10||0/10||0.636||0.260||10/10||10/10|
|9||Maurer’s ”Universal Statistical” Test||0.000||0.000||0/10||0/10||0.439||0.240||10/10||10/10|
|10||Linear Complexity Test||0.064||0.121||6/10||10/10||0.489||0.291||10/10||10/10|
|11||Serial Test (1)||0.415||0.131||10/10||10/10||0.520||0.170||10/10||10/10|
|12||Approximate Entropy Test||0.000||0.000||0/10||0/10||0.394||0.347||9/10||10/10|
|13||Cumulative Sums Test (1)||0.089||0.138||7/10||10/10||0.409||0.247||10/10||10/10|
4 Two-sample K-S test with ideal empirical distribution
A simple method to improve the K-S test based second-level test involves the use of the statistic instead of if the exact distribution is known for the target first-level test. In this case, we can obtain test statistics without the error effect. However, it is not always possible to compute the exact distribution for a given first-level test. Therefore, as another method, we examine a method that uses the empirical distribution of p-values obtained from the first-level test for ideal random sequences as the reference distribution.
Let be the p-values obtained by the first-level test for ideal or nearly ideal random sequences. By the definition, the distribution of obeys . Similar to Equation (1), the empirical distribution of is defined as
By using the two-sample K-S test, the goodness-of-fit between the empirical distribution and is also tested as a second-level randomness test. The test statistic of this two-sample K-S test is defined as
For the two-sample K-S test, the null hypothesis that , and are from the same exact distribution is accepted if
for the significance level .
where and . By providing an irrational algebraic number as an initial state , we can generate a chaotic true orbit with infinite precision. Then, we can obtain the binary sequence by assigning
5 Numerical results
5.1 Examples of second-level tests based on the K-S test
As a first numerical experiment, two second-level tests based on the K-S test were applied to some of the first-level tests in NIST SP800-22. One second-level test was based on the one-sample K-S test with the reference distribution , and the second is the second-level test based on the two-sample K-S test with the empirical distribution that was separately prepared. We performed these second-level tests ten times, wherein, for each second-level test, we used the p-values obtained by applying the first-level test to sequences with length . The tested sequences were generated by the Mersenne twister-based PRNG. The empirical distribution used as a reference was constructed based on the results of the first-level tests for the PRNG based on the chaotic true orbit of the Bernoulli map with and .
The results of the one-sample K-S test and the two-sample K-S test are shown in columns (a) and (b) of Table 1, respectively. Here, the mean and the standard deviation of the ten obtained p-values, and the pass rate of the number of passes divided by ten are shown for each randomness test. For the first-level tests of Nos. 7, 11, and 13, which consist of several tests, the result for one test is only shown as an example. The random excursions test and the random excursions variant test were excluded because the number of obtained p-values varied depending on the tested sequences. The results of the one-sample K-S test with a uniform distribution completely failed for the first-level tests of Nos. 4, 5, 6, 8, 9, and 12. However, almost all the results of the two-sample K-S test with the empirical distribution were successful. These results suggest an improvement in the second-level test using the two-sample K-S test with the empirical distribution constructed using high-quality PRNG.
5.2 Examination of the derived inequality
To examine the inequality (7), we numerically analyze the difference between test statistics and for a particular distribution under the reference distribution . As a toy model, we consider the exact distribution , which is a piecewise linear function, given by
For a given sample size and constant parameter , we randomly generate that obeys the distribution and calculate and for times. Then, we obtain the mean values and and , respectively. In Fig. 2, (circles) and (solid line) are shown for the cases and . Here, ten samples of are plotted for each . As a result, is less than for both cases and converges to with increasing for . This result is consistent with the inequality (7).
5.3 Safe sample sizes for the frequency test and the binary matrix rank test
Here, we analyze the frequency test and the binary matrix rank test shown in Table 1 as examples. The frequency test was analyzed by Pareschi et. al. as an example of tests based on binomial distribution. As a different example, we analyzed the binary matrix rank test based on the trinomial distribution. The binary matrix rank test also failed for the one-sample K-S test with a uniform distribution. For these two tests, we calculated the exact distributions for the sequence length and obtained the exact value of the constant . The statistics and , and their difference
were also calculated from the test results shown in column (a) of Table 1. Results are shown in Table 2. The range of the standard error of the mean (SEM) is also shown. The differenceis less than for both tests, and these results are consistent with the inequality (7).
For the safety of these tests, we can obtain the maximum sample size for the given admissible difference of the expected values of and , as mentioned in Section 3. For example, if , which is 10% of the boundary value , is admissible, the maximum sample size is for the frequency test and for the binary matrix rank test. Furthermore, the sample size , which is the recommended parameter of NIST SP800-22, is safe if is admissible for the frequency test, and is admissible for the binary matrix rank test.
|Frequency Test||Binary Matrix Rank Test|
In this work we derived an inequality that provides the upper bound on the difference of the expected values of the test statistics for the K-S test based second-level randomness test. The derived inequality was numerically examined and consistent results were obtained. In addition, we examined the second-level test that uses the two-sample K-S test with the nearly ideal empirical distribution constructed from the PRNG based on the chaotic true orbit for several randomness tests in NIST SP800-22. These results are expected to prove useful for evaluating the safety of the randomness test using the K-S test. We intend to perform an analysis of the other goodness-of-fit tests, such as the Crámer-von-Mises test and the Anderson-Darling test, in future work.
This work was supported by JSPS KAKENHI Grant Numbers 16KK0005, 17K00355. The computation was carried out using the computer resources offered under the category of General Projects by the Research Institute for Information Technology, Kyushu University.
-  L. E. Bassham et al., NIST SP800-22 Rev. 1a: A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications, 2010, https://csrc.nist.gov/publications/detail/sp/800-22/rev-1a/final. (accessed 27 Sep. 2021)
-  P. L’Ecuyer and R. Simard, TestU01: A C Library for Empirical Testing of Random Number Generators, ACM Transactions on Mathematical Software, 33 (2007), 1–40.
-  F. Pareschi, R. Rovatti and G. Setti, On Statistical Tests for Randomness Included in the NIST SP800-22 Test Suite and Based on the Binomial Distribution, IEEE Transactions on Information Forensics and Security, 7 (2012), 491–505.
-  H. Haramoto, Study on Upper Limit of Sample Size for a Two-level Test in NIST SP800-22, Japan J. Indust. Appl. Math., 38 (2021), 193–209.
-  A. Yamaguchi and A. Saito, On the statistical test of randomness based on the uniformity of p-values used in NIST statistical test suite (in Japanese), in: Proc. of the 2015 JSIAM Annual Meeting, pp. 34–35, JSIAM, 2015.
-  A. Saito and A. Yamaguchi, Pseudorandom Number Generation using Chaotic True Orbits of the Bernoulli Map, Chaos, 26 (2016), 063112.
-  A. Saito and A. Yamaguchi, Pseudorandom Number Generator based on the Bernoulli Map on Cubic Algebraic Integers, Chaos, 28 (2018), 103122.
-  W. H. Press et al., Numerical Recipes in C : the Art of Scientific Computing, Cambridge University Press, Cambridge, 1992.
-  G. Marsaglia, W. W. Tsang and J. Wang, Evaluating Kolmogorov’s Distribution, J. Statistical Software, 8 (2003), 1–4.
-  M. Matsumoto and T. Nishimura, A Nonempirical Test on the Weight of Pseudorandom Number Generators, in: Proc. of Monte Carlo and Quasi-Monte Carlo Methods 2000, pp.381–395, Springer, 2002.
-  A. Yamaguchi and A. Saito, Analysis of the Effect of Discreteness of the p-value Distribution on the Randomness Test using the Goodness-of-Fit Test with a Uniform Distribution (in Japanese), in: Proc. of the 2018 JSIAM Annual Meeting, pp. 123–124, JSIAM, 2018.