Random number sequences are widely used in many fields such as information security and various stochastic algorithms. In general, the properties of the random number sequences used in these applications affects their efficiency and effectiveness. In information security contexts including cryptography, the quality of randomness is particularly important because security strength depends on this quality. Thus, it is necessary and important to evaluate the randomness of sequences or their generators from different perspectives.
Randomness tests are one such means of evaluation. They are hypothesis tests and the null is that the given sequence is truly random. Randomness tests do not require information about the generator of the given sequence. Thus, they can be widely used without regard to the generator. Recently, Tamura and Shikano used randomness tests to inspect the properties of a quantum computer developed by IBM and showed that this computer does not work as expected .
There are many types of randomness tests and particular test suites have been proposed [2, 3, 4]. SP800-22  published by the US National Institute of Standard and Technology (NIST) is one of the most well-known test suites. The first version was published in 2001 and revision 1a published in 2010 is currently valid. Revision 1a consists of 188 test items in the default setup that can be categorized into 15 test types. Some of the tests included therein are parametric, and so multi items can be implemented for each test.
For each test item included in SP800-22, a criterion is specified concerning whether the given sequences pass or not. However, a criterion that simultaneously covers all test items is not specified. This is a critical problem common to many test suites that needs to be addressed for the effective use of randomness tests. In general, test items included in one test suite are not independent of each other, i.e., p-values computed by multi test items do not distribute independently and the joint distribution is not known. This renders it difficult to specify a rational criterion through all test items. As an exception, Sugita proposed a test suite consisting of items that are independent of each other, where p-values are computed based on “parity”. Previous studies have reported empirical results concerning the dependency among particular randomness tests [6, 7, 8, 9, 10, 11, 12, 13]. These studies were conducted under the null or under a certain alternative hypothesis. Considering dependency under an alternative hypothesis can be important from the perspective of computational burden. Notwithstanding the importance of extant work in this domain, the dependency among test items included in a given test suite has not received sufficient attention in the literature. In this paper, we aim to address this by focusing on test suite dependency under the null, culminating in the specification of a criterion through all test items.
To specify a criterion through all test items in SP800-22, it is critically important to study the Non-overlapping Template Matching Test, which is one type of randomness test included in SP800-22. This test counts the number of occurrences of a short string called “template” on a given sequence and computes a p-value based on the number. Various short strings can be used as templates, and so we can consider multi test items in the context of the Non-overlapping Template Matching Test. Such test items account for 148 of the 188 items in SP800-22. Thus, understanding the dependency among these 148 test items is fundamental for understanding the dependency among test items in SP800-22. We investigate the dependency between two Non-overlapping Template Matching Test items by deriving the joint probability density function of the two p-values. Next, we propose a transformation to render multi test items independent of each other.
The remainder of the paper is organized as follows. In section 2, we introduce the Non-overlapping Template Matching Test and discuss its soundness. In section 3, we derive the joint distribution of two Non-overlapping Template Matching Test p-values. In section 4, we propose a method to render the test items independent. Finally, in section 5 we offer conclusions.
2 Non-overlapping Template Matching Test
In this section, we introduce the Non-overlapping Template Matching Test and confirm its soundness. Here, “soundness” means that a p-value follows the uniform distribution on
under the null hypothesis, at least approximately.
The Non-overlapping Template Matching Test algorithm is as follows:
Divide the given -bit sequence into blocks . Here, each block is -bit and .
For , compute as the occurrences of a given short string called “template” on , i.e.,
Here, means the -bit substring from -th bit to -th bit of a string .
Compute as follows:
Perform a -test with degrees of freedom for and compute p-value .
We can use any short string as a template provided satisfies
for . Where , there are 148 templates satisfying the condition. The sample program provided by NIST uses the 148 9-bit templates in the default set up. The program performs these 148 test items as Non-overlapping Template Matching Tests, i.e., each test item is operationalized via the above algorithm using one of the 148 templates.
To satisfy the property of soundness, is required to at least approximately follow a -distribution with degrees of freedom, under the null hypothesis. The requirement is satisfied if each
independently follows a normal distribution, at least approximately. Since the blocks do not overlap with each other, independence is assured. Then, we show that follows . This also serves as preparation for the next section.
First, we confirm that the average and the variance ofare and , respectively. We regard as a
-bit random variable uniformly distributed onand let be a random variable described as
Since is uniformly distributed, it is straightforward that
Using , we can represent as
Similarly, the standard deviation can be derived as follows:
Then, approximates the variance of for sufficiently large .
Next, we confirm that approximately follows a normal distribution. We use the following theorem :
Consider a random variable sequence satisfying the following conditions:
There exists a positive integer such that and are independent of each other for all .
For an arbitrary positive integer , the joint distribution of does not depend on .
Then, the distribution of converges to a normal distribution as .
From the above, it is proven that follows for sufficiently large . Consequently, the soundness of the Non-overlapping Template Matching Test is established.
3 Joint distribution of p-values
Consider two -bit templates and . In the following, we use the notation to denote variable corresponding to . The purpose of this section is to obtain the joint distribution of under the null hypothesis.
3.1 Derivation of the joint distribution
First, we derive the joint distribution of . Theorem 2.1 can be extended to multi-dimensional cases and thus it is shown that follows a two-dimensional normal distribution as . Since we already know that both marginal distributions follow a standard normal distribution, the joint distribution is entirely specified if we derive the correlation coefficient. Using and , we obtain
We define the following notation:
Then, we obtain
From the above, the correlation coefficient of the joint distribution is computed as follows:
Hereafter, we ignore the remainder term of (31).
Next, we consider the joint cumulative distribution of . Assume that random variable pairs , , , independently follow a two-dimensional normal distribution and that each marginal distribution is a standard normal distribution. We denote the correlation coefficient between and as , which does not depend on . We define and as follows:
It is straightforward that
Thus, we consider only the case that and . In the sample program provided by NIST, the number of blocks is fixed at 8, and so we adopt the assumption that is even. When
, the joint cumulative distribution functionis as follows:
where is the incomplete gamma function. Derivation of (35) is elaborated in the Appendix. Using , the joint cumulative distribution of is given as .
Finally, we obtain the joint distribution of using the joint cumulative distribution of . Assume that is the value of when the value of the corresponding p-value is . By the definition of the p-value, we have
We generated sequences using the Mersenne twister . The length of each sequence is -bit. Using the sequences we computed two-dimensional joint distributions of p-values and compared the experimental and theoretical distributions based on (36). We used templates pairs and for expository purposes. The correlation coefficients defined in the previous subsection can be computed as and .
For each sequence, the sample program provided by NIST uses 148 templates and computes 148 Non-overlapping Template Matching Test p-values in the default setup. Thus, we need to extend the discussion in the former section and retrieve the 148 dimensional joint distribution of the 148 p-values. However, 148 dimensions is computationally onerous, so this needs to be circumvented. To transform the 148 test items and generate new test items that are independent of each other will be one of the solutions.
4.1 Proposed method
We propose a method to transform the test items by orthogonalization of a multi-dimensional normal distribution. We consider -bit templates , , , and introduce notation as
Assume that is sufficiently large. The covariance matrix of denoted by is described as follows:
If , then the probability density function of is described as
when . Since
is a real symmetric matrix, then there exists an orthogonal matrixsuch that
where is a diagonal matrix. If , then the diagonal elements of are all positive because is the covariance matrix of . Then, using a diagonal matrix described as
where is a unit matrix. Using and , we transform to as
Then, we obtain the probability density function of as follows:
Then, components of follow the standard normal distribution independent of each other. For each , if we replace with , then the test items become independent of each other.
However, it is not ensured that . Indeed,
defined on 148 9-bit templates has zero eigenvalues. Consider templatesand . If , then it is ensured that an integer exists such that
as . In other words, where and occur they form a pair. Then, we have
Equation (45) is one reason why has zero eigenvalues. Thus, we need to remove at least or
from our discussion. Technically, we can identify templates to remove by checking eigenvectors corresponding to zero eigenvalues. When we consider the 148 9-bit templates, we need to remove eitheror , and or . Finally, we need to remove or or or .
We used the Mersenne twister  and AES-128  and generated sequences for each generator. The length of the sequences is -bit. AES-128 was used with counter mode. We executed 145 test items for these sequences using 145 9-bit templates (except 100000000, 111111110 and 001010101) and compared the pre- and post-transformation results.
Figure 3 shows the frequency of the sequences for the number of test items rejecting each sequence. Here, “reject” means that a p-value is less than 0.01. The black line denotes expectation values in the case that all test items are independent of each other. This result implies that we can retrieve test item series that are independent of each other, and thus the proposed transformation is effective.
We derived the joint distribution of two Non-overlapping Template Matching Test p-values under the assumption that the block size is infinity. The results suggest that we cannot regard test items as independent of each other.
We also proposed a method to remove dependency using orthogonalization of a multi-dimensional normal distribution. Experimental results testify to the efficacy of the method. Thus, it is expected that the method will contribute to fixing a rational criterion through all test items in SP800-22 and thus to the appropriate use of randomness tests.
-  K. Tamura and Y. Shikano, “Quantum Random Numbers generated by the Cloud Superconducting Quantum Computer,” arXiv:1906.04410 (2019).
-  G. Marsaglia, “DIEHARD: a battery of tests of randomness,” http://stat. fsu. edu/geo (1996).
-  P. L’Ecuyer and R. Simard, “TestU01: AC library for empirical testing of random number generators,” ACM Trans. on Mathematical Software 33.4 (2007): 22.
-  A. Rukhin, at el., “ A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications,” National Institute of Standards and Technology Special Publication 800-22 revision 1a (2010).
-  H. Sugita, “Orthogonal test series for pseudorandomness test,” RIMS Kôkyûroku 1127 (2000): 80-87 (in Japanese).
-  P. Hellekalek and S. Wegenkittl, “Empirical evidence concerning AES,” ACM Trans. on Modeling and Computer Simulation 13.4 (2003): 322-333.
-  M. S. Turan, A. DoǦanaksoy and S. Boztaş., “On independence and sensitivity of statistical randomness tests,” International Conference on Sequences and Their Applications. Springer, Berlin, Heidelberg, 2008.
-  A. DoǦanaksoy, B. Ege and K. Mus, “Extended results for independence and sensitivity of NIST randomness tests,” Information Security and Cryptography Conference, ISC Turkey, 2008.
-  L. Fan, H. Chen and S. Gao, “A General Method to Evaluate the Correlation of Randomness Tests,” in: Information Security Applications, WISA 2013, Lecture Notes in Computer Science, vol 8267. Springer, Cham (2014).
-  A. Yamaguchi and A. Saito, “Analysis of NIST Randomness test for correlative sequences generated by chaotic true orbit,” proceeding of JSIAM 2016, 2016 (in Japanese).
-  F. Sulak, et al,. “On the independence of statistical randomness tests included in the NIST test suite,” Turkish Journal of Electrical Engineering & Computer Sciences 25.5 (2017): 3673-3683.
-  A. DoǦanaksoy, et al., “Mutual correlation of NIST statistical randomness tests and comparison of their sensitivities on transformed sequences,” Turkish Journal of Electrical Engineering & Computer Sciences 25.2 (2017): 655-665.
-  A. Iwasaki, “Analysis of NIST SP800-22 focusing on randomness of each sequence,” JSIAM Letters 10 (2018): 1-4.
W. Hoeffding and R. Herbert, “The central limit theorem for dependent random variables,” Duke Mathematical Journal 15.3 (1948): 773-780.
-  M. Matsumoto and T. Nishimura, “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator,” ACM Trans. on Modeling and Computer Simulation 8.1 (1998): 3-30.
-  V. Rijmen and J. Daemen, “Advanced encryption standard,” Proceedings of Federal Information Processing Standards Publications, National Institute of Standards and Technology (2001): 19-22.
Appendix Derivation of (35)
We present the derivation of (35). Assume that random variable pair follows a two-dimensional normal distribution and that each marginal distribution is a standard normal distribution. We denote the correlation coefficient between and as and assume that . Then, the distribution of is specified and the probability density function is described as
Assume that independently follow . For arbitrary positive integer , we define as
We represent the probability density function of as
, the characteristic function ofas and the joint cumulative distribution function of as . For arbitrary , the probability that or is zero, so in the following we consider the region where and .
By definition, satisfies
Then, the corresponding characteristic function is computed as follows:
Here, is the imaginary unit. We introduce the following change of variables:
Using and , we get
Since , we have
Here, the root of a complex number is defined as
Since are mutually independent, we obtain
Then, we have
We assume to be even as per section 3 of the main text. Since ,
Then，since , using the integration path shown in fig. 4, Jordan’s lemma can be applied as follows:
Then, we obtain