Independent Randomness Tests based on the Orthogonalized Non-overlapping Template Matching Test

08/20/2019 ∙ by Atsushi Iwasaki, et al. ∙ Kyoto University 0

In general, randomness tests included in a test suite are not independent of each other. This renders it difficult to fix a rational criterion through the whole test suite with an explicit significance level. In this paper, we focus on the Non-overlapping Template Matching Test, which is a randomness test included in the NIST statistical test suite. The test uses a parameter called "template" and we can consider a test item for each template. We investigate dependency between two test items by deriving the joint probability density function of the two p-values and propose a transformation to make multi test items independent of each other.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Random number sequences are widely used in many fields such as information security and various stochastic algorithms. In general, the properties of the random number sequences used in these applications affects their efficiency and effectiveness. In information security contexts including cryptography, the quality of randomness is particularly important because security strength depends on this quality. Thus, it is necessary and important to evaluate the randomness of sequences or their generators from different perspectives.

Randomness tests are one such means of evaluation. They are hypothesis tests and the null is that the given sequence is truly random. Randomness tests do not require information about the generator of the given sequence. Thus, they can be widely used without regard to the generator. Recently, Tamura and Shikano used randomness tests to inspect the properties of a quantum computer developed by IBM and showed that this computer does not work as expected [1].

There are many types of randomness tests and particular test suites have been proposed [2, 3, 4]. SP800-22 [4] published by the US National Institute of Standard and Technology (NIST) is one of the most well-known test suites. The first version was published in 2001 and revision 1a published in 2010 is currently valid. Revision 1a consists of 188 test items in the default setup that can be categorized into 15 test types. Some of the tests included therein are parametric, and so multi items can be implemented for each test.

For each test item included in SP800-22, a criterion is specified concerning whether the given sequences pass or not. However, a criterion that simultaneously covers all test items is not specified. This is a critical problem common to many test suites that needs to be addressed for the effective use of randomness tests. In general, test items included in one test suite are not independent of each other, i.e., p-values computed by multi test items do not distribute independently and the joint distribution is not known. This renders it difficult to specify a rational criterion through all test items. As an exception, Sugita proposed a test suite consisting of items that are independent of each other

[5], where p-values are computed based on “parity”. Previous studies have reported empirical results concerning the dependency among particular randomness tests [6, 7, 8, 9, 10, 11, 12, 13]. These studies were conducted under the null or under a certain alternative hypothesis. Considering dependency under an alternative hypothesis can be important from the perspective of computational burden. Notwithstanding the importance of extant work in this domain, the dependency among test items included in a given test suite has not received sufficient attention in the literature. In this paper, we aim to address this by focusing on test suite dependency under the null, culminating in the specification of a criterion through all test items.

To specify a criterion through all test items in SP800-22, it is critically important to study the Non-overlapping Template Matching Test, which is one type of randomness test included in SP800-22. This test counts the number of occurrences of a short string called “template” on a given sequence and computes a p-value based on the number. Various short strings can be used as templates, and so we can consider multi test items in the context of the Non-overlapping Template Matching Test. Such test items account for 148 of the 188 items in SP800-22. Thus, understanding the dependency among these 148 test items is fundamental for understanding the dependency among test items in SP800-22. We investigate the dependency between two Non-overlapping Template Matching Test items by deriving the joint probability density function of the two p-values. Next, we propose a transformation to render multi test items independent of each other.

The remainder of the paper is organized as follows. In section 2, we introduce the Non-overlapping Template Matching Test and discuss its soundness. In section 3, we derive the joint distribution of two Non-overlapping Template Matching Test p-values. In section 4, we propose a method to render the test items independent. Finally, in section 5 we offer conclusions.

2 Non-overlapping Template Matching Test

In this section, we introduce the Non-overlapping Template Matching Test and confirm its soundness. Here, “soundness” means that a p-value follows the uniform distribution on

under the null hypothesis, at least approximately.

2.1 Algorithm

The Non-overlapping Template Matching Test algorithm is as follows:

  1. Divide the given -bit sequence into blocks . Here, each block is -bit and .

  2. For , compute as the occurrences of a given short string called “template” on , i.e.,

    (1)

    Here, means the -bit substring from -th bit to -th bit of a string .

  3. Compute as follows:

    (2)

    where

    (3)
    (4)
  4. Perform a -test with degrees of freedom for and compute p-value .

We can use any short string as a template provided satisfies

(5)

for . Where , there are 148 templates satisfying the condition. The sample program provided by NIST uses the 148 9-bit templates in the default set up. The program performs these 148 test items as Non-overlapping Template Matching Tests, i.e., each test item is operationalized via the above algorithm using one of the 148 templates.

2.2 Soundness

To satisfy the property of soundness, is required to at least approximately follow a -distribution with degrees of freedom, under the null hypothesis. The requirement is satisfied if each

independently follows a normal distribution

, at least approximately. Since the blocks do not overlap with each other, independence is assured. Then, we show that follows . This also serves as preparation for the next section.

First, we confirm that the average and the variance of

are and , respectively. We regard as a

-bit random variable uniformly distributed on

and let be a random variable described as

(6)

Since is uniformly distributed, it is straightforward that

(7)
(8)

Using , we can represent as

(9)

Then, by (7) and (8),

(10)
(11)
(12)
(13)
(14)

Similarly, the standard deviation can be derived as follows:

(15)
(16)

By (5) and (6),

(17)

Substituting (17) into (16), we obtain

(18)
(19)
(20)

Then, approximates the variance of for sufficiently large .

Next, we confirm that approximately follows a normal distribution. We use the following theorem [14]:

Theorem. 2.1

Consider a random variable sequence satisfying the following conditions:

  • There exists a positive integer such that and are independent of each other for all .

  • For an arbitrary positive integer , the joint distribution of does not depend on .

  • .

Then, the distribution of converges to a normal distribution as .

As , the random variable sequence satisfies the three conditions in theorem 2.1. Then, by (9) it is shown that approximately follows a normal distribution.

From the above, it is proven that follows for sufficiently large . Consequently, the soundness of the Non-overlapping Template Matching Test is established.

3 Joint distribution of p-values

Consider two -bit templates and . In the following, we use the notation to denote variable corresponding to . The purpose of this section is to obtain the joint distribution of under the null hypothesis.

3.1 Derivation of the joint distribution

First, we derive the joint distribution of . Theorem 2.1 can be extended to multi-dimensional cases and thus it is shown that follows a two-dimensional normal distribution as . Since we already know that both marginal distributions follow a standard normal distribution, the joint distribution is entirely specified if we derive the correlation coefficient. Using and , we obtain

(21)
(22)
(23)

We define the following notation:

(24)

Then, we obtain

(25)

Substituting (25) into (23), we have

(26)
(27)
(28)

From the above, the correlation coefficient of the joint distribution is computed as follows:

(29)
(30)
(31)

Hereafter, we ignore the remainder term of (31).

Next, we consider the joint cumulative distribution of . Assume that random variable pairs , , , independently follow a two-dimensional normal distribution and that each marginal distribution is a standard normal distribution. We denote the correlation coefficient between and as , which does not depend on . We define and as follows:

(32)
(33)

It is straightforward that

(34)

Thus, we consider only the case that and . In the sample program provided by NIST, the number of blocks is fixed at 8, and so we adopt the assumption that is even. When

, the joint cumulative distribution function

is as follows:

(35)

where is the incomplete gamma function. Derivation of (35) is elaborated in the Appendix. Using , the joint cumulative distribution of is given as .

Finally, we obtain the joint distribution of using the joint cumulative distribution of . Assume that is the value of when the value of the corresponding p-value is . By the definition of the p-value, we have

(36)

3.2 Experiment

We generated sequences using the Mersenne twister [15]. The length of each sequence is -bit. Using the sequences we computed two-dimensional joint distributions of p-values and compared the experimental and theoretical distributions based on (36). We used templates pairs and for expository purposes. The correlation coefficients defined in the previous subsection can be computed as and .

(a) Experiment
(b) Theory
Figure 1: Joint distribution of p-values with templates 001010101 and 010101011
(a) Experiment
(b) Theory
Figure 2: Joint distribution of p-values with templates 001010101 and 101010100

The results are illustrated in Figure 1 and 2 and testify to the robustness of the theory discussed in this section. It is also shown that we cannot regard the test items as independent of each other in practical contexts.

4 Orthogonalization

For each sequence, the sample program provided by NIST uses 148 templates and computes 148 Non-overlapping Template Matching Test p-values in the default setup. Thus, we need to extend the discussion in the former section and retrieve the 148 dimensional joint distribution of the 148 p-values. However, 148 dimensions is computationally onerous, so this needs to be circumvented. To transform the 148 test items and generate new test items that are independent of each other will be one of the solutions.

4.1 Proposed method

We propose a method to transform the test items by orthogonalization of a multi-dimensional normal distribution. We consider -bit templates , , , and introduce notation as

Assume that is sufficiently large. The covariance matrix of denoted by is described as follows:

If , then the probability density function of is described as

when . Since

is a real symmetric matrix, then there exists an orthogonal matrix

such that

(37)

where is a diagonal matrix. If , then the diagonal elements of are all positive because is the covariance matrix of . Then, using a diagonal matrix described as

(38)

we get

(39)

where is a unit matrix. Using and , we transform to as

(40)

Then, we obtain the probability density function of as follows:

(41)
(42)
(43)

Then, components of follow the standard normal distribution independent of each other. For each , if we replace with , then the test items become independent of each other.

However, it is not ensured that . Indeed,

defined on 148 9-bit templates has zero eigenvalues. Consider templates

and . If , then it is ensured that an integer exists such that

(44)

as . In other words, where and occur they form a pair. Then, we have

(45)

Equation (45) is one reason why has zero eigenvalues. Thus, we need to remove at least or

from our discussion. Technically, we can identify templates to remove by checking eigenvectors corresponding to zero eigenvalues. When we consider the 148 9-bit templates, we need to remove either

or , and or . Finally, we need to remove or or or .

4.2 Experiment

We used the Mersenne twister [15] and AES-128 [16] and generated sequences for each generator. The length of the sequences is -bit. AES-128 was used with counter mode. We executed 145 test items for these sequences using 145 9-bit templates (except 100000000, 111111110 and 001010101) and compared the pre- and post-transformation results.

Figure 3: Number of rejected sequences before and after the transformation

Figure 3 shows the frequency of the sequences for the number of test items rejecting each sequence. Here, “reject” means that a p-value is less than 0.01. The black line denotes expectation values in the case that all test items are independent of each other. This result implies that we can retrieve test item series that are independent of each other, and thus the proposed transformation is effective.

5 Conclusions

We derived the joint distribution of two Non-overlapping Template Matching Test p-values under the assumption that the block size is infinity. The results suggest that we cannot regard test items as independent of each other.

We also proposed a method to remove dependency using orthogonalization of a multi-dimensional normal distribution. Experimental results testify to the efficacy of the method. Thus, it is expected that the method will contribute to fixing a rational criterion through all test items in SP800-22 and thus to the appropriate use of randomness tests.

References

  • [1] K. Tamura and Y. Shikano, “Quantum Random Numbers generated by the Cloud Superconducting Quantum Computer,” arXiv:1906.04410 (2019).
  • [2] G. Marsaglia, “DIEHARD: a battery of tests of randomness,” http://stat. fsu. edu/geo (1996).
  • [3] P. L’Ecuyer and R. Simard, “TestU01: AC library for empirical testing of random number generators,” ACM Trans. on Mathematical Software 33.4 (2007): 22.
  • [4] A. Rukhin, at el., “ A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications,” National Institute of Standards and Technology Special Publication 800-22 revision 1a (2010).
  • [5] H. Sugita, “Orthogonal test series for pseudorandomness test,” RIMS Kôkyûroku 1127 (2000): 80-87 (in Japanese).
  • [6] P. Hellekalek and S. Wegenkittl, “Empirical evidence concerning AES,” ACM Trans. on Modeling and Computer Simulation 13.4 (2003): 322-333.
  • [7] M. S. Turan, A. DoǦanaksoy and S. Boztaş., “On independence and sensitivity of statistical randomness tests,” International Conference on Sequences and Their Applications. Springer, Berlin, Heidelberg, 2008.
  • [8] A. DoǦanaksoy, B. Ege and K. Mus, “Extended results for independence and sensitivity of NIST randomness tests,” Information Security and Cryptography Conference, ISC Turkey, 2008.
  • [9] L. Fan, H. Chen and S. Gao, “A General Method to Evaluate the Correlation of Randomness Tests,” in: Information Security Applications, WISA 2013, Lecture Notes in Computer Science, vol 8267. Springer, Cham (2014).
  • [10] A. Yamaguchi and A. Saito, “Analysis of NIST Randomness test for correlative sequences generated by chaotic true orbit,” proceeding of JSIAM 2016, 2016 (in Japanese).
  • [11] F. Sulak, et al,. “On the independence of statistical randomness tests included in the NIST test suite,” Turkish Journal of Electrical Engineering & Computer Sciences 25.5 (2017): 3673-3683.
  • [12] A. DoǦanaksoy, et al., “Mutual correlation of NIST statistical randomness tests and comparison of their sensitivities on transformed sequences,” Turkish Journal of Electrical Engineering & Computer Sciences 25.2 (2017): 655-665.
  • [13] A. Iwasaki, “Analysis of NIST SP800-22 focusing on randomness of each sequence,” JSIAM Letters 10 (2018): 1-4.
  • [14]

    W. Hoeffding and R. Herbert, “The central limit theorem for dependent random variables,” Duke Mathematical Journal 15.3 (1948): 773-780.

  • [15] M. Matsumoto and T. Nishimura, “Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator,” ACM Trans. on Modeling and Computer Simulation 8.1 (1998): 3-30.
  • [16] V. Rijmen and J. Daemen, “Advanced encryption standard,” Proceedings of Federal Information Processing Standards Publications, National Institute of Standards and Technology (2001): 19-22.

Appendix Derivation of (35)

We present the derivation of (35). Assume that random variable pair follows a two-dimensional normal distribution and that each marginal distribution is a standard normal distribution. We denote the correlation coefficient between and as and assume that . Then, the distribution of is specified and the probability density function is described as

(46)

Assume that independently follow . For arbitrary positive integer , we define as

(47)

We represent the probability density function of as

, the characteristic function of

as and the joint cumulative distribution function of as . For arbitrary , the probability that or is zero, so in the following we consider the region where and .

By definition, satisfies

(48)

Then, the corresponding characteristic function is computed as follows:

(49)
(50)

Here, is the imaginary unit. We introduce the following change of variables:

(51)

Using and , we get

(52)
(53)
(54)
(55)
(56)
(57)

Since , we have

(58)
(59)

By (58) and (59),

(60)
(61)

Here, the root of a complex number is defined as

(62)

Substituting (60) and (61) into (57), we arrive at

(63)
(64)

Since are mutually independent, we obtain

(65)
(66)

Then, we have

(67)
(68)
(69)

We assume to be even as per section 3 of the main text. Since ,

(70)

Then,since , using the integration path shown in fig. 4, Jordan’s lemma can be applied as follows:

(71)
(72)
(73)
(74)
ReIm
Figure 4: Integral path

Then, we obtain

(75)