An extension of the Plancherel measure

05/17/2018 ∙ by Miklós Arató, et al. ∙ National Science Foundation Eötvös Loránd University National Bank of Hungary University of Delaware 0

Given a distribution in the unite square and having iid sample from it the first question what a statistician might do to test the hypothesis that the sample is iid. For this purpose an extension of the Plancherel measure is introduced. Recent literature on asymptotic behavior of Plancherel measure is discussed with extension to the new set up. Models for random permutations are described and the power of different tests is compared.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 11

page 13

page 14

page 19

page 20

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Let

be iid uniform random variables on

. Step by step we define the process

for all starting with . Here is the number of levels, are the Young numbers and are the new positions for the numbers Note that the data do not change in course of the algorithm only we introduce a combinatorial structure reflecting their value and order. The process is a dynamically structured ordered sample.

The random variables generated by the sequence will be monotone non decreasing and for each level the positions have the property of monotonicity;

For a demonstration, let us look the first two steps of generating . For we have and Next there are two possible cases:

— if then and

otherwise.

The narrative of the algorithm is the following. For each the new number sits down on the first level to its place assigned according to its order of magnitude among the -s. If there are any on the first level greater then the newcomer then pushes up the first being larger then . On the next level the algorithm is repeated: the new number sits down on the place ordered for it according to its magnitude and pushes up the first on its level being larger.

For we denote by the level of the th number in the ordered sample after arriving the number and by the whole sequence.

For example if and then . Let us collect first the indices where the value of equals one:

1 3 4 6

next where :

2 7 10

next for :

5 8

finally

9

and

11

for and . The result is called standard Young tableau: it is monotone increasing both in row and column. These numbers are determined by the rank numbers of the -s: . We call the sequence generated by the rank numbers the level process of a permutation. Next table shows the evolution of the sequences. Here, for the sake of clarity, we use the inverse rank numbers in place of the sliding elements and we supplant with zeros the sleeping ones.

1 2 3 4 5 6 7 8 9 10 11
- - - - - - - - - - -
0 0 0 0 1 0 0 0 0 0 0
- - - - - - - - - - -
0 0 0 0 2 0 0 0 0 0 0
0 2 0 0 1 0 0 0 0 0 0
- - - - - - - - - - -
0 0 0 0 2 0 0 0 0 0 0
0 2 0 0 1 0 0 0 0 0 3
- - - - - - - - - - -
0 0 0 0 2 0 0 0 0 0 4
0 2 0 0 1 0 0 0 4 0 3
- - - - - - - - - - -
0 0 0 0 0 0 0 0 0 0 5
0 0 0 0 2 0 0 0 5 0 4
0 2 0 0 1 0 0 5 4 0 3
- - - - - - - - - - -
0 0 0 0 0 0 0 0 0 0 6
0 0 0 0 6 0 0 0 0 0 5
0 6 0 0 2 0 0 0 5 0 4
6 2 0 0 1 0 0 5 4 0 3
- - - - - - - - - - -
0 0 0 0 0 0 0 0 0 0 6
0 0 0 0 6 0 0 0 7 0 5
0 6 0 0 2 0 0 7 5 0 4
6 2 7 0 1 0 0 5 4 0 3
- - - - - - - - - - -
0 0 0 0 0 0 0 0 0 0 6
0 0 0 0 6 0 0 0 7 0 5
0 6 0 0 2 0 0 7 5 0 4
6 2 7 0 1 0 0 5 4 8 3
- - - - - - - - - - -
0 0 0 0 0 0 0 0 0 0 6
0 0 0 0 6 0 0 0 7 0 5
0 6 0 0 2 0 0 7 5 9 4
6 2 7 9 1 0 0 5 4 8 3
- - - - - - - - - - -
0 0 0 0 0 0 0 0 0 0 6
0 0 0 0 6 0 0 0 7 0 5
0 6 0 0 2 0 0 7 5 9 4
6 2 7 9 1 0 10 5 4 8 3
- - - - - - - - - - -
0 0 0 0 0 0 0 0 0 0 11
0 0 0 0 0 0 0 0 11 0 6
0 0 0 0 6 0 0 11 7 0 5
0 6 0 0 2 0 11 7 5 9 4
6 2 7 9 1 11 10 5 4 8 3

The first element of the permutation is so we put the number in the fifth column. Other elements remain zero.

The second element is (unfortunately) but this coincidence does not make any complication: we put a in the second column and in the fifth, too.

The third is hence in the eleventh column appears the number .

The fourth is and the number looks up in the ninth and the eleven column on the appropriate level.

The fifth element is : the number goes to the eighth column (and not in fifth).

And so on. The number of rank numbers leading to this sequence is determined by the so called hook numbers of the tableau:

8 5 3 1
6 3 1
4 1
2
1

We know that is divisible by the product of these numbers and the quotient is the number of the rank numbers resulting to our . Surprisingly the number of s resulting the same sequence is too. The explanation is that the inverse of a permutation results in a determining the same as the of the permutation and the two s determine uniquely the permutation. This fact leads to the Plancherel distribution. The question is the joint density of the numbers and the limiting behavior of the process when goes to infinity. The average divided by

and the standard deviation of

of the Plancherel distribution for some are the followings:

n ave st.dev.
11 0.95 0.9480
1.46 1.8468
1.72 3.7652
1.82 6.7102
1.85 12.4306
1.86 19.6362

Standard deviation is close to , cf [2], [15], [16], and [20]. We use instead of the statistic which is the sum of the logarithm of the hook numbers. For the process is shown in Figure 1. The curves sample only for some fixed . Instead of the sample

comes from uniform distribution in

The figure shows the numbers according to their level: the first coordinate is and the second is .

Figure 2 shows the density estimation (red curve) of

for , it is seemingly Gaussian (blue curve).

Figure 3 shows the numbers for offering the minimum for the product of the hook numbers. The the minimum of is while its expected value is . The difference is relatively large concerning the standard deviation which is .

The shape of the curves follow such a well pronounced lines only when the corresponding Plancherel probability is close to its expected value. It means that the sample elements are independent. We think that the process offers an appropriate statistics for testing independence with identical distribution.

In next table some parameters are given for on Ornstein-Uhlenbeck process.

ave st.dev
0.5 63008 3.00
0.95 63017 9.23
0.995 63129 101.35

Switching the distribution of to standard exponential the habit of the process

changes a bit but the quantile transformation offers the appropriate chain between the two cases thus we use the same notation. The distribution of

is exponential with parameter hence we investigate the rescaled process

One can hope that the distribution of the process stabilizes as n goes to infinity; we call the limit extended Plancherel distribution.

2 The core process

The statistics defined as divided by the product of hook numbers counts all permutations belonging to the same level process. The minimum of this number is one: there are two permutations being uniquely determined by their level process. The maximum of the statistics is close to . Let us sum up for a fixed for all integer the number of permutations belonging to between and . Then for most the result is positive: in such cases it has to be larger than Surprisingly the distribution is concentrated in a narrow interval containing the permutations what we call typical (see [17]).

3 Conditional independence

For fixed and the first element of a permutation of the set cuts the set in two parts: the initializing set and its complement. For we say that a random permutation can be cut properly by if and are conditionally independent on the condition that is initialized by a given subset of with elements. We say that a random permutation is proper (decomposable) if it can be cut properly for all .

If then there are Young tableaux and among them has larger Plancherel probability than . Their total probability is thus we propose the test that accepts the uniform distribution of a random permutation if the corresponding Young tableau is an element of this set with elements given in the next table.

7 2 1 1
6 4 1
6 3 2
6 3 1 1
6 2 2 1
6 2 1 1 1
5 4 2
5 4 1 1
5 3 3
5 3 2 1
5 3 1 1 1
5 2 2 2
5 2 2 1 1
5 2 1 1 1 1
4 4 2 1
4 4 1 1 1
4 3 3 1
4 3 2 2
4 3 2 1 1
4 3 1 1 1 1
4 2 2 2 1
4 2 2 1 1 1
4 2 1 1 1 1 1
3 3 3 1 1
3 3 2 2 1
3 3 2 1 1 1
3 2 2 2 1 1

The most probable tableau is its Plancherel probability if If we choose a random distribution having uniform distribution on the set of permutation having a corresponding Young tableau equal with this one the divergence projection of this distribution on the set of distributions which may be properly cut the resulting distribution has a divergence from the uniform one .

Proper random permutations have a simple parametrization: for all subset of the set

there is a probability distribution. Defining the elements of the random permutation step by step we use the distribution on the subset of the remaining elements.

We call the random permutation double proper if both the random permutation and its inverse are proper. If all permutations have positive probability the logarithms of the probabilities are elements of a subset with dimension ([6]). The space itself has a rather sophisticated structure but there is a subspace with simple characteristics:

where is a real valued matrix. We call the model checkerboard. In case we have only one permutation even the inference of such a simple model is out of question without further structural assumptions on the matrix . The graph of the two dimensional points is a simple pictorial statistics of a permutation. If we can figure out any structure in the graph the permutation is certainly not random and there is a hope to fit a matrix

with relatively small number of degree of freedom.

Suppose we have a two-dimensional random variable , then we can construct the same processes from iid sequence as we have done from

and the pair of processes may reflect the joint distribution of

. There is an other possibility: let be the permutation moving the ordered sample of to the ordered sample of than the level process of this permutation reflects directly the joint distribution of . If and are independent, then the distribution of is uniform and the distribution of the level process is Plancherel. In the general case the distribution resembles to the checkerboard one. If the pair

has two dimensional normal distribution then the distribution of the level process depends only on the correlation of

and . We call the process Gaussian level process (see [5] and [19]).

4 Complexity

There are many statistics measuring different aspects of permutations. We can invert any permutation into some iid sequence reordering its ordered sample according to the given permutation. Applying the method we can resample a single permutation multiplying it with iid uniformly distributed random permutation and seeking any deviance between the original permutation and its random descendants. One possible method is compression: if there are any short description of the given permutation it is not highly complex ([1], [4], [10] and [12]).

A possible way is the extension of Lovász’s graphons, see [11], [13] and [7]. The idea is to order to all subset with elements of a permutation of the first integer the rank numbers corresponding to the numbers

5 Dynamics

One face of a permutation is that it represents a dynamics: means that number moves to number what makes sense even in case Using cyclically the neighboring pairs we can define a new permutation as

The attractors of this dynamics may reflect the complexity of .

6 Testing IID property

In nonparapetric statistics the sample elements are supplanted by their rank numbers [21], [3] and [8] and the typical resampling method is the use of random permutations on some parts of the sample. We are trying to bridge this traditional field of statistics with Plancherel measure.

7 Variations

Victor Reiner, Franco Saliola and Volkmar Welker initiate the use of elements subsets of a permutation without supplanting the elements with their rank numbers [18]

. They conjecture that all the square of the singular values of the matrix joining these parts with the original permutations are integers.

8 Power studies

For arbitrary real let us define the distribution by

where is an appropriate scaling factor (of course ). First question is the relation of our test with other ones concerning this exponential family. It would be interesting to compare the power of hook-number-product test against this exponential family with the power against the Gaussian level alternative (see [14]).

9 Where does the information come from?

Once upon a time, fifty years ago, there was a conference in Debrecen. At this conference Shinzo Watanabe gave a lecture with the title we are using here again ([22]). The truth is that our world was originally full of information. Ir was the Demiurge who filtered out superfluous parts of this information and imposed pre-existing Forms on the chaotic material. At the above mentioned conference Imre Csiszár, Gyula Katona an Gábor Tusnády proved the conservation of entropy. Now we think that the controlled loss of information makes the trick: when we reduce a permutation to its level process we condensate the information of the permutation in a proper way.

10 Laudation

Many thanks to Imre Csiszár for providing us the privilege to receive and benefit from many wise advice given us by him throughout our life and, in particular, for insightfully encouraging us to write on the present theme a paper that we are pleased to dedicate to him on the occasion of his 80th birthday, and to wish him happy returns in an arbitrarily long life in good health and spirits.


References

  • [1] José M. Amigó: Permutation complexity in dynamical systems, Springer Series in Synergetics Springer 2010
  • [2] Alexander I. Buffetov: On the Vershik-Kerov conjecture concerning the Shannon-McMillan-Breiman theorem for Plancherel family of measures on the space of Young diagrams, Geom. Funct. Anal 22 938-975
  • [3] Jin Seo Cho and Halbert White: Generalized runs tests for the IID hypothesis, Journal of Econometrics, 162, (2011), 326-344
  • [4] Volker Claus: Complexity measures on permutations, in: Informatik (ed.: Johannes Buchmann, Harald Ganzinger and Wolfgang J. Paul) Universität Saarbrücken, 81-94
  • [5] W. J. Conover: Some locally most powerful rank tests for correlation, Journal of Modern Applied Statistical Methods 1 19-23
  • [6] Villő Csiszár: Conditional independence relations and loglinear models for random matchings, Acta Mathematica Hungarica 122 (2009) 131-152
  • [7] V. Féray, P.-L. Méliot and A. Nikeghbali: Graphons, permutons and the Thoma simplex: three mod-Gaussian moduli spaces, arXiv:1712.06841v2math.PR 19 Mar 2018
  • [8] Konstantinos Fokianos and Maria Pitsillou: Consistent testing for pairwise dependence in time series, Technometrics, 59 (2017) 262-270
  • [9] Jaroslav Hájek, Zbynek Sidak and P. K. Sen: Theory of rank tests, New York, Academic Press 1999
  • [10] Taichi Haruna and Kohei Nakajima: Permutation complexity and coupling measures in hidden Markov models, arXiv:1204.1821v3, 26 Jun, 2013.
  • [11] Carlos Hoppe, Yoshiharu Kohayakawa, Carlos Gustavo Moreira and Rudini Menezes Sampaio: Limits of permutation sequences through permutation regularity, arXiv:1106.1663v1, 8 Jun, 2011.
  • [12] Karsten Keller, Teresa Mangold, Inga Stolz and Jenna Werner: Permutation entropy: new ideas and challenges, Entropy 19 (2017)math.CO] 31 Aug 2015
  • [13] Richard Kenyon, Daniel Král, Charles Radin and Peter Winkler: Permutations with fixed pattern densities, arXiv:1506.02340v2, August 31, 2015.
  • [14] Lucien Le Cam: Comparison of experiments - a short review,

    Statistics, Probability and Game Theory

    IMS Lecture Notes - Monograph Series (1996) 30, 127-138
  • [15] Pierre-Loïc Méliot: Representation theory of symmetric groups, CRC Press, A Chapman & Hall Bok 2017
  • [16] Sevak Mkrtchyan: Entropy of Schur-Weyl measures, Annales de l’Institut Henri Poincaré - Probabilités et Statistiques, 50, (2014) 678-713.
  • [17] Boris Pittel: On the distribution of the number of Young tableaux for a uniformly random diagram, Advances in Applied Mathematics, 29, (2002) 184-214.
  • [18] Victor Reiner, Franco Saliola and Volkmar Welker: Spectra of symmetrized shuffling operators, Mem. Amer. Math. Soc., 228, (2014)
  • [19] Gábor J. Székely and Maria L. Rizzo: Partial distance correlations with methods for dissimilarities, The Annals of Statistics, 42, 2382-2412
  • [20] A. M. Vershik: Asymptotic theory of path spaces of graded graphs and its applications, Jpn. J. Math. 11 (2016) 151-218
  • [21] Abraham Wald and Jacob Wolfowitz: Statistical tests based on permutations of observations, The Annals of Mathematical Statistics 15(1944) 358-372
  • [22] Shinzo Watanabe: Where does the information come from? Proceedings of the Colloquium on Information Theory, Ed. Alfréd Rényi, János Bolyai Mathematical Society Budapest, Hungary 1968 511-513