A Stirling-type formula for the distribution of the length of longest increasing subsequences, applied to finite size corrections to the random matrix limit

The discrete distribution of the length of longest increasing subsequences in random permutations of order n is deeply related to random matrix theory. In a seminal work, Baik, Deift and Johansson provided an asymptotics in terms of the distribution of the largest level of the large matrix limit of GUE. As a numerical approximation, however, this asymptotics is inaccurate for small lengths and has a slow convergence rate, conjectured to be just of order n^-1/3. Here, we suggest a different type of approximation, based on Hayman's generalization of Stirling's formula. Such a formula gives already a couple of correct digits of the length distribution for n as small as 20 but allows numerical evaluations, with a uniform error of apparent order n^-3/4, for n as large as 10^12; thus closing the gap between a table of exact values (that has recently been compiled for up to n=1000) and the random matrix limit. Being much more efficient and accurate than Monte-Carlo simulations for larger n, the Stirling-type formula allows for a precise numerical understanding of the first few finite size correction terms to the random matrix limit, a study that has recently been initiated by Forrester and Mays, who visualized the form of the first such term. We display also the second one, of order n^-2/3, and derive (heuristically) expansions of expected value and variance of the length, exhibiting several more terms than previously known.

READ FULL TEXT VIEW PDF

page 2

page 4

07/13/2019

On the convergence rate of some nonlocal energies

We study the rate of convergence of some nonlocal functionals recently c...
02/22/2019

Convergence Rate of Empirical Spectral Distribution of Random Matrices from Linear Codes

It is known that the empirical spectral distribution of random matrices ...
10/25/2019

Convergence Analysis of the Randomized Newton Method with Determinantal Sampling

We analyze the convergence rate of the Randomized Newton Method (RNM) in...
08/04/2021

The volume-of-tube method for Gaussian random fields with inhomogeneous variance

The tube method or the volume-of-tube method approximates the tail proba...
03/25/2019

Computation of the Expected Euler Characteristic for the Largest Eigenvalue of a Real Non-central Wishart Matrix

We give an approximate formula for the distribution of the largest eigen...
08/21/2017

Economic Design of Memory-Type Control Charts: The Fallacy of the Formula Proposed by Lorenzen and Vance (1986)

The memory-type control charts, such as EWMA and CUSUM, are powerful too...
02/28/2020

Criteria for the numerical constant recognition

The need for recognition/approximation of functions in terms of elementa...

1. Introduction

As witnessed by a number of outstanding surveys and monographs (see, e.g., [MR1694204, MR3468920, MR3468738, MR2334203]

), a surprisingly rich topic in combinatorics and probability theory, deeply related to representation theory and to random matrix theory, is the study of the lengths

of longest increasing subsequences of permutations on the set and of the behavior of their distribution in the limit . Here, is defined as the maximum of all for which there are such that . Writing permutations in the form we get, e.g., for , where one of the longest increasing subsequences has been highlighted. Enumeration of the permutations with a given can be encoded probabilistically: by equipping the symmetric group on 

with the uniform distribution,

becomes a discrete random variable with cumulative probability distribution (CDF)

and probability distribution (PDF)

.

Constructive combinatorics.

Using the Robinson–Schensted correspondence [MR121305], one gets the distribution of in the following form, see, e.g., [MR843332, §§3.3–3.7]:

(1)

Here denotes an integer partition of and is the number of standard Young tableaux of shape , as given by the hook length formula. By generating all partitions of order , in 1968 Baer and Brock [MR228216] computed tables of up to ; in 2000 Odlyzko and Rains [MR1771285] for (the tables are online, see [OdlyzkoTable]), reporting a computing time for of about 12 hours (here , the number of partitions, is of size ). This quickly becomes infeasible,111See Remark 3.1 below for a method to compute the exact rational values of the distribution of based on random matrix theory, which has recently been used to tabulate , , up to . as is already as large as for . Another use of the combinatorial methods is the approximation of the distribution of by Monte Carlo simulations [MR228216, MR1771285]: one samples random permutations  and calculates by the Robinson–Schensted correspondence.222In Mathematica, a single trial is generated by the command


Figure 1. Discrete distribution of for near its mode vs. the random matrix limit given by the leading order terms in (3) and (33a) (solid red line); here and in the figures below, discrete distributions are shown as blue bars centered at the integers. Left: CDF ; right: PDF . The discrete distributions were computed using the Stirling-type formula (5

), with additive errors estimated to be smaller than

, cf. Fig. 3, which is well below plotting accuracy.

Analytic combinatorics and the random matrix limit.

For analytic methods of enumeration the starting points is a more or less explicit representation of a suitable generating function; here, the suitable one turns out to be the exponential generating function of the CDF , when considered as a sequence of with the length fixed:

(We note that is an entire function of exponential type.) In fact, Gessel [Gessel90, p. 280] obtained in 1990 the explicit representation

(2)

in terms of a Toeplitz determinant of the modified Bessel functions , , which are entire functions of exponential type themselves. By relating, first, the Toeplitz determinant to the machinery of Riemann–Hilbert problems to study a double-scaling limit of the generating function and by using, next, a Tauberian theorem333The Tauberian part (“de-poissonization”) makes it hard to get more than the leading order of the asymptotics. to induce from that limit an asymptotics of the coefficients, Baik, Deift and Johansson [MR1682248] succeeded 1999 in establishing444In the first place, [MR1682248, Thm. 1.1] states the following limit to hold pointwise in :

However, since the limit distribution is continuous, by a standard Tauberian follow-up [MR1652247, Lemma 2.1] of the Portmanteau theorem in probability theory, this convergence is known to hold, in fact, uniformly in .

(3)

uniformly in . Here,

denotes the Tracy–Widom distribution for

, that is, the probability that in the soft-edge scaling limit of the Gaussian unitary ensemble (GUE) the largest eigenvalue is bounded from above by

. This distribution can be evaluated numerically based on its representation either in terms of the Airy kernel determinant [MR1236195] or in terms of the Painlevé II transcendent [MR1257246]; see Remark 3.2 and [MR2895091] for details.

As impressive as the use of the limit (3) might look as a numerical approximation to the distribution of near its mode for larger , cf. Fig. 1, there are two notable deficiencies: first, since the error term in (3) is additive (i.e., w.r.t. absolute scale), the approximation is rather poor for ; second, the convergence rate is rather slow, in fact conjectured to be just of the order , see [arxiv.2205.05257] and the discussion below. Both deficiencies are well illustrated in Table 1 for and in Fig. 2 for .

Stirling-type (5) Monte-Carlo random matrix (3)
1   
2   
3
4
5
6
7
8
9
Table 1. The values of and various of its approximations for . The Monte Carlo simulation was run with samples. Clearly observable is a multiplicative (i.e., relative) error of already less than for the Stirling-type formula (5) as well as an additive (i.e., absolute) error of order for Monte Carlo and an additive error of order for the limit (3) from random matrix theory.

A Stirling-type formula

In this paper we suggest a different type of numerical approximation to the distribution of that enjoys the following advantages: (a) it has a small multiplicative (i.e., relative) error, (b) it has faster convergence rates, apparently even faster than (3) with its first finite size correction term added, and (c) it is much faster to compute than Monte Carlo simulations. In fact, the distribution of for , as shown in Fig. 1, exhibits an estimated maximum additive error of less than and took just about five seconds to compute; whereas Forrester and Mays [arxiv.2205.05257] have recently reported a computing time of about 14 hours to generate Monte Carlo trials for this ; the error of such a simulation is expected to be of the order .

Specifically, we use Hayman’s generalization [Hayman56] of Stirling’s formula for -admissible functions; for expositions see [MR2483235, MR1373678, Wong89]. For simplicity, as is the case here for , assume that

is an entire function with positive coefficients and consider the auxiliary functions

If is -admissible, then for each the equation has a unique solution such that and the following generalization of Stirling’s formula555A generalization of Stirling’s classical formula, indeed: for the -admissible function , cf. Thm. 2.1 Criterion II.g, we have , , and (4) specifies to

As in Table 1, the error is already below for as small as . holds true:

(4)

We observe that the error is multiplicative here. In Thm. 2.2 we will prove, using some theory of entire functions, the -admissibility of the generating functions . Hence the Stirling-type formula (4) applies without further ado to their coefficients . Since the error is multiplicative, nothing changes if we multiply the approximation by and we get

(5)

where we have labeled all quantities when applied to by an additional index . An approximation to is then obtained simply by taking differences. The power of these approximations, if used as a numerical tool even for as small as , is illustrated in Table 1 and Figs. 13, as well as in Table 2 below.

Figure 2. Discrete distribution of for near its mode vs. the random matrix limit given by the leading order terms in (8) and (33a) (solid red line). Left: CDF ; right: PDF . The exact values of the distribution of and their approximation by the Stirling-type formula (5) differ just by additive errors of the order , see Fig. 3, which is well below plotting accuracy.

Numerical evaluation of the generating function

For the Stirling-type formula (5) to be easily accessible in practice, we require an expression for that can be numerically evaluated, for , in a stable, accurate, and efficient fashion. Since the direct evaluation of the Toeplitz determinant (2) is numerically highly unstable, and has a rather unfavorable complexity of for larger , we look for alternative representations. One option—used in [MR2754188] to numerically extract from by Cauchy integrals over circles in the complex plane that are centered at the origin with the same radius as in (5)—is the machinery, cf., e.g., [MR1935759], to transform Toeplitz determinants into Fredholm determinants which are then amenable for the numerical method developed in [MR2600548]. However, since we need the values of the generating function for real only, there is a much more efficient option, which comes from yet another connection to random matrix theory.

To establish this connection we first note that an exponentially generating function of a sequence of probability distributions has a probabilistic meaning if multiplied by : a process called poissonization. Namely, if the draws from the permutation groups are independent for all orders and if we take

to be a further independent random variable with Poisson distribution of intensity

, we see that

is, for fixed , the cumulative probability distribution of the composite discrete random variable . On the other hand, for fixed , also turns out to be a probability distribution w.r.t. the continuous variable : specifically, in terms of precisely the Toeplitz determinant (2), Forrester and Hughes [MR1303076, Eq. (3.33)] arrived in 1994 at the representation

(6)

Here, denotes the probability that, in the hard-edge scaling limit of the Laguerre unitary ensemble (LUE) with parameter , the smallest eigenvalue is bounded from below by . Now, the point here is that this distribution can be evaluated numerically, stable and accurate with a complexity that is largely independent of , based on two alternative representations: either in terms of the Bessel kernel determinant [MR1236195] or in terms of the Jimbo–Miwa–Okamoto -form of the Painlevé III (-PIII) transcendent [MR1266485]; see [MR2895091] for details. We will show in Sect. 3 that the auxiliary functions and fit into both frameworks, too.

Figure 3. Maximum absolute (i.e., additive) errors of various approximations to (left panel) the CDF and (right panel) the PDF in double logarithmic scaling of the axes, based on tabulated exact values up to , cf. Remark 3.1; solid lines are fits of the form for . Red : error of the leading order terms (random matrix limit) in (8), (33a), with (CDF) and (PDF). Green : error of the expansions (8), (33a) truncated after the first finite size correction term, with (CDF) and (PDF), where has been approximated as in Fig. 4. Blue : error of the Stirling-type formula (5), with (CDF) and (PDF).

Finite size corrections to the random matrix limit

A double logarithmic plot of the additive errors (taking the maximum w.r.t. ) in approximating the distribution by either the random matrix limit (3) or by the Stirling-type formula (5) exhibits nearly straight lines, see Fig. 3 for chosen between and . Fitting the slopes to simple rationals strongly suggests that, as , uniformly in ,

(7a)
(7b)

The approximation order (and the size of the implied constant) in (7b) is much better than the one in (7a) so that the Stirling-type formula can be used to reveal the structure of the term in the random matrix limit. In fact, as the error plot in Fig. 3 suggests and we will more carefully argue in Sect. 4.1, this can even be iterated yet another step and we are led to the specific conjecture666Note that is always a discrete variable in this paper, so there is no need for taking integer parts here.

(8)

as , uniformly in . Compelling evidence for the existence of the functions and is given in the left panels of Figs. 4 and 6.777Based on Monte-Carlo simulations, Forrester and Mays [arxiv.2205.05257, Eq. (1.10) and Fig. 7] were recently also led to conjecture an expansion of the form

However, the substantially larger errors of Monte Carlo simulations as compared to the Stirling-type formula (5) would inhibit them from getting, in reasonable time, much evidence about the next finite size correction term.

Figure 4. Rescaled differences between the distributions of and their expansions truncated after the leading order term (i.e., the random matrix limit), see (8) for the CDF resp. (33a) for the PDF; data points (to avoid clutter just every is displayed) have been calculated using the Stirling-type formula (5) for (red ), (green ), (blue ). Left: CDF errors rescaled by , horizontal axis is , cf. [arxiv.2205.05257, Fig. 7] for a similar figure with data from Monte Carlo simulations for and . The solid line is a polynomial of degree  fitted to the data points for with ; it approximates in that interval. Right: PDF errors rescaled by , horizontal axis is . The solid line displays the function as an approximation of , with the polynomial taken from the left panel. The dotted line shows the term only.

Additionally, we will argue in Sect. 4.3 that (8) allows us to derive an expansion of the expected value of , specifically

(9)

Similarly, we will derive in Sect. 4.4 an expansion of the variance of of the form

(10)

The values of and are the known values of mean and variance of the Tracy–Widom distribution , cf. [MR2895091, Table 10]. The leading parts of (9) up to and of (10) up to  had been established previously by Baik, Deift and Johansson [MR1682248, Thm. 1.2].

2. -Admissibility of the Generating Function and its Implications

2.1. -admissible functions

For simplicity we restrict ourselves to entire functions. We refrain from displaying the rather lengthy technical definition of -admissibility,888Since we consider entire functions only, -admissibility is here understood to hold in all of . which is difficult to be verified in practice and therefore seldomly directly used. Instead, we start by collecting some usefuls facts and criteria from Hayman’s original paper [Hayman56]:999Interestingly, the powerful criterion in part III (which is [Hayman56, Thm. XI]) is missing from the otherwise excellent expositions [MR2483235, MR1373678, Wong89] of -admissibility.

Theorem 2.1 (Hayman 1956).

Let and be entire functions and let denote a polynomial with real coefficients.

I. If is -admissible, then:

  • for large , so that in particular the auxiliary functions101010In terms of differential operators we have .

    (11)

    are well defined there;

  • for large there is strictly convex in , strictly monotonically increasing, and such that as ; in particular, for large integers there is a unique that solves , it is as ;

  • if the coefficients of are all positive, then I.b holds for all ;

  • as , uniformly in ,

    (12)

II. If and are -admissible, then:

  • , and are -admissible;

  • if the leading coefficient of is positive, and are -admissible;

  • if the Taylor coefficients of are eventually positive, is -admissible.

III. If has genus zero111111By definition, an entire function has genus zero if it is a polynomial or if it can be represented as a convergent infinite product of the form

where is a constant, is the order of the zero at , and is the sequence of the non-zero zeros, where each one is listed as often as multiplicity requires. with, for some , at most finitely many zeros in the sector and satisfies I.a such that as , then is -admissible.

Obviously, the Stirling-type formula (4) is obtained from the approximation result (12) by just inserting the particular choice .

Remark 2.1.

We observe that (12) has an interesting probabilistic content:121212Note that the particular case (which is -admissible by Thm. 2.1

.II.g) specifies to the well-known normal approximation of the Poisson distribution for large intensities—which is a simple consequence of the central limit theorem if we observe that the sum of

independent Poisson random variates of intensity is one of intensity . as a distribution in the discrete variable , the Boltzmann131313We follow the terminology in the theory of Boltzmann samplers [MR2095975], a framework for the random generation of combinatorial structures. probabilities associated with an -admissible entire function are, for large intensities , approximately normal with mean and variance ; see the right panel of Fig. 5 for an illustrative example using the generating function . The additional freedom that is provided in the normal approximation (12) by the uniformity w.r.t. will be put to good use in Sect. 2.3.

Figure 5. Left: the auxiliary function (blue solid line) associated with the generating function , together with the asymptotics as (green dotted line) and as (red dashed line). Right: Illustration of the approximation (12) of the Boltzmann probabilities (blue bars) associated with the generating function for intensity , cf. the notation in (5

). The normal distribution (red solid line) has mean

(cf. the left panel) and variance .

The classification of entire functions (by quantities such as genus, order, type, etc.) and their distribution of zeros is deeply related to the analysis of their essential singularity at . For the purposes of this paper, the following simple criterion is actually all we need. The proof uses some theory of entire functions, which can be found, e.g., in [MR589888].

Lemma 2.1.

Let be an entire function of exponential type with positive Maclaurin coefficients. If there are constants such that there holds, for the principal branch of the power function and for each , the asymptotic expansion141414See Remark A.1 for the uniformity implied with the notation as while .

(13)

then is -admissible. For the associated auxiliary functions and satisfy

(14)
Proof.

The expansion (13) is equivalent to

(15)

which readily implies:

  • since is arbitrary, has the Phragmén–Lindelöf indicator

    so that has order and type , hence genus zero;

  • for sufficiently large , there are no zeros of with

Since the Maclaurin coefficients of are positive we have for and the auxiliary functions , in (11) are well-defined for . In fact, both functions can be analytically continued into the domain of uniformity of the expansion (15) and by differentiating this expansion (which is, because of analyticity, legitimate by a theorem of Ritt, cf. [MR0435697]) we obtain (14); this implies, in particular, as . Thus, all the assumptions of Thm. 2.1.III are satisfied and is shown to be -admissible. ∎

2.2. Singularity analysis of the generating function at

Establishing an expansion of the form (13) for as given by (2), that is to say, for the Toeplitz determinant

(16)

suggests to start with the expansions (valid for all , see [MR0435697, p. 251])

(17a)
(17b)

This does not yield (13) at once, as there could be, however unlikely it would be, eventually a catastrophic cancellation of all of the expansion terms when being inserted into the determinant expression defining . For the specific cases a computer algebra system shows that exactly the first terms of the expansion (17) mutually cancel each other in forming the determinant, and we get by this approach151515Odlyzko [MR1373678, Ex. 10.9] reports that he and Wilf had used this approach, before 1995, for small in the framework of the method of “subtraction of singularities” in asymptotic enumeration. He states the expansions for and , cf. [MR1373678, Eqs. (10.30)/(10.39)], with a misrepresented constant factor in , though. No attempt, however, was made back then to guess the general form. the expansions

All of them, inherited from (17), are valid as while with the uniformity content implied by the symbol . From these instances, in view of (17) and the multilinearity of the determinant, we guess that

and observe

Consulting the OEIS161616https://oeis.org/A000178 (On-Line Encyclopedia of Integer Sequences) suggests the coefficients to be generally of the form

Though this is very likely to hold for all —a fact that would at once yield the -admissiblity of all the generating function by Lemma 2.1—a proof seems to be elusive along these lines, but see Remark 2.3 for a remedy.

Inspired by the fact that the one-dimensional Laplace’s method easily gives the leading order term in (17) when applied to the Fourier representation

(18)

we represent the Toeplitz determinant in terms of a multidimensional integral and study the limit by the multidimensional Laplace method discussed in the Appendix. In fact, (18) shows that the symbol of the Toeplitz determinant is and a classical formula of Szegő’s [Sz1915, p. 493] from 1915, thus gives, without further calculation, the integral representation171717This induces, see (6), an integral representation of the distribution which has been derived in 1994 by Forrester [MR1271945] using generalized hypergeometric functions defined in terms of Jack polynomials.

(19)

where

denotes the Vandermonde determinant of the complex numbers .

Remark 2.2.

By Weyl’s integration formula on the unitary group , cf. [MR2105088, Eq. 1.5.89], the integral (19) can be written as

where the expectation is taken with respect to the Haar measure. Without any reference to (2), this form was derived in 1998 by Rains [Rains98, Cor. 4.1] directly from the identity

which he had obtained most elegantly from the representation theory of the symmetric group.

We are now able to prove our main theorem.

Theorem 2.2.

For each and there holds the asymptotic expansion

Thus, by Lemma 2.1, the generating functions are -admissible and their auxiliary functions satisfy, as ,

(20)
Proof.

We write (19) in the form

The phase function of this multidimensional integrand, that is to say

takes it minimum at with the expansion as , where denotes Euclidian length. Likewise we get for the non-exponential factor181818This factor is zero at , so that Hsu’s variant (39) of Laplace’s method, which is the one predominantly found in the literature, does not yield the leading term of . That we have to expand the non-exponential factor up to degree for the first non-zero contribution to show up, corresponds to the mutual cancellation of the leading terms of (17) when being inserted into the Toeplitz determinant that defines .

where the degree of the homogeneous polynomial is .

Therefore, by the multidimensional Laplace method as given in Corollary A.1, see also formula (45) following it, we obtain immediately

with

(21)

where the evaluation of this multiple integral is well-known in random matrix theory, e.g., as a consequence of Selberg’s integral formula, cf. [MR2760897, Eq. (2.5.11)]. ∎

Remark 2.3.

By (6), Thm. 2.2 implies