DeepAI

Joint Probabilities within Random Permutations

A celebrated analogy between prime factorizations of integers and cycle decompositions of permutations is explored here. Asymptotic formulas characterizing semismooth numbers (possessing at most several large factors) carry over to random permutations. We offer a survey of practical methods for computing relevant probabilities of a bivariate or trivariate flavor.

02/12/2022

Random primes in arithmetic progressions

We describe a straightforward method to generate a random prime q such t...
06/10/2020

Probabilities of first order sentences on sparse random relational structures: An application to definability on random CNF formulas

We extend the convergence law for sparse random graphs proven by Lynch t...
01/08/2020

06/15/2020

Taxonomy and Practical Evaluation of Primality Testing Algorithms

Modern cryptography algorithms are commonly used to ensure information s...
01/09/2015

Survey schemes for stochastic gradient descent with applications to M-estimation

In certain situations that shall be undoubtedly more and more common in ...
03/05/2018

Asymptotic Equivalence of Fixed-size and Varying-size Determinantal Point Processes

Determinantal Point Processes (DPPs) are popular models for point proces...

1 Density

Difficulties presented by the numerical integration of are evident in Figure 2.  The surface appears to touch the -plane only when ; its prominent ridge occurs along the line because corresponds to a unique point of nondifferentiability for ; its remaining boundary hovers over the broken line , everywhere finite except in the vicinity of .

Complications are compounded for the three other densities (which are, in themselves, approximations).  Figure 3 contains a plot of

 f13(x,z)=x∫zf123(x,y,z)dy.

The surface appears to touch the -plane when and simultaneously, as well as everywhere along the broken line .

Figure 4 contains a plot of

 f14(x,w)=min{x,1/3}∫wx∫zf1234(x,y,z,w)dydz.

The (precipitously rising) surface appears to touch the -plane only when and simultaneously; its remaining boundary hovers over the broken line , everywhere finite except in the vicinity of . The vertical scale is more expansive here than for the other plots.

Figure 5 contains a plot of

 f23(y,z)=1∫yf123(x,y,z)dx.

The (fairly undulating) surface appears to touch the -plane only when .  Unlike the other densities, a singularity here occurs at .

2 Correlation

Let

 E(x)=∞∫xe−ttdt=−Ei(−x),x>0

be the exponential integral.  Upon normalization, the moment of the longest cycle length is [10, 11, 12]

 limn→∞E(Λhr)nh=1h!(r−1)!∞∫0xh−1E(x)r−1exp[−E(x)−x]dx

(in this paper, rank or ; height or ).  The cross-correlation between longest and longest cycle lengths is

 κr,s =E(ΛrΛs)−E(Λr)E(Λs)√E(Λ2r)−E(Λr)2√E(Λ2s)−E(Λs)2 →⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩−0.75803584...if r=1 and s=2,−0.78421290...if r=1 and s=3,−0.68442819...if r=1 and s=4,+0.35549741...if r=2 and s=3

with cross-moments given by [13, 14]

 limn→∞E(Λ1Λ2)n2=12∞∫0x∫0exp[−E(y)−x−y]dydx,
 limn→∞E(Λ1Λ3)n2=12∞∫0x∫0y∫01yexp[−E(z)−x−y−z]dzdydx,
 limn→∞E(Λ1Λ4)n2=12∞∫0x∫0y∫0z∫01yzexp[−E(w)−x−y−z−w]dwdzdydx,
 limn→∞E(Λ2Λ3)n2=12∞∫0x∫0y∫01xexp[−E(z)−x−y−z]dzdydx.

The fact that is negatively correlated with other , yet is positively correlated with other , is due to longest cycles typically occupying a giant-size portion of permutations, but second-longest cycles less so.

3 Distribution

Bach & Peralta [15]

discussed a remarkable heuristic model, based on random bisection, that simplifies the computation of joint probabilities involving

and .  In the same paper, they rigorously proved that asymptotic predictions emanating from the model are valid.  Subsequent researchers extended the work to and , to and , and to and .  We shall not enter into details of the model nor its absolute confirmation, preferring instead to dwell on numerical results and certain relative verifications.

3.1 First and Second

For , Bach & Peralta [15] demonstrated that

 limn→∞P{Λ2n≤a \& Λ1n≤b}=ρ(1a)=I0(a)+b∫aρ(1−xa)dxx=I1(a,b).

Note the slight change from earlier – writing before – a convention we adopt so as to be consistent with the literature.  Let .  Return now to the example from the introduction.  Evaluating

 J1(13,12)=ρ(3)+1/2∫1/3ρ(1−x1/3)dxx

is less numerically problematic than evaluating

 1/3∫0x∫0f12(x,y)dydx=ρ(3)+1/2∫1/31/3∫0f12(x,y)dydx

for two reasons:

• a double integral has been miraculously reduced to a single integral,

• the argument of within the integral is rather than , which is unstable as .

The advantages of using the Bach & Peralta formulation will become more apparent as we move forward (incidently, their is the same as our ).

0.30685282 0.69314718
0.04860839 0.80417093 0.17604345
0.00491093 0.61877013 0.09148808 0.01974468
0.00035472 0.46286746 0.03043740 0.00578984 0.00149456
0.00001965 0.36519810 0.00849154 0.00107262 0.00029307 0.00008552

Table 1: and for ,

1.00000000 0.30685282
0.85277932 0.22465184 0.04860839
0.62368106 0.09639901 0.02465561 0.00491093
0.46322219 0.03079212 0.00614457 0.00184928 0.00035472
0.36521775 0.00851119 0.00109227 0.00031272 0.00010517 0.00001965

Table 2: for ,

A verification of is as follows:

 ∂J1∂b=ρ(1−ba)1b

by the Second Fundamental Theorem of Calculus, hence

as anticipated by Billingsley [5].  An interpretation of is helpful:

 I1(a,b)=limn→∞P{Λ2n≤a \& a<Λ1n≤b}

i.e., the probability that exactly one cycle has length in the interval and all others have length .  We have, for instance,

 ∂I1∂a∣∣∣b=1=0,I1(a,1)≈0.8285

when , the value maximizing as .

3.2 First and Third

For and , Lambert [16] demonstrated that

 J2(a,b)=limn→∞P{Λ3n≤a \& Λ1n≤b}=J1(a,b)+b∫ab∫yρ(1−x−ya)dxxdyy=I2(a,b).

(Incidently, his is the same as our .)

0.14722068 0.08220098
0.36143259 0.19556747 0.01998464
0.46463747 0.20709082 0.02278925 0.00201596
0.48588944 0.16644726 0.01263312 0.00136571 0.00013356

Table 3: for ,

1.00000000 0.30685282 0.04860839
0.98511365 0.29196647 0.04464025 0.00491093
0.92785965 0.23788294 0.02893382 0.00386524 0.00035472
0.85110720 0.17495845 0.01372538 0.00167843 0.00023872 0.00001965

Table 4: for ,

A verification of is as follows:

 ∂I2∂b =12∂∂bb∫ab∫aρ(1−x−ya)dxxdyy =12b∫aρ(1−b−ya)1bdyy+12b∫aρ(1−x−ba)1bdxx=b∫aρ(1−x−ba)1bdxx

by symmetry; thus by Leibniz’s Rule,

 ∂2I2∂a∂b =−b∫aρ′(1−x−ba)1−x−ba21bdxx−ρ(1−a−ba)1ab =b∫aρ(1−a−x−ba)1−x−ba1−x−ba2xbdx−∂2J1∂a∂b

hence

 ∂2J2∂a∂b=b∫aρ(1−a−x−ba)axbdx=b∫af123(b,x,a)dx=f13(b,a),

as was to be shown.  An interpretation of is helpful:

 I2(a,b)=limn→∞P{Λ3n≤a \& a<Λ2n≤Λ1n≤b}

i.e., the probability that exactly two cycles have length in the interval and all others have length .

3.3 First and Fourth

For and , Cavallar [17] and Zhang [18] independently demonstrated that

 J3(a,b)=limn→∞P{Λ4n≤a \& Λ1n≤b}=J2(a,b)+b∫ab∫zb∫yρ(1−x−y−za)dxxdyydzz=I3(a,b).

(Incidently, Cavallar’s is the same as our while Zhang’s is the same as our .)

0.01488635 0.01488635 0.00396814
0.07126587 0.06809540 0.01884107 0.00094238
0.14082221 0.12382378 0.02870816 0.00222512 0.00009015

Table 5: for ,

1.00000000 0.30685282 0.04860839 0.00491093
0.99912552 0.30597834 0.04777489 0.00480762 0.00035472
0.99192941 0.29878222 0.04243355 0.00390355 0.00032887 0.00001965

Table 6: for ,

We omit details of the verification of , except to mention the start point

 ∂I3∂b=16∂∂bb∫ab∫ab∫aρ(1−x−y−za)dxxdyydzz

and the end point .  An interpretation of is helpful:

 I3(a,b)=limn→∞P{Λ4n≤a \& a<Λ3n≤Λ1n≤b}

i.e., the probability that exactly three cycles have length in the interval and all others have length .

3.4 Second and Third

For , and , Ekkelkamp [19, 20] demonstrated that

 limn→∞P{Λ3n≤a, a<Λ2n≤b \& Λ1n≤c}=b∫ac∫yρ(1−x−ya)dxxdyy

under the additional condition .  If we were to suppose that this condition is unnecessary and set , then by definition of , we would have

 L1(a,b)=limn→∞P{Λ3n≤a \& Λ2n≤b}=ρ2(1a)=K0(a)+b∫a1∫yρ1(1−x−ya)dxxdyy=K1(a,b)

where is similar (but not identical) to :

 K1(a,b)=limn→∞P{Λ3n≤a \& a<Λ2n≤b}.

On the one hand, our supposition is evidently false.  In the following, we compare provisional theoretical values (eight digits of precision) against simulated values (just two digits):

0.62368106 0.27362816 0.21
0.46322219 0.40043992 0.32 0.17285583 0.14
0.36521775 0.43489680 0.35 0.24479052 0.20 0.10650591 0.09

Table 7: and for ,

1.00000000 0.85277932
0.98511365 0.89730922 0.84 0.62368106
0.92785965 0.86366210 0.79 0.63607802 0.60 0.46322219
0.85110720 0.80011455 0.72 0.61000827 0.56 0.47172366 0.45 0.36521775

Table 8: for ,

where special cases

 L1(a,b)={ρ2(1/b)if a=b≤1/3,ρ3(1/a)if a≤1/3 and b=1/2

are surely true.

On the other hand, a verification of is as follows:

 ∂L1∂b=∂∂bb∫a1∫yρ(1−x−ya)dxxdyy=1∫bρ(1−x−ba)1bdxx

hence by Leibniz’s Rule,

 ∂2L1∂a∂b =−1∫bρ′(1−x−ba)1−x−ba21bdxx=1∫bρ(1−a−b−xa)1−b−xa1−b−xa2bxdx =1∫bρ(1−a−b−xa)abxdx=1∫bf123(x,b,a)dx=f23(b,a),

as was to be shown.  If a correction term of the form could be incorporated into , rendering it suitably smaller, then the above argument would still go through.  Determining such expressions , is an open problem.

For , , and , Ekkelkamp [19, 20] further demonstrated that

 limn→∞P{Λ4n≤α, α<Λ3n≤β, β<Λ2n≤γ \& Λ1n≤δ} =β∫αγ∫zδ∫yρ(1−x−y−zα)dxxdyydzx

under the additional condition .  Such a formula might eventually assist in calculating

 limn→∞P{Λ4n≤α \& Λ2n≤γ},limn→∞P{Λ4n≤α \& Λ3n≤β}.

We leave this task for others.  Accuracy can be improved by including a subordinate term – we have studied only main terms of asymptotic expansions – this fact was mentioned in [21], citing [19], but for proofs one must refer to [20].  It is striking that so much of this material remains unpublished (seemingly abandoned but thankfully preserved in doctoral dissertations; see [22, 23] for more).

An odd confession is necessary at this point and it is almost surely overdue.  The multivariate probabilities discussed here were originally conceived not in the context of

-permutations as , but instead in the difficult realm of integers (prime factorizations with cryptographic applications) as .  Knuth & Trabb Pardo [3, 24, 25] were the first to tenuously observe this analogy.  Lloyd [26, 27] reflected, “They do not explain the coincidence… No isomorphism of the problems is established”.  Early in his article, Tao [28] wrote how a certain calculation doesn’t offer understanding for “why there is such a link”, but later gave what he called a “satisfying conceptual (as opposed to computational) explanation”.  After decades of waiting, the fog has apparently lifted.

A counterpart of Billingsley’s :

 g1234(x,y,z,w)=116xyzwσ(1−x−y−z−ww)1√w,
 1>x>y>z>w>0,x+y+z+w<1;
 ξσ′(ξ)+12σ(ξ)+12σ(ξ−1)=0 for ξ>1,σ(ξ)=1/√ξ for 0<ξ≤1

is applicable to the study of connected components in random mappings [6, 8].  Let and denote the largest and second-largest such components. We use similar notation, but different techniques (because not as much is known about as about .)  For example,

 limn→∞P{Λ1n>12} =1∫1/2g1(x)dx=1∫1/212xσ(1−xx)dx√x =121∫1/21x√1−xdx=ln(1+√2).

Call this probability .  The analog here of what we called in the introduction is

 1−limn→∞P{Λ1n>12}−limn→∞P{Λ1n≤12 \& 13<Λ2n≤12} =1−Q−1/2∫1/3x∫1/3g12(x,y)dydx=1−Q−1/2∫1/3x∫1/314xyσ(1−x−yy)dydx√y =1−Q−141/2∫1/3x∫1/3dydxxy√1−x−y=0.065484671719...

and the analog of we called is

 limn→∞P{Λ1n>12}−limn→∞P{Λ1n>12 \& 13<Λ2n≤12} =Q−2/3∫1/21−x∫1/3g12(x,y)dydx=Q−2/3∫1/21−x∫1/314xyσ(1−x−yy)dydx√y =Q−142/3∫1/21−x∫1/3dydxxy√1−x−y=0.780087954710....

Thus the analog of (associated with the orangebrown triangle in Figure 1) is

 limn→∞P{Λ2n>13}=1−A−(1−A−B)=0.154427373569...

and should lead in due course to a formula for , generalizing .

Given a random -permutation, let denote the length of the shortest cycle ( if the permutation has no cycle) and denote the number of cycles of length .  Since, as , the distribution of approaches Poisson() and , , , … become asymptotically independent [29], we can calculate corresponding probabilities for .  For example,

 P{S1=1}=P{C1≥1}=1−P{C1=0}=1−e−1,
 P{S1=2} =P{C1=0 \&\ C2≥1}=P{C1=0}−P{C1=0 \&\ C2=0} =P{C1=0}(1−P{C2=0})=e−1(1−e−1/2)=e−1−e−3/2

and, more generally,

 P{S1=i}=e−Hi−1−e−Hi,Hm=m∑k=11k.

It is understood that these are limiting quantities as .  As another example,

 P{S2=1}=P{C1≥2}=1−P{C1≤1}=1−2e−1,
 P{S2=2} =P{C1=1 \&\ C2≥1}+P{C1=0 \&\ C2≥2} =P{C1=1}−P{C1=1 \&\ C2=0}+P{C1=0}−P{C1=0 \&\ C2≤1} =e−1(1−e−1/2)+e−1(1−32e−1/2)=2e−1−52e−3/2

and

 P{S2=j}=(Hj−1+1)e−Hj−1−(Hj+1)e−Hj.

 P{S1=i \&\ S2=j}=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩e−Hi−1−(1+1i)e−Hiif i=j,1i(e−Hj−1−e−Hj)if i

enabling a conjecture: .  A proof still remains out of reach.

6 Acknowledgements

I am grateful to Michael Rogers, Josef Meixner, Nicholas Pippenger, Eran Tromer, John Kingman, Andrew Barbour, Ross Maller and Joseph Blitzstein for helpful discussions.  The creators of Mathematica, as well as administrators of the MIT Engaging Cluster, earn my gratitude every day.  Interest in this subject has, for me, spanned many years [30, 31].  A sequel to this paper will be released soon [32].

References

• [1] S. R. Finch, Permute, Graph, Map, Derange, arXiv:2111.05720.
• [2] S. R. Finch, Rounds, Color, Parity, Squares, arXiv:2111.14487.
• [3] D. E. Knuth and L. Trabb Pardo, Analysis of a simple factorization algorithm, Theoret. Comput. Sci. 3 (1976) 321–348; also in Selected Papers on Analysis of Algorithms, CSLI, 2000, pp. 303-339; MR0498355.
• [4] S. R. Finch, Second best, Third worst, Fourth in line, arXiv:2202.07621.
• [5] P. Billingsley, On the distribution of large prime divisors, Period. Math. Hungar. 2 (1972) 283–289; MR0335462.
• [6] G. A. Watterson, The stationary distribution of the infinitely-many neutral alleles diffusion model, J. Appl. Probab. 13 (1976) 639–651; 14 (1977) 897; MR0504014 and MR0504015.
• [7] A. M. Vershik, Asymptotic distribution of factorizations of natural numbers into prime divisors (in Russian), Dokl. Akad. Nauk SSSR v. 289 (1986) n. 2, 269–272; Engl. transl. in Soviet Math. Dokl. v. 34 (1987) 57–61; MR0856456.
• [8] R. Arratia, A. D. Barbour and S. Tavaré, Random combinatorial structures and prime factorizations, Notices Amer. Math. Soc. 44 (1997) 903–910; MR1467654.
• [9] J. F. C. Kingman, Poisson processes revisited, Probab. Math. Statist. 26 (2006) 77–95; MR2301889.
• [10] L. A. Shepp and S. P. Lloyd, Ordered cycle lengths in a random permutation, Trans. Amer. Math. Soc. 121 (1966) 340–357; MR0195117.
• [11] R. Arratia, A. D. Barbour and S. Tavaré, Logarithmic Combinatorial Structures: a Probabilistic Approach, Europ. Math. Society, 2003, pp. 21-24, 52, 87–89, 118; MR2032426.
• [12] R. G. Pinsky, A view from the bridge spanning combinatorics and probability, arXiv:2105.13834.
• [13] R. C. Griffiths, On the distribution of allele frequencies in a diffusion model, Theoret. Population Biol. 15 (1979) 140–158; MR0528914.
• [14] T. Shi, Cycle lengths of -biased random permutations, B.S. thesis, Harvey Mudd College, 2014, http://scholarship.claremont.edu/hmc_theses/65/.
• [15] E. Bach and R. Peralta, Asymptotic semismoothness probabilities, Math. Comp. 65 (1996) 1701–1715; MR1370848.
• [16] R. Lambert, Computational Aspects of Discrete Logarithms, Ph.D. thesis, Univ. of Waterloo, 1996.
• [17] S. H. Cavallar, On the Number Field Sieve Integer Factorisation Algorithm, Ph.D. thesis, Univ. Leiden, 2002; ch. 2 also in The Three-Large-Primes Variant of the Number Field Sieve, CWI report MAS-R0219, 2002, http://ir.cwi.nl/pub/4222.
• [18] C. Zhang, An Extension of the Dickman Function and its Application, Ph.D. thesis, Purdue Univ., 2002; Distribution of -semismooth integers, PanAmer. Math. J. 18 (2008) 45–60; MR2467928.
• [19] W. H. Ekkelkamp, The role of semismooth numbers in factoring large numbers, Proc. Conf. on Algorithmic Number Theory, ed. A.-M. Ernvall-Hytönen, M. Jutila, J. Karhumäki and A. Lepistö, Turku Centre for Computer Science, 2007, pp. 40–44; http://oldtucs.abo.fi/publications/.
• [20] W. H. Ekkelkamp, On the Amount of Sieving in Factorization Methods, Ph.D. thesis, Univ. Leiden, 2010; http://www.universiteitleiden.nl/en/research/research-output/.
• [21] E. Bach and J. Sorenson, Approximately counting semismooth integers, Proc. 38th Internat. Symp. on Symbolic and Algebraic Computation (ISSAC), ACM, 2013, pp. 23–30; arXiv:1301.5293; MR3206336.
• [22] E. H. Cliffe, Reflections on the Number Field Sieve, Ph.D. thesis, Univ. of Bath, 2007; http://researchportal.bath.ac.uk/en/studentTheses/.
• [23] E. Tromer, Hardware-Based Cryptanalysis, Ph.D. thesis, Weizmann Institute of Science, 2007; http://www.cs.tau.ac.il/~tromer/phd-dissertation/.
• [24] A. Granville, The anatomy of integers and permutations, unpublished note, 2008, http://dms.umontreal.ca/~andrew/PDF/Anatomy.pdf.
• [25] A. Granville, J. Granville and R. J. Lewis, Prime Suspects. The Anatomy of Integers and Permutations, Princeton Univ. Press, 2019, pp. 200–201; MR3966460.
• [26] S. P. Lloyd, Ordered prime divisors of a random integer, Annals of Probab. 12 (1984) 1205–1212; MR0757777.
• [27] J. F. C. Kingman, The Poisson-Dirichlet distribution and the frequency of large prime divisors, unpublished note, 2004, http://www.newton.ac.uk/documents/preprints/.
• [28] T. Tao, Cycles of a random permutation, and irreducible factors of a random polynomial, unpublished note, 2015, http://terrytao.wordpress.com/2015/07/15/.
• [29] R. Arratia and S. Tavaré, The cycle structure of random permutations, Annals of Probab. 20 (1992) 1567–1591; MR1175278.
• [30] S. R. Finch, Golomb-Dickman constant, Mathematical Constants, Cambridge Univ. Press, 2003, pp. 284–292; MR2003519.
• [31] S. R. Finch, Extreme prime factors, Mathematical Constants II, Cambridge Univ. Press, 2019, pp. 171–172; MR3887550.
• [32] S. R. Finch, Components and cycles of random mappings, forthcoming.  Steven Finch MIT Sloan School of Management Cambridge, MA, USA steven_finch@harvard.edu