Efficient uniform generation of random derangements with the expected distribution of cycle lengths

by   J. R. G. Mendonça, et al.
Paris-Sud University

We show how to generate random derangements with the expected distribution of cycle lengths by two different techniques: random restricted transpositions and sequential importance sampling. The algorithms are simple to understand and implement and possess a performance comparable to or better than those of currently known methods. Our data suggest that the mixing time (in the total variance distance) of the algorithm based on random restricted transpositions is O(n^an^2) with a ≃1/2 and n the size of the derangement. For the sequential importance sampling algorithm we prove that it generates random derangements in O(n) time with a small probability O(1/n) of failing.


On the uniform generation of random derangements

We show how to generate random derangements with the expected distributi...

Symmetrized importance samplers for stochastic differential equations

We study a class of importance sampling methods for stochastic different...

Stein Variational Adaptive Importance Sampling

We propose a novel adaptive importance sampling algorithm which incorpor...

An Importance Sampling Algorithm Based on Evidence Pre-propagation

Precision achieved by stochastic sampling algorithms for Bayesian networ...

Random Sampling using k-vector

This work introduces two new techniques for random number generation wit...

Randomized sequential importance sampling for estimating the number of perfect matchings in bipartite graphs

We introduce novel randomized sequential importance sampling algorithms ...

Random derangements and the Ewens Sampling Formula

We study derangements of {1,2,...,n} under the Ewens distribution with p...

1 Introduction

Derangements are permutations on labels such that for all , . Derangements are useful in a number of applications like in the testing of software branch instructions and random paths and data randomization and experimental design bacher ; pd-rlg-sph ; sedgewick . A well known algorithm to generate random derangements is Sattolo’s algorithm, that outputs a random cyclic derangement in time gries ; prodinger ; sattolo ; wilson . An algorithm to generate random derangements in general (not only cyclic derangements) has been given in analco ; iran . Algorithms to generate all -derangements in lexicographic or Gray order have also been developed akl ; baril ; korsh .

In this paper we propose two procedures to generate random derangements with the expected distribution of cycle lengths: one based on the randomization of derangements by random restricted transpositions (a random walk in the set of derangements) and the other based on a simple sequential importance sampling scheme. The generation of restricted permutations by means of sequential importance sampling methods is closely related with the problem of estimating the permanent of a matrix, an important problem in, e. g., graph theory, statistical mechanics, and experimental design

beichl ; brualdi ; pd-rlg-sph . Simulations show that the randomization algorithm samples a derangement uniformly in time, where is the size of the derangement and , while the sequential importance sampling algorithm does it in time but with a small probability of failing. The algorithms are straighforward to understand and implement and can be modified to perform related computations of interest in many areas.

2 Mathematical preliminaries

Let us briefly recapitulate some notation and terminology on permutations; for detailed accounts see arratia ; charalambides .

We denote a permutation of a set of labels (an -permutation), formally a bijection of onto itself, by , where . If and are two -permutations, their product is given by the composition . A cycle of length in a -permutation is a sequence of indices such that , …, , and , completing the cycle. Fixed points are -cycles, transpositions are -cycles. An -permutation with cycles of length , , is said to be of type , with . For example, the -permutation has cycles and is of type , where we have omitted the trailing .

The number of -permutations with cycles is given by the unsigned Stirling number of the first kind . Useful formulae involving these numbers are , , and the recursion relation . We have , counting just the identity permutation , , counting -permutations with fixed points, that can be taken in different ways, plus a transposition of the remaining two labels, and , the number of cyclic -derangements. It can also be shown that , where is the th harmonic number. Obviously, , the total number of -permutations.

Let us denote the set of all -derangements by . It is well known that


the so-called rencontres (or subfactorial, ) numbers. Let us also denote the set of -cycle -derangements, irrespective of their type, by . Note that for . If we want to generate random -derangements uniformly over , we must be able to generate -cycle random -derangements with probabilities


where . The following proposition establishes the (little known, hard-to-find) cardinality of the sets .

Proposition 2.1.

The cardinality of the set is given by


The number of -permutations with cycles is . Of these, have at least one fixed point, have at least two fixed points, and so on. Perusal of the inclusion-exclusion principle then furnishes the result. ∎

Proposition 2.2.

The numbers obey the recursion relation


with and , .

Proposition 2.2 follows from (3) by index manipulations and the properties of the binomial coefficients and Stirling numbers of the first kind. The numbers are sometimes called associated Stirling number of the first kind. Equation (4) generalizes the recursion relation for the rencontres numbers. Equation (3) recovers and for , while we find that for . Another noteworthy identity, valid for even, is , the number of derangements with the property that , a. k. a. fixed-point-free involutions. We see that already for small we have and .

Remark 2.3.

One can consider the distribution of -derangements over possible cycle types (instead of cycle lengths) for a “finer” view of the distribution. The number of -permutations of type is given by Cauchy’s formula


The analogue of (2) is given by , where is the conjugacy class formed by all -permutations of type . For example, for cyclic derangements and all other , and we obtain , as before.

3 Generating random derangements by random transpositions

Our first approach to generate random

-derangements uniformly distributed over

consists in taking an initial -derangement and to scramble it by random restricted transpositions enough to obtain a sample from the uniform distribution over . By restricted transpositions we mean swaps avoiding pairs for which or . Algorithm T describes the generation of random -derangements according to this idea, where is a constant establishing the amount of random restricted transpositions to be attempted and is a computer generated pseudorandom uniform deviate in .

Remark 3.1.

Algorithm T is applicable only for , since it is not possible to connect the even permutations and by a single transposition.

A good choice for the initial derangement in Algorithm T is any cyclic derangement (cycle length ), for example, . A particularly bad choice would be an involution ( even, cycle length ), for example, , because then the algorithm would never be able to generate derangements with . (Incidentally, this suggests the use of Algorithm T to generate random fixed-point-free involutions.) To avoid this problem we hardcoded the requirement that Algorithm T starts with a cyclic derangement. If several parallel streams of random derangements are needed, one can set different initial random cyclic derangements from a one-line implementation of Sattolo’s algorithm.

Remark 3.2.

The minimum number of transpositions necessary to take a cyclic -derangement into a -cycle -derangement is , , since transpositions of labels that belong to the same cycle split it into two cycles,


and, conversely, transpositions involving labels of different cycles join them into a single cycle. If Algorithm T is started with a cyclic derangement then one must set .

0:  Initial cyclic -derangement
1:   number of restricted transpositions to attempt
2:  for  to  do
3:      ,
4:      if  then
5:           swap
6:      end if
7:  end for
7:  For sufficiently large , is a random derangement uniformly distributed in
Algorithm T Random derangements by random restricted transpositions

We run Algorithm T for and different values of and collect data. Simulations were performed on Intel Xeon E5-1650 v3 processors running -O3 compiler-optimized C code (GCC v. ) over Linux kernel at  GHz, while the numbers (3) were calculated on the software package Mathematica 11.3 wolfram . We draw our pseudo-random numbers from Vigna’s superb xoshiro256+ generator xoshiro . Our results appear in Table 1. We see from that table that with random restricted transpositions there is a slight excess of probability mass in the lower -cycle sets with , and . Trying to scramble the initial -derrangement by restricted transpositions performs better. The difference between attempting and random restricted transpositions is much less pronounced. Figures for derangements of higher cycle number fluctuate more due to the finite size of the sample. The data suggest that Algorithm T can generate a random -derangement uniformly distributed on with random restricted transpositions, employing pseudorandom numbers in the process. This is further discussed in Section 5.

Remark 3.3.

It is a classic result that transpositions are needed before a shuffle by unrestricted transpositions becomes “sufficiently random” aldous ; persi . A similar analysis for random restricted transpositions over derangements is complicated by the fact that derangements do not form a group. Recently, the analysis of the spectral gap of the Markov transition kernel of the process provided the upper bound , with and a decreasing function of aaron . This bound results from involved estimations and approximations and may not be very accurate. We are not aware of other rigorous results on this particular problem. Related results for the mixing time of the random transposition walk on permutations with “one-sided restrictions” ( for given ) appear in hanlon .

Cycles Algorithm T () Algorithm S Exact
Eqs. (1)–(3)
runtime (sec)
Table 1: Proportion of -derangements in measured in samples generated by Algorithms T and S for . The notation reads . Data for Algorithm S are based on a run with a ratio of completed/attempted derangements of . Runtimes are given for comparison.

4 Sequential importance sampling of derangements

4.1 The SIS algorithm

Sequential importance sampling (SIS) is an importance sampling scheme with the sampling weights built up sequentially. The idea is particularly suited to sample composite objects from a complicated sample space with many restrictions and for which the high-dimensional volume , from which the uniform distribution follows, may not be easily calculable. However, since we can always write


we can think of “telescoping” the sampling of by first sampling , then use the updated information brought by the knowledge of to sample and so on. In Monte Carlo simulations, the right-hand side of (7) actually becomes , with the distributions estimated or inferred incrementally based on approximate weighting functions for the partial objects . Expositions of the SIS framework of interest to what follows appear in cdhl ; pd-rlg-sph .

Algorithm S

describes a SIS algorithm to generate random derangements inspired by the analogous problem of sampling contingency tables with restrictions

cdhl ; pd-rlg-sph as well as by the problem of estimating the permanent of a matrix kuznetsov ; liang ; rasmussen . Our presentation of Algorithm S is not the most efficient for implementation; the auxiliary sets , for instance, are not actually needed and were included only to facilitate the analysis of the algorithm, and the tests in line 4 can be reduced to a single test in the last pass, since all except perhaps .

The distribution of cycle lengths in derangements generated by Algorithm S is presented in Table 1; there is excellent agreement between the data and the exact values.

2:  for  to  do
4:      if  then
5:           choose uniformly at random
8:      else
9:           fail
10:      end if
11:  end for
11:  If completed, is a random derangement uniformly distributed in
Algorithm S Random derangements by sequential importance sampling

4.2 Failure probability of the SIS algorithm

In the th pass of the loop in Algorithm S, can pick (lines 56) one of either or labels, depending on whether label has already been picked. This guarantees the construction of the -derangement till the last but one element . The -derangement is completed only if the last remaining label is different from , such that does not pick . The probability that Algorithm S fails is thus given by


Now, according to Algorithm S, line 5, we have


where we write for in the argument of the expectation to improve readability, and the failure probability becomes


where stand for . Expression (10) is but a probabilistic version of the inclusion-exclusion principle known as Poincaré’s formula in elementary probability.

The explicit computation of (10) is a cumbersome business, and we will not pursue it here. Otherwise, the following theorem establishes an easy upper bound on the failure probability of Algorithm S.

Theorem 4.1.

Algorithm S fails with probability .


Recall that in the th pass of the loop in Algorithm S we have or , such that the expectation obeys


and it immediately follows that


Remark 4.2.

In A we provide an improved upper bound to based on an “independent sets” approximation that works remarkably well.

The measured failure rate for the SIS data in Table 1 is , not far from . A sample of runs of Algorithm S of derangements each with gives an average failure rate of

with a sample minimum of 0.014130 and maximum of 0.014991, where the digits within parentheses indicate the uncertainty at one standard deviation in the corresponding last digits of the datum. Figure 

1 depicts Monte Carlo data for the failure probability (10) against the upper bound . Each data point was obtained as an average over runs of Algorithm S of derangements each except for , for which the runs are of derangements each.

Figure 1: Measured failure rate for Algorithm S against the upper bound . Error bars in the data are much smaller than the symbols shown.

4.3 Uniformity of the SIS algorithm

In the SIS approach, the ensuing sampling probabilities may deviate considerably from the uniform distribution. An important aspect of Algorithm S is that it generates each with probability , as the data in Table 1 suggest. We prove that this is indeed the case based on related results in cdhl ; kuznetsov ; rasmussen .

Definition 4.3.

We call the matrix with on the diagonal and elsewhere the derangement matrix of order .

The following well known property of clarifies its denomination and will be useful later.

Lemma 4.4.

The permanent of is given by the th rencontre number .


The result is classic and we only outline its proof; for details consult brualdi . The permanent of is given by


where the first sum is over all permutations of and the other sum is the Laplace (row) development of the permanent in terms of the submatrices obtained from by deleting its th row and th column. Perusal of the Laplace development of twice eventually leads to the recursion relation with and . But this is exactly the recursion relation for the rencontre numbers. ∎

The next lemma elucidates the close relationship between the sets of Algorithm S and .

Lemma 4.5.

The product of the , , generated by Algorithm S

is a one-sample unbiased estimator for



We have to prove that . From Algorithm S, lines 3 and 7, it follows, with for some realization of , that


where is the indicator function that equals if is true and otherwise. Now if we write for and multiply the expectations (14) we find that


Expression (15) is nothing but the expression for the permanent of a matrix with elements if and otherwise, that is, of the derangement matrix , and by Lemma 4.4 it follows that . ∎

Theorem 4.6.

Algorithm S samples uniformly.


Algorithm S generates permutations with probability . Since


we have


By Lemmas 4.4 and 4.5 the r. h. s. of (17) equals , and we are golden. ∎

5 Mixing time of the restricted transpositions shuffle

To shed some light on the question of how many random restricted transpositions are necessary to generate random derangements uniformly over , we investigate the convergence of Algorithm T numerically. This can be done by monitoring the evolution of the empirical probabilities along the run of the algorithm towards the exact probabilities given by (2).

Let be the measure that puts mass on the set and be the empirical measure


where is the derangement obtained after attempting restricted transpositions by Algorithm T on a given initial derangement . The total variance distance between and is given by aldous ; persi


The right-hand side of (19) can be seen as the “histogram distance” between and in the norm. Clearly, . This distance allows us to define as the time it takes for to fall within distance of ,


It is usual to define the mixing time by setting or

, this last figure being reminiscent of the spectral analysis of Markov chains. We set

. This choice is motivated by the following pragmatic reasons:

  • We want the derangements output by Algorithm T to be as uniformly distributed over as possible, so the smaller the the better the assessment of the algorithm and the choice of the constant ;

  • Most of the probability mass is concentrated on a few cycle numbers (see Table 1 and Remark 5.1 below), such that even relatively small differences between and are likely to induce noticeable biases in the output of Algorithm T;

  • With we found that , meaning that not even every possible derangement had chance to be generated if the initial derangement is cyclic (see Remark 3.2).

Remark 5.1.

It is well known that the number of -cycles of random

-permutations is Poisson distributed with mean

, such that as the CLT implies that the length of the cycles of random permutations follow a normal ditribution with mean and variance ; see, e. g., arratia and the references there in. soria proved that the same holds for permutations with no cycles of length less than a given , using complex asymptotics of exponential generating functions; analco and iran provide the analysis for the particular case of derangements. Figure 2 displays the exact distribution of -cycles for derangements with together with the normal density . For we obtain from equations (2)–(3) that and , while and . The distribution of cycle lengths in Figure 2 indeed looks close to a normal

, albeit slightly skewed. We did not go to greater

because Stirling numbers of the first kind are notoriously hard to compute even by computer algebra systems running on modern workstations. Recently, the cycle structure of certain types of restricted permutations (with ) was also shown to be asymptotically normal ozel .

Figure 2: Distribution of cycle lengths of random -derangements for together with the normal densities (shorter, in blue) and (taller, in orange) with and .

Starting with a cyclic derangement, i. e., with and all other , we run Algorithm T and collect statistics on . Figure 3 displays the average over runs for . The behavior of does not show sign of the cutoff phenomenon—a sharp transition from unmixed state () to mixed state () over a small window of time . Table 2 lists the average obtained over samples for larger derangements at . An adjustment of the data to the form


furnishes and . Our data thus suggest that with , roughly an lower than the upper bound given by aaron . It is tempting to conjecture that (and, perhaps, that ) exactly, cf. last two lines of Table 2, although our data do not support the case unequivocally.

Figure 3: Total variance distance (averaged over runs) between the empirical measure (with ) and the stationary measure of the process defined by Algorithm T for . The dashed line indicates the level .
Table 2: Mixing time evaluated at obtained from an average trajectory over samples; see Figure 3. The second line displays the best guess to involving only integer and semi-integer exponents. The last line displays the adjusted value of supposing a dependence like in (21) with .

6 Summary and conclusions

While simple rejection-sampling generates random derangements with an acceptance rate of , thus being (plus the cost of verifying if the permutation generated is a derangement, which does not impact the complexity of the algorithm but impacts its runtime), Sattolo’s algorithm only generates cyclic derangements, and Martínez-Panholzer-Prodinger algorithm, with guaranteed uniformity, is , we described two procedures, Algorithms T and S, that are competitive for the efficient uniform generation of random derangements. We note in passing that Algorithm T can easily be used also to generate random fixed-point-free involutions.

We found, empirically, that random restricted transpositions with suffice to spread an initial -derangement over uniformly. The fact that for all as long as and explains the good statistics (see Table 1) displayed by Algorithm T with . There are currently very few analytical results on the mixing time of the random restricted transposition walk implemented by Algorithm T; the upper bound obtained in aaron is roughly above our empirical findings.

Algorithm T employs pseudorandom numbers and Algorithm S employs pseudorandom numbers to generate an -derangement uniformly distributed over . In this way, even if we set with some , both algorithms perform better than currently known methods, with comparable runtime performances between them.


The author thanks Aaron Smith (U. Ottawa) for useful correspondence and suggestions on a previous version of the manuscript, the LPTMS for kind hospitality during a sabbatical leave in France, and FAPESP (Brazil) for partial support through grants nr. 2015/21580-0 and 2017/22166-9.

Appendix A Improved upper bound for the failure probability of Algorithm S

In the th pass of the loop in Algorithm S we have


The difficulty in the calculation of resides in the calculation of . We can simplify this calculation by ignoring the conditioning of the event on the event , i. e., by ignoring correlations between the values assumed by the many along a “path” in the algorithm. This approximation is clearly better in the beginning of the construction of , when is small, than later. In doing so we obtain


such that . This approximate is greater than the true , because conditioning on can only restrict, not enlarge, the set of indices available to . Numerical evidence supports this claim. We thus have that the approximate value of is greater than its true value, and the failure probability (10) becomes bounded by


The surprisingly excellent agreement between this upper bound and the Monte Carlo data can be appreciated in Figure 4; compare with Figure 1. Note how the data lie only very slightly below the upper bound (24).

Figure 4: Measured failure rate for Algorithm S against the upper bound (24). Error bars in the data are much smaller than the symbols shown.


  • (1) D. Aldous, J. A. Fill, Reversible Markov Chains and Random Walks on Graphs. Unfinished monograph (recompiled 2014) available at http://www.stat.berkeley.edu/~aldous/RWG/book.html.
  • (2) S. G. Akl, A new algorithm for generating derangements, BIT Numer. Math. 20 (1) (1989) 2–7.
  • (3) R. Arratia, A. D. Barbour, S. Tavaré, Logarithmic Combinatorial Structures: A Probabilistic Approach. EMS, Zürich (2003).
  • (4) A. Bacher, O. Bodini, H.-K. Hwang, T.-H. Tsai, Generating random permutations by coin tossing: Classical algorithms, new analysis, and modern implementation, ACM Trans. Algorithms 13 (2) (2017) 24.
  • (5) J. L. Baril, V. Vajnovszki, Gray code for derangements, Discrete Appl. Math. 140 (1–3) (2004) 207–221.
  • (6) I. Beichl, F. Sullivan, Approximating the permanent via importance sampling with application to the dimer covering problem, J. Comput. Phys. 149 (1) (1999) 128–147.
  • (7) R. A. Brualdi, R. J. Ryser, Combinatorial Matrix Theory, Cambridge University Press, Cambridge, 1991.
  • (8) C. A. Charalambides, Enumerative Combinatorics, Chapman & Hall/CRC, Boca Raton, 2002.
  • (9) Y. Chen, P. Diaconis, S. P. Holmes, J. S. Liu, Sequential Monte Carlo methods for statistical analysis of tables, J. Am. Stat. Assoc. 100 (469) (2005) 109–120.
  • (10) P. Diaconis, Group Representations in Probability and Statistics, IMS, Hayward (1988).
  • (11) P. Diaconis, R. L. Graham, S. P. Holmes, Statistical problems involving permutations with restricted positions, in: M. de Gunst, C. Klaassen, A. Van der Vaart (Eds.), State of the Art in Probability and Statistics: Festschrift for Willem R. van Zwet, IMS, Beachwood, 2001, pp. 195–222.
  • (12) P. Flajolet, M. Soria, Gaussian limiting distributions for the number of components in combinatorial structures, J. Comb. Theor. Ser. A 53 (2) (1990) 165–182.
  • (13) D. Gries, J. Xue, Generating a random cyclic permutation, BIT Numer. Math. 28 (3) (1988) 569–572.
  • (14) P. Hanlon, A random walk on the rook placements on a Ferrer’s board, Electron. J. Comb. 3 (2) (1996) 26.
  • (15) J. F. Korsh, P. S. LaFollette, Constant time generation of derangements, Inf. Process. Lett. 90 (4) (2004) 181–186.
  • (16) N. Y. Kuznetsov, Computing the permanent by importance sampling method, Cybern. Syst. Anal. 32 (6) (1996) 749–755.
  • (17) H. Liang, L. Shi, F. Bai, X. Liu, Random path method with pivoting for computing permanents of matrices, Appl. Math. Comput. 185 (1) (2007) 59–71.
  • (18) C. Martínez, A. Panholzer, H. Prodinger, Generating random derangements, in: R. Sedgewick, W. Szpankowski (Eds.), 2008 Proc. Fifth Workshop on Analytic Algorithmics and Combinatorics – ANALCO, SIAM, Philadelphia, 2008, pp. 234–240.
  • (19) E. Ozel, The number of -cycles in a family of restricted permutations, arXiv:1710.07885 (2017).
  • (20) A. Panholzer, H. Prodinger, M. Riedel, Measuring post-quickselect disorder, J. Iran. Stat. Soc. 3 (2) (2004) 219–249.
  • (21) H. Prodinger, On the analysis of an algorithm to generate a random cyclic permutation, Ars Comb. 65 (2002) 75–78.
  • (22) L. E. Rasmussen, Approximating the permanent: A simple approach, Random Struct. Algor. 5 (2) (1994) 349–361.
  • (23) S. Sattolo, An algorithm to generate a random cyclic permutation, Inf. Process. Lett. 22 (6) (1986) 315–317.
  • (24) R. Sedgewick, Permutation generation methods, Comput. Surv. 9 (2) (1977) 137–164.
  • (25) A. Smith, Comparison theory for Markov chains on different state spaces and application to random walk on derangements, J. Theor. Probab. 28 (4) (2015) 1406–1430.
  • (26) S. Vigna, xoshiro/xoroshiro generators and the PRNG shootout. Available at http://xoshiro.di.unimi.it/.
  • (27) M. C. Wilson, Random and exhaustive generation of permutations and cycles, Ann. Comb. 12 (4) (2009) 509–520.
  • (28) Wolfram Research, Inc., Mathematica, Version 11.3, Champaign, IL (2018).