1 Introduction
Derangements are permutations on labels such that for all , . Derangements are useful in a number of applications like in the testing of software branch instructions and random paths and data randomization and experimental design bacher ; pdrlgsph ; sedgewick . A well known algorithm to generate random derangements is Sattolo’s algorithm, that outputs a random cyclic derangement in time gries ; prodinger ; sattolo ; wilson . An algorithm to generate random derangements in general (not only cyclic derangements) has been given in analco ; iran . Algorithms to generate all derangements in lexicographic or Gray order have also been developed akl ; baril ; korsh .
In this paper we propose two procedures to generate random derangements with the expected distribution of cycle lengths: one based on the randomization of derangements by random restricted transpositions (a random walk in the set of derangements) and the other based on a simple sequential importance sampling scheme. The generation of restricted permutations by means of sequential importance sampling methods is closely related with the problem of estimating the permanent of a matrix, an important problem in, e. g., graph theory, statistical mechanics, and experimental design
beichl ; brualdi ; pdrlgsph . Simulations show that the randomization algorithm samples a derangement uniformly in time, where is the size of the derangement and , while the sequential importance sampling algorithm does it in time but with a small probability of failing. The algorithms are straighforward to understand and implement and can be modified to perform related computations of interest in many areas.2 Mathematical preliminaries
Let us briefly recapitulate some notation and terminology on permutations; for detailed accounts see arratia ; charalambides .
We denote a permutation of a set of labels (an permutation), formally a bijection of onto itself, by , where . If and are two permutations, their product is given by the composition . A cycle of length in a permutation is a sequence of indices such that , …, , and , completing the cycle. Fixed points are cycles, transpositions are cycles. An permutation with cycles of length , , is said to be of type , with . For example, the permutation has cycles and is of type , where we have omitted the trailing .
The number of permutations with cycles is given by the unsigned Stirling number of the first kind . Useful formulae involving these numbers are , , and the recursion relation . We have , counting just the identity permutation , , counting permutations with fixed points, that can be taken in different ways, plus a transposition of the remaining two labels, and , the number of cyclic derangements. It can also be shown that , where is the th harmonic number. Obviously, , the total number of permutations.
Let us denote the set of all derangements by . It is well known that
(1) 
the socalled rencontres (or subfactorial, ) numbers. Let us also denote the set of cycle derangements, irrespective of their type, by . Note that for . If we want to generate random derangements uniformly over , we must be able to generate cycle random derangements with probabilities
(2) 
where . The following proposition establishes the (little known, hardtofind) cardinality of the sets .
Proposition 2.1.
The cardinality of the set is given by
(3) 
Proof.
The number of permutations with cycles is . Of these, have at least one fixed point, have at least two fixed points, and so on. Perusal of the inclusionexclusion principle then furnishes the result. ∎
Proposition 2.2.
The numbers obey the recursion relation
(4) 
with and , .
Proposition 2.2 follows from (3) by index manipulations and the properties of the binomial coefficients and Stirling numbers of the first kind. The numbers are sometimes called associated Stirling number of the first kind. Equation (4) generalizes the recursion relation for the rencontres numbers. Equation (3) recovers and for , while we find that for . Another noteworthy identity, valid for even, is , the number of derangements with the property that , a. k. a. fixedpointfree involutions. We see that already for small we have and .
Remark 2.3.
One can consider the distribution of derangements over possible cycle types (instead of cycle lengths) for a “finer” view of the distribution. The number of permutations of type is given by Cauchy’s formula
(5) 
The analogue of (2) is given by , where is the conjugacy class formed by all permutations of type . For example, for cyclic derangements and all other , and we obtain , as before.
3 Generating random derangements by random transpositions
Our first approach to generate random
derangements uniformly distributed over
consists in taking an initial derangement and to scramble it by random restricted transpositions enough to obtain a sample from the uniform distribution over . By restricted transpositions we mean swaps avoiding pairs for which or . Algorithm T describes the generation of random derangements according to this idea, where is a constant establishing the amount of random restricted transpositions to be attempted and is a computer generated pseudorandom uniform deviate in .Remark 3.1.
Algorithm T is applicable only for , since it is not possible to connect the even permutations and by a single transposition.
A good choice for the initial derangement in Algorithm T is any cyclic derangement (cycle length ), for example, . A particularly bad choice would be an involution ( even, cycle length ), for example, , because then the algorithm would never be able to generate derangements with . (Incidentally, this suggests the use of Algorithm T to generate random fixedpointfree involutions.) To avoid this problem we hardcoded the requirement that Algorithm T starts with a cyclic derangement. If several parallel streams of random derangements are needed, one can set different initial random cyclic derangements from a oneline implementation of Sattolo’s algorithm.
Remark 3.2.
The minimum number of transpositions necessary to take a cyclic derangement into a cycle derangement is , , since transpositions of labels that belong to the same cycle split it into two cycles,
(6) 
and, conversely, transpositions involving labels of different cycles join them into a single cycle. If Algorithm T is started with a cyclic derangement then one must set .
We run Algorithm T for and different values of and collect data. Simulations were performed on Intel Xeon E51650 v3 processors running O3 compileroptimized C code (GCC v. ) over Linux kernel at GHz, while the numbers (3) were calculated on the software package Mathematica 11.3 wolfram . We draw our pseudorandom numbers from Vigna’s superb xoshiro256+ generator xoshiro . Our results appear in Table 1. We see from that table that with random restricted transpositions there is a slight excess of probability mass in the lower cycle sets with , and . Trying to scramble the initial derrangement by restricted transpositions performs better. The difference between attempting and random restricted transpositions is much less pronounced. Figures for derangements of higher cycle number fluctuate more due to the finite size of the sample. The data suggest that Algorithm T can generate a random derangement uniformly distributed on with random restricted transpositions, employing pseudorandom numbers in the process. This is further discussed in Section 5.
Remark 3.3.
It is a classic result that transpositions are needed before a shuffle by unrestricted transpositions becomes “sufficiently random” aldous ; persi . A similar analysis for random restricted transpositions over derangements is complicated by the fact that derangements do not form a group. Recently, the analysis of the spectral gap of the Markov transition kernel of the process provided the upper bound , with and a decreasing function of aaron . This bound results from involved estimations and approximations and may not be very accurate. We are not aware of other rigorous results on this particular problem. Related results for the mixing time of the random transposition walk on permutations with “onesided restrictions” ( for given ) appear in hanlon .
Cycles  Algorithm T ()  Algorithm S  Exact  
—  Eqs. (1)–(3)  
runtime (sec)  — 
4 Sequential importance sampling of derangements
4.1 The SIS algorithm
Sequential importance sampling (SIS) is an importance sampling scheme with the sampling weights built up sequentially. The idea is particularly suited to sample composite objects from a complicated sample space with many restrictions and for which the highdimensional volume , from which the uniform distribution follows, may not be easily calculable. However, since we can always write
(7) 
we can think of “telescoping” the sampling of by first sampling , then use the updated information brought by the knowledge of to sample and so on. In Monte Carlo simulations, the righthand side of (7) actually becomes , with the distributions estimated or inferred incrementally based on approximate weighting functions for the partial objects . Expositions of the SIS framework of interest to what follows appear in cdhl ; pdrlgsph .
Algorithm S
describes a SIS algorithm to generate random derangements inspired by the analogous problem of sampling contingency tables with restrictions
cdhl ; pdrlgsph as well as by the problem of estimating the permanent of a matrix kuznetsov ; liang ; rasmussen . Our presentation of Algorithm S is not the most efficient for implementation; the auxiliary sets , for instance, are not actually needed and were included only to facilitate the analysis of the algorithm, and the tests in line 4 can be reduced to a single test in the last pass, since all except perhaps .4.2 Failure probability of the SIS algorithm
In the th pass of the loop in Algorithm S, can pick (lines 5–6) one of either or labels, depending on whether label has already been picked. This guarantees the construction of the derangement till the last but one element . The derangement is completed only if the last remaining label is different from , such that does not pick . The probability that Algorithm S fails is thus given by
(8) 
Now, according to Algorithm S, line 5, we have
(9) 
where we write for in the argument of the expectation to improve readability, and the failure probability becomes
(10) 
where stand for . Expression (10) is but a probabilistic version of the inclusionexclusion principle known as Poincaré’s formula in elementary probability.
The explicit computation of (10) is a cumbersome business, and we will not pursue it here. Otherwise, the following theorem establishes an easy upper bound on the failure probability of Algorithm S.
Theorem 4.1.
Algorithm S fails with probability .
Proof.
Recall that in the th pass of the loop in Algorithm S we have or , such that the expectation obeys
(11) 
and it immediately follows that
(12) 
∎
Remark 4.2.
In A we provide an improved upper bound to based on an “independent sets” approximation that works remarkably well.
The measured failure rate for the SIS data in Table 1 is , not far from . A sample of runs of Algorithm S of derangements each with gives an average failure rate of
with a sample minimum of 0.014130 and maximum of 0.014991, where the digits within parentheses indicate the uncertainty at one standard deviation in the corresponding last digits of the datum. Figure
1 depicts Monte Carlo data for the failure probability (10) against the upper bound . Each data point was obtained as an average over runs of Algorithm S of derangements each except for , for which the runs are of derangements each.4.3 Uniformity of the SIS algorithm
In the SIS approach, the ensuing sampling probabilities may deviate considerably from the uniform distribution. An important aspect of Algorithm S is that it generates each with probability , as the data in Table 1 suggest. We prove that this is indeed the case based on related results in cdhl ; kuznetsov ; rasmussen .
Definition 4.3.
We call the matrix with on the diagonal and elsewhere the derangement matrix of order .
The following well known property of clarifies its denomination and will be useful later.
Lemma 4.4.
The permanent of is given by the th rencontre number .
Proof.
The result is classic and we only outline its proof; for details consult brualdi . The permanent of is given by
(13) 
where the first sum is over all permutations of and the other sum is the Laplace (row) development of the permanent in terms of the submatrices obtained from by deleting its th row and th column. Perusal of the Laplace development of twice eventually leads to the recursion relation with and . But this is exactly the recursion relation for the rencontre numbers. ∎
The next lemma elucidates the close relationship between the sets of Algorithm S and .
Lemma 4.5.
Proof.
We have to prove that . From Algorithm S, lines 3 and 7, it follows, with for some realization of , that
(14) 
where is the indicator function that equals if is true and otherwise. Now if we write for and multiply the expectations (14) we find that
(15) 
Expression (15) is nothing but the expression for the permanent of a matrix with elements if and otherwise, that is, of the derangement matrix , and by Lemma 4.4 it follows that . ∎
Theorem 4.6.
Algorithm S samples uniformly.
5 Mixing time of the restricted transpositions shuffle
To shed some light on the question of how many random restricted transpositions are necessary to generate random derangements uniformly over , we investigate the convergence of Algorithm T numerically. This can be done by monitoring the evolution of the empirical probabilities along the run of the algorithm towards the exact probabilities given by (2).
Let be the measure that puts mass on the set and be the empirical measure
(18) 
where is the derangement obtained after attempting restricted transpositions by Algorithm T on a given initial derangement . The total variance distance between and is given by aldous ; persi
(19) 
The righthand side of (19) can be seen as the “histogram distance” between and in the norm. Clearly, . This distance allows us to define as the time it takes for to fall within distance of ,
(20) 
It is usual to define the mixing time by setting or
, this last figure being reminiscent of the spectral analysis of Markov chains. We set
. This choice is motivated by the following pragmatic reasons:
We want the derangements output by Algorithm T to be as uniformly distributed over as possible, so the smaller the the better the assessment of the algorithm and the choice of the constant ;

With we found that , meaning that not even every possible derangement had chance to be generated if the initial derangement is cyclic (see Remark 3.2).
Remark 5.1.
It is well known that the number of cycles of random
permutations is Poisson distributed with mean
, such that as the CLT implies that the length of the cycles of random permutations follow a normal ditribution with mean and variance ; see, e. g., arratia and the references there in. soria proved that the same holds for permutations with no cycles of length less than a given , using complex asymptotics of exponential generating functions; analco and iran provide the analysis for the particular case of derangements. Figure 2 displays the exact distribution of cycles for derangements with together with the normal density . For we obtain from equations (2)–(3) that and , while and . The distribution of cycle lengths in Figure 2 indeed looks close to a normal, albeit slightly skewed. We did not go to greater
because Stirling numbers of the first kind are notoriously hard to compute even by computer algebra systems running on modern workstations. Recently, the cycle structure of certain types of restricted permutations (with ) was also shown to be asymptotically normal ozel .Starting with a cyclic derangement, i. e., with and all other , we run Algorithm T and collect statistics on . Figure 3 displays the average over runs for . The behavior of does not show sign of the cutoff phenomenon—a sharp transition from unmixed state () to mixed state () over a small window of time . Table 2 lists the average obtained over samples for larger derangements at . An adjustment of the data to the form
(21) 
furnishes and . Our data thus suggest that with , roughly an lower than the upper bound given by aaron . It is tempting to conjecture that (and, perhaps, that ) exactly, cf. last two lines of Table 2, although our data do not support the case unequivocally.
6 Summary and conclusions
While simple rejectionsampling generates random derangements with an acceptance rate of , thus being (plus the cost of verifying if the permutation generated is a derangement, which does not impact the complexity of the algorithm but impacts its runtime), Sattolo’s algorithm only generates cyclic derangements, and MartínezPanholzerProdinger algorithm, with guaranteed uniformity, is , we described two procedures, Algorithms T and S, that are competitive for the efficient uniform generation of random derangements. We note in passing that Algorithm T can easily be used also to generate random fixedpointfree involutions.
We found, empirically, that random restricted transpositions with suffice to spread an initial derangement over uniformly. The fact that for all as long as and explains the good statistics (see Table 1) displayed by Algorithm T with . There are currently very few analytical results on the mixing time of the random restricted transposition walk implemented by Algorithm T; the upper bound obtained in aaron is roughly above our empirical findings.
Acknowledgments
The author thanks Aaron Smith (U. Ottawa) for useful correspondence and suggestions on a previous version of the manuscript, the LPTMS for kind hospitality during a sabbatical leave in France, and FAPESP (Brazil) for partial support through grants nr. 2015/215800 and 2017/221669.
Appendix A Improved upper bound for the failure probability of Algorithm S
In the th pass of the loop in Algorithm S we have
(22) 
The difficulty in the calculation of resides in the calculation of . We can simplify this calculation by ignoring the conditioning of the event on the event , i. e., by ignoring correlations between the values assumed by the many along a “path” in the algorithm. This approximation is clearly better in the beginning of the construction of , when is small, than later. In doing so we obtain
(23) 
such that . This approximate is greater than the true , because conditioning on can only restrict, not enlarge, the set of indices available to . Numerical evidence supports this claim. We thus have that the approximate value of is greater than its true value, and the failure probability (10) becomes bounded by
(24) 
The surprisingly excellent agreement between this upper bound and the Monte Carlo data can be appreciated in Figure 4; compare with Figure 1. Note how the data lie only very slightly below the upper bound (24).
References
 (1) D. Aldous, J. A. Fill, Reversible Markov Chains and Random Walks on Graphs. Unfinished monograph (recompiled 2014) available at http://www.stat.berkeley.edu/~aldous/RWG/book.html.
 (2) S. G. Akl, A new algorithm for generating derangements, BIT Numer. Math. 20 (1) (1989) 2–7.
 (3) R. Arratia, A. D. Barbour, S. Tavaré, Logarithmic Combinatorial Structures: A Probabilistic Approach. EMS, Zürich (2003).
 (4) A. Bacher, O. Bodini, H.K. Hwang, T.H. Tsai, Generating random permutations by coin tossing: Classical algorithms, new analysis, and modern implementation, ACM Trans. Algorithms 13 (2) (2017) 24.
 (5) J. L. Baril, V. Vajnovszki, Gray code for derangements, Discrete Appl. Math. 140 (1–3) (2004) 207–221.
 (6) I. Beichl, F. Sullivan, Approximating the permanent via importance sampling with application to the dimer covering problem, J. Comput. Phys. 149 (1) (1999) 128–147.
 (7) R. A. Brualdi, R. J. Ryser, Combinatorial Matrix Theory, Cambridge University Press, Cambridge, 1991.
 (8) C. A. Charalambides, Enumerative Combinatorics, Chapman & Hall/CRC, Boca Raton, 2002.
 (9) Y. Chen, P. Diaconis, S. P. Holmes, J. S. Liu, Sequential Monte Carlo methods for statistical analysis of tables, J. Am. Stat. Assoc. 100 (469) (2005) 109–120.
 (10) P. Diaconis, Group Representations in Probability and Statistics, IMS, Hayward (1988).
 (11) P. Diaconis, R. L. Graham, S. P. Holmes, Statistical problems involving permutations with restricted positions, in: M. de Gunst, C. Klaassen, A. Van der Vaart (Eds.), State of the Art in Probability and Statistics: Festschrift for Willem R. van Zwet, IMS, Beachwood, 2001, pp. 195–222.
 (12) P. Flajolet, M. Soria, Gaussian limiting distributions for the number of components in combinatorial structures, J. Comb. Theor. Ser. A 53 (2) (1990) 165–182.
 (13) D. Gries, J. Xue, Generating a random cyclic permutation, BIT Numer. Math. 28 (3) (1988) 569–572.
 (14) P. Hanlon, A random walk on the rook placements on a Ferrer’s board, Electron. J. Comb. 3 (2) (1996) 26.
 (15) J. F. Korsh, P. S. LaFollette, Constant time generation of derangements, Inf. Process. Lett. 90 (4) (2004) 181–186.
 (16) N. Y. Kuznetsov, Computing the permanent by importance sampling method, Cybern. Syst. Anal. 32 (6) (1996) 749–755.
 (17) H. Liang, L. Shi, F. Bai, X. Liu, Random path method with pivoting for computing permanents of matrices, Appl. Math. Comput. 185 (1) (2007) 59–71.
 (18) C. Martínez, A. Panholzer, H. Prodinger, Generating random derangements, in: R. Sedgewick, W. Szpankowski (Eds.), 2008 Proc. Fifth Workshop on Analytic Algorithmics and Combinatorics – ANALCO, SIAM, Philadelphia, 2008, pp. 234–240.
 (19) E. Ozel, The number of cycles in a family of restricted permutations, arXiv:1710.07885 (2017).
 (20) A. Panholzer, H. Prodinger, M. Riedel, Measuring postquickselect disorder, J. Iran. Stat. Soc. 3 (2) (2004) 219–249.
 (21) H. Prodinger, On the analysis of an algorithm to generate a random cyclic permutation, Ars Comb. 65 (2002) 75–78.
 (22) L. E. Rasmussen, Approximating the permanent: A simple approach, Random Struct. Algor. 5 (2) (1994) 349–361.
 (23) S. Sattolo, An algorithm to generate a random cyclic permutation, Inf. Process. Lett. 22 (6) (1986) 315–317.
 (24) R. Sedgewick, Permutation generation methods, Comput. Surv. 9 (2) (1977) 137–164.
 (25) A. Smith, Comparison theory for Markov chains on different state spaces and application to random walk on derangements, J. Theor. Probab. 28 (4) (2015) 1406–1430.
 (26) S. Vigna, xoshiro/xoroshiro generators and the PRNG shootout. Available at http://xoshiro.di.unimi.it/.
 (27) M. C. Wilson, Random and exhaustive generation of permutations and cycles, Ann. Comb. 12 (4) (2009) 509–520.
 (28) Wolfram Research, Inc., Mathematica, Version 11.3, Champaign, IL (2018).