1 Introduction
The search for socalled quantum computational supremacy has garnered a great deal of interest and investment in recent years. One of the most promising candidates for this goal was introduced at STOC ’11 by Aaronson and Arkhipov who described an experimental setup in linear optics known as Boson Sampling.
Since its introduction, Boson Sampling has attracted a great deal of attention with numerous experimental efforts around the world attempting implementations for various problem sizes (see e.g. Bentivegna et al., 2015; Spring et al., 2013; Broome et al., 2013; Tillmann et al., 2013; Crespi et al., 2013; Spagnolo et al., 2014; Latmiral et al., 2016). The ultimate goal is to exhibit a physical quantum experiment of such a scale that it would be hard if not impossible to simulate the output classically and thereby to establish socalled ‘quantum supremacy’. In terms of the physical Boson Sampling experiment, corresponds to the number of photons and the number of output modes and increasing either of these is difficult in practice. Progress has therefore been slow (see Lund et al., 2017, and the references therein) with the experimental record until recently being (Wang et al., 2016). However in a breakthrough result in 2019, Wang et al. demonstrated a Boson Sampling experiment with photons and . Their Boson Sampling experimental setup has input photons of which are detected, following the model of Aaronson and Brod (2016).
Phrased in purely mathematical terms, the task is to generate independent random samples from a particular probability distribution on all multisets of size with elements chosen from as follows. It is convenient to represent a multiset by an array consisting of elements of the multiset in nondecreasing order. We denote the set of distinct values of by , with where is the multiplicity of the value in . The cardinality of is known to be – see Feller (1968), for example.
Now let be the complex valued matrix consisting of the first columns of a given
dimensional Haar random unitary matrix,
. For each , build an matrix where the th row of is row in for and define a probability mass function (pmf) on as(1) 
where is the permanent of and in the definition the summation is for all , the set of permutations of . For given , the Boson Sampling problem is to simulate random samples from the pmf either with a quantum photonic device or with classical computing hardware.
Aaronson and Arkhipov show that exact Boson Sampling is not efficiently solvable by a classical computer unless and the polynomial hierarchy collapses to the third level. Although the original proof restricted the range of and a brute force evaluation of the probabilities of each multiset, as a preliminary to random sampling, requires the calculation of permanents of matrices, each one of which takes time with the fastest known algorithms. Previously we gave a faster classical sampling algorithm running in time and linear space (Clifford and Clifford, 2018). This suggested that at least photons would be needed in any Boson Sampling experiment to demonstrate socalled quantum computational supremacy. A more recent fine grained complexity theoretic analysis of Boson Sampling suggests that in fact input photons will be needed to achieve quantum computational supremacy (Dalzell et al., 2018).
In practical terms increasing the number of input photons is not the only difficultly when performing Boson Sampling. Experiments also get increasingly difficult to perform as the number of output modes increases. Current experiments, for example, have typically set the number of modes to be a small multiple of the number of photons and it is likely this trend will continue in the near future.
The original hardness result for exact Boson Sampling of Aaronson and Arkhipov required that as a crucial step in their reduction. For approximate Boson Sampling the number of outputs has to increase at least quadratically with the number of input photons for any known hardness results to apply. On the other hand, there is no existing evidence that approximate Boson Sampling is easier than exact Boson Sampling for any set of parameters. Moreover, Grier and Schaeffer (2016) showed that computing the permanent of a unitary matrix itself is hard. By applying this result to the proof technique of Aaronson and Arkhipov it follows that even with the number of output modes equal to the number of input photons, Boson Sampling cannot be performed in polynomial time classically unless . This raises the question of whether experimentalists should now focus on increasing the number of input photons alone, keeping the number of output modes to be a small multiple of, or even equal to, the number of input modes.
We answer this question in the negative and show that if for constant then it is indeed possible to sample from the Boson Sampling distribution significantly more quickly than was known before. We establish the expected complexity averaged over all realisations. We simplicity, we first give the result for the case . The time complexity for the general case is given in Section 6.
Theorem 1.
The expected time complexity of Boson Sampling with is
The additional space complexity on top of that needed to store the input is .
Our sampling algorithm also applies directly to the closely related problem of scattershot Boson Sampling (Bentivegna et al., 2015). From a mathematical perspective this produces no additional complications beyond the specification of a different matrix. Once the matrix is specified we can apply our new sampling algorithm directly.
2 Related work
Terhal and DiVincenzo (2004) were the first to recognise that studying the complexity of sampling from lowdepth quantum circuits could be useful for demonstrating a separation between quantum and classical computation. Since that time the goal of finding complexity separations for quantum sampling problems has received a great deal of interest. We refer the interested reader to Lund et al. (2017) and Harrow and Montanaro (2017) for surveys on the topic.
The search for faster classical algorithms for Boson Sampling has been of central interest since the problem was first explicitly formulated by Aaronson and Arkhipov
. The first significant breakthrough was a Markov chain Monte Carlo (MCMC) sampling procedure developed by
Neville et al. (2017). The overall approach of MCMC is to take samples from some easy to compute proposal distribution and then to accept them depending on their individual likelihoods. Neville et al. were able to provide numerical evidence that for limited problem sizes only approximately such permanent computations were needed to take one sample approximately from the Boson Sampling distribution. This MCMC approach is however necessarily approximate and does not give provable bounds on the quality of its approximation.In a previous paper (Clifford and Clifford, 2018) we presented an exact Boson Sampling algorithm running in time per sample, costing approximately two permanent calculations of matrices for all values of . This exact sampling algorithm takes a completely different approach from that of MCMC, focussing both on speed and provable correctness. Furthermore, Shchesnovich (2019) argues that our algorithm can achieve greater efficiency in various regimes of and .
3 Methods
3.1 Computing the permanent of low rank matrices
The complexity of computing the permanent of an arbitrary complex matrix was shown to be (Ryser, 1963) and subsequently (Nijenhuis and Wilf, 1978; Glynn, 2010). The computation time is decreased when there are several identical rows or columns, resulting in a matrix of reduced rank (Tichy, 2011; Shchesnovich, 2013, 2019). We will show how these ideas can be implemented in practice to produce a faster algorithm that suits our needs more closely.
Let be the first columns of a complex matrix, . Assume the rows of are distinct and let be a matrix with repeated rows drawn from . Specifically, let be a multiset of size with elements from and let be its associated array of multiplicities, then the th row of will be row of , for .
According to Ryser’s formula
(2) 
The subset can be written as where . Note there are elements in It follows there are ways of choosing a subset of a given size , when is between and . Furthermore whenever .
Ryser’s formula can then be expressed as
(3) 
A straightforward implementation of the formula requires us to iterate over the tuples . Since each term in the summation requires operations, the run time is then as shown by Shchesnovich (2013). Similar complexity is achieved but with an improved constant factor overhead when starting from Glynn’s formula (Chin and Huh, 2018).
To make this iteration as fast as possible we would ideally like to perform the iteration in such a way that at each stage we change only one of the values and these are changed by . This can be achieved using Guan codes (otherwise known as generalised Gray codes) (Guan, 1998), borrowing from the idea of Nijenhuis and Wilf (1978) who used a basic Gray code to speed up Ryser’s algorithm.
Theorem 2.
Using Guan codes, can be calculated in time.
Proof.
There are terms in the outside set of summations in (3). By using Guan’s algorithm we can move through the tuples exhaustively, adding or subtracting from a single element. This means that the set of values can be updated in operations. The product over is a further operations. Updating the relevant binomial term is a operation. The total operation count is then . ∎
3.2 Laplace expansion
We make extensive use of the Laplace expansion for permanents (see Marcus and Minc, 1965, page 578), namely that for any matrix ,
where is the submatrix with row and column removed. Note that only depends on the submatrix of with the th row removed. An important consequence is that when is modified by changing its th row, the modified permanent can be calculated in steps, provided the values are available. As explained in Clifford and Clifford (2018), we can take advantage of this observation to quickly compute a set of permanents of matrices each with one row differing from the other.
We now show that computation of all of the values has the same time complexity as computing , the permanent of a single matrix when there are repeated rows in . We derive this result for matrices with repeated rows using a combination of Ryser’s algorithm and Guan codes.
Lemma 1.
Let be a complex matrix with repeated rows specified by multiplicities and let be submatrices of with row and column removed, . The collection can be evaluated jointly with the same time complexity as that of , with additional space.
Proof of Lemma.
By applying Ryser’s formula (3) to for a given value of we have:
(4) 
where is the multiplicity of repeated rows in and .
Working through values of in Guan code order, the terms can be evaluated in combined time for every new . This is because successive arrays differ by one in a single element. The product of the terms can be computed in time giving time to compute for a single value of , but of course this has to be replicated times to cover all values of .
To compute more efficiently we observe that each product can be expressed as where and are forward and backward cumulative products, with .
We can therefore compute all of the partial products in time, giving an overall total time complexity of for jointly computing , and since , the time complexity is as claimed. Furthermore the computation time has constant factor overheads similar to that of computing . Other than the original matrix, space used is dominated by the two arrays of cumulative products, both of length . ∎
4 Boson Sampling algorithm
Clifford and Clifford (2018) provide the following algorithm for Boson Sampling:
Calculation of the time complexity proceeds as in Clifford and Clifford (2018), with a few modifications to take account of repeated rows in evaluating permanents of the submatrices involved. At stage , let be the multiplicities of the values in . The operation count in applying Lemma 1 is . Using the Laplace expansion for permanents, as described in Section 3.2, an array of length is obtained by summing terms for each . Taking a single sample from the pmf proportional to the array takes time. This gives the time complexity bound for stage of + and hence a total operation count of
(5) 
Importantly at the conclusion of stage , the matrix has been found. In other words at this intermediate stage a random submatrix has been drawn for the Boson Sampling problem with photons and output modes. Equivalently, the multiset formed by sorting in nondecreasing order is then a random sample from the Boson Sampling distribution on .
We now consider the averagecase time complexity when the algorithm is applied with a random choice of Haar unitary matrix, .
5 Marginal uniformity of the Boson Sampling distribution
Using quantum theoretic considerations, Arkhipov and Kuperberg (2012) have shown that the marginal Boson Sampling pmf averaged over Haar random unitaries is uniform on the space of multisets.
Theorem 3 (Arkhipov and Kuperberg).
With as in (1) and
a random matrix drawn from the
dimensional Haar random unitary distribution, the marginal Boson Sampling pmf is given by(6) 
The proof is immediate from quantum theoretic considerations (Arkhipov and Kuperberg, 2012). It can also be derived from properties of the Weingarten function in random matrix theory as follows.
Suppose is an dimensional Haar random unitary matrix. The Weingarten function is defined to be
for and in , where is the complex conjugate of and is the set of permutations of .
Theorem 4.
Let , , and be arrays of positive integers and let
then
As a immediate corollary we have:
Corollary 1.
Proof.
Turning now to the Boson Sampling distribution on multisets of size with elements in where a multiset is represented by an array consisting of elements of in nondecreasing order. As before where is the multiplicity of the value in .
From the definition (1) we now have
(7) 
using Theorem 4. For each the final summation in (7) is , so the expectation becomes
from Corollary 1 and collecting factorial terms. The last term counts the number of ways that the elements of the array can be permuted without changing the array, so that and the result (6) follows.
6 Averagecase time complexity of Boson Sampling
The averagecase complexity of Algorithm A is the expected value of (5) when is drawn from the Haar random unitary distribution. We start by considering the term . Since the multiplicity array is an alternative representation of the multiset , we know from Theorem 3 that
is uniformly distributed over the set of all multiplicity arrays,
.Recall that or equivalently as can be shown with the usual “stars and bars” argument of Feller (1968), i.e. we place bars at locations in and stars at the remaining locations. The total number of such arrangements is . Adding further bars at each end, i.e. at locations and , the associated array is the count of stars between bars.
Lemma 2.
Suppose that is sampled uniformly from , the set of all possible multiplicity arrays then
(8) 
Proof.
To see this, start with the stars and bars arrangement for a particular array as above. Now consider adding a new bar between each existing neighbouring pair of bars. If there are stars between a pair of bars, the new bar can be located in of places, for example if there is one star the new bar can be before or after it. The number of arrangements of stars and bars for a given array is then .
Since is uniformly distributed on and ,
The result now follows because is the total number of arrangements of the new and old stars and bars, i.e. the number of ways of placing bars among integer locations. ∎
Corollary 2.
With the conditions of Lemma 2 and supposing that for some fixed value then
In particular when , and as
Proof.
Apply Stirling’s formula to the factorial terms in (8). ∎
Corollary 3.
Let a complex matrix with repeated rows having multiplicity where is uniformly distributed on then the expected running time of is of order
We now prepare to prove the generalisation of Theorem 1.
Theorem 5.
The averagecase time complexity of Boson Sampling is bounded above by a term of order
In particular when for some fixed value of this is of order
Proof.
Note that the first term in the time complexity of Algorithm A given in (5) is equivalent to the total time complexity for evaluating the set of permanents of , . In particular, evaluation of the permanent of the final matrix has time complexity where is the array of multiplicities in . From Lemma 2 this has a simple reduction in the average case, since is uniformly distributed on by Theorem 3.
At the intermediate stage , the matrix can be viewed as the final matrix in a Boson Sampling algorithm with reduced size, i.e. where columns are taken from rather than . Again the averagecase complexity has a simple form from Lemma 2 so the expected value of the first term in the time complexity of Algorithm A given in (5), averaging over , is
Incorporating the second term in (5) gives the complexity as claimed. Finally, Stirling’s formula gives the asymptotic form. ∎
References

Aaronson and Arkhipov (2011)
Aaronson, S. and Arkhipov, A. (2011).
The computational complexity of linear optics.
In STOC ’11: Proc. 43^{rd}
Annual ACM Symp. Theory of Computing
, pages 333–342.  Aaronson and Brod (2016) Aaronson, S. and Brod, D. J. (2016). Bosonsampling with lost photons. Physical Review A, 93(1):012335.
 Arkhipov and Kuperberg (2012) Arkhipov, A. and Kuperberg, G. (2012). The bosonic birthday paradox. Geometry & Topology Monographs, 18:1–7.
 Bentivegna et al. (2015) Bentivegna, M., Spagnolo, N., Vitelli, C., Flamini, F., Viggianiello, N., Latmiral, L., Mataloni, P., Brod, D. J., Galvão, E. F., Crespi, A., et al. (2015). Experimental scattershot boson sampling. Science advances, 1(3):e1400255.
 Broome et al. (2013) Broome, M. A., Fedrizzi, A., RahimiKeshari, S., Dove, J., Aaronson, S., Ralph, T. C., and White, A. G. (2013). Photonic boson sampling in a tunable circuit. Science, 339(6121):794–798.
 Chin and Huh (2018) Chin, S. and Huh, J. (2018). Generalized concurrence in boson sampling. Scientific reports, 8(1):1–9.
 Clifford and Clifford (2018) Clifford, P. and Clifford, R. (2018). The classical complexity of boson sampling. pages 146–155.
 Collins (2003) Collins, B. (2003). Moments and cumulants of polynomial random variables on unitary groups, the ItzyksonZuber integral, and free probability. International Mathematics Research Notices, 2003(17):953–982.
 Collins and Śniady (2006) Collins, B. and Śniady, P. (2006). Integration with respect to the Haar measure on unitary, orthogonal and symplectic group. Communications in Mathematical Physics, 264(3):773–795.
 Crespi et al. (2013) Crespi, A., Osellame, R., Ramponi, R., Brod, D. J., Galvão, E. F., Spagnolo, N., Vitelli, C., Maiorino, E., Mataloni, P., and Sciarrino, F. (2013). Integrated multimode interferometers with arbitrary designs for photonic boson sampling. Nature Photonics, 7(7):545–549.
 Dalzell et al. (2018) Dalzell, A. M., Harrow, A. W., Koh, D. E., and La Placa, R. L. (2018). How many qubits are needed for quantum computational supremacy? arXiv preprint arXiv:1805.05224.

Feller (1968)
Feller, W. (1968).
An introduction to probability theory and its applications.  Vol. 1
. Wiley.  Glynn (2010) Glynn, D. G. (2010). The permanent of a square matrix. European Journal of Combinatorics, 31(7):1887–1891.
 Grier and Schaeffer (2016) Grier, D. and Schaeffer, L. (2016). New hardness results for the permanent using linear optics. arXiv preprint arXiv:1610.04670.
 Guan (1998) Guan, D.J. (1998). Generalized Gray codes with applications. Proceedings of the National Science Council. ROC(A), 22(6):841–848.
 Harrow and Montanaro (2017) Harrow, A. W. and Montanaro, A. (2017). Quantum computational supremacy. Nature, 549(7671):203–209.
 Latmiral et al. (2016) Latmiral, L., Spagnolo, N., and Sciarrino, F. (2016). Towards quantum supremacy with lossy scattershot boson sampling. New Journal of Physics, 18(11).
 Lund et al. (2017) Lund, A., Bremner, M. J., and Ralph, T. (2017). Quantum sampling problems, BosonSampling and quantum supremacy. arXiv:1702.03061.
 Marcus and Minc (1965) Marcus, M. and Minc, H. (1965). Permanents. The American Mathematical Monthly, 72(6):577–591.
 Neville et al. (2017) Neville, A., Sparrow, C., Clifford, R., Johnston, E., Birchall, P. M., Montanaro, A., and Laing, A. (2017). Classical boson sampling algorithms with superior performance to nearterm experiments. Nature Physics, 13(12):1153–1157.
 Nijenhuis and Wilf (1978) Nijenhuis, A. and Wilf, H. S. (1978). Combinatorial algorithms: for computers and calculators. Academic press.
 Petz and Réffy (2004) Petz, D. and Réffy, J. (2004). On asymptotics of large Haar distributed unitary matrices. Periodica Mathematica Hungarica, 49(1):103–117.
 Ryser (1963) Ryser, H. J. (1963). Combinatorial Mathematics, volume 14 of Carus Mathematical Monographs. American Mathematical Soc.
 Shchesnovich (2013) Shchesnovich, V. (2013). Asymptotic evaluation of bosonic probability amplitudes in linear unitary networks in the case of large number of bosons. International Journal of Quantum Information, 11(05):1350045.
 Shchesnovich (2019) Shchesnovich, V. (2019). On the classical complexity of sampling from quantum interference of indistinguishable bosons. arXiv preprint arXiv:1904.02013v3.
 Spagnolo et al. (2014) Spagnolo, N., Vitelli, C., Bentivegna, M., Brod, D. J., Crespi, A., Flamini, F., Giacomini, S., Milani, G., Ramponi, R., Mataloni, P., et al. (2014). Experimental validation of photonic boson sampling. Nature Photonics, 8(8):615–620.
 Spring et al. (2013) Spring, J. B., Metcalf, B. J., Humphreys, P. C., Kolthammer, W. S., Jin, X.M., Barbieri, M., Datta, A., ThomasPeter, N., Langford, N. K., Kundys, D., et al. (2013). Boson sampling on a photonic chip. Science, 339(6121):798–801.
 Terhal and DiVincenzo (2004) Terhal, B. M. and DiVincenzo, D. P. (2004). Adaptive quantum computation, constant depth quantum circuits and arthurmerlin games. In Quantum Information and Computation, volume 4, pages 134–145.
 Tichy (2011) Tichy, M. C. (2011). Entanglement and Interference of Identical Particles. PhD thesis, Freiburg University.
 Tillmann et al. (2013) Tillmann, M., Dakić, B., Heilmann, R., Nolte, S., Szameit, A., and Walther, P. (2013). Experimental boson sampling. Nature Photonics, 7(7):540–544.
 Wang et al. (2016) Wang, H., He, Y., Li, Y.H., Su, Z.E., Li, B., Huang, H.L., Ding, X., Chen, M.C., Liu, C., Qin, J., Li, J.P., He, Y.M., Schneider, C., Kamp, M., Peng, C.Z., Hoefling, S., Lu, C.Y., and Pan, J.W. (2016). Multiphoton bosonsampling machines beating early classical computers. arXiv:1612.06956.
 Wang et al. (2019) Wang, H., Qin, J., Ding, X., Chen, M.C., Chen, S., You, X., He, Y.M., Jiang, X., Wang, Z., You, L., et al. (2019). Boson sampling with 20 input photons in 60mode interferometers at state spaces. arXiv:910.09930.