The main contribution of this paper is a new paradigm in which to approach the question: how many resources does a computation cost to perform?
Traditionally, in theoretical computer science, this question has been approached using abstract computational models—the Turing machine,-calculus, cellular automata, recursive functions, families of acyclic circuits, etc. These models typically abstract away the physical model of computation. While seemingly very different, all models have the following in common: a finite, non-empty set of states, and a set of transition functions that map between those states. A computation is then simply an ordered subset of these transitions functions that when applied consecutively map any state (input) to another one (output). Despite this seeming simplicity, these models have been powerful enough to provide an enormous set of results—from uncomputability, to complexity hierarchies. The limitation of this computing paradigm lies in the implicit assumption that the underlying physical model of computation is classical.
To remedy this, quantum models of computation have been introduced to either complement or replace the classical ones in algorithmic analysis. These quantum models—the quantum Turing machine deutsch97 , quantum cellular automata qca , quantum circuits, one-way model one-way , etc.—merely replace the set of of possible states mentioned above with an infinite one (countable or uncountable), and then set the transition functions to be a unitary operators (or quantum super-operators in the case of some models).
Here we take this progression one step further. We begin by acknowledging both that information is physical, and that computation is a physical process. The complexity of computation, that accounts for spatial and temporal resources is hence an accounting of physical resources required for a computation. Since time, energy and quantum correlations are intertwined in the so-called quantum speed limit, the proper accounting of the cost of computation should include these quantities. The quantum speed limit taddei2013quantum ; deffner2017quantum ; campaioli2018tightening states that the time to transform any quantum state to another state under a physical map is no smaller than , given by
Here is the Bures angle between the two states, is related to the average energy margolus1998maximum and
is related to the average standard deviation of energymandelstam1991uncertainty . Here is the instantaneous ground state of the time-dependant Hamiltonian and is the instantaneous standard deviation of the energy. As we will show below, holding the intrinsic resources spent to perform a computation constant and exploiting correlations rely on the quantum speed limit.
A physical system that stores information, and a physical system that carries out the computational process, must have Hamiltonians ascribed to them that account for these correlations. We hence make the following observation: every physical system has a Hamiltonian, and this Hamiltonian informs the correlations that decide the cost of computation. This leads us to what we consider the most fully fleshed out, yet still completely general model of computation. This model is presented graphically in Fig. 1 and presented in detail in the supplement. Several works on the thermodynamic cost of computation have also included the Hamiltonian to account for the energetic, entropic and correlation cost of computation del2011thermodynamic ; faist2015minimal ; faist2018fundamental . Previously free-energy changes, and correlations between the system and the environment have been discussed. However, the approach here of studying Hamiltonian correlations within the computer system itself is quite novel. In our proposal, the unitary problem is fixed, but the Hamiltonian paths are varied to exploit correlations. Beyond the addition of the Hamiltonians of every individual system, as previously discussed, we also see the addition of a battery. This allows us to simply hold the energetic resources constant while exploiting correlations to speedup computation.
While our computing machine is intrinsically quantum, and all computations are quantum evolutions in our model, we wish to discuss the particular case where our quantum computer performs a classical computation. By this we mean that the input and output of the computing machine are both classical states; the output is precisely that which would be output by a classical reversible circuit; the circuit has depth that grows asymptotically in the input as the time required for our computation; and, finally, the dynamics of the computer’s evolution closely mimic that of the circuit. A formal definition of a classical computation in our model is given in Def. 6. Next, we present our main result.
Coherent Speedup of Classical Computation
There is a well known tradeoff between the energy-per-second cost of a computation and the time-cost of a computation. To exemplify this tradeoff explicitely, consider the logical NOT gate. This can be implemented quantum mechanically by the application of the unitary operator
. The unitary operator in turn can be implemented by setting the Hamiltonian of the qubit system tofor a time .
If instead we use the Hamiltonian , then we cut the time needed to complete the evolution of a gate in half. In general the time scales in inverse proportion to where we define
are the maximum and minimum eigenvalues ofrespectively.
The function is particularly useful in that it allows us to calculate the actual energy of the Hamiltonian. The most common method for doing so is to use the norm of the Hamiltonian . However, the origin of the energy scale has no physical meaning. Hence, the origin (mimimum eigenvalue) has to be set to zero for the norm to be meaningful.
While the function is not actually a norm, it is often used as one as it gives a meaningful number in a more general context that the Hamiltonian norm. For instance, in metrology Kok2010 , it is used as part of the calculation of the generalised Heisenberg limit that bounds the speed of any quantum metrological procedure.
In the context of computation we can see that by increasing the value of where is the Hamiltonian driving the evolution of a quantum register during the implementation of a quantum gate, any computation can be sped up—almost 111As S. Lloyd points out, this speed-up is not quite arbitrarylloyd . If one increases the instantaenous energy of the system to high, one runs the risk of creating a black hole where one’s computer used to exist. arbitrarily so—at the trade-off cost of increasing instantaneous energy of the system, or energy-per-second.
Hence, in order for time complexity of algorithms to remain meaningful within our model we must set a limit , where is constant that can scale at most linearly in the size of the input .
Having bounded the norm of the Hamiltonian it may seem that our model offers no benefits over well-established models, like the circuit model, that more simply set a unitary gate-set, and quantify the use of said gates. A central advantage of our model is that it allows us to discern, and take advantage of, resources that are not readily discernable in a model based on unitary operators. One such resourse is quantum correlations. We use the NOT gate to exemplify how to exploit this resource.
If one wishes to speed-up a single NOT gate there is nothing one can do beyond increasing the energy of the system (which we’ve established now we cannot do). Suppose, however, we wish to implement two such gates, one each on two separate qubits. The traditional way of implementing these two gates in parallel is to apply the unitary to each individual qubit (see Fig. 2). Collectively the system’s Hamiltonian is set to , and evolved for time .
Another way to implement this joint operation is coherently. Instead of the Hamiltonian used above we use . Note that . So, in both cases the system needs to be evolved under the appropriate Hamiltonian for time . However is a resource for state transformations (and hence computation), since the bound clearly depends on this quantity. Hence to fairly compare the parallel and the coherent strategy is to fix the resources, namely p(H). Since whereas , we can scale , and stay within the same norm limit as the parallel implementation . Therefore, implements both NOT gates in half the time that requires. This argument is at the heart of the quantum advantage in energy storage Binder2015 ; Campaioli2017 .
We now extend this argument to implementing the Tofolli gate in a coherently parallel fashion. Since the Toffoli gate is universal for reversible classical computation, this means that arbitrary classical computation can be made coherently parallel using the arguments described here. Note that any classical computation can be done reversibly with the addition of the necessary ancillary bitsbennett1 ; bennett2 . This argument can be extended to any number of parallel gates, and gives an fold efficiency advantage for arbitrary computations.
Any classical reversible computation on bits can be implemented as a circuit consisting of layers each consisting of Toffoli gates. At each layer (time-step) one may choose to implement the Toffoli gates sequentially, in parallel, or coherently together using the method we described above. Using this latter method allows us to significantly reduce the time cost of the computation.
The time savings for each Toffoli gate of using the method we propose here is proportional to the total number of gates being implemented. Explicitely, if there are Toffoli gates in the current layer, then each Toffoli gate requires times the time required to implement the Toffoli gate in a standard fashion. This in turn implies that the time cost of implementing each layer (using the above method) is constant regardless of the number of Toffoli gates in said layer. This has the net effect of de facto reducing the computational complexity of any reversible circuit to (below) its depth complexity.
This advantage is maximised in the case of highly parallelisable reversible circuits (where at every depth). In such cases the above method has the effect of reducing the time required to run the algorithm by two polynomial orders (a algorithm can be performed with resources, an algorithm with resources etc.)
Any irreversible classical circuit over the gate set AND, NOT can be transformed into a reversible one over the gate set Toffoli using one of many techniques bennett1 ; bennett2 ; amy2017 . Hence, the above technique can be applied to any classical algorithm. However, to show that doing so would yield any benefit one needs to show that the conversion from a classical irreversible circuit to a reversible one does not increase the complexity of the circuit in a such way that it overwhelms any advantages gained from our coherent parralelisation method. Fortunately, Bennett’s original method—which consists of replacing all AND gates with Toffoli ones (and introdcing the appropriate necessary ancillary bits) neither increases the computational time complexity, nor does it change the computational depth complexity 222While it does increase the space complexity, this is irrelevant to our analysis here.. Hence, coherent parralelisation can be used to reduce the computational cost of any classical algorithm to below its depth complexity.
Coherent parallelisation, while being a computation-speedup technique, bears more resemblance to efficiency increasing methods in other fields—like metrology—rather than quantum algorithms. First, coherent parallelisation exploits the same type of correlations in the Hamiltonian that Heisenberg-limited metrology and quantum enhanced charging do. As such, it is an intrinsically quantum effect with no classical analogue. Second, this is quite different from the advantage that well-known quantum algorithms have over classical. Algorithms like Shor’s or Grover’s display an advantage over classical counterparts when solving particular problems. Furthermore, an algorithm like Shor’s is different, logically and dynamically, from any classical algorithm that attempts to solve the same problem.
By comparison, the speed-up method we showcased in the previous section can be used to accelerate any possible classical computation. This speed-up is such that it can reduce the cost of solving a problem by up to two polynomial orders. This is clearly less than the advantage that Shor’s algorithm has over classical factorisation algorithms. However, coherent parallelisation is a method that can be applied much more generally. It is much more general than even Grover’s search algorithm. While, technically, Grover’s algorithm can be used to solve any problem within the class NP, it only gives an advantage for problems that do not (currently) have an algorithm that solves the problem more efficiently than brute-force search.
By contrast, coherent parallelisation provides an advantage on quite literally any problem. The proportion of this advantage grows the more parallelisable the problem is. The maximum advantage is achieved for problems that can be efficiently solved using parallel computation using low (logarithmic) depth circuits—i.e. problems in the class NC. Any such problem can be solved with coherent parallelisation in logarithmic time. Hence coherent parallelisation is particularly well-suited for speeding up ubiquitous mathematical tasks such as matrix multiplication and speed up physically important tasks such as Monte Carlo
simulations, genetic algorithms and many particle-physics simulations. Other computations that are particularly well-suited for coherent parallelisation that are worth mentioning due to their real-world applications include machine-learning tasks like hyperparameter grid search and important cryptographic tasks such as proof-of-work in crypto-currencies and blockchain technologies.
On the opposite front, it is important to differentiate coherent parallisation from standard parallel computation. First note that standard parallelisation can be (and often is) used to speed up a computation. It does not, however, reduce the computational cost (complexity) of a computation. Whether one peforms a series of gates sequentially, or in parallel, one must pay the (energetic) costs of said gates. Further, the time to perform each gate remains unchanged.
By contrast coherent parallelisation does reduce the time to implement each gate by a factor proportional to the number of gates coherently parallelised.
In other words, both coherent and traditional parallisation speed up a computation by a factor of where is the (minimum) number of gates that can be performed in parallel at each time step due to all these gates being performed simultaneously. However, coherent parallelisation further reduces the time to perform each gate by a factor of . This translates into a further speed-up, over and above parallel computation, by a factor of . This gives coherent parallelisation a time speed-up factor over sequential algorithms of .
A possible objection to the above analysis would be that we have changed the rules of the game to our advantage by changing the gate sets. In computational compexity theory and algorithm analysis a standard fixed gate set (usually AND, NOT ) is used across the board when comparing algorithms. In discussing quantum computation this has always been a shorthand, however, and not a very accurate one at that. While computational cost accounting done using counting up the number of gates is certainly convenient it often hides the true cost of computation. In reality, not all gates have the same cost—either energy- or time-wise. A more accurate resource accounting is—and has always been—to measure the total energy needed to perform a computation. According to this metric, coherent parallelisation performs admirably.
It is important to note that the energy costs of a computation cannot be properly gleaned—and therefore the advantages of a method such as coherent parallelisation ever be recognised—by looking solely at a quantum-circuit (or similar) model of computation. To achieve that one needs to look directly at the physical properties of the processes performing the computation, i.e. the Hamiltonians themselves. We close by arguing that while coherent parallelisation has strong practical and theoretical implications on its own, it is perhaps most important in its role as a showcase for this approach to studying computation. We believe that in the long run it is this new approach to studying computation, with the models and frameworks that include correlations and physical Hamiltonians into computational complexity theory, that will be the most fruitful contribution of this work.
SV acknowledges support from an IITB-IRCC grant number 16IRCCSG019 and by the National Research Foundation, Prime Minister’s Office, Singapore under its Competitive Research Programme (CRP Award No. NRF-CRP14-2014- 02).
Dorner, U. et al.
Optimal quantum phase estimation.Physical review letters 102, 040403 (2009).
- (2) Giovannetti, V., Lloyd, S. & Maccone, L. Quantum metrology. Physical review letters 96, 010401 (2006).
- (3) Zwierz, M., Pérez-Delgado, C. A. & Kok, P. General optimality of the heisenberg limit for quantum metrology. Phys. Rev. Lett. 105, 180402 (2010).
- (4) Giovannetti, V., Lloyd, S. & Maccone, L. Advances in quantum metrology. Nature photonics 5, 222 (2011).
- (5) Degen, C. L., Reinhard, F. & Cappellaro, P. Quantum sensing. Reviews of modern physics 89, 035002 (2017).
- (6) Pérez-Delgado, C. A., Pearce, M. E. & Kok, P. Fundamental limits of classical and quantum imaging. Phys. Rev. Lett. 109, 123601 (2012).
- (7) Brida, G., Genovese, M. & Berchera, I. R. Experimental realization of sub-shot-noise quantum imaging. Nature Photonics 4, 227 (2010).
- (8) Binder, F. C., Vinjanampathy, S., Modi, K. & Goold, J. Quantacell: powerful charging of quantum batteries 49, 143001 (2015). eprint 1505.07835.
- (9) Campaioli, F. et al. Enhancing the Charging Power of Quantum Batteries. Physical Review Letters 118 (2017). eprint 1612.04991.
- (10) Ferraro, D., Campisi, M., Andolina, G. M., Pellegrini, V. & Polini, M. High-power collective charging of a solid-state quantum battery. Physical review letters 120, 117702 (2018).
- (11) Le, T. P., Levinsen, J., Modi, K., Parish, M. M. & Pollock, F. A. Spin-chain model of a many-body quantum battery. Physical Review A 97, 022106 (2018).
- (12) Deutsch, D. Quantum theory, the church–turing principle and the universal quantum computer. Proc. R. Soc. A 400, 97–117 (1985).
- (13) Pérez-Delgado, C. A. & Cheung, D. Local unitary quantum cellular automata. Phys. Rev. A 76, 032320 (2007).
- (14) Raussendorf, R. & Briegel, H. J. A one-way quantum computer. Phys. Rev. Lett. 86, 5188–5191 (2001).
- (15) Taddei, M. M., Escher, B. M., Davidovich, L. & de Matos Filho, R. L. Quantum speed limit for physical processes. Physical review letters 110, 050402 (2013).
- (16) Deffner, S. & Campbell, S. Quantum speed limits: from heisenberg?s uncertainty principle to optimal quantum control. Journal of Physics A: Mathematical and Theoretical 50, 453001 (2017).
- (17) Campaioli, F., Pollock, F. A., Binder, F. C. & Modi, K. Tightening quantum speed limits for almost all states. Physical review letters 120, 060409 (2018).
- (18) Margolus, N. & Levitin, L. B. The maximum speed of dynamical evolution. In Physica D, vol. 120, 188–195 (Elsevier Science Publishers BV, 1998).
- (19) Mandelstam, L. & Tamm, I. The uncertainty relation between energy and time in non-relativistic quantum mechanics. In Selected Papers, 115–123 (Springer, 1991).
- (20) Del Rio, L., Åberg, J., Renner, R., Dahlsten, O. & Vedral, V. The thermodynamic meaning of negative entropy. Nature 474, 61 (2011).
- (21) Faist, P., Dupuis, F., Oppenheim, J. & Renner, R. The minimal work cost of information processing. Nature communications 6, 7669 (2015).
- (22) Faist, P. & Renner, R. Fundamental work cost of quantum processes. Physical Review X 8, 021011 (2018).
- (23) Bennett, C. H. Logical reversibility of computation. IBM journal of Research and Development 17, 525–532 (1973).
- (24) Bennett, C. H. Time/space trade-offs for reversible computation. SIAM Journal on Computing 18, 766–776 (1989).
- (25) Amy, M., Roetteler, M. & Svore, K. M. Verified compilation of space-efficient reversible circuits. In International Conference on Computer Aided Verification, 3–21 (Springer, 2017).
- (26) Lloyd, S. Ultimate physical limits to computation. Nature 406, 1047 (2000).
We begin with a formal definition of the function that we have used in the main body of the paper.
Let be any Hermitian operator. Then
where , are the maximum and minimum eigenvalues of respectively.
Next is a discussion of the model of computation we are presenting for the first time in this paper.
Model of Computation.— We start this section with a formal definition of our computational model:
Definition 2 (Computing Machine).
A Computing Machine (CM) consists of a closed physical system with three subsystems : the battery, control, and input/output systems respectively.
consists of an unbounded number of two-dimensional subsystems each with Hamiltonian .
consists of a countably infinite dimensional system with Hamiltonian that can be arbitrarily chosen. All but a finite subsystem of dimension is set to the ground state of at the beginning of this computation. The state () of the subystem at the start (end) of the computation is called the input (output).
exchanges energy with the battery subsystem to power the application of a Hamiltonian for a time to the input/output system during computation. This can be done with standard energy conserving unitaries. This Hamiltonian is such that
where is the size of the input and is a constant that at most scales lineraly in .
The purpose of our model of computation is to act as the most general abstraction of natural process that can perform computation, without ignoring any of the necessary physical properties such a process.
The purpose of the battery subsystem is to account for the energy required to perform the computation. In order to be able to compare meaningfully different computations, a standard battery Hamiltonian is chosen for every possible computer and computation performed.
The input/output subsystem is how the computer communicates with the external world, and meaningfully perform computation. The subsystem is ininitalised to the input state before computation. At the end of the computation the subsystem should then hold the output state. An arbitrary Hamiltonian for this system is allowed in order to be able to model—and quantify the energetic resources in—computation on different information carriers. These information carriers can be anything from ions in a trap or potential well, to nuclei, to anyons, depending on the actual implementation of the quantum computer and they all different real-world Hamiltonians.
While we leave the possibility open in our model to any possible input/output system, we will be particularly interested in (and restrict further discussion to) systems with a homogenous repeating structure, e.g. spin particles each with Hamiltonian and pairwise Ising interaction.
Finally, the sole purpose of the control subsystem is to provide a locus for the computation itself. It mediates between the battery and the input/output system, and peforms the computation itself by drawing power from the former, and applying an external Hamiltonian to the later. This Hamiltonian is time dependant, and arbitrarily chosen based on the computation to be performed. As mentioned in the main text, in order to maintain the meaningfulness of algorithmic time complexity within our model we must bound from above. Our bound is dependant on to allow for parallel computation (performing multiple gates on different qubits at the same time). If we further limit to be a constant independent of we can define a sequential computing machine. In this paper we focus on the more general (parallel) model.
Tied to computation are the concepts problem and algorithm. In theoretical computer science it is common to focus attention to decision problems, and algorithms that solve these. This is inadaquate for our needs as we wish to explore the full gamut of possibilities with respect to transformations on quantum states. Hence we set the following definitions:
Definition 3 (Problem).
Let be the set of all density operators. A problem is a function
such that for every the dimensionality of the support of is the same as that of .
Clearly, not all problems are solvable (at least by a quantum computer). For those problems that are solvable the function can be represented as a unitary operator . We call these problems unitary.
Definition 4 (Algorithm).
Given a unitary problem an algorithm that solves the problem is a function
where is the set of all time-dependant Hamiltonian operators. If represents the input size then
is an ordered pair of a time-dependant Hamtiltonianon qubits and a time such that
where is the usual time-ordering operator.
When the algorithm is clear we will use the following shorthand notation: and are the appropriate Hamiltonian and time required for an input of size . Similarly . Further, it will be convenient to define where to be the partial run of the computation for time .
While every computation is a quantum process in our model, we can meaningfully isolate the cases where these evolutions describe a classical, reversible computation. To do so, we must first define a computation basis
Definition 5 (Computational basis).
Given a computing machine a computational basis for it is a basis for (a finite-dimensinoal subspace of) its input/output system such that for any finite
There are many ways (computational models) to describe classical computations. Here we use the well understood standard circuit model. We understand a uniform family of reversible circuits to consist of circuits consisting of only Toffoli gates, each acting on bits of input. Let be the circuit depth of . For every let be the result of running the circuit on the binary representation of as input. Further, for any let be the result of running the circuit truncated at depth , on input .
Also when discussing a particular algorithm described as a family of circuits, we will use , and to refer to its circuit-cost complexity and circuit-depth complexity respectively.
Definition 6 (Reversible Classical Computation).
Let be any computational basis. Then a classical computation is one where
For every , is such that for there exists a such that
There exists a uniform family of classical reversible circuits such that:
For every and for every input
The following holds:
For every , there exists a discrete set of time points , such that , and for every input , for every depth :
It is worth unpacking the above definition. Part 1 simply states that the computation maps computational basis states to computational basis states. Part 2 states that there exists a classical reversible circuit such that a) the output is the same for any particular input, b) the depth complexity of the circuit scales asymptotically in exactly the same as the time required to run the computation in our machine and finally, c) there exists discrete time points during the computation at which the state of the input/ouput system are computational basis states that correspond exactly to the binary state the circuit would be in if run to the appropriate depth. It should be immediately clear from the above that the computing machine model defined here is a universal model of computation.
Coherent Parallelisation.— We now focus on our method for increasing the efficiency/speed of arbitrary classical computations.
Consider a quantum system with
identical sub-systems (qubits, qudits or the tensor product thereof). Let, for any Hermitian and/or unitary operator to mean applied to the subystem. Formally:
For any Hermitian, unitary operator we define
We then have the following result.
Let be a Hermitian, unitary operator. Then, for any non-zero integer :
Since is both Hermitian and unitary, its only possible eigenvalues are . So either , or . If , then the lemma follows trivially. Therefore, lets assume . Let be valued eigenkets respectively of . Then
To complete the proof we must show that are the maximum and minimum valued (respectively) eigenkets of both and .
We show that the largest eigenvalue of is
. We proceed by contradiction. Assume there exists a vectorsuch that
where and , and that this is the largest valued eigenket of . Given that is an eigenket of it must be that it may be written as
where each ket is an eigenket of . And further,
where in the last equality we used the facts that is an eigenket of , and that is both Hermitian and unitary. From here it follows that , which is a contradiction. An identical argument can be used to show that is the minimum eigenvalue of , and similar arguments can be used that the are the maximum/minimum eigenvalues of . ∎
The following theorem follows directly from the previous lemma, and our computing machine definition.
Theorem 1 (Coherent parallelisation).
Let be any Hermitian unitary gate acting on a -dimensional system or qudit. Implementing gates in parallel using a standard parallel computation implementation is times slower than using a coherent parallelisation approach .
Without loss of generality let the bound . Then, in order to use the standard parallelisation method within the bound set, one must use a normalised version of the parallel Hamiltonian . On the other hand to implement the gates using coherent parallelisatoin one may use standard coherent parallelisation Hamiltonian as defined above, since it is already normalised to . Then , as required. However,
Which shows that using the Hamiltonian one can implement the desired gate a factor of times faster than using . ∎
Theorem 2 (Coherent parallelisation of reversible circuits).
Let be a uniform family of reversible circuits, and let be the implementation of said circuit as a computing machine algorithm and is the time required to run on an input of size of . The same computation can be performed using coherent parallelisation in time where .
First we note that the average number of gates in at each depth is given by . Hence, by Thm. 1 the implementation of the gates of at depth can be sped up on average by a factor of using coherent parallelisation over a standard implementation. Taking the behaviour at the asymptotic limit as gives us the desired result. ∎
Note that in the previous theorem we’re comparing a coherent parallelisation implementation to a standard computing machine implementation of a classical reversible circuit. However, this latter implementation is already parallel (all gates at any depth are taken to be implemented at once). Obviously, a parallel implementation has a speed factor advantage of over a sequential implementation. We’ve hence proven the following corollary.
Let be a uniform family of reversible circuits. The same computation can be performed using coherent parallelisation in time where .
We conclude with the following result.
Theorem 3 (Coherent parallelisation of classical circuits).
Let be a uniform family of classical circuits over the universal gate set NAND. The same computation can be performed using coherent parallelisation in time where .
For this proof we first convert to a reversible family of circuits that has both the same depth- and circuit-complexity, and then simply apply Corollary 1. For the first step we use a result by Bennet bennett1 ; bennett2 that states that any irreversible circuit family with space complexity , circuit depth complexity and circuit complexity can be perfectly simulated using a reversible circuit with space complexity , circuit depth complexity and circuit complexity . ∎
It is worth noting that there are many methods to convert an irreversible circuit into a reversible circuit all of which have a space/depth complexity tradeoff. For our purposes, Bennet’s method is optimal as it allows us to reach the the theoretical optimal time performance for coherent parallelisation. For many real-world applications it may be beneficial to consider newer irreversible-to-reversible transformation methods amy2017 .