Simulation of Quantum Circuits via Stabilizer Frames

by   Héctor J. García, et al.
University of Michigan

Generic quantum-circuit simulation appears intractable for conventional computers and may be unnecessary because useful quantum circuits exhibit significant structure that can be exploited during simulation. For example, Gottesman and Knill identified an important subclass, called stabilizer circuits, which can be simulated efficiently using group-theory techniques and insights from quantum physics. Realistic circuits enriched with quantum error-correcting codes and fault-tolerant procedures are dominated by stabilizer subcircuits and contain a relatively small number of non-Clifford components. Therefore, we develop new data structures and algorithms that facilitate parallel simulation of such circuits. Stabilizer frames offer more compact storage than previous approaches but require more sophisticated bookkeeping. Our implementation, called Quipu, simulates certain quantum arithmetic circuits (e.g., reversible ripple-carry adders) in polynomial time and space for equal superpositions of n-qubits. On such instances, known linear-algebraic simulation techniques, such as the (state-of-the-art) BDD-based simulator QuIDDPro, take exponential time. We simulate quantum Fourier transform and quantum fault-tolerant circuits using Quipu, and the results demonstrate that our stabilizer-based technique empirically outperforms QuIDDPro in all cases. While previous high-performance, structure-aware simulations of quantum circuits were difficult to parallelize, we demonstrate that Quipu can be parallelized with a nontrivial computational speedup.



page 1

page 2

page 3

page 4


Faster Schrödinger-style simulation of quantum circuits

Recent demonstrations of superconducting quantum computers by Google and...

An Algebraic Quantum Circuit Compression Algorithm for Hamiltonian Simulation

Quantum computing is a promising technology that harnesses the peculiari...

Bit-Slicing the Hilbert Space: Scaling Up Accurate Quantum Circuit Simulation to a New Level

Quantum computing is greatly advanced in recent years and is expected to...

A Tensor Network based Decision Diagram for Representation of Quantum Circuits

Tensor networks have been successfully applied in simulation of quantum ...

Cache Blocking Technique to Large Scale Quantum Computing Simulation on Supercomputers

Classical computers require large memory resources and computational pow...

Finding fault-tolerant Clifford circuits using satisfiability modulo theories solvers and decoding merged color-surface codes

Universal fault-tolerant quantum computers will require the use of effic...

Intel Quantum Simulator: A cloud-ready high-performance simulator of quantum circuits

Classical simulation of quantum computers will continue to play an essen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Quantum information processing manipulates quantum states rather than conventional - bits. It has been demonstrated with a variety of physical technologies (NMR, ion traps, Josephson junctions in superconductors, optics) and used in recently developed commercial products. Examples of such products include MagiQ’s quantum key distribution system and ID-Quantique’s quantum random number generator. Shor’s factoring algorithm [24] and Grover’s search algorithm [13] apply the principles of quantum information to carry out computation asymptotically more efficiently than conventional computers. These developments fueled research efforts to design, build and program scalable quantum computers. Due to the high volatility of quantum information, quantum error-correcting codes (QECC) and effective fault-tolerant (FT) architectures are necessary to build reliable quantum computers. For instance, the work in [22] describes practical FT architectures for quantum computers, and [15] explores architectures for constructing reliable communication channels via distribution of high-fidelity EPR pairs in a large quantum computer. Most quantum algorithms are described in terms of quantum circuits and, just like conventional digital circuits, require functional simulation to determine the best FT design choices given limited resources. In particular, high-performance simulation is a key component in quantum design flows [25]

that facilitates analysis of trade-offs between performance and accuracy. Simulating quantum circuits on a conventional computer is a difficult problem. The matrices representing quantum gates, and the vectors that model quantum states grow exponentially with an increase in the number of

qubits – the quantum analogue of the classical bit. Several software packages have been developed for quantum-circuit simulation including Oemer’s Quantum Computation Language (QCL) [21] and Viamontes’ Quantum Information Decision Diagrams (QuIDD) implemented in the QuIDDPro package [26]. While QCL simulates circuits directly using state vectors, QuIDDPro uses a variant of binary decision diagrams to store state vectors more compactly in some cases. Since the state-vector representation requires excessive computational resources in general, simulation-based reliability studies (e.g. fault-injection analysis) of quantum FT architectures using general-purpose simulators has been limited to small quantum circuits [5]. Therefore, designing fast simulation techniques that target quantum FT circuits facilitates more robust reliability analysis of larger quantum circuits.

Fig. 1: Clifford gates: Hadamard (H), Phase (P) and controlled-NOT (CNOT).

Stabilizer circuits and states. Gottesman [12] and Knill identified an important subclass of quantum circuits, called stabilizer circuits, which can be simulated efficiently on classical computers. Stabilizer circuits are exclusively composed of Clifford gates – Hadamard, Phase and controlled-NOT gates (Figure 1) followed by one-qubit measurements in the computational basis. Such circuits are applied to a computational basis state (usually ) and produce output states known as stabilizer states. Because of their extensive applications in QECC and FT architectures, stabilizer circuits have been studied heavily [1, 12]. Stabilizer circuits can be simulated in polynomial-time by keeping track of the Pauli operators that stabilize111An operator is said to stabilize a state iff . the quantum state. Such stabilizer operators are maintained during simulation and uniquely represent stabilizer states up to an unobservable global phase.222According to quantum physics, the global phase of a quantum state is unobservable and does not need to be simulated. Thus, this technique offers an exponential improvement over the computational resources needed to simulate stabilizer circuits using vector-based representations.

Aaronson and Gottesman [1] proposed an improved technique that uses a bit-vector representation to simulate stabilizer circuits. Aaronson implemented this simulation approach in his CHP software package. Compared to other vector-based simulators (QuIDDPro, QCL) the technique in [1] does not maintain the global phase of a state and simulates each Clifford gate in time using space. The overall runtime of CHP is dominated by measurements, which require time to simulate.

Stabilizer-based simulation of generic circuits. We propose a generalization of the stabilizer formalism that admits simulation of non-Clifford gates such as Toffoli333The Toffoli gate is a -bit gate that maps (, , ) to (, , ). gates. This line of research was first outlined in [1], where the authors describe a stabilizer-based representation that stores an arbitrary quantum state as a sum of density-matrix444Density matrices are self-adjoint positive-semidefinite matrices of trace , that describe the statistical state of a quantum system [19]. terms. In contrast, we store arbitrary states as superpositions555 A superposition is a norm- linear combination of terms. of pure stabilizer states. Such superpositions are stored more compactly than the approach from [1], although we do not handle mixed stabilizer states. The key obstacle to the more efficient pure-state approach has been the need to maintain the global phase of each stabilizer state in a superposition, where such phases become relative. We develop a new algorithm to overcome this obstacle. We store stabilizer-state superpositions compactly using our proposed stabilizer frame data structure. To speed up relevant algorithms, we

store generator sets for each stabilizer frame in row-echelon form

to avoid expensive Gaussian elimination during simulation. The main advantages of using stabilizer-state superpositions to simulate quantum circuits are:

  • Stabilizer subcircuits are simulated with high efficiency.

  • Superpositions can be restructured and compressed on the fly during simulation to reduce resource requirements.

  • Operations performed on such superpositions can be computed in parallel and lend themselves to distributed or asynchronous processing.

Our stabilizer-based technique simulates certain quantum arithmetic circuits in polynomial time and space for input states consisting of equal superpositions of computational-basis states. On such instances, well-known generic simulation techniques take exponential time. We simulate various quantum Fourier transform and quantum fault-tolerant circuits, and the results demonstrate that our data structure leads to orders-of-magnitude improvement in runtime and memory as compared to state-of-the-art simulators.

In the remaining part of this document, we assume at least a superficial familiarity with quantum computing. Section 2 describes key concepts related to quantum-circuit simulation and the stabilizer formalism. In Section 3, we introduce stabilizer frames and describe relevant algorithms. Section 4 describes in detail our simulation flow implemented in Quipu, and Section 5 discusses a parallel implementation of our technique. In Section 6, we describe empirical validation of Quipu and in single- and multi-threaded variants, as well as comparisons with state-of-the-art simulators. Section 7 closes with concluding remarks.

2 Background and Previous Work

Quantum information processes, including quantum algorithms, are often modeled using quantum circuits and, just like conventional digital circuits, are represented by diagrams [19, 26]. Quantum circuits are sequences of gate operations that act on some register of qubits – the basic unit of information in a quantum system. The quantum state of a single qubit is described by a two-dimensional complex-valued vector. In contrast to classical bits, qubits can be in a superposition of the and states. Formally, , where and are the two-dimensional computational basis states and are probability amplitudes that satisfy . An

-qubit register is the tensor product of

single qubits and thus is modeled by a complex vector , where each is a binary string representing the value of each basis state. Furthermore, satisfies . Each gate operation or quantum gate is a unitary matrix that operates on a small subset of the qubits in a register. For example, the quantum analogue of a NOT gate is the operator ,

Similarly, the two-qubit CNOT operator flips the second qubit (target) iff the first qubit (control) is set to , e.g.,

Another operator of particular importance is the Hadamard (H), which is frequently used to put a qubit in a superposition of computational-basis states, e.g.,

Note that the H gate generates unbiased superpositions in the sense that the squares of the absolute value of the amplitudes are equal.

The dynamics involved in observing a quantum state are described by non-unitary measurement operators [19, Section 2.2.3]. There are different types of quantum measurements, but the type most pertinent to our discussion comprises projective measurements in the computational basis, i.e., measurements with respect to the or basis states. The corresponding measurement operators are and , respectively. The probability of obtaining outcome on the qubit of state is given by the inner product , where is the conjugate transpose of . For example, the probability of obtaining upon measuring is

Cofactors of quantum states. The output states obtained after performing computational-basis measurements are called cofactors, and are states of the form and . These states are orthogonal to each other and add up to the original state. The norms of cofactors and the original state are subject to the Pythagorean theorem. We denote the - and -cofactor by and , respectively, where is the index of the measured qubit. One can also consider iterated cofactors, such as double cofactors , , and . Cofactoring with respect to all qubits produces amplitudes of individual basis vectors. Readers familiar with cofactors of Boolean functions can use intuition from logic optimization and Boolean function theory.

2.1 Quantum circuits and simulation

To simulate a quantum circuit , we first initialize the quantum system to some desired state (usually a basis state). can be represented using a fixed-size data structure (e.g., an array of complex numbers) or a variable-size data structure (e.g., algebraic decision diagram). We then track the evolution of via its internal representation as the gates in are applied one at a time, eventually producing the output state [1, 19, 26]. Most quantum-circuit simulators [8, 20, 21, 26] support some form of the linear-algebraic operations described earlier. The drawback of such simulators is that their runtime and memory requirements grows exponentially in the number of qubits. This holds true not only in the worst case but also in practical applications involving quantum arithmetic and quantum FT circuits.

Gottesman developed a simulation method involving the Heisenberg model [12] often used by physicists to describe atomic-scale phenomena. In this model, one keeps track of the symmetries of an object rather than represent the object explicitly. In the context of quantum-circuit simulation, this model represents quantum states by their symmetries, rather than complex-valued vectors and amplitudes. The symmetries are operators for which these states are

-eigenvectors. Algebraically, symmetries form

group structures, which can be specified compactly by group generators [14].

2.2 The stabilizer formalism

A unitary operator stabilizes a state iff is a –eigenvector of , i.e., . We are interested in operators derived from the Pauli matrices: , and the identity . The one-qubit states stabilized by the Pauli matrices are:

: :
: :
: :
TABLE I: Multiplication table for Pauli matrices. Shaded cells indicate anticommuting products.

Observe that stabilizes all states and does not stabilize any state. Thus, the entangled state is stabilized by the Pauli operators , , and . As shown in Table I, it turns out that the Pauli matrices along with and the multiplicative factors , , form a closed group under matrix multiplication [19]. Formally, the Pauli group on qubits consists of the -fold tensor product of Pauli matrices, such that and . For brevity, the tensor-product symbol is often omitted so that is denoted by a string of , , and characters or Pauli literals, and a separate integer value for the phase . This string-integer pair representation allows us to compute the product of Pauli operators without explicitly computing the tensor products,666This holds true due to the identity: . e.g., . Since , can have at most irredundant generators [19]. The key idea behind the stabilizer formalism is to represent an -qubit quantum state by its stabilizer group – the subgroup of that stabilizes .

Theorem 1

For an -qubit pure state and , . If , is specified uniquely by and is called a stabilizer state.

(i) To prove that is commutative, let such that . If and anticommute, - - - -. Thus, and cannot both be elements of .

(ii) To prove that every element of is of degree , let such that . Observe that for . Since , we obtain and .

(iii) From group theory, a finite Abelian group with identity element such that for every element in the group must be .

(iv) We now prove that . First note that each independent generator imposes the linear constraint on the -dimensional vector space. The subspace of vectors that satisfy such a constraint has dimension , or half the space. Let be the set of generators for . We add independent generators to one by one and impose their linear constraints, to limit to the shared -eigenvector. Thus the size of is at most . In the case , the independent generators reduce the subspace of possible states to dimension one. Thus, is uniquely specified.

The proof of Theorem 1 shows that is specified by only irredundant stabilizer generators. Therefore, an arbitrary -qubit stabilizer state can be represented by a stabilizer matrix whose rows represent a set of generators for . (Hence we use the terms generator set and stabilizer matrix interchangeably.) Since each is a string of Pauli literals, the size of the matrix is . The fact that implies that the leading phase of can only be  and not .777Suppose the phase of is , then - which is not possible since - does not stabilize any state. Therefore, we store the phases of each separately using a binary vector of size .

The storage cost for is , which is an exponential improvement over the cost often encountered in vector-based representations.

Example 1. The state is uniquely specified by any of the following matrices: , , . One obtains from by left-multiplying the second row by the first. Similarly, is obtained from or via row multiplication. Observe that multiplying any row by itself yields , which stabilizes . However, cannot be used as a generator because it is redundant and carries no information about the structure of . This holds true in general for of any size.

Theorem 1 suggests that Pauli literals can be represented using two bits, e.g., , , and . Therefore, a stabilizer matrix can be encoded using an binary matrix or tableau. This approach induces a linear map because vector addition in is equivalent to multiplication of Pauli operators up to a global phase. The tableau implementation of the stabilizer formalism is covered in [1, 19].

Observation 1. Consider a stabilizer state represented by a set of generators of its stabilizer group . Recall from the proof of Theorem 1 that, since , each generator imposes a linear constraint on . Therefore, the set of generators can be viewed as a system of linear equations whose solution yields the (for some between and ) non-zero computational-basis amplitudes that make up . Thus, one needs to perform Gaussian elimination to obtain such basis amplitudes from a generator set.  
Canonical stabilizer matrices. Observe from Example 1 that, although stabilizer states are uniquely determined by their stabilizer group, the set of generators may be selected in different ways. Any stabilizer matrix can be rearranged by applying sequences of elementary row operations in order to obtain the row-reduced echelon form structure depicted in Figure 2b. This particular stabilizer-matrix structure defines a canonical representation for stabilizer states [10, 12]. The elementary row operations that can be performed on a stabilizer matrix are transposition, which swaps two rows of the matrix, and multiplication, which left-multiplies one row with another. Such operations do not modify the stabilizer state and resemble the steps performed during a Gaussian elimination888Since Gaussian elimination essentially inverts the matrix, this could be sped up to time by using fast matrix inversion algorithms. However, - time Gaussian elimination seems more practical. procedure. Several row-echelon (standard) forms for stabilizer generators along with relevant algorithms to obtain them have been introduced in the literature [3, 12, 19].

Stabilizer-circuit simulation. The computational-basis states are stabilizer states that can be represented by the stabilizer-matrix structure depicted in Figure 2a. In this matrix form, the sign of each row along with its corresponding -literal designates whether the state of the qubit is () or (). Suppose we want to simulate circuit . Stabilizer-based simulation first initializes to specify some basis state. Then, to simulate the action of each gate , we conjugate each row of by .999 Since , the resulting state is stabilized by because . We require that the conjugation maps to another string of Pauli literals so that the resulting matrix is well-formed. It turns out that the H, P and CNOT gates have such mappings, i.e., these gates conjugate the Pauli group onto itself [12, 19]. Table II lists mappings for the H, P and CNOT gates.



Fig. 2: (a) Stabilizer-matrix structure for basis states. (b) Row-echelon form for stabilizer matrices. The -block contains a minimal set of generators with literals. Generators with literals only appear in the -block.
Gate Input Output
Gate Input Output
TABLE II: Conjugation of Pauli-group elements by Clifford gates [19]. For the case, subscript indicates the control and the target.

Example 2. Suppose we simulate a CNOT gate on . Using the stabilizer representation, . The rows of stabilize as required.

Since H, P and CNOT gates are directly simulated via the stabilizer formalism, these gates are also known as stabilizer gates and any circuit composed exclusively of such gates is called a unitary stabilizer circuit. Table II shows that at most two columns of are updated when a Clifford (stabilizer) gate is simulated. Therefore, such gates are simulated in time. Furthermore, for any pair of Pauli operators and , , where if and commute, and otherwise. Thus, Pauli gates can also be simulated in linear time as they only permute the phase vector of the stabilizer matrix.

Theorem 2

An -qubit stabilizer state can be obtained by applying a stabilizer circuit to the computational-basis state.

The work in [1] represents the generators using a tableau, and then shows how to construct a canonical stabilizer circuit from the tableau. We refer the reader to [1, Theorem 8] for details of the proof. Algorithms for obtaining more compact canonical circuits are discussed in [11].

Corollary 3

An -qubit stabilizer state can be transformed by Clifford gates into the computational-basis state.

Since every stabilizer state can be produced by applying some unitary stabilizer circuit to the state, it suffices to reverse to perform the inverse transformation. To reverse a stabilizer circuit, reverse the order of gates and replace every gate with .

The stabilizer formalism also admits measurements in the computational basis [12]. Conveniently, the formalism avoids the direct computation of measurement operators and inner products (Section 2). However, the updates to for such gates are not as efficient as for Clifford gates. Note that any qubit in a stabilizer state is either in a () state or in an unbiased superposition of both. The former case is called a deterministic outcome and the latter a random outcome. We can tell these cases apart in time by searching for or literals in the column of . If such literals are found, the qubit must be in a superposition and the outcome is random with equal probability (); otherwise the outcome is deterministic ( or ).

Case 1 – randomized outcomes: one flips an unbiased coin to decide the outcome and then updates to make it consistent with the outcome. Let be a row in with an literal in its position, and let be the Pauli operator with a literal in its position and everywhere else. The phase of is set to if and if . Observe that and anticommute. If any other rows in anticommute with , multiply them by to make them commute with . Then, replace with . Since this process requires up to row multiplications, the overall runtime is .

Case 2 – deterministic outcomes: no updates to are necessary but we need to figure out whether the qubit is in the or state, i.e., whether the qubit is stabilized by or -. One approach is to perform Gaussian elimination (GE) to put in row-echelon form. This removes redundant literals from and makes it possible to identify the row containing a in its position and ’s everywhere else. The phase of such a row decides the outcome of the measurement. Since this is a GE-based approach, it takes time in practice.

The work in [1] improved the runtime of deterministic measurements by doubling the size of to include destabilizer generators in addition to the stabilizer generators. Such destabilizer generators help identify which specific row multiplications to compute in order to decide the measurement outcome. This approach avoids GE and thus deterministic measurements are computed in time. In Section 3, we describe a different approach that computes such measurements in linear time without extra storage but with an increase in runtime when simulating Clifford gates.

In quantum mechanics, the states and are considered phase-equivalent because does not affect the statistics of measurement. Since the stabilizer formalism simulates Clifford gates via their action-by-conjugation, such global phases are not maintained.

Example 3. Suppose we have state , which is stabilized by -. Conjugating the stabilizer by the Phase gate yields -=-. However, in the state-vector representation, . Thus the global phase is not maintained by the stabilizer.

Since global phases are unobservable, they do not need to be maintained when simulating a single stabilizer state. However, in Section 4, we show that such phases must be maintained when dealing with stabilizer-state superpositions, where global phases become relative.

3 Stabilizer Frames: Data Structure
and Algorithms

The Clifford gates by themselves do not form a universal set for quantum computation [1, 19]. However, the Hadamard and Toffoli () gates do [2]. To simulate and other non-Clifford gates, we extend the formalism to include the representation of arbitrary quantum states as superpositions of stabilizer states.

Example 4. Recall from Section 2.2 that computational-basis states are stabilizer states. Thus, any one-qubit state is a superposition of the stabilizer states and . In general, any state decomposition in a computational basis is a stabilizer superposition.

Suppose in Example 3 is an unbiased state such that where . Then can be represented using a single stabilizer state instead of two (up to a global phase). The key idea behind our technique is to identify and compress large unbiased superpositions on the fly during simulation to reduce resource requirements. To this end, we leverage the following observation and derive a compact data structure for representing stabilizer-state superpositions.

Observation 2. Given an -qubit stabilizer state , there exists an orthonormal basis including and consisting entirely of stabilizer states. One such basis is obtained directly from the stabilizer of by changing the signs of an arbitrary, non-empty subset of generators of , i.e., by modifying the phase vector of the stabilizer matrix for .101010Let and be the stabilizer groups for and , respectively. If there exist and such that -, and are orthogonal since is a -eigenvector of and is a -eigenvector of . Thus, one can produce additional orthogonal stabilizer states. Such states, together with , form an orthonormal basis. This basis is unambiguously specified by a single stabilizer state, and any one basis state specifies the same basis.

Definition 1. An -qubit stabilizer frame is a set of stabilizer states that forms an orthogonal subspace basis in the Hilbert space. We represent by a pair consisting of () a stabilizer matrix and () a set of distinct phase vectors , where . We use to denote the ordered assignment of the elements in as the ()-phases of the rows in . Therefore, state is represented by . The size of the frame, which we denote by , is equal to .

Each phase vector can be viewed as a binary (-) encoding of the integer index that denotes the respective basis vector. Thus, when dealing with qubits or less, a phase vector can be compactly represented by a -bit integer (modern CPUs also support -bit integers). To represent an arbitrary state using , one additionally maintains a vector of complex amplitudes , which corresponds to the decomposition of in the basis defined by , i.e., and . Observe that each forms a pair with phase vector in since . Any stabilizer state can be viewed as a one-element frame.

Example 5. Let . Then can be represented by the stabilizer frame depicted in Figure 3.

Fig. 3: A two-qubit stabilizer state whose frame representation uses two phase vectors.

We now describe several frame operations that are useful for manipulating stabilizer-state superpositions.

. Consider the stabilizer basis defined by frame . A stabilizer or Pauli gate acting on maps such a basis to , where is the global phase of stabilizer state . Since we obtain a new stabilizer basis that spans the same subspace, this operation effectively rotates the stabilizer frame. Computationally, we perform a frame rotation as follows. First, update the stabilizer matrix associated with as per Section 2.2. Then, iterate over the phase vectors in and update each one accordingly (Table II). Let be the decomposition of onto . Frame rotation simulates the action of Clifford gate on since,


Observe that the global phase of each becomes relative with respect to . Therefore, our approach requires that we compute such phases explicitly in order to maintain a consistent representation.

Global phases of states in . Recall from Theorem 2 that any stabilizer state for some stabilizer circuit . To compute the global phase of , one keeps track of the global factors generated when each gate in is simulated in sequence. In the context of frames, we maintain the global phase of each state in using the amplitude vector . Let be the matrix associated with and let be the ()-phase vector in that forms a pair with . When simulating Clifford gate , each is updated as follows:

1) Set the leading phases of the rows in to .
2) Obtain a basis state from and store its non-zero amplitude . If is the Hadamard gate, it may be necessary to sample a sum of two non-zero basis amplitudes (one real, one imaginary).
3) Consider the matrix representation of and the vector representation of , and compute via matrix-vector multiplication.
4) Obtain from and store its amplitude .
5) Compute the global factor generated as .

By Observation 1, needs to be in row-echelon form (Figure 2b) in order to sample the computational-basis amplitudes and . Thus, simulating gates with global-phase maintenance would take time for -qubit stabilizer frames. To improve this, we introduce a simulation invariant.

Invariant 1

The stabilizer matrix associated with remains in row-echelon form during simulation.

Since Clifford gates affect at most two columns of , Invariant 1 can be repaired with row multiplications. Since each row multiplication takes , the runtime required to update during global-phase maintenance simulation is . Therefore, for an -qubit stabilizer frame, the overall runtime for simulating a single Clifford gate is since one can memoize the updates to required to compute each . Another advantage of maintaining this invariant is that the outcome of deterministic measurements (Section 2.2) can be decided in time linear in since it eliminates the need to perform Gaussian elimination.

. This operation facilitates measurement of stabilizer-state superpositions and simulation of non-Clifford gates using frames. (Recall from Section 2 that post-measurement states are also called cofactors.) Here, is the cofactor index. Let be the stabilizer basis defined by . Frame cofactoring maps such a basis to . Therefore, after a frame is cofactored, its size either remains the same (qubit was in a deterministic state and thus one of its cofactors is empty) or doubles (qubit was in a superposition state and thus both cofactors are non-empty). We now describe the steps required to cofactor .

1) Check associated with to determine whether qubit is a random (deterministic) state. (Section 2.2)
2a) Suppose qubit is in a deterministic state. Since is maintained in row-echelon form (Invariant 1) no frame updates are necessary.
2b) In the randomized-outcome case, apply the measurement algorithm from Section 2.2 to while forcing the outcome to . This is done twice – once for each -cofactor, and the row operations performed on are memoized each time.
3) Let be the ()-phase vector in that forms a pair with . Iterate over each pair and update its elements according to the memoized operations.
4) For each , insert a new phase vector-amplitude pair corresponding to the cofactor state added to the stabilizer basis.

Similar to frame rotation, the runtime of cofactoring is linear in the number of phase vectors and quadratic in the number of qubits. However, after this operation, the number of phase vectors (states) in will have grown by a (worst case) factor of two. Furthermore, any state represented by is invariant under frame cofactoring.

Example 6. Figure 4 shows how is cofactored with respect to its first two qubits. A total of four cofactor states are obtained.

4 Simulating Quantum Circuits
with Stabilizer Frames

Let be the stabilizer frame used to represent the -qubit state . Following our discussion in Section 3, any stabilizer or Pauli gate can be simulated directly via frame rotation.

Suppose we want to simulate the action of , where and are the control qubits, and is the target. First, we decompose into all four of its double cofactors (Section 2) over the control qubits to obtain the following equal superposition of orthogonal states:

Here, we assume the most general case where all the cofactors are non-empty. The number of states in the superposition obtained could also be two (one control qubit has an empty cofactor) or one (both control qubits have empty cofactors). After cofactoring, we compute the action of the Toffoli as,


where is the Pauli gate (NOT) acting on target . We simulate Equation 4 with the following frame operations. (An example of the process is depicted in Figure 4.)

1) .
2) .
3) Let be the Pauli operator with a literal in its position and everywhere else. Due to Steps 1 and 2, the matrix associated with must have two rows of the form and . Let and be the indices of such rows, respectively. For each phase vector ,
if the and elements of are both - (i.e., if the phase vector corresponds to the cofactor), flip the value of element in (apply to this cofactor).
Fig. 4: Simulation of using a stabilizer-state superposition (Equation 4). Here, , and . Amplitudes are omitted for clarity and the phases are prepended to matrix rows. The gate is applied to the third qubit of the cofactor.

Controlled-phase gates can also be simulated using stabilizer frames. This gate applies a phase-shift factor of if both the control qubit and target qubit are set. Thus, we compute the action of as,


Equation 4 can be simulated via frame-based simulation using a similar approach as discussed for gates. Let be the decomposition of onto . First, cofactor over the and qubits. Then, for any phase vector that corresponds to the cofactor, set . Observe that, in contrast to gates, controlled- gates produce biased superpositions. The Hadamard and controlled- gates are used to implement the quantum Fourier transform circuit, which plays a key role in Shor’s factoring algorithm.

Measuring . Since the states in are orthogonal, the outcome probability when measuring is calculated as the sum of the normalized outcome probabilities of each state. The normalization is with respect to the amplitudes stored in

, and thus the overall measurement outcome may have a non-uniform distribution. Formally, let

be the superposition of states represented by , the probability of observing outcome upon measuring qubit is,

where denotes the measurement operator in the computational basis as discussed in Section 2. The outcome probability for each stabilizer state is computed as outlined in Section 2.2. Once we compute , we flip a (biased) coin to decide the outcome and cofactor the frame such that only the states that are consistent with the measurement remain in the frame.

Prior work on simulation of non-Clifford gates using the stabilizer formalism can be found in [1] where the authors represent a quantum state as a sum of density-matrix terms while simulating non-Clifford operations acting on distinct qubits. In contrast, the number of states in our technique is although we do not handle density matrices and perform more sophisticated bookkeeping.

Technology-dependent gate decompositions. Stabilizer-frame simulation can be specialized to quantum technologies that are represented by libraries of primitive gates (and, optionally, macro gates) with restrictions on qubit interactions as well as gate parallelism and scheduling. The work in [17] describes several primitive gate libraries for different quantum technologies including quantum dot, ion trap and superconducting systems. Such libraries can be incorporated into frame-based simulation by decomposing the primitive gates into linear combinations of Pauli or Clifford operators, subject to qubit-interaction constraints. For example, the gate and its inverse, which are primitive gates in most quantum machine descriptions, can be simulated as .

Multiframe simulation. Although a single frame is sufficient to represent a stabilizer-state superposition , one can sometimes tame the exponential growth of states in by constructing a multiframe representation. Such a representation cuts down the total number of states required to represent , thus improving the scalability of our technique. Our experiments in Section 6 show that, when simulating ripple-carry adders, the number of states in grows linearly when multiframes are used but exponentially when a single frame is used. To this end, we introduce an additional frame operation.

. One derives a multiframe representation directly from a single frame by examining the set of phase vectors and identifying candidate pairs that can be coalesced into a single phase vector associated with a different stabilizer matrix. Since we maintain the stabilizer matrix of a frame in row-echelon form (Invariant 1), examining the phases corresponding to -rows (-literal in column and ’s in all other columns) allows us to identify the columns in that need to be modified in order to coalesce candidate pairs. More generally, suppose are a pair of phase vectors from the same -qubit frame. Then is considered a candidate iff it has the following properties: (i) and are equal up to entries corresponding to -rows (where is the qubit the row stabilizes), and (ii) for some (where and are the frame amplitudes paired with and ). Let be the indices of a set of differing phase-vector elements, and let be the qubits stabilized by the -rows identified by . The steps in our coalescing procedure are:

1) Sort phase vectors such that candidate pairs with differing elements are next to each other.
2) Coalesce candidates into a new set of phase vectors .
3) Create a new frame consisting of and matrix , where =CNOTCNOTCNOTPH.
4) Repeat Steps 2–3 until no candidate pairs remain.

During simulation, we execute the coalescing operation after the set of phase vectors in a frame is expanded via cofactoring. Therefore, the choice of (and thus ) is driven by the -rows produced after a frame is cofactored. (Recall that cofactoring modifies such that each cofactored qubit is stabilized by a operator.) The output of our coalescing operation is a list of -qubit frames (i.e., a multiframe) that together represent the same superposition as the original input frame . The size of the multiframe generated is half the number of phase vectors in the input frame. The runtime of this procedure is dominated by Step 1. Each phase-vector comparison takes time. Therefore, the runtime of Step 1 and our overall coalescing procedure is for a single frame with phase vectors.

Example 7. Suppose we coalesce the frame depicted in Figure 5. Candidate pairs are and , with and , respectively. To obtain , conjugate the second column of by an H gate (Step 3), which will coalesce into a single phase vector . Similarly, to obtain , conjugate the second column by H, then conjugate the second and third columns by CNOT, which will coalesce . Observe that no P gates are applied since for all pairs in .

Fig. 5: Example of how a multiframe representation is derived from a single-frame representation.

Candidate pairs can be identified even in the absence of -rows in an -qubit . By Corollary 3, one can always find a stabilizer circuit that maps to the matrix structure depicted in Figure 2a, whose rows are all of form. Several -time algorithms exist for obtaining [1, 3, 11]. We leverage such algorithms to extend our coalescing operation as follows:

1) Find that maps to computational-basis form.
2) .
3) .
4)  for .

To simulate stabilizer, Toffoli and controlled- gates using multiframe , we apply single-frame operations to each frame in the list independently. For Toffoli and controlled- gates, additional steps are required:

1) Apply the coalescing procedure to each frame and insert the new ‘‘coalesced’’ frames in the list.
2) Merge frames with equivalent stabilizer matrices.
3) Repeat Steps 1–2 until no new frames are generated.

Orthogonality of multiframes. We introduce the following invariant to facilitate simulation of quantum measurements on multiframes.

Invariant 2

The stabilizer frames that represent a superposition of stabilizer states remain mutually orthogonal during simulation, i.e., every pair of (basis) vectors from any two frames are orthogonal.

Given multiframe , one needs to consider two separate tasks in order to maintain Invariant 2. The first task is to verify the pairwise orthogonality of the states in . The orthogonality of two -qubit stabilizer states can be checked using the inner-product algorithm describe in [11], which takes

time. To improve this, we derive a heuristic based on Observation 

3, which takes advantage of similarities across the (canonical) matrices in to avoid expensive inner-product computations in many cases. We note that, when simulating quantum circuits that exhibit significant structure, contains similar stabilizer matrices with equivalent rows (Pauli operators). Let be the set of -qubit stabilizer matrices in . Our heuristic keeps track of a set of Pauli operators , that form an intersection across the matrices in .

Example 8. Consider the multiframe from Figure 5. The intersection consists of (first row of both ).

By Observation 3, if two phase vectors (states) have different entries corresponding to the Pauli operators in , then the states are orthogonal and no inner-product computation is required. For certain practical instances, including the benchmarks described in Section 6, we obtain a non-empty and our heuristic proves effective. When is empty or the phase-vector pair is equivalent, we use the algorithm from [11] to verify orthogonality. Therefore, in the worst case, checking pairwise orthogonality of the states in takes time for a multiframe that represents a -state superposition.

The second task to consider when maintaining Invariant 2 is the orthogonalization of the states in when our check fails. To accomplish this, we iteratively apply the operation to each frame in in order to decompose into a single frame. At each iteration, we select a pivot qubit based on the composition of Pauli literals in the corresponding column. We apply the operation only if there exists a pair of matrices in that contain a different set of Pauli literals in the pivot column.

1) Find pivot qubit .
2)  for .
3) Merge frames with equivalent stabilizer matrices.
4) Repeat Steps 1–3 until a single frame remains.
Fig. 6: Overall simulation flow for Quipu.

Observe that the order of pivot selection does not impact the performance of orthogonalization. Each iteration of the algorithm can potentially double the number of states in the superposition. Since the algorithm terminates when a single frame remains, the resulting states are represent by distinct phase vectors and are therefore pairwise orthogonal. In the worst case, all qubits are selected as pivots and the resulting frame constitutes a computational-basis decomposition of size .

The overall simulation flow of our frame-based techniques is shown in Figure 6 and implemented in our software package Quipu.

Example 9. Figure 7 depicts the main steps of the Quipu simulation flow for a small non-Clifford circuit.

Fig. 7: Example simulation flow for a small non-Clifford circuit (top left) using Quipu. The multiframes obtained are pairwise orthogonal and thus no orthogonalization is required.

5 Parallel Simulation

Unlike other techniques based on compact representations of quantum states (e.g., using BDD data structures [26]), most frame-based operations are inherently parallel and lend themselves to multi-threaded implementation. The only step in Figure 6 that presents a bottleneck for a parallel implementation is the orthogonalization procedure, which requires communication across frames. All other processes at both the single- and multi-frame levels can be executed on large subsets of phase vectors independently.

template<class Func, class Iter, class... Params>
auto async_launch( Func f, const Iter begin, const Iter end,
                  Params... p )
  -> vector< decltype( async(f, begin, end, p...) ) >
  vector< decltype( async(f, begin, end, p...) ) > futures;
  int size = distance(begin, end);
  int n = size/MTHREAD;
  for(int i = 0; i < MTHREAD; i++)      {
    Iter first = begin + i*n;
    Iter last = (i < MTHREAD - 1) ? begin + (i+1)*n : end;
    futures.push_back( async( f, first, last, p...) );
  return futures;
Fig. 8: Our C++11 template function for parallel execution of the frame operations described in Section 4.

We implemented a multithreaded version of Quipu using the C++11 thread support library. Each worker thread is launched via the std::async() function. Figure 8 shows our wrapper function for executing calls to std::async(). The async_launch() function takes as input: () a frame operation (Func f), () a range of phase-vector elements defined by Iter begin and Iter end, and () any additional parameters (Params... p) required for the frame operation. Furthermore, the function returns a vector of std::future – the C++11 mechanism for accessing the result of an asynchronous operation scheduled by the C++ runtime support system. The workload (number of phase vectors) of each thread is distributed evenly across the number of cores in the system (MTHREAD). The results from each thread are joined only when orthogonalization procedures are performed since they require communication between multiple threads. This is accomplished by calling the std::future::get() function on each future. All Clifford gates and measurements are simulated in parallel.

6 Empirical Validation

We tested a single-threaded and multi-threaded versions of Quipu on a conventional Linux server using several benchmark sets consisting of stabilizer circuits, quantum ripple-carry adders, quantum Fourier transform circuits and quantum fault-tolerant (FT) circuits.

We used a straightforward implementation of the state-vector model using an array of complex amplitudes to perform functional verification of: () all benchmarks with qubits and (Quipu output for such benchmarks. We simulated such circuits and checked for equivalence among the resultant states and operators [26]. In the case of stabilizer circuits, we used the equivalence-checking method described in [1, 11].

Avg. Runtime (secs)

                      Number of qubits
Fig. 9: Average time needed by Quipu and CHP to simulate an -qubit stabilizer circuit with gates and measurements. Quipu is asymptotically as fast as CHP but is not limited to stabilizer circuits.

Stabilizer circuits. We compared the runtime performance of single-threaded Quipu against that of CHP using a benchmark set similar to the one used in [1]. We generated random stabilizer circuits on qubits, for . The use of randomly generated benchmarks is justified for our experiments because (i) our algorithms are not explicitly sensitive to circuit topology and (ii) random stabilizer circuits have been considered representative [16]. For each , we generated the circuits as follows: fix a parameter ; then choose random unitary gates (CNOT, P or H) each with probability . Then measure each qubit in sequence. We measured the number of seconds needed to simulate the entire circuit. The entire procedure was repeated for ranging from to in increments of . Figure 9 shows the average time needed by Quipu and CHP to simulate this benchmark set. The purpose of this comparison is to evaluate the overhead of supporting generic circuit simulation in Quipu. Since CHP is specialized to stabilizer circuits, we do not expect Quipu to be faster. When , the simulation time appears to grow roughly linearly in for both simulators. However, when the number of unitary gates is doubled (), the runtime of both simulators grows roughly quadratically. Therefore, the performance of both simulators depends strongly on the circuit being simulated. Although Quipu is slower than CHP, we note that Quipu maintains global phases whereas CHP does not. Nonetheless, Figure 9 shows that Quipu is asymptotically as fast as CHP when simulating stabilizer circuits that contain a linear number of measurements. The multithreaded speedup in Quipu for non-Clifford circuits is not readily available for stabilizer circuits.

Ripple-carry adders. Our second benchmark set consists of -bit ripple-carry (Cuccaro) adder [7] circuits, which often appear as components in many arithmetic circuits [18]. The Cuccaro circuit for is shown in Figure 10. Such circuits act on two -qubit input registers, one ancilla qubit and one carry qubit for a total of qubits. We applied H gates to all input qubits in order to simulate addition on a superposition of computational-basis states. Figure 11 shows the average runtime needed to simulate this benchmark set using Quipu. For comparison, we ran the same benchmarks on an optimized version of QuIDDPro, called QPLite 111111QPLite is up to faster since it removes overhead related to QuIDDPro’s interpreted front-end for quantum programming., specific to circuit simulation [26]. When , QPLite is faster than Quipu because the QuIDD representing the state vector remains compact during simulation. However, for , the compactness of the QuIDD is considerably reduced, and the majority of QPLite’s runtime is spent in non-local pointer-chasing and memory (de)allocation. Thus, QPLite fails to scale on such benchmarks and one observes an exponential increase in runtime. Memory usage for both Quipu and QPLite was nearly unchanged for these benchmarks. Quipu consumed MB on average while QPLite consumed almost twice as much (MB).

Fig. 10: Ripple-carry (Cuccaro) adder for -bit numbers and [7, Figure 6]. The third qubit from the top is an ancilla and the qubit is the carry. The -register is overwritten with the result .

Runtime (secs)

-bit Cuccaro adder ( qubits)
Fig. 11: Average runtime and memory needed by Quipu and QuIDDPro to simulate -bit Cuccaro adders on an equal superposition of all computational basis states.

We ran the same benchmarks using single and multiframes. The number of states in the superposition grows exponentially in for a single frame, but linearly in when multiple frames are allowed. This is because gates produce large equal superpositions that are effectively compressed by our coalescing technique. Since our frame-based algorithms require poly() time for states in a superposition, Quipu simulates Cuccaro circuits in polynomial time and space for input states consisting of large superpositions of basis states. On such instances, known linear-algebraic simulation techniques (e.g., QuIDDPro) take exponential time while Quipu’s runtime grows quadratically (best quadratic fit with ).

The work in [18] describes additional quantum arithmetic circuits that are based on Cuccaro adders (e.g., subtractors, conditional adders, comparators). We used Quipu to simulate such circuits and observed similar runtime performance as that shown in Figure 11.

Fig. 12: The three-qubit QFT circuit.

Quantum Fourier transform (QFT) circuits. Our third benchmark set consists of circuits that implement the -qubit QFT, which computes the discrete Fourier transform of the amplitudes in the input quantum state. Let