HybridQ: A Hybrid Simulator for Quantum Circuits

by   Salvatore Mandrà, et al.

Developing state-of-the-art classical simulators of quantum circuits is of utmost importance to test and evaluate early quantum technology and understand the true potential of full-blown error-corrected quantum computers. In the past few years, multiple theoretical and numerical advances have continuously pushed the boundary of what is classically simulable, hence the development of a plethora of tools which are often limited to a specific purpose or designed for a particular hardware (e.g. CPUs vs. GPUs). Moreover, such tools are typically developed using tailored languages and syntax, which makes it hard to compare results from, and create hybrid approaches using, different simulation techniques. To support unified and optimized use of these techniques across platforms, we developed HybridQ, a highly extensible platform designed to provide a common framework to integrate multiple state-of-the-art techniques to run on a variety of hardware. The philosophy behind its development has been driven by three main pillars: "Easy to Use", "Easy to Extend", and "Use the Best Available Technology". The powerful tools of HybridQ allow users to manipulate, develop, and extend noiseless and noisy circuits for different hardware architectures. HybridQ supports large-scale high-performance computing (HPC) simulations, automatically balancing workload among different processor nodes and enabling the use of multiple backends to maximize parallel efficiency. Everything is then glued together by a simple and expressive language that allows seamless switching from one technique to another as well as from one hardware to the next, without the need to write lengthy translations, thus greatly simplifying the development of new hybrid algorithms and techniques.



There are no comments yet.


page 1


Establishing the Quantum Supremacy Frontier with a 281 Pflop/s Simulation

Noisy Intermediate-Scale Quantum (NISQ) computers aim to perform computa...

Intel Quantum Simulator: A cloud-ready high-performance simulator of quantum circuits

Classical simulation of quantum computers will continue to play an essen...

Faster Schrödinger-style simulation of quantum circuits

Recent demonstrations of superconducting quantum computers by Google and...

Parameterized quantum circuits as machine learning models

Hybrid quantum-classical systems make it possible to utilize existing qu...

Reliable quantum circuits have defects

State of the art quantum computing architectures are founded on the deci...

Cache Blocking Technique to Large Scale Quantum Computing Simulation on Supercomputers

Classical computers require large memory resources and computational pow...

Webots.HPC: A Parallel Robotics Simulation Pipeline for Autonomous Vehicles on High Performance Computing

In the rapidly evolving and maturing field of robotics, computer simulat...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction and Motivations

Since the historical milestone of quantum supremacy achieved by the Google team [q_suprem_google]

, quantum technology has continued to advance at an incredibly fast pace. More and more laboratories and companies have developed their own quantum hardware using a variety of materials, including superconducting qubits, trapped-ions, or photonic quantum chips paving the road for full-blown error-corrected quantum computers. At the same time, multiple theoretical and numerical advances have raised the bar for a potential quantum advantage and pushed the boundaries of what is classically simulable

[mi2021information, huang2020classical]

. Among the different techniques to simulate large-scale quantum circuits, three are widely used: state vector simulation

[guerreschi2020intel, pednault2019leveraging]

, tensor contraction

[villalonga2020establishing, villalonga2016flexible, gray2021hyper, zlokapa2020boundaries, pan2020contracting, markov_tn, markov2018quantum], and expansion around efficiently simulable quantum circuits [aaronson_gottesmann_stabilizer_sim, Bravyi2019simulationofquantum, bravyi2016improved, jozsa2008matchgates, aharonov2003simple]. To aid the development of new quantum algorithms and the large-scale simulation of quantum circuits, a plethora of tools have been developed. Many of these tools, however, consider only a few simulation techniques and can typically be run only on specific hardware (e.g. CPUs or GPUs). Moreover, such tools use tailored languages and syntax specific to an application, making it hard to compare results and develop hybrid algorithms. To support unified and optimized use of these techniques across platforms, we developed HybridQ  a highly extensible platform designed to provide a framework to integrate multiple state-of-the-art simulation techniques, and run seamlessly on a variety of hardware types. HybridQ is entirely written in Python, and only uses compiled languages such as to achieve state-of-the-art performance in core parts of the simulation. The simple and expressive language developed in HybridQ allows users to design and manipulate both noiseless and noisy circuits, and to seamlessly switch from one simulation technique to another without the burden of rewriting an entirely new algorithm. HybridQ also allows simulation on multiple backends, including CPUs, GPUs, and TPUs, both on single nodes or on heterogeneous HPC clusters.

Among the many innovations introduced in HybridQ, it is worth highlighting:

  • Fully integrates multiple techniques under the same common framework:

    • State Vector Simulation: A multi-threaded core which uses AVX instructions to achieve state-of-the-art performance in matrix-vector multiplication of quantum states. The core, which uses a similar syntax of numpy.dot, is integrated into HybridQ whenever a matrix-vector multiplication is needed.

    • Tensor Contraction: HybridQ fully integrates cotengra[cotengra] (one of the most advanced tools used to identify optimal contractions of tensors) and quimb[quimb] to run large-scale simulations of quantum circuits.

    • Clifford Expansion: A novel approach based on fusing together multiple non-Clifford gates and then finding the smallest number of stabilizer operators to represent the fused result was devised. Further, a parallelized branch-and-bound technique was introduced to minimize the number of branches, thereby greatly reducing the computational cost of the simulation.

  • HybridQ supports large-scale simulations on heterogeneous HPC clusters via MPI. More precisely, without any prior knowledge from the user, it automatically splits and balances the workload among different processor nodes and allows the use of multiple backends to maximize parallel efficiency.

  • Simulations of both noiseless and noisy circuits are supported in HybridQ. More importantly, it fully integrates noise in its framework, allowing the simulation of noisy circuits regardless of the numerical technique chosen (including tensor contraction).

  • Instead of developing monolithic gates which are hard to extend and develop, the concept of “properties” is introduced to speedup and simplify the development of new gates. Each property is developed independently and enables new abstract “features” (such as the concept of having qubits, of having parameters, or simply of being unitary). Gates are then built by collecting multiple properties. In this manner, any improvement on a specific feature will immediately percolate to any gate using such a property. This approach avoids multiple development of the same features on different gates, maximizing code recycling.

  • HybridQ fully integrates the use of “symbols” (as in sympy[sympy]) in almost every part of the framework. For instance, one may use sympy.symbols to parametrize gates, which can be eventually specified at a later time.

Ii Gates and Properties

Gates are at the core of any simulator of quantum circuits, as they represent the basic operations to manipulate quantum states. Therefore, gates must be expressive enough to allow arbitrary simulations but, at the same time, simple enough to allow users of any level to intuitively simulate existing quantum hardware.

To this end, HybridQ uses a bottom-up approach based on the development of “properties”. More precisely, instead of building monolithic gates, each one with its own specification and design, properties are build independently so that multiple gates can inherit from them. This not only allows a high reuse of existing code, but it drastically simplify the development and extension of existing gates as new properties are created and deployed. Indeed, basic properties used across the library are defined in base.property, and more specialized properties are built on top of them. As an example, the property to have “qubits” can be implemented as simply as follows:

from hybridq import base
class Qubits(base.__Base__):
    Class with ’qubits’.
# Generate new type
QubitGate = base.generate(’QubitGate’,
                          (Qubits, ),
                          qubits=(1, 2, 3))
# Generate new gate
gate = QubitGate()
# Output gate qubits
> (1, 2, 3)

In HybridQ, there are two types of gates: gates acting on quantum states (gate.Gate) and gates acting on density matrices (dm.gate.Gate). In both cases, gates can be specified by their common names. For instance:

from hybridq.gate import Gate
from sympy.abc import g
import numpy as np
# Generate a Hadamard gate
> Gate_H(name=’H’, qubits=(1,), M=numpy.ndarray(shape=(2, 2), dtype=float64))**1.2
# Generate an X-Rotation
Gate(’RX’, params=[np.pi])**2.2
> Gate_RX(name=’RX’, n_qubits=1, $\phi$=2.2$\pi$)
# Generate a controlled-phase gate
Gate(’CPHASE’, qubits=[’a’, ’b’], params=[g]).adj()
> Gate_CPHASE^+(name=’CPHASE’, qubits=(’a’, ’b’), params=(g,))

allows to specify the Hadamard gate and the controlled-phase gate respectively. Gates can also be specified directly providing a matrix:

from hybridq.gate import MatrixGate
import numpy as np
# Generate a gate by specifying its matrix
MatrixGate(np.eye(2), qubits=[’qubit’])**2.2
> MatrixGate(name=’MATRIX’,
>            qubits=(’qubit’,),
>            M=numpy.ndarray(shape=(2, 2),
>            dtype=float64))**2.2

If a closer look is taken at the specific inheritance, one can see that different gates may inherit from different properties to better describe them:

from hybridq.gate import Gate
# Output inheritance for Hadamard and X-Rotation
> [hybridq.base.base.Gate_H,
>  hybridq.gate.gate.BaseGate,
>  hybridq.gate.property.CliffordGate,
>  hybridq.gate.property.MatrixGate,
>  hybridq.gate.property.SelfAdjointUnitaryGate,
>  hybridq.gate.property.UnitaryGate,
>  hybridq.gate.property.PowerMatrixGate,
>  hybridq.gate.property.PowerGate,
>  hybridq.gate.property.QubitGate,
>  hybridq.base.property.Tags,
>  hybridq.base.property.Name,
>  hybridq.base.base.__Base__,
>  object]
> [hybridq.base.base.Gate_RX,
>  hybridq.gate.gate.BaseGate,
>  hybridq.gate.property.RotationGate,
>  hybridq.gate.property.ParamGate,
>  hybridq.base.property.Params,
>  hybridq.gate.property.UnitaryGate,
>  hybridq.gate.property.PowerMatrixGate,
>  hybridq.gate.property.PowerGate,
>  hybridq.gate.property.QubitGate,
>  hybridq.base.property.Tags,
>  hybridq.base.property.Name,
>  hybridq.base.base.__Base__,
>  object]

Beyond the most common gates, HybridQ provides a variety of useful gate.Gate that allows great freedom in specifying quantum circuits:

  • StochasticGate(gates, p):

    gates are sampled with probability

    p every time the gate is applied to a quantum state,

  • TupleGate(gates): gates are gathered together and applied to the quantum state one by one,

  • Control(c_qubits, gate): The application of gate to the quantum state is controlled by c_qubits,

  • Projection(state, qubits): The quantum state is projected to state,

  • Measure(qubits): A measurement operation on qubits is applied to the quantum state,

  • FunctionalGate(f, qubits): f is applied to the quantum state. Observe that f can be an arbitrary function, allowing the easy and straightforward implementation of oracles such as the Grover gate [grover1996fast] or QAOA simulations [farhi2016quantum, wang2018quantum, hadfield2019quantum],

  • SchmidtGate(gates, s): This gate is used to represent decomposed gates in the form , with and being arbitrary gates. Arbitrary gates can be decomposed in SchmidGate by using gate.utils.decompose. Similarly, SchmidGate can be merged back to a MatrixGate by using gate.utils.merge.

At the moment,

HybridQ implements the following dm.gate.Gate for density matrix simulations:

  • KrausSuperGate(gates, s): The KrausSuperGate is a specialization of the SchmidtGate for density matrices. More specifically, the gate implements:

    with being the density matrix, and and being arbitrary gates.

  • MatrixSuperGate: This gate is the operator representation of the superoperator defining a quantum map. More precisely, once is vectorized, MatrixSuperGate is applied to using a matrix-vector multiplication.

All gate.Gate and dm.gate.Gate provide a useful way to “tag” them. Tagging can be used to identify and/or filter specific gates:

from hybridq.gate.utils import get_available_gates
from hybridq.gate import Gate
import numpy as np
# Generate random gates with tags
gates = [
           for _ in range(10)
# Filter only gates with tags[’a’] == 0
list(filter(lambda g: g.tags[’a’] == 0, gates))
> [Gate_U3(name=’U3’,
>          n_qubits=1,
>          n_params=3,
>          tags={’a’: 0}),
>  Gate_CX(name=’CX’,
>          n_qubits=2,
>          M=numpy.ndarray(shape=(4, 4),
>                          dtype=int64),
>          tags={’a’: 0}),
>  Gate_ISWAP(name=’ISWAP’,
>             n_qubits=2,
>             M=numpy.ndarray(shape=(4, 4),
>                             dtype=complex128),
>             tags={’a’: 0}),
>  Gate_FSIM(name=’FSIM’,
>            n_qubits=2,
>            n_params=2,
>            tags={’a’: 0})]

Ii-a Noisy gates

HybridQ fully supports multiple noise channels (noise.channel) which are different specializations of the more general KrausSuperGate. Among them, it is worth mentioning:

  • GlobalPauliChannel: A general channel describing the map , where are length vectors (for qubits) with entries in describing a tensor product of single-qubit Pauli operators.

  • GlobalDepolarizingChannel: A special case of GlobalPauliChannel, describing , where is the dimension,

    the identity matrix, and

    a probability.

  • DephasingChannel: A single qubit dephasing channel of the form , where is a Pauli matrix specified by the user.

  • AmplitudeDampingChannel: A single qubit channel with Kraus operators , . The user can additionally specify a non-zero excitation rate (for ), i.e. implementing a generalized amplitude damping channel.

Each of the “Global” channels also supports a corresponding “Local” version, with noise applied independently to each qubit specified. The user can specify unique parameters for each qubit, or a single set of values to use on every qubit. Moreover, these noise channels can be ‘attached’ to standard gate.Gate, in order to implement a noisy gate.

Ii-B Grover Oracle

In this section, we demonstrate how simple it is to construct a Grover gate [grover1996fast] in HybridQ. The first step consists in providing a function f to update the quantum state:

from hybridq.gate import Projection
import numpy as np
def grover(self, psi, order):
    Given a quantum state ‘psi‘ with qubits given
    by ‘order‘, invert the phase of the state
    # First, let’s create a projector
    proj = Projection(state=’0’ * self.n_qubits,
    # Project quantum state
    psi0, new_order = proj(psi=psi,
    # Check that order hasn’t changed
    assert(new_order == order)
    # Update quantum state
    psi -= 2 * psi0
    # Return quantum state
    return psi, order

Once the Grover oracle is defined, a FunctionalGate can be used to define the Grover gate:

from hybridq.gate import FunctionalGate
import numpy as np
# Define a Grover gate that will act
# on qubits 1 and 2 only
grover_gate = FunctionalGate(f=grover,
                             qubits=[1, 2])
# Define a superposition of three qubits
psi = np.ones(shape=(2, 2, 2),
psi /= np.linalg.norm(psi)
# Apply Grover gate to state
new_psi, _ = grover_gate(psi, order=[0, 1, 2])
# Print output
for i, x in enumerate(new_psi.ravel()):
    print(’{0}: {1:+g}’.format(
> 000: -0.353553
> 001: +0.353553
> 010: +0.353553
> 011: +0.353553
> 100: -0.353553
> 101: +0.353553
> 110: +0.353553
> 111: +0.353553

Iii Circuits

If gates are the “instructions“ to manipulate quantum states, circuits are nothing less than “programs” with multiple instructions to follow. Following the same philosophy we used to build gates, the circuit design in HybridQ is kept as simple as possible, with multiple “utilities” acting on them.

There are two classes of circuits in HybridQ: regular circuits acting on quantum states (circuit.Circuit) and the “super” circuits acting on density matrices (dm.circuit.Circuit, with the latter being a generalization of the former) and supports the inclusion of super operators. Indeed, while circuit.Circuit can only accept gate.Gate, dm.circuit.Circuit can accept both gate.Gate and dm.gate.Gate:

from hybridq.gate import Gate
from hybridq.circuit import Circuit
from hybridq.dm.gate import Gate as SuperGate
from hybridq.dm.circuit import Circuit as SuperCircuit
# Generate a few gates
g1 = Gate(’X’)
g2 = Gate(’PROJECTION’, state=’01’)
g3 = SuperGate(’KRAUS’, gates=(Gate(’RX’), Gate(’RY’)))
# Generate a regular circuit
Circuit([g1, g2])
> Circuit([
>   Gate_X(name=’X’,
>          n_qubits=1,
>          M=numpy.ndarray(shape=(2, 2),
>          dtype=int64)),
>   ProjectionGate(name=’PROJECTION’,
>                  n_qubits=2,
>                  state=’01’)
> ])
# Generate a super circuit
SuperCircuit([g1, g2, g3])
> Circuit([
>   Gate_X(name=’X’,
>          n_qubits=1,
>          M=numpy.ndarray(shape=(2, 2),
>          dtype=int64)),
>   ProjectionGate(name=’PROJECTION’,
>                  n_qubits=2,
>                  state=’01’)
>   KrausSuperGate(name=’KRAUS’,
>                  gates=(...),
>                  s=1)
> ])
# Adding super gates to regular circuits fails
Circuit([g1, g2, g3])
> ValueError: ’KrausSuperGate’ not supported.

Iii-a Circuit utilities

While regular and super circuits are kept simple, HybridQ provides powerful utilities to manipulate them. All circuit utilities can be found in circuit.utils and dm.circuit.utils respectively. Among them, it is worth mentioning:

  • simplify(circuit): Simplify circuit as much as possible. If use_matrix_commutation=True, commutation using the matrix representation of gates is used to maximize the simplification,

  • matrix(circuit): Return the matrix representation of circuit,

  • to_matrix_gate(circuit): Convert circuit to a gate.MatrixGate,

  • isclose(a, b): Given two circuits a and b, determine if the two circuits are close within an absolute tollerance of atol. If use_matrix_commutation=True, commutation using the matrix representation of gates is also used,

  • compress(circuit): Compress gates in circuit so that gates in the new circuit will not have more than max_n_qubits qubits, Compression is very important to improve the performance of circuit simulations. Indeed, larger gates may better exploit vectorization instructions present in modern hardware and, therefore, reduce the computational cost to simulate the circuit,

  • pop(circuit, pinned_qubits): Remove gates in circuit outside the lightcone generated by pinned_qubits. Gates can be removed from either side of the circuits (direction).

from hybridq.gate import Gate
from hybridq.circuit import Circuit, utils
# Generate a random circuit
circuit = Circuit(
    Gate(’CPHASE’, qubits=[q1, q2], params=[1])
      for q1 in range(5)
      for q2 in range(q1 + 1, 5))
# Compress gates
> Circuit([
>   MatrixGate(name=’MATRIX’, qubits=(0, 1, 2), ...)
>   MatrixGate(name=’MATRIX’, qubits=(0, 3, 4), ...)
>   MatrixGate(name=’MATRIX’, qubits=(1, 3, 4), ...)
>   MatrixGate(name=’MATRIX’, qubits=(2, 3, 4), ...)
> ])
# Shuffle circuit and invert it
inv_circuit = Circuit(circuit[x]
  for x in np.random.permutation(
# Simplify
utils.simplify(circuit + inv_circuit)
> Circuit([
> ])
# If use_matrix_commutation is False, circuit
# and circuit_inv cannot cancel each other
assert (len(utils.simplify(circuit + inv_circuit,
            use_matrix_commutation=False)) == 0)
> AssertionError

Iv Simulations

In the last few years, following the steady improvement of quantum technology to build increasingly large quantum processors, different numerical algorithms have been designed to reduce the computational cost to simulate quantum circuits. With a few exceptions, the most successful among them can be classified in one of the following categories: state vector simulation

[guerreschi2020intel, pednault2019leveraging], tensor contraction [villalonga2020establishing, villalonga2016flexible, gray2021hyper, zlokapa2020boundaries, pan2020contracting, markov_tn, markov2018quantum], and expansion around efficiently simulable quantum circuits [aaronson_gottesmann_stabilizer_sim, Bravyi2019simulationofquantum, bravyi2016improved, jozsa2008matchgates, aharonov2003simple].

All of these techniques have pros and cons, and their performance largely depends on the structure of the quantum circuit to simulate. For instance, the computational cost of state vector simulations grows only polynomially with the number of gates but its memory requirement increases exponentially with the number of qubits, limiting it to  50 qubits [markov2018quantum]. On the other hand, tensor contraction [markov_tn] with its “slicing” technique [markov_tn, markov2018quantum] allows the simulation of circuits with a large number of qubits [villalonga2016flexible, villalonga2020establishing]. However, its computational complexity is bounded by the treewidth of the quantum circuit and it quickly grows (often exponentially) with its depth. The expansion around efficiently simulable circuits [bravyi2016improved, jozsa2008matchgates, aharonov2003simple] may allow simulations of deep quantum circuits with multiple qubits if the expansion has a limited number of terms. One of the most well-known expansions is around stabilizer circuits made of Clifford gates only [aaronson_gottesmann_stabilizer_sim, bravyi2016improved]. In this case, the complexity grows exponentially with the number of non-Clifford gates [Bravyi2019simulationofquantum]. While stabilizer circuits (which contain only Clifford gates) are widely used in error-correction schemes, hard-to-simulate circuits [bravyi2016improved] and application-driven circuits [hastings2014improving, wecker2014gate] often have a large number of non-Clifford gates, making this technique unsuitable for such cases.

In the spirit to keep the library easy to use and allow the development of hybrid algorithms (giving rise to the name HybridQ), all simulation techniques in HybridQ can be called using a single entry point, which uses the same syntax, regardless of the choice of the numerical methods or hardware. More precisely, HybridQ provides circuit.simulation.simulate and circuit.dm.simulation.simulate to simulate quantum state and density matrices respectively, with the latter entry point being a macro which transforms a super circuit to a regular circuit and eventually calls the former entry point. At the moment, HybridQ provides support to simulate both quantum state and density matrices using state vector simulation (optimize=’evolution’), tensor contraction (optimize=’tn’), and Clifford expansion (optimize=’clifford’).

HybridQ fully supports HPC simulations for both optimize=’tn’ and optimize=’clifford’, but we are planning to include MPI parallelization for optimize=’evolution’ in the next HybridQ release. To parallelize HybridQ simulations on HPC clusters, it suffices to call the Python script using mpiexec from any terminal: HybridQ will automatically detect the use of MPI and split the workload among different nodes.

Iv-a Initial and Final State Specification

In HybridQ, the initial and final quantum states are specified by using the keywords initial_state and final_state, that is, the final quantum state is projected to the provided state (for optimize=’tn’ only). In both cases, the quantum states can be provided by using either arrays, strings, or circuit.Circuit (for the simulation of density matrices only). Arrays must be numpy.ndarray convertible, with a number of dimensions proportional to the number of qubits (for quantum state simulations) or twice the number of qubits (for density matrix simulations). At the moment, HybridQ only supports two-state (qubit) dimensions, but we are planning to allow -state (“qudit”) dimensions with in the upcoming release.

Similarly, strings must have either chars (for quantum state simulations) or chars (for density matrix simulations), with being the number of qubits. The allowed tokens are:

  • 0, 1: Set qubit to either 0 or 1 state in the computational basis,

  • +, -: Set qubit to either + or - state in the X basis,

  • .: The corresponding qubit is left “open” while contracting the circuit (for optimize=’tn’ only),

  • [a-zA-Z]: If two or more qubits have the same letter, a multi-index Kronecker delta is attached to them (that is, qubits are traced-out together. For optimize=’tn’ only). This is particularly useful to trace out subsets of qubits while performing the simulation of density matrices.

Iv-B State Vector

HybridQ provides state vector simulation of the quantum state with the keyword optimize=’evolution’. Currently, HybridQ supports two different numerical engines to perform matrix-vector multiplications: optimize=’evolution-hybridq’ (the default value), which uses a in-house developed C core that exploits AVX instructions and parallelization via OpenMP, and optimize=’evolution-einsum’, which uses NumPy [numpy] and opt-einsum [opt_einsum] as the numerical backend.

For the C backend, state vectors are stored as a pair of contiguous array of AVX packed floating numbers (-float/-double for AVX2), one for the real part and one for the imaginary of the state vector. If the matrix-vector multiplication involves the least significative qubits, qubits are swapped to fully exploit AVX instructions. For the einsum backend, state vectors are stored as numpy.ndarray of complex numbers.

Numerical simulations on GPUs and TPUs are enabled for optimize=’evolution-einsum’ by using JAX [jax] as the backend (backend=’jax’). Multi-threaded optimization is supported for optimize=’evolution-hybridq’ only via OpenMP, and the number of used threads can be tuned by modifying the environment variable OMP_NUM_THREADS (if not set, all cores are used by default). At the moment, optimize=’evolution’ does not support HPC simulations using MPI, but we are planning to include such feature in the upcoming release.

from hybridq.dm.circuit.simulation import simulate \
  as dm_simulate
from hybridq.circuit.simulation import simulate
from hybridq.extras.random import get_rqc
from hybridq.circuit import utils
from tqdm.auto import tqdm
from time import time
from os import system
import numpy as np
# Get CPU name
system(’lscpu | egrep "Model name|^CPU\(s\)"’)
> CPU(s):     8
> Model name: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
# Get random circuit
circuit = get_rqc(n_qubits=6, n_gates=50)
# Set superposition as initial state
initial_state = ’+’ * 6
# Simulate circuit using einsum on CPU
psi_cpu = simulate(circuit,
# Simulate circuit using einsum on GPU
psi_gpu = simulate(circuit,
# Simulate density matrix using C++ core on CPU
dm_cpu = dm_simulate(circuit,
# Get density matrix from quantum state
dm_psi = np.reshape(np.kron(psi_cpu.ravel(),
                    (2, ) * 12)
# Checks
assert (np.allclose(psi_cpu, psi_gpu))
assert (np.allclose(dm_psi, dm_cpu))
# Get some random circuits
cs = {n: [get_rqc(n, n**2) for _ in range(5)]
        for n in tqdm(range(12, 29, 2),
        desc=’Generating random circuits’)}
# Compute typical depth
depth = {n:np.average([len(utils.moments(c))
          for c in cs[n]]) for n in cs}
print(’Typical depth’)
for n,d in depth.items():
    bar = ’@’ * int(d / 10)
    print(f’{n}: {bar:s} {d:1.2f}’)
> Typical depth
> -------------
> 12: @@@ 32.82
> 14: @@@@ 42.82
> 16: @@@@ 49.18
> 18: @@@@@ 52.73
> 20: @@@@@@ 61.36
> 22: @@@@@@ 68.82
> 24: @@@@@@@ 73.73
> 26: @@@@@@@@ 84.00
> 28: @@@@@@@@@ 90.00
# Initialize times
times = {n: [] for n in cs}
# Simulate circuits
for n in tqdm(range(12, 29, 2), ’Simulating’):
    for c in tqdm(cs[n], leave=False):
        t_ini = time()
        psi = simulate(c,
        t_end = time()
        times[n].append(t_end - t_ini)
print(’Typical runtime (s)’)
for n,t in [(n,np.average(times[n]))
              for n in times]:
    bar = ’@’ * max(1, int(t / 2))
    print(f’{n}: {bar:s} {t:1.2f}’)
> Typical runtime (s)
> -------------------
> 12: @ 0.83
> 14: @ 1.12
> 16: @ 1.49
> 18: @ 1.98
> 20: @ 2.52
> 22: @ 3.39
> 24: @@ 5.22
> 26: @@@@@ 11.50
> 28: @@@@@@@@@@@@@@@@@@@ 38.81

Iv-C Tensor Contraction

Tensor contraction [markov_tn] has been shown to be a particularly powerful approach to simulating quantum circuits in the Noisy Intermediate-Scale Quantum regime [preskill2018quantum, q_suprem_google, villalonga2020establishing, villalonga2016flexible, mi2021information, gray2021hyper, huang2020classical, pan2020contracting]. HybridQ fully supports the simulation of quantum circuits using tensor contraction via Quimb [quimb] and CoTenGra [cotengra] More precisely, given a Circuit, and both the initial_state and final_state, HybridQ automatically identifies the best contraction (using CoTenGra), and eventually performs the tensor network contraction (using Quimb) to simulate the quantum circuit. If the largest intermediate tensor is larger than the provided max_largest_intermediate, slicing is automatically applied to the tensor (thanks to CoTenGra). Multiple parameters can be passed to both Quimb and CoTenGra through the circuit.simulation.simulate entrypoint (use the help command to check all the supported options). Depending on the backend, multi-thread optimization is supported via either OpenMP or MKL, and the number of used threads can be tuned by modifying the environment variable OMP_NUM_THREADS and MKL_NUM_THREADS respectively. Finding the optimal contraction scheme can be also parallelized, which is activated by using parallel=True (False by default).

HybridQ fully supports MPI parallelization for optimize=’tn’. More precisely, slices are automatically distributed among the available nodes. Once completed, slices are then recollected using a divide-and-conquer approach to avoid overloading a single node.

from hybridq.dm.circuit.simulation import simulate \
  as dm_simulate
from hybridq.circuit.simulation import simulate
from hybridq.extras.random import get_rqc
import numpy as np
# Get random circuit
circuit = get_rqc(n_qubits=6, n_gates=50)
# Set superposition as initial state
initial_state = ’+’ * 6
# Project all qubits to the state +,
# excluding qubit 1 which is left "open"
final_state = ’+.++++’
# Simulate using tensor contraction (on GPU)
psi = simulate(circuit,
               optimize=’tn’, parallel=True,
# For the density matrix simulation, let’s add
# some extra qubits to eventually trace out
dm_circuit = circuit + get_rqc(2, 5, indexes=[6, 7])
# Let’s trace out qubits 6 and 7
dm_initial_state = 2 * (initial_state + ’++’)
dm_final_state = 2 * (final_state + ’ab’)
# Simulate density matrix using tensor contraction
dm = dm_simulate(dm_circuit,
                 parallel=True, optimize=’tn’)
# Get density matrix from quantum state
dm_psi = np.reshape(np.kron(psi.ravel(),
                    (2, ) * 2)
# Check
assert (np.allclose(dm_psi, dm))

Iv-D Clifford Expansion

As demonstrated by Gottesman and Knill [gottesman1998heisenberg], circuits made of Clifford gates only (i.e., gates which are stabilizers of the Pauli group) can be simulated in polynomial time [aaronson_gottesmann_stabilizer_sim]. Such circuits are called “stabilizer” circuits. Similarly, a “stabilizer state” is any quantum state that can be obtained by applying a stabilizer circuit to a state in the computational basis. Such states are widely used in quantum error-correction. More generally, if a circuit is made of non-Clifford gates, the simulation cost is exponential in , with a power which depends on the non-Clifford gate. The exponential cost derives from the “expansion” of the non-Clifford gates, which eventually requires the simulation of independent stabilizer circuits, with called the stabilizer “rank” [aaronson_gottesmann_stabilizer_sim, bravyi2016improved]. For instance, controlled phases can be expanded using stabilizers:

where we used the fact that qubits in stabilizer states can always be projected to either or in the computational basis in polynomial time. In general, if a qubit non-Clifford gate is applied to a stabilizer state, the resulting non-stabilizer state can always be expanded using no more than stabilizers, with being the number of elements different from zero in the matrix representation of the non-Clifford gate.

To simulate density matrices in HybridQ (optimize=’tn’), we start from a Pauli “string” of the form , with being the number of qubits, and gates are applied one by one as . If is a Clifford gate, since it stabilizes the Pauli group, will just be a different Pauli string (that is, Clifford gates “move” Pauli strings to Pauli strings). However, if is a non-Clifford gate, will be expanded as superposition of orthogonal Pauli strings [mi2021information]:

That is, the simulation of the density matrix “branches” times. Once all the gates are applied, the final density matrix will be a superposition of orthogonal Pauli strings. Observe that the amount of memory required to store a density matrix expanded in Pauli string is , with being the dominant term. If few non-Clifford gates are applied, all the relevant details of the density matrix can be stored in memory [mi2021information].

To further improve the performance of simulating density matrices by Clifford expansion, we introduce in HybridQ the two following innovations:

  • Rather than applying gates one by one, HybridQ compresses multiple gates into larger ones, which are eventually applied to the Pauli strings. This approach drastically reduces the amount of branches by avoiding unnecessary branches (see example below). Indeed, consider the simple example of multiple non-Clifford gates applied to the same group of qubits. If each gate were applied one by one, the number of branches would be the product of each single . However, if compressed together to a single qubit gate, the total number of branches may only be .

  • Instead of expanding all the non-Clifford gates at the beginning of the simulation, we dynamically branch the Pauli strings by using a depth-first expansion. While this approach may require more memory than expanding all the non-Clifford gates beforehand, the dynamical branching effectively reduces the amount of branches [mi2021information] to simulate, consequently reducing the computational cost to simulate the density matrix. Also, the depth-first expansion ensures that the amount of memory required to keep track of branches grows only polynomially with the number of non-Clifford gates.

To achieve compile-time performance, the core parts of the Clifford expansion are Just-In-Time (JIT) compiled using Numba (therefore, the first time optimize=’clifford’ is used, it may take a while).

HybridQ fully supports multi-thread parallelization for optimize=’clifford’, which is enabled by using parallel=True. When enabled, an initial breadth-first expansion is used to gather multiple branches which are eventually distributed among different threads. HPC simulations via MPI are also fully supported for optimize=’clifford’. Similarly, a breadth-first expansion to gather multiple branches which are eventually distributed to multiple nodes. MPI and parallel=True can be used together to maximize the performance.

from hybridq.gate import Gate
from hybridq.dm.circuit.simulation import simulate \
  as dm_simulate
from hybridq.extras.random import get_rqc
from hybridq.circuit import utils
from hybridq.utils import kron
import numpy as np
# Get random circuit
circuit = get_rqc(6, 20)
# Get random paulis
paulis = [
    Gate(np.random.choice(list(’IXYZ’)), [q])
      for q in circuit.all_qubits()
# Simulate density using clifford expansion
# (no compression)
pauli_dm_0, info_0 = dm_simulate(circuit,
# Simulate density using clifford expansion
# (with compression)
pauli_dm_4, info_4 = dm_simulate(circuit,
for c,info in zip((0, 4), (info_0, info_4)):
    n = info[’n_explored_branches’]
    r = info[’runtime (s)’]
    print(f’Explored {n:7,} branches in’, end= )
    print(f’{r:1.3f} seconds (compress={c})’)
> Explored 745,614 branches in 2.530s (compress=0)
> Explored  19,273 branches in 0.425s (compress=4)
# Reconstruct density matrices
dm_0 = sum(w * kron(*[Gate(g).matrix() for g in p])
            for p,w in pauli_dm_0.items())
dm_4 = sum(w * kron(*[Gate(g).matrix() for g in p])
            for p,w in pauli_dm_4.items())
# Check reconstruction
assert (np.allclose(dm_0, dm_4, atol=1e-5))
# Compute density matrix directly from
# ’circuit’ and ’paulis’
dm_full = utils.matrix(circuit + \
                       paulis + \
# Check
assert (np.allclose(dm_0, dm_full, atol=1e-5))

V Simulations with noise

HybridQ supports the inclusion of noise at the Kraus operator level (see Section II-A). In particular, it is possible to implement generic transformations of the form


thus allowing in principle any quantum map to be represented [q_computation_nielsen_chuang].

At the highest level of generality, a quantum map can be specified by the left and right operators, the matrix of coefficients , as well as the qubits on which the noise acts. A user may also define a quantum map much more concisely through an alternative API, or use the pre-defined noise models as in Section II-A. An example of a single qubit DepolarizingChannel during a rotation is shown in Fig. 1.

Fig. 1: Depolarizing noise during a rotation on a qubit starting in the state. Plot shows the expectation value along . Each step is a rotation by a fixed amount.

HybridQ also allows noise models to be easily built on top of pre-existing circuits. For example, one can attach two-qubit noise channels to the two-qubit gates in a circuit. We have a code snippet below which shows such an example.

Noise can be simulated either at the density matrix level, or by pure state sampling of the Kraus operators. Whilst the former is exact, the system reachable sizes more quickly become prohibitive since the density matrix scales as for qubits (instead of for a pure state). Pure state sampling allows one to reach larger system sizes than is possible through density matrix simulation alone, at the expense of accuracy (from the sampling). The code is designed to support seamlessly change between the two paradigms by setting a single argument. Thus, the user can perform full density matrix simulations up to the size they can reach within the limits of their hardware, and then switch to sampling mode.

V-a Performance

Noisy simulations are fully compatible with all of the backends available for pure state simulation. This enables one to perform noisy simulations using the backend, GPU’s, or even via tensor network contraction. Here, we compare some wall-clock speed tests for HybridQ versus one of the most well-known simulators of open quantum systems, QuTiP (version 4.7) [qutip2]. All tests are performed on the same machine with a 2.5 GHz Quad-Core Intel Core i7 chip, and 16GB RAM111We use the latest version of QuTiP as of this writing, installed from source with OpenMP, with system specifications: QuTiP Version: 4.7.0.dev0+nogit, Numpy Version: 1.19.2, Scipy Version: 1.5.0, Cython Version: 0.29.24, Matplotlib Version: 3.2.2, Python Version: 3.8.5, Number of CPUs: 4, BLAS Info: INTEL MKL, OPENMP Installed: True, INTEL MKL Ext: True. All notebooks to reproduce results in this Section are available at [GH_HybridQ].

In Fig. 2 we show the time taken to apply a local depolarizing channel on each qubit in the system. Shown by the circular markers are two HybridQ protocols, one using NumPy’s einsum (optimize=’evolution-einsum’, red-dash), and the other the core (optimize=’evolution-hybridq’, blue-solid). We find after around qubits, HybridQ substrantially outperforms QuTiP. This is likely due to the fact that QuTiP represents objects in a sparse format which means we generally expect the simulations will be slower, especially at larger sizes. In the comparison we also include all the time to set-up and prepare the circuit for the simulation in HybridQ which involves some pre-processing, thus there is likely a greater overhead at the smaller system sizes.

Fig. 2: Comparison of QuTiP to HybridQ speed on density matrix operations. We show the wall-clock time versus system size (number of qubits). We apply a depolarizing channel on each qubit of a random density matrix, utilising two of the HybridQ backends. Note the ‘evolution-hybridq‘ backend only works for more than 5 qubits in the density matrix setting.

Lastly, in Fig. 3 we perform a particular simulation using the built-in tensor network contraction. Here we find an even starker performance difference. In this simulation, we first construct a random quantum circuit. Then, after each Gate, we add a GlobalDepolarizingChannel with noise (acting on one or both qubits of the gate respectively). We consider the task of extracting the reduced density matrix of one particular qubit. This can be achieved via the optimize=’tn’ optimization method in HybridQ  which when applied to a density matrix is equivalent to the partial trace (when we sum over the left and right indices of the density matrix as required). Some example code to find the reduced density matrix of qubit via tensor contraction, in a noisy simulation, with initial state being the state, is shown below:

from hybridq.dm.circuit.simulation import simulate as dm_simulate
from hybridq.extras.random import get_rqc
from hybridq.noise.utils import add_depolarizing_noise
# Generate a random quantum circuit
circuit = get_rqc(n_qubits=3, n_gates=200)
# Add 1% noise to single qubit gates,
# and 1.5% to two qubit gates
noisy_circuit = add_depolarizing_noise(circuit,
# Simulate by contracting qubit 1,2 indices,
# and leave 0 open
rho_0 = dm_simulate(noisy_circuit,

We find that HybridQ scales very favorably here (note that the number of gates is not scaling with ). In contrast, there is no native way to perform such a calculation in QuTiP (as far as we are aware). We therefore perform the entire state evolution, and trace out the unwanted qubits in that case.

Fig. 3: Tensor network contraction on a density matrix circuit. For each system size, we generate a random quantum circuit of

total gates, which we then add depolarizing noise after each gate. Using the tensor network contraction, we extract the reduced state of the 0’th qubit. In QuTiP we obtain this state by performing the full simulation and tracing out the other degrees of freedom.

Vi Summary

In this paper we presented HybridQ, a highly extensible platform designed to provide a common framework to integrate multiple state-of-the-art techniques to run on a variety of hardware. HybridQ provides a simple and expressive language that allows seamless switching from one technique to another as well as from one hardware to the next, without the need to write lengthy translations, thus greatly simplifying the development of new hybrid algorithms and techniques.

At the moment, HybridQ can simulate quantum circuits using state-vector evolution, tensor contraction and Clifford expansion. All the aforementioned methods allow multi-thread parallelization on single nodes, while tensor contraction and Clifford expansion allow HPC parallelization on multiple nodes via MPI (state-vector parallelization on multiple nodes is planned on the next HybridQ release).

HybridQ fully integrates noise simulations for all the supported simulation techniques, by “converting” noisy circuits to regular circuits. For instance, it is possible to compute expectation values of local operators of noisy circuits by using tensor contraction, without the need to simulate the full density matrix. This immediately translate to a faster simulation as shown in Fig. 3.

Code Availability


 is an open source tool and freely available under the

Apache 2.0 licence at GitHub@NASA[GH_HybridQ].


We are grateful for support from NASA Ames Research Center, particularly the NASA Transformative Aeronautic Concepts Program, and also from DARPA under IAA 8839, Annex 114. The authors acknowledge the support from the NASA Ames Research Center and the support from the NASA Advanced Supercomputing Division for providing access to the NASA HPC systems, Pleiades and Merope. J.M. is supported by NASA Academic Mission Services (NAMS), contract number NNA16BD14C. SM. is supported by the Prime Contract No. 80ARC020D0010 with the NASA Ames Research Center. The United States Government retains, and by accepting the article for publication, the publisher acknowledges that the United States Government retains, a non-exclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for United States Government purposes.

Appendix A Artifact Description Appendix: HybridQ: A Hybrid Simulator for Quantum Circuits

A-a Abstract

HybridQ is a highly extensible platform designed to provide a common framework to integrate multiple state-of-the-art techniques to simulate large scale quantum circuits on a variety of hardware.

A-B Description

The philosophy behind development of HybridQ has been driven by three main pillars: Easy to Use, Easy to Extend, and Use the Best Available Technology. The powerful tools of HybridQ allow users to manipulate, develop, and extend noiseless and noisy quantum circuits for different hardware architectures. HybridQ supports large-scale high-performance computing (HPC) simulations, automatically balancing workload among different processor nodes and enabling the use of multiple backends to maximize parallel efficiency. Everything is then glued together by a simple and expressive language that allows seamless switching from one technique to another as well as from one hardware to the next, without the need to write lengthy translations, thus greatly simplifying the development of new hybrid algorithms and techniques.

A-B1 Software installation

HybridQ can be installed by either using pip:

pip install git+https://github.com/nasa/hybridq

or by using conda:

conda env create -f envinronment.yml

A-B2 Software dependencies

HybridQ requires:

  • Python 3.7+

  • C compiler (optional)

All the other dependencies are installed through pip (see [GH_HybridQ] for the full list of requirements).

A-C Experiment workflow

All scripts included in the main text are complete and reproducible.

A-D Evaluation and expected result

All results are matched with either theoretical results or against well established third-parties tools.

A-E Code Availability

HybridQ is an open source tool and freely available under the Apache 2.0 licence at GitHub@NASA[GH_HybridQ].