Enabling multi-programming mechanism for quantum computing in the NISQ era

02/10/2021
by   Siyuan Niu, et al.
0

As NISQ devices have several physical limitations and unavoidable noisy quantum operations, only small circuits can be executed on a quantum machine to get reliable results. This leads to the quantum hardware under-utilization issue. Here, we address this problem and improve the quantum hardware throughput by proposing a multiprogramming approach to execute multiple quantum circuits on quantum hardware simultaneously. We first introduce a parallelism manager to select an appropriate number of circuits to be executed at the same time. Second, we present two different qubit partitioning algorithms to allocate reliable partitions to multiple circuits-a greedy and a heuristic. Third, we use the Simultaneous Randomized Benchmarking protocol to characterize the crosstalk properties and consider them in the qubit partition process to avoid crosstalk effect during simultaneous executions. Finally, we enhance the mapping transition algorithm to make circuits executable on hardware using decreased number of inserted gates. We demonstrate the performance of our multi-programming approach by executing circuits of different size on IBM quantum hardware simultaneously. We also investigate this method on VQE algorithm to reduce its overhead.

READ FULL TEXT VIEW PDF
12/01/2021

How Parallel Circuit Execution Can Be Useful for NISQ Computing?

Quantum computing is performed on Noisy Intermediate-Scale Quantum (NISQ...
06/03/2021

Analyzing crosstalk error in the NISQ era

Noisy Intermediate-Scale Quantum (NISQ) hardware has unavoidable noises,...
04/27/2020

A New Qubits Mapping Mechanism for Multi-programming Quantum Computing

For a specific quantum chip, multi-programming helps to improve overall ...
05/23/2020

Optimized Quantum Circuit Partitioning

The main objective of this paper is to improve the communication costs i...
02/11/2019

Reliable quantum circuits have defects

State of the art quantum computing architectures are founded on the deci...
07/13/2020

Benchmarking 16-element quantum search algorithms on IBM quantum processors

We present experimental results on running 4-qubit unstructured search o...
11/12/2021

HybridQ: A Hybrid Simulator for Quantum Circuits

Developing state-of-the-art classical simulators of quantum circuits is ...

1. Introduction

Quantum computing promises to achieve an exponential speedup to tackle certain computational tasks compared with the classical computers (doi:10.1137/S0097539795293172, ; cao2019quantum, ; egger2020quantum, ; farhi2014quantum, ; lanyon2010towards, ; tang2020quantum, ; kerenidis2020quantum, ). Although quantum technologies are continuously improving, current quantum devices are still qualified as Noisy Intermediate-Scale Quantum (NISQ) hardware (Preskill2018quantumcomputingin, ), with several physical constraints. For example, for superconducting devices which we target in this paper, connections are only allowed between two neighbouring qubits. Besides, the gate operations of NISQ devices are noisy and have unavoidable error rates. As we do not have enough number of qubits to realize Quantum Error Correction (calderbank1998quantum, ; calderbank1996good, ; fowler2012surface, ), only small circuits with limited depth can obtain reliable results when executed on quantum hardware, which leads to the waste of hardware resource. Moreover, with the growing demand to access to quantum hardware, its under-utilization issue increases the waiting time for users, which indicates the need to improve the hardware throughput.

As the qubit number of the hardware increases and the error rates improve, it becomes possible to execute multiple circuits on a quantum chip simultaneously. The multi-programming mapping problem was firstly introduced by (das2019case, ), which demonstrated that the throughput and utilization of NISQ hardware can be enhanced by executing several circuits at the same time. Ref (dou2020new, ) further improved it in terms of fidelity and gate number by proposing a Community Detection Assisted Partition algorithm along with the X-SWAP scheme (we refer to this algorithm as CDAP for brevity). However, their results showed that when executing multiple quantum circuits simultaneously, the activity of one circuit can negatively impact the fidelity of others, due to the difficulty of allocating reliable regions to each circuit, higher chance of crosstalk error (sheldon2016procedure, ), and the qubit movement limitation (only inside of the partition). Previous works (das2019case, ; dou2020new, ) have left these issues largely unexplored and have not addressed the problem holistically: (1) Hardware topology and calibration data are not fully analyzed where allocation is done on unreliable or sparse-connected partitions to circuits ignoring the robust qubits and links. (2) These works use only SWAP gate for mapping transition process and the modified circuits always have a large number of additional gates. (3) Crosstalk error is not considered when allocating partitions for circuits. For example, the X-SWAP scheme (dou2020new, ) for reducing the inserted SWAP number can only be performed when the two circuits are allocated to neighbouring partitions, which can introduce crosstalk effect and decrease the circuit output fidelity. Detrimental crosstalk impact when executing multiple parallel instructions has been reported in  (murali2020software, ; ash2020analysis, ; ash2020experimental, ) by using Simultaneous Randomized Benchmarking (SRB) (gambetta2012characterization, ). In presence of crosstalk, gate error can be increased by an order of magnitude. Ref (ash2020analysis, ) even proposed a fault-attack model using crosstalk in a multi-programming environment.

It is important to investigate the multi-programming approach in the NISQ era especially for Variational Quantum Algorithms (VQAs) (cerezo2020variational, ). For example, the multi-programming mechanism can enable to execute several ansatz states in parallel in one quantum processor, such as in Variational Quantum Eigensolver (VQE) (peruzzo2014variational, ; kandala2017hardware, ), Variational Quantum Linear Solver (VQLS) (bravo2020variational, ; huang2019near, )

, or Variational Quantum Classifier (VQC) 

(havlivcek2019supervised, ; romero2019variational, ) with reliability. It is also general enough to be applied to other quantum circuits regardless of applications or algorithms.

In this work, we address the problem of multi-programming while considering the impact of hardware topology, calibration data, and crosstalk without losing the circuit fidelity. First, we introduce a parallelism manager that can optimally select the number of circuits being executed on the quantum hardware simultaneously. Second, we present two different qubit partition algorithms to allocate reliable partitions to different circuits. One is a greedy partition algorithm which provides optimal choices. The other one is based on a heuristic which can give nearly optimal results and significantly reduce the time complexity. Third, we consider crosstalk error during the partition process to lower the crosstalk effect during simultaneous executions. Then, we improve the mapping transition step of the qubit mapping problem to make quantum circuits executable on quantum hardware with a reduced number of additional gates. Finally, we evaluate our algorithm on real quantum hardware by first executing circuits of different sizes at the same time and then applying it to VQE algorithm to estimate the ground state energy of deuteron. To the best of our knowledge, this is the first attempt to propose a complete multi-programming process flow for executing an optimal number of workloads in parallel ensuring the output fidelity by analyzing the hardware limitations.

2. Results

2.1. Multi-programming workflow

Figure 1. Overview of the proposed multi-programming framework. The input layer includes the quantum hardware information and multiple quantum circuit workloads. The parallelism manager helps to decide whether executing circuits simultaneously or independently. For simultaneous executions, it works with the hardware-aware multi-programming compiler to select an optimal number of shared workloads to be executed at the same time. Then, the scheduler makes all the circuits executable on the quantum hardware and we can obtain the results of the output circuits.

The multi-programming workflow is schematically shown in Fig. 1, which includes the following steps:

  • Input layer. It contains a list of small quantum circuits written in OpenQASM language (cross2017open, ), and the quantum hardware information, including the hardware topology, calibration data, and crosstalk effect.

  • Parallelism manager. It can determine whether executing circuits concurrently or separately. If the simultaneous execution is allowed, it can further decide the number of circuits to be executed on the hardware at the same time without losing fidelity based on the fidelity metric included in the hardware-aware multi-programming compiler.

  • Hardware-aware multi-programming Compiler. Qubits are partitioned to several reliable regions and are allocated to different quantum circuits using qubit partition algorithms. Then, the partition fidelity is evaluated by the post qubit partition process. We introduce a fidelity metric here which helps to decide whether this number of circuits can be executed simultaneously or the number needs to be reduced based on their properties.

  • Scheduler. The mapping transition algorithm is applied and circuits are transpiled to be executable on real quantum hardware.

  • Output layer. Output circuits are executed on the quantum hardware simultaneously or independently according to the previous steps and the experimental results are obtained.

2.2. Parallelism manager

Figure 2. Process flow of each block that constitutes our multi-programming approach. (a) The parallelism manager selects circuits according to their densities and passes them to the hardware-aware multi-programming compiler. (b) The qubit partition algorithms allocate reliable regions to multiple circuits. is the difference between partition scores when partitioning independently and simultaneously, which is the fidelity metric. is the threshold set by the user. The fidelity metric helps to select the optimal number of simultaneous circuits to be executed. (c) The scheduler performs mapping transition algorithm and makes quantum circuits executable on real quantum hardware.

In order to determine the optimal number of circuits that can be executed on the hardware in parallel without losing fidelity, here, we introduce the parallelism manager, shown in Fig. 2a.

Suppose we have a list of circuit workloads with qubits for each of them, that are expected to be executed on -qubit hardware. Firstly, the circuits are sorted according to their densities. The density of a circuit is defined as the number of CNOTs divided by the qubit number of the circuit, (dou2020new, ). Then, we pick circuits which is the maximum number of circuits that are able to be executed on the hardware at the same time, . If is equal to one, then all the circuits should be executed independently. Otherwise, these circuits are passed to the hardware-aware multi-programming compiler. They work together to decide an optimal number of simultaneous circuits to be executed.

2.3. Hardware-aware multi-programming compiler

2.3.1. Qubit partition


Here, we present the key features of the qubit partition algorithms. A motivational example can be found in Supplementary Note 2.

Crosstalk effect characterization.  
Crosstalk is one of the major noise sources in NISQ devices, which can corrupt a quantum state due to quantum operations on other qubits (sarovar2020detecting, ). There are two types of crosstalk. The first one is quantum crosstalk, which is caused by the always-on-ZZ interaction (mundada2019suppression, ; zhao2020high, ). The second one is classical crosstalk caused by the incorrect control of the qubits. The calibration data provided by IBM do not include the crosstalk error. To consider the crosstalk effect in partition algorithms, we must first characterize it in the hardware. There are several protocols presented in (gambetta2012characterization, ; bialczak2010quantum, ; proctor2019direct, ; erhard2019characterizing, ; huang2020alibaba, ) to benchmark the crosstalk effect in quantum devices. In this paper, we choose the mostly used protocol – Simultaneous Randomized Benchmarking (SRB) (gambetta2012characterization, ) to detect and quantify the crosstalk between CNOT pairs when executing them in parallel.

We characterize the crosstalk effect followed by the optimization methods presented in (murali2020software, ). On IBM quantum devices, the crosstalk effect is significant only at one hop distance between CNOT pairs (murali2020software, ), such as () shown in Fig. 2(a), when the control pulse of one qubit propagates an unwanted drive to the nearby qubits that have similar resonate frequencies. Therefore, we perform SRB only on CNOT pairs that are separated by one-hop distance. For those pairs whose distance is greater than one hop, the crosstalk effects are very weak and we ignore them. It allows us to parallelize SRB experiments of multiple CNOT pairs when they are separated by two or more hops. For example, in IBM Q 27 Toronto (ibmq_toronto) (IBMQ27Toronto, ), the pairs (), (), () can be characterized in parallel.

We perform the crosstalk characterization on IBM Q 27 Toronto twice. The results show that, although the absolute gate errors vary every day, the pairs that have strong crosstalk effect remain the same across days. SRB experiment on CNOT pairs () gives error rate and . Here, represents the CNOT error rate of when and are executed in parallel. If there is a crosstalk effect between the two pairs, it will lead to or . The crosstalk effect characterization is expensive and time costly. Some of the pairs do not have crosstalk effect whereas the CNOT error of the pair affected the most by crosstalk effect is increased by more than five times. Therefore, we extract the pairs with significant crosstalk effect, i.e., and only characterize these pairs when crosstalk properties are needed. We choose the same factor 3 to quantify the pairs with strong crosstalk error like (murali2020software, ). The result of crosstalk effect characterization on IBM Q 27 Toronto is shown in Fig. 2(b).

(a)
(b)
Figure 3. Characterization of crosstalk effect. (a) Crosstalk pairs separated by one-hop distance. The crosstalk pairs should be able to be executed at the same time. Therefore, they cannot share the same qubit. One-hop is the minimum distance between crosstalk pairs. (b) Crosstalk effect results of IBM Q 27 Toronto using SRB. The arrow of the red dash line points to the CNOT pair that is affected significantly by crosstalk effect, e.g., and affect each other when they are executed simultaneously. In our experiments, , whereas . As we choose 3 as the factor to pick up pairs with strong crosstalk effect, there is no arrow at pair .

Greedy sub-graph partition algorithm.  
We develop a Greedy Sub-graph Partition algorithm (GSP) for qubit partition process which is able to provide theoretically the optimal partitions for different quantum circuits (see Supplementary Note 3 for pseudo-code of GSP). The first step of the GSP algorithm is to traverse the overall hardware to find all the possible partitions for a given circuit. For example, suppose we have a five-qubit circuit, we find all the subgraphs of the hardware topology (also called coupling graph) containing five qubits as the partition candidates. Each candidate has a score to represent its fidelity depending on the topology and calibration data. The partition with the best fidelity is selected and all the qubits inside of the partition are marked as used qubits so they cannot be assigned to other circuits. For the next circuit, a subgraph with the required number of qubits is assigned and we check if there is an overlap on this partition to partitions of previous circuits. If not, the subgraph is a partition candidate for the given circuit and the same process is applied to each subsequent circuit. To account for crosstalk, we check if any pairs in a subgraph have strong crosstalk effect caused by the allocated partitions of other circuits. If so, the score of the subgraph is adjusted to take crosstalk error into account.

In order to evaluate the reliability of a partition, there are two factors that need to be considered: partition topology and error rates of two-qubit links and readout error of each qubit. One-qubit gates are ignored for simplicity and because of their relatively low error rates compared to the other quantum operations. If there is a qubit pair in a partition that has strong crosstalk affected by other partitions, then CNOT error of this pair is added to the crosstalk effect. Note that the most recent calibration data should be retrieved through the IBM Quantum Experience before each usage to ensure that the algorithm has access to the most accurate and up-to-date information. To evaluate the partition topology, we determine the longest shortest path (also called graph diameter) of the partition, denoted . The smaller the longest shortest path is, the better the partition is connected and eventually fewer SWAP gates would be needed to make a connection between two qubits in a well-connected partition.

We devise a reliability score metric for a partition that is the sum of the graph diameter , average CNOT error rate of the links times the number of CNOTs of the circuit, and the sum of the readout error rate of each qubit in a partition (Eq. 1). Note that the CNOT error rate includes the crosstalk effect if it exists.

(1)

The graph diameter is always prioritized in this equation, since it is more than one order of magnitude larger than the other two factors. The partition with the smallest reliability score is selected. It is supposed to have the best connectivity and the lowest error rate. Moreover, the partition algorithm prioritizes the quantum circuit with a large density because the input circuits are ordered by their densities during the parallelism manager process. The partition algorithm is then called for each circuit in order. However, GSP algorithm is expensive and time costly. For small circuits, GSP algorithm gives the best choice of partition. It is also useful to use it as a baseline to compare with other partition algorithms. For beyond NISQ, a better approach should be explored to overcome the complexity overhead.

Qubit fidelity degree-based heuristic sub-graph partition algorithm.  
The Qubit fidelity degree-based Heuristic Sub-graph Partition algorithm (QHSP) should perform as well as GSP but without the large runtime overhead.

In QHSP, when allocating partitions, we favor qubits with high fidelity. We define the fidelity degree of qubit based on the CNOT and readout fidelities of this qubit as in Eq. 2.

(2)

are the neighbour qubits connected to , is the CNOT error matrix, and is the readout error rate. is a user defined parameter to weight between the CNOT error rate and readout error rate. Such parameter is useful for two reasons: (1) Typically, in a quantum circuit, the number of CNOT operations is different from the number of measurement operations. Hence, the user can decide on based on the relative number of operations. (2) For some qubits, the readout error rate is one or more orders of magnitude larger than the CNOT error rate. Thus, it is reasonable to add a weight parameter.

The fidelity degree metric reveals two aspects of the qubit. The first one is the connectivity of the qubit. The more neighbours a qubit has, the larger its fidelity degree is. The second one is the reliability of the qubit accounting CNOT and readout error rates. Thus, the metric allows us to select a reliable qubit with good connectivity. Instead of trying all the possible subgraph combinations (as in GSP algorithm), we propose a QHSP algorithm to build partitions that contain qubits with high fidelity degree while significantly reducing runtime.

To further improve the algorithm, we construct a list of qubits with good connectivity as starting points. We sort all physical qubits (qubits used in hardware) by their physical node degree, which is defined as the number of links in a physical qubit. Note that, the physical node degree is different from the fidelity degree. Similarly, we also obtain the largest logical node degree of the logical qubit (qubits used in the quantum circuit) by checking the number of different qubits that are connected to a qubit through CNOT operations. Next, we compare these two metrics.

If the largest physical node degree is less than the largest logical node degree, it means we cannot find a suitable physical qubit to map the logical qubit with the largest logical node degree that satisfies all the connections. In this case, we only collect the physical qubits with the largest physical node degree. Otherwise, the physical qubits whose physical node degree is greater than or equal to the largest logical node degree are collected as starting points. By limiting the starting points, this heuristic partition algorithm becomes even faster.

For each qubit in the starting points list, it explores its neighbours and finds the neighbour qubit with the highest fidelity degree calculated in Eq. 2, and merges it into the sub-partition. Then, the qubit inside of the sub-partition with the highest fidelity degree explores its neighbour qubits and merges the best one. The process is repeated until the number of qubits inside of the sub-partition is equal to the number of qubits needed. This sub-partition is considered as a subgraph and is added to the partition candidates (see Supplementary Note 3 for pseudo-code of QHSP).

After obtaining all the partition candidates, we compute the fidelity score for each of them. As we start from a qubit with high physical node degree and merge to neighbour qubits with high fidelity degree, the constructed partition is supposed to be well-connected, hence, we do not need to check the connectivity of the partition using the longest shortest path as in Eq. 1, GSP algorithm. We can only compare the error rates. The fidelity score metric is simplified by only calculating the CNOT and readout error rates as in Eq. 3. It is calculated for each partition candidate and the best one is selected. See supplementary note 3 for an example of explaining QHSP in detail.

(3)

Runtime analysis  
Let be the number of hardware qubits, the number of circuit qubits to be allocated in a partition, the number of gates that the circuit has.

For GSP algorithm, in most cases, the number of circuit qubits is less than the number of hardware qubits, thus the time cost is . It increases exponentially as the number of circuit qubits augments. QHSP algorithm starts by collecting a list of starting points where . It takes , which is polynomial. For the detailed explanation of runtime analysis, see Supplementary Note 3.

2.3.2. Post qubit partition


By default multi-programming mechanism reduces circuit fidelity compared to standalone circuit execution mode. If the fidelity reduction is significant, circuits should be executed independently or the number of simultaneous circuits should be reduced even though the hardware throughput can be decreased as well. Therefore, we consistently check the circuit fidelity difference between independent versus concurrent execution.

We start with qubit partition process for each circuit independently and obtain the fidelity score of the partition. Next, this qubit partition process is applied to these circuits to compute the fidelity score when executing them simultaneously. The difference between the fidelity scores is denoted , which is the fidelity metric. If is less than a specific threshold , it means simultaneous circuit execution does not detriment significantly the fidelity score, thus circuits can be executed concurrently, otherwise, independently or reduce the number of simultaneous circuits. The fidelity metric along with the parallelism manager help to define the optimal number of simultaneous circuits to be executed.

2.4. Scheduler

2.4.1. Mapping transition algorithm


The circuits need to be transformed to be executable on real quantum hardware, which includes two steps: initial mapping and mapping transition. The initial mapping of each circuit is created while taking into account swap error rate and swap distance to perform qubit movement operations (niu2020hardware, ). The initial mapping of the simultaneous mapping transition process is obtained by merging the initial mapping of each circuit according to its partition. We further improve the mapping transition algorithm (niu2020hardware, ) by modifying the heuristic cost function to better select the inserted gate. We also introduce the Bridge gate to the simultaneous mapping transition process for multi-programming.

First, each quantum circuit is transformed into a more convenient format – Directed Acyclic Graph (DAG) circuit which represents the operation dependencies of the circuit without considering the connectivity constraints. Then, the compiler traverses the DAG circuit and goes through each quantum gate sequentially. The gate that does not depend on other gates (i.e., all the gates before it have been executed) is allocated to the first layer, denoted . The compiler checks if the gates on the first layer are hardware-compliant. The hardware-compliant gates can be executed on the hardware directly without modification. They are added to the scheduler, removed from the first layer and marked as executed. If the first layer is not empty, which means some gates are non-executable on hardware, a SWAP or Bridge gate is needed. We collect all the possible SWAPs and Bridges, and use the cost function (see Eq. 5) to find the best candidate. The process is repeated until all the gates are marked as executed (see Supplementary Note 4 for pseudo-code of simultaneous mapping transition algorithm).

A SWAP gate requires three CNOTs and inserting a SWAP gate can change the current mapping. A Bridge gate requires four CNOTs and inserting a Bridge gate does not change the current mapping and it can only be used to execute a CNOT when the distance between the control qubit and the target qubit is exactly two. Both gates need three supplementary CNOTs. The SWAP gate is preferred when it has a positive impact on the following gates, allocated in the extended layer , hence it makes these gates executable or reduces the distance between control and target quits. Otherwise, a Bridge gate is preferred.

A cost function is introduced to evaluate the cost of inserting a SWAP or Bridge. We use the following distance matrix (Eq. 4) as in (niu2020hardware, ) to quantify the impact of the SWAP or Bridge gate,

(4)

where is the swap distance matrix and is the swap error matrix. We set and to 0.5 to equally consider the swap distance and swap error rate. In (niu2020hardware, ), only the impact of a SWAP and Bridge on other gates (first and extended layer) was considered without considering their impact on the gate itself. As each of them is composed of either three or four CNOTs, their impact cannot be ignored. Hence, in our multi-programming mapping transition algorithm, we take self impact into account and create a list of both SWAP and Bridge candidates, labeled as ”tentative gates” and the heuristic cost function is as:

(5)

where is the parameter that weights the impact of the extended layer, is the number of gates of the tentative gate, represents a SWAP or Bridge gate, and represents the mapping. SWAP gate has three CNOTs, thus is three and we consider the impact of three CNOTs on the first layer. The mapping is the new mapping after inserting a SWAP. For Bridge gate, is four and we consider four CNOTs on the first layer, and the mapping is the current mapping as Bridge gate does not change the current mapping. We weight the impact on the extended layer to prioritize the first layer. This cost function can help the compiler select the best gate to insert between a SWAP and Bridge gate.

2.5. Application: simultaneous executions of multiple circuits of different size

2.5.1. Experimental results


(a)
(b)
Figure 4. Comparison of fidelity and number of additional gates on IBM Q 27 Toronto when executing two circuits simultaneously. (a) Fidelity. (b) Number of additional gates.
(a)
(b)
Figure 5. Comparison of fidelity and number of additional gates on IBM Q 65 Manhattan when executing three circuits simultaneously. (a) Fidelity. (b) Number of additional gates.
(a)
(b)
Figure 6. Comparison of fidelity and number of additional gates on IBM Q 65 Manhattan when executing four circuits simultaneously. (a) Fidelity. (b) Number of additional gates.

We first evaluated our multi-programming approach by executing a list of different-size benchmarks at the same time on two quantum devices, IBM Q 27 Toronto and IBM Q 65 Manhattan (ibmq_manhattan) (IBMQ65Manhattan, ) (see Supplementary Note 1 for further information about the selected quantum hardware). All the benchmarks are collected from the previous work (zulehner2018efficient, ), including several functions taken from RevLib (WGT+:2008, ) as well as some quantum algorithms written in Quipper (green2013quipper, ) or Scaffold (abhari2012scaffold, ). These benchmarks are widely used in the quantum community and their details are shown in Table 1

. We chose small quantum circuits with shallow-depth since only small circuits can obtain reliable results when executed on real quantum hardware. The metrics we used to evaluate our algorithm include Probability of a Successful Trial (PST), number of additional CNOT gates, and Trial Reduction Factor (TRF), see Method for detailed explanation.

Table 1. Information of benchmarks

Several published qubit mapping algorithms (li2019tackling, ; wille2019mapping, ; murali2019noise, ; zhu2020dynamic, ; niu2020hardware, ; guerreschi2018two, ; itoko2020optimization, ) and multi-programming mapping algorithms are available as discussed in section 1. HA (niu2020hardware, ) seems to be the best qubit mapping algorithm in terms of the number of additional gates and circuit fidelity. We use HA as the baseline for independent executions of multiple circuits. CDAP algorithm proposed in (dou2020new, ) seems to be the best multi-programming mapping algorithm and is considered as the baseline for concurrent executions of multiple circuits.

To summarize, we compare our multi-programming algorithms, 1) GSP + improved mapping transition (labeled as GSP) and 2) QHSP + improved mapping transition (labeled as QHSP), with the baseline CDAP. The loss of fidelity due to simultaneous executions of multiple circuits is reported by comparing concurrent versus independent executions. Moreover, we compare the partition + improved mapping transition algorithm based on HA (labeled as PHA) versus HA on independent executions to show the impact of partition in large quantum hardware for a small circuit. The details of the configuration of algorithms are presented in Methods.

We first ran two quantum circuits on IBM Q 27 Toronto simultaneously. Results on output state fidelity and the number of additional gates are shown in Fig. 4. For independent executions, the fidelity is improved by 46.8% and the number of additional gates is reduced by 8.7% comparing PHA to HA. For simultaneous executions, QHSP and GSP allocate the same partitions except for the first experiment – (ID1, ID1). In this experiment, GSP improves the fidelity by 6% compared to QHSP. Partition results might be different due to the various calibration data and the choice of , but the difference of the partition fidelity score between the two algorithms is small. The results show that QHSP is able to allocate nearly optimal partitions while reducing runtime significantly. Therefore, for the rest experiments, we only evaluate QHSP algorithm. QHSP can improve the fidelity by 28.9% and reduce the additional gate number by 52.3% compared to CDAP. Comparing simultaneous (QHSP) versus independent (PHA) executions for two circuits, fidelity decreases by 5.8% and the number of additional gates is almost the same. During the post-partition process, does not pass the threshold and TRF is two.

Next, we executed on IBM 65 Manhattan three and four simultaneous quantum circuits. Fig. 5 and Fig. 6 show the comparison of fidelity and the number of additional gates. PHA always outperforms HA for independent executions. QSHP significantly outperforms CDAP with the number of simultaneous circuits increasing. The output fidelity is increased by 74.8% and 55.3% on average for the two cases. The reduction of inserted gate number is always more than 50%. The threshold is still not passed and TRF becomes three and four. Moreover, fidelities decrease by 1.5% and 6.7% when comparing simultaneous (QHSP) versus independent (PHA) executions.

Finally, to evaluate the hardware limitations of executing multiple circuits in parallel, we set the threshold to 0.2. All the five benchmarks are able to be executed simultaneously on IBM Q 65 Manhattan. Partition fidelity difference is 0.18. Results show that fidelity of simultaneous executions (QHSP) is decreased by 9.5% compared to independent executions (PHA). Both fidelity and additional gate number improvement of QHSP are more than 50% compared to CDAP. The complete experimental results can be found in Supplementary Note 5.

2.5.2. Result analysis


For independent executions, algorithm PHA is always better than HA due to two reasons: (1) The initial mapping of the two algorithms is based on a random process. During the experiment, we perform the initial mapping generation process ten times and select the best one. However, for PHA, we first limit the random process into a reliable and well-connected small partition space rather than the overall hardware space used by HA. Therefore, with only ten trials, PHA finds a better initial mapping. (2) We improve the mapping transition process of PHA, which can make a better selection between SWAP and Bridge gate. HA is shown to be sufficient for hardware with a small number of qubits for example a 5-qubit quantum chip. If we want to map a circuit on large hardware, it is better to first limit the search space into a reliable small partition and then find the initial mapping. This qubit partition approach can be applied to general qubit mapping problem for search space limitation when large hardware is selected to map.

For simultaneous executions, QHSP performs better than CDAP because of the following reasons: (1) CDAP constructs a hierarchy tree according to the modularity-based FN community detection algorithm (newman2004fast, ). The tree is constructed by calculating the modularity of the overall hardware coupling graph. However, when allocating a partition to a circuit, we focus on the topology and calibration data inside of the partition, rather than the whole hardware. As the number of partitions to allocate increases, the performance of CDAP becomes worse. (2) CDAP only considers the SWAP gate to realize the connection ignoring the Bridge gate, which can significantly reduce the number of additional gates. (3) CDAP does not consider the crosstalk effect. Although the X-SWAP scheme used in CDAP can slightly reduce the number of additional gates, it only works when the allocated partitions are close to each other, which will increase the crosstalk effect. However, QHSP takes the partition topology, error rate, and crosstalk effect into consideration and can provide better partitions. QHSP uses almost the same number of additional gates whereas fidelity is decreased less than 10% compared to PHA if the threshold is set to 0.1.

2.6. Application: Estimate the ground state energy of deuteron

In order to demonstrate the potential interest to apply the multi-programming mechanism to existing quantum algorithms, we investigated it on VQE algorithm. To do this, we performed the same experiment as (gokhale2020optimization, ; dumitrescu2018cloud, ) on IBM Q 65 Manhattan, estimating the ground state energy of deuteron, which is the nucleus of a deuterium atom, an isotope of hydrogen.

Deuteron can be modeled using a 2-qubit Hamiltonian spanning four Pauli strings: and (gokhale2020optimization, ; dumitrescu2018cloud, ). If we use the naive measurement to calculate the state energy, one ansatz corresponds to four different measurements. Pauli operator grouping has been proposed to reduce this overhead by utilizing simultaneous measurement (gokhale2020optimization, ; kandala2017hardware, ; crawford2019efficient, ). For example, the Pauli strings can be partitioned into two commuting families: {} and {} using the approach proposed in (gokhale2020optimization, ). It allows one parameterized ansatz to be measured twice instead of four measurements in naive method.

We used a simplified Unitary Coupled Cluster ansatz with a single parameter and three gates, as described in (gokhale2020optimization, ; dumitrescu2018cloud, ). The algorithm configuration of this experiment is explained in Methods. We applied our multi-programming method on the top of the Pauli operator grouping approach (labeled as PG) (gokhale2020optimization, ). We performed this experiment twice across different days. For the first experiment, the parallelism manager worked with the hardware-aware multi-programming compiler to finally select ten circuits for simultaneous execution without passing the fidelity threshold. It corresponds to perform five optimisations (five different parameterized circuits) at the same time (one parameterized circuit needs two measurements). The selected ten circuits were passed to the scheduler to be executed in parallel. The required circuit number is reduced by ten times compared to PG. Note that, if we use the naive measurement, the number of circuits needed will be reduced by a factor of 20. The result is shown in Fig. 6(a). The error rate is quite high for the two executions, 29.7% for PG and 64.4% for multi-programming + PG. The result of the second experiment is shown in Fig. 6(b). In this case, four optimisations (eight circuits) were selected to be executed at the same time with respect to the fidelity threshold. The error rate is 9.3% and 7% for the two methods. Applying multi-programming can even improve the output fidelity. The huge fidelity difference is due to the different calibration data of the device which are the input of our multi-programming approach. The complete result of the two experiments including hardware throughput is shown in Fig. 6(c).

(a)
(b)
(c)
Figure 7. The estimation of the ground state energy of deuteron under PG and muti-programming + PG. (a) Five optimisations with ten measurements. (b) Four optimisations with eight measurements. (c) The complete result of the two experiments. is the number of simultaneous circuit number.

3. Discussion

In this article, we presented a multi-programming approach that allows to execute multiple circuits on a quantum chip simultaneously without losing fidelity. We introduced the parallelism manager and fidelity metric to select optimally the number of circuits to be executed at the same time. Moreover, we proposed a hardware-aware multi-programming compiler which contains two qubit partition algorithms taking hardware topology, calibration data, and crosstalk effect into account to allocate reliable partitions to different quantum circuits. We also demonstrated an improved simultaneous mapping transition algorithm which helps to transpile the circuits on quantum hardware with a reduced number of inserted gates.

We first executed a list of circuits of different sizes simultaneously and compared our algorithm with the state-of-the-art multi-programming approach. Experimental results showed that our approach can outperform the state of the art in terms of both output fidelity and the number of additional gates. Then, we investigated our multi-programming approach on VQE algorithm to estimate the ground state energy of deuteron, showing the added value of applying our approach to existing quantum algorithms. The multi-programming approach is evaluated on IBM hardware, but it is general enough to be adapted to other quantum hardware.

Based on the experimental result, we found that the main concern with multi-programming mechanism is a trade-off between output fidelity and the hardware throughput. For example, how one can decide which programs to execute simultaneously and how many of them to execute without losing fidelity. Here, we list several guidelines to help the user to utilize our multi-programming approach.

  • Check the target hardware topology and calibration data. The multi-programming mechanism is more suitable for a relatively large quantum chip compared to the quantum circuit and with low error rate.

  • Choose appropriate fidelity threshold for post qubit partition process. A high threshold can improve the hardware throughput but lead to the reduction of output fidelity. It should be set carefully depending on the size of the benchmark. For benchmarks of small size that we used in experiments, it is reasonable to set the threshold to 0.1.

  • The number of circuits that can be executed simultaneously will mainly depend on the fidelity threshold and the calibration data of the hardware.

  • QHSP algorithm is suggested for the partition process due to efficiency and GSP is recommended to evaluate the quality of the partition algorithm. Using both algorithms, one can explore which circuits can be executed simultaneously and how many of them within the given fidelity threshold.

Quantum hardware development with more and more qubits will enable execution of multiple quantum programs simultaneously and possibly a linchpin for quantum algorithms requiring parallel sub-problem executions. Variational Quantum Algorithm is becoming a leading strategy to demonstrate quantum advantages for practical applications. In such algorithms, the preparation of parameterized quantum state and the measurement of expectation value are realized on shallow circuits (zhang2020shallow, ). Taking VQE as an example, the Hamiltonian can be decomposed into several Pauli operators and simultaneous measurement by grouping Pauli operators have been proposed in (gokhale2020optimization, ; kandala2017hardware, ; crawford2019efficient, ) to reduce the overhead of the algorithm. Based on our experiment, we have shown that the overhead of VQE can be further improved by executing several sets of Pauli operators at the same time using multi-programming mechanism.

For future work, we would like to apply our multi-programming algorithm to other variational quantum algorithms such as VQLS or VQC to enable the preparation of states in parallel and to reduce the overhead of these algorithms. Moreover, in our qubit partition algorithms, we take the crosstalk effects into consideration by characterizing them and adding them to the fidelity score of the partition, which is able to avoid the crosstalk error in a high level. There are some other approaches of eliminating the crosstalk error in a cheaper way instead of performing SRB protocol, for example using commutativity rules to reorder the simultaneous gate operations (murali2020software, ; itoko2020optimization, ). However, these methods have some challenges such as trading off between crosstalk and decoherence. More interesting tricks for crosstalk mitigation need to be targeted for simultaneous executions. In addition, not all the benchmarks have the same circuit depth. Taking the time-dependency into consideration, choosing the optimal combination of circuits of different depth to run simultaneously can also be the focus of future work.

4. Methods

4.1. Metrics

Here are the detailed explanations of the metrics that we use to evaluate our algorithm.

  1. Probability of a Successful Trial (PST) (tannu2019not, ). This metric is defined by the number of trials that give the expected result divided by the total number of trials. The expected result is obtained by executing the quantum circuit on the simulator. To have a precise estimation of the PST, we execute each quantum circuit on the quantum hardware for a large number of trials (8192).

  2. Number of additional CNOT gates. This metric is related to the number of SWAP or Bridge gates inserted. This metric can show the ability of the algorithm to reduce the number of additional gates.

  3. Trial Reduction Factor (TRF). This metric is introduced in (das2019case, ) to evaluate the improvement of the throughput thanks to the multi-programming mechanism. It is defined as the ratio of trials needed when quantum circuits are executed independently to the trials that when they are executed simultaneously.

4.2. Algorithm configurations

Here, we consider the algorithm configurations of different multi-programming and standalone mapping approaches. We select the best initial mapping out of ten attempts for HA, PHA, GSP, and QHSP. Weight parameter in the cost function (Eq. 5) is set to 0.5 and the size of the extended layer is set to 20. Parameters and are set to 0.5 respectively to consider equally the swap distance and swap error rate. For the experiments of multiple different-size circuits, the weight parameter of QHSP (Eq. 2) is set to because of the relatively large number of CNOT gates in benchmarks, whereas for deuteron experiment, is set to because of the small number of CNOTs of the parameterized circuit. The threshold for post qubit partition is set to 0.1 to ensure the multi-programming fidelity. Due to the expensive cost of SRB, we perform SRB only on IBM Q 27 Toronto and collect the pairs with significant crosstalk effect. Only the collected pairs are characterized and their crosstalk properties are provided to the partition process. The experimental results on IBM Q 65 Manhattan do not consider the crosstalk effect. For each algorithm, we only evaluate the mapping transition process, which means no optimisation methods like gate commutation or cancellation are applied.

The algorithm is implemented in Python and evaluated on a PC with 1 Intel i5-5300U CPU and 8 GB memory. Operating System is Ubuntu 18.04. All the experiments were performed on the IBM quantum information science kit (qiskit) (Qiskit, ) and the version used is 0.21.0.

5. Data Availability

The source code of the algorithms used in this paper is available on the Github repository (Github, ).

6. Supplementary Information

6.1. Supplementary note - Hardware information

Single qubit error rate: to
CNOT error rate: to
Readout error rate: to

Figure 8. IBM Q 27 Toronto topology and error rates.

Single qubit error rate: to
CNOT error rate: to
Readout error rate: to

Figure 9. IBM Q 65 Manhattan topology and calibration data.

Noise can cause several errors during the execution process such as (1) coherence errors due to the fragile nature of qubits. The qubit can only maintain information for a limited amount of time. (2) Operational errors including gate errors and measurement errors (readout errors). (3) Crosstalk errors that violate the isolated qubit state due to operations on other qubits.

Supplementary Fig. 8 shows the hardware topology and the calibration data of IBM Q 27 Toronto. We list the calibration data of single-qubit error rate, CNOT error rate, and readout error rate. Note that these errors are not constant and change at each re-calibration of the chip, and IBM does not provide the statistics of crosstalk error. The other device that we choose to evaluate our algorithm is IBM Q 65 Manhattan. Its topology and calibration data are shown in Supplementary Fig. 9. CNOT error rate is one order of magnitude higher than their one-qubit counterparts. Moreover, the readout error rate is of the same order of magnitude or higher than CNOT error rate. In this paper, we only focus on CNOT error rate and readout error rate because of the relatively low error rates of one-qubit gates.

It is important to note that all the interconnects between qubits as well as the reliability of qubit are not equal with respect to CNOT error rate and readout error rate. Taking IBM Q 27 Toronto as an example, the best CNOT gate has an error rate of 4.8 times lower than the worst CNOT, and the most reliable qubit has a readout error rate of 31.7 times lower than the worst qubit. Therefore, each qubit cannot be treated equally, and we need to consider the error difference between the links and qubits.

In this article, we mainly focus on IBM architectures. But the proposed methods are general enough to be applied to any other quantum chips that use the quantum-gate model of computation, such as Google’s Sycamore (google-quantum-supremacy, ) or Rigetti’s Aspen-8.

6.2. Supplementary note - Motivational Example

(a)
(b)
(c)
Figure 10. A motivational example of qubit partition problem (error rate in %). (a) Partition without considering operational error. (b) Partition considering operational error without considering crosstalk effect. (c) Partition considering both operational error and crosstalk effect.

To motivate the qubit partition problem, we execute two small circuits QC1 and QC2 simultaneously on IBM Q 27 Toronto with different partitions (Supplementary Fig. 10). CNOT error rate of each link is shown in the figure and the unreliable links and qubits with high readout error rates are highlighted in red. Both circuits have five qubits with a different number of gates as listed in Supplementary Fig. 11.

There are two constraints to be considered when executing multiple circuits concurrently. First, each circuit should be allocated to a partition containing reliable physical qubits. Allocated physical qubits can not be shared among quantum circuits. Second, qubits can be moved only inside of their circuit partition, in other words, qubits can be swapped within the same partition only. Thus, finding reliable partitions for multiple circuits is an important step in the multi-programming mapping problem.

We compare three partitions with the same topology to show the impact of different error sources on the output fidelity: (1) Partition P1 without considering the operational error (Supplementary Fig. 9(a)). (2) Partition P2 only considering operational error without the crosstalk effect (Supplementary Fig. 9(b)). (3) Partition P3 considering both operational error and crosstalk effect (Supplementary Fig. 9(c)). Note that the operational error includes CNOT error and readout error. For illustration, we fix the partition of QC2 to and only change the partition of QC1. It is important to note that if we have different topologies, the fidelity of the circuit will be different as well because the number of additional gates is strongly related to the hardware topology.

Results in Supplementary Fig. 11b show that Partition P1 has the lowest fidelity. Partition P2 considers operational error and selects with reliable qubits and links. However, it does not consider the crosstalk effect. Since is the neighour of , when and are executed at the same time, they can affect each other and violate the qubit state. Partition P3 includes and considers both operational error and crosstalk effect. P3 does not have the crosstalk effect and is slightly better than P2 in terms of the operational error, however, the output fidelity of QC1 is increased by .

(a)
(b)
Figure 11. Results of the motivational example. (a) Circuit information and output fidelity results of different partitions. n: qubit number. g: gate number of the circuit. (b) Output fidelity results of different partitions.

6.3. Supplementary note - Qubit Partition

In this note, we first demonstrate the pseudo-code of GSP algorithm. Then, we show an example of QHSP algorithm and its pseudo-code. Finally, we explain the runtime analysis of the two algorithms in detail.

6.3.1. Greedy sub-graph partition algorithm


The pseudo-code of GSP is shown in Algorithm 1.

input : Quantum circuit , Coupling graph , Calibration data , Crosstalk properties crosstalk_props, Used_qubits
output : A list of candidate partitions sub_graph_list
1 begin
2       qubit_num .qubit_num;
3       Set sub_graph_list to empty list;
4       for sub_graph combinations (, qubit_num) do
5             if sub_graph is connected then
6                   if  is empty then
7                         sub_graph.Set_Partition_Score (, , );
8                         sub_graph_list.append (sub_graph);
9                        
10                   end if
11                   if no qubit in sub_graph is in  then
12                         crosstalk_pairs Find_Crosstalk_pairs (sub_graph, crosstalk_props, );
13                         sub_graph.Set_Partition_Score (, , , crosstalk_pairs);
14                         sub_graph_list.append (sub_graph);
15                        
16                   end if
17                  
18             end if
19            
20       end for
21      return sub_graph_list ;
22      
23 end
24
Algorithm 1 GSP algorithm

6.3.2. Qubit fidelity degree-based heuristic sub-graph partition algorithm


Supplementary Fig. 12 shows an example of applying QHSP on IBM Q 5 Valencia (5-qubit ibmq_valencia) (IBMQ5Valencia, ) for a four-qubit circuit. The calibration data of IBM Q 5 Valencia, including readout error rate and CNOT error rate is shown in Supplementary Fig. 11(a). The fidelity degree of qubit calculated by Eq. 2 is shown in Supplementary Fig. 11(c). Here, we consider a circuit of medium size and set to two. Suppose the largest logical degree is three. Therefore, is selected as the starting point since it is the only physical qubit that has the same physical node degree as the largest logical degree. It has three neighbour qubits: , , and . is merged into the sub-partition because it has the highest fidelity degree among neighbour qubits. The sub-partition becomes . As the fidelity degree of is larger than , the algorithm will select again the left neighbour qubit with the largest fidelity degree of , which is . The sub-partition becomes . is still the qubit with the largest fidelity degree in the current sub-partition, its neighbour qubit – is merged. The final sub-partition is and it can be considered as a partition candidate. The merging process is shown in Supplementary Fig. 11(b).

The pseudo-code of QHSP is shown in Algorithm 2.

input : Quantum circuit , Coupling graph , Calibration data , Crosstalk properties crosstalk_props, Used_qubits , Starting points starting_points
output : A list of candidate partitions sub_graph_list
1 begin
2       circ_qubit_num .qubit_num;
3       Set sub_graph_list to empty list;
4       for i starting_points do
5             Set sub_graph to empty list;
6             qubit_num ;
7             while qubit_num circ_qubit_num do
8                   if sub_graph is empty then
9                         sub_graph.append (i);
10                         qubit_num qubit_num + 1 ;
11                         continue;
12                        
13                   end if
14                  best_qubit find_best_qubit (sub_graph, , );
15                   if best_qubit None then
16                         sub_graph.append (best_qubit);
17                         qubit_num qubit_num + 1 ;
18                         continue;
19                        
20                   end if
21                  
22             end while
23            if len (sub_graph) = circ_qubit_num then
24                   if  is empty then
25                         sub_graph.Set_Partition_Error (, , ,);
26                         sub_graph_list.append (sub_graph);
27                        
28                   end if
29                  if no qubit in sub_graph is in  then
30                         crosstalk_pairs Find_Crosstalk_pairs (sub_graph, crosstalk_props, );
31                         sub_graph.Set_Partition_Error (, , , crosstalk_pairs);
32                         sub_graph_list.append (sub_graph);
33                        
34                   end if
35                  
36             end if
37            
38       end for
39      return sub_graph_list ;
40      
41 end
42
Algorithm 2 QHSP algorithm
(a)
(b)
(c)
Figure 12. Example of qubit partition on IBM Q 5 Valencia for a four-qubit circuit using QHSP. Suppose the largest logical degree of the target circuit is three. (a) Calibration data of IBM Q 5 Valencia. The value inside of the node represents the readout error rate (in%), and the value above the link represents the CNOT error rate (in%). (b) Process of constructing a partition candidate using QHSP. (c) The physical node degree and the fidelity degree of each qubit calculated by Eq. 2.

6.3.3. Runtime analysis


Let be the number of hardware qubits and the number of qubits in the circuit to be allocated in a partition. GSP algorithm selects all the combinations of subgraphs from -qubit hardware and takes time, which is . For each subgraph, it computes its fidelity score including calculating the longest shortest path, which scales at . It ends up being equivalent to . In most cases, the number of circuit qubits is less than the number of hardware qubits, thus the time complexity becomes . It increases exponentially as the number of qubits of the circuit augments.

QHSP algorithm starts by collecting a list of starting points where . To get the starting points, we sort the physical qubits by their physical node degree, which takes . Then, we iterate over all the gates of the circuit (e.g. circuit has gates) and sort the logical qubits according to the logical node degree, which takes . Next, for each starting point, it iteratively merges the best neighbour qubit until each sub-partition contains qubits. To find the best neighbour qubit, the algorithm finds the best qubit in a sub-partition and traverses all its neighbours to select the one with the highest fidelity degree. Finding the best qubit in the sub-partition is where is the number of qubits in a sub-partition. The average number of qubits is , so this process takes time on average. Finding the best neighbour qubit is because of the nearest-neighbor connectivity of superconducting devices. Overall, the QHSP takes time, and it can be truncated to , which is polynomial.

6.4. Supplementary note - Mapping Transition Algorithm

In this note, we present the pseudo-code of our simultaneous mapping transition algorithm (see Algorithm 3).

input : Circuits , Coupling graph , Distance matrices , Initial mapping , First layers
output : Final schedule schedule
1 begin
2       ;
3       while  not all gates are executed do
4             Set swap_bridge_lists to empty list;
5             for  in  do
6                   for gate in  do
7                         if gate is hardware-compliant then
8                               schedule.append (gate);
9                               Remove gate from ;
10                              
11                         end if
12                        
13                   end for
14                  if  is not empty then
15                         swap_bridge_candidate_list FindSwapBridgePairs (, );
16                         swap_bridge_lists.append (swap_bridge_candidate_list);
17                        
18                   end if
19                  
20             end for
21            for swap_bridge_candidate_list swap_bridge_lists do
22                   for  swap_bridge_candidate_list do
23                         Map_Update (, );
24                         ;
25                         for gate  do
26                               + (gate, )
27                         end for
28                         . (, , );
29                         Update the extended layer ;
30                         ;
31                         for gate  do
32                               + (gate, );
33                              
34                         end for
35                        
36                   end for
37                  Choose the best gate ;
38                   Map_Update (, );
39                  
40             end for
41            Update the First layers;
42            
43       end while
44      return schedule
45 end
46
Algorithm 3 Simultaneous mapping transition algorithm

6.5. Supplementary note - Experimental results

In this note, we demonstrate the exact experimental results when executing a different number of circuits on the two devices, IBM Q 27 Toronto and IBM Q 65 Manhattan, at the same time.

  • : average of PSTs. : runtime in seconds of the partition process. : comparison of average fidelity.

Table 2. Comparison of fidelity when executing two circuits simultaneously on IBM Q 27 Toronto.
  • : number of additional gates. : sum of number of additional gates. : comparison of sum of number of additional gates.

Table 3. Comparison of number of additional gates when executing two circuits simultaneously on IBM Q 27 Toronto.
  • : average of PSTs. : runtime in seconds of the partition process. : comparison of average fidelity.

Table 4. Comparison of fidelity when executing three circuits simultaneously on IBM Q 65 Manhattan.
  • : number of additional gates. : sum of number of additional gates. : comparison of sum of number of additional gates.

Table 5. Comparison of number of additional gates when executing three circuits simultaneously on IBM Q 65 Manhattan.
  • : average of PSTs. : runtime in seconds of the partition process. : comparison of average fidelity.

Table 6. Comparison of fidelity when executing four circuits simultaneously on IBM Q 65 Manhattan.
  • : number of additional gates. : sum of number of additional gates. : comparison of sum of number of additional gates.

Table 7. Comparison of number of additional gates when executing three circuits simultaneously on IBM Q 65 Manhattan.

References

7. Acknowledgment

This work is funded by the QuantUM Initiative of the Region Occitanie, University of Montpellier and IBM Montpellier. The authors would like to thank Xinglei Dou and Lei Liu for the meaningful discussions and exchanges. The authors are very grateful to Adrien Suau for the helpful suggestions and feedback on an early version of this manuscript. We acknowledge use of the IBM Q for this work. The views expressed are those of the authors and do not reflect the official policy or position of IBM or the IBM Q team.

8. Author Contributions

S.N and A.T.S contributed equally to this work. A.T.S proposed the problem formalism. S.N implemented the algorithms and wrote the paper. A.T.S revised the paper. Both authors reviewed and discussed the analyses and results of the work.

9. Competing interests

The authors declare no competing interests.

10. Additional Information

Correspondence and requests for materials should be addressed to S.N.