An efficient quantum circuits optimizing scheme compared with QISKit

07/04/2018 ∙ by Xin Zhang, et al. ∙ Chongqing University 0

Recently, the development of quantum chips has made great progress-- the number of qubits is increasing and the fidelity is getting higher. However, qubits of these chips are not always fully connected, which sets additional barriers for implementing quantum algorithms and programming quantum programs. In this paper, we introduce a general circuit optimizing scheme, which can efficiently adjust and optimize quantum circuits according to arbitrary given qubits' layout by adding additional quantum gates, exchanging qubits and merging single-qubit gates. Compared with the optimizing algorithm of IBM's QISKit, the quantum gates consumed by our scheme is 74.7 time is only 12.9

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Quantum computing has attracted increasing attention because of its tremendous computing power [1, 2, 3] in recent years. There are more and more companies and scientific research institutions who devote themselves to developing quantum chips with more qubits and higher fidelity. While most theoretical studies assume that interactions between arbitrary pairs of qubits are available, almost all these realistic chips have certain constraints on qubit connectivity[4, 5]. For example, IBM’s 5-qubit superconducting chips Tenerife and Yorktown[6] adopt neighboring connectivity ( illustrated in Fig.1 (a) and 1 (b), respectively). [7] uses a 4-qubit superconducting chip, in which four qubits are not directly connected, but are connected by a central resonator. That is, the layout of this chip is central, as shown in Fig.1 (c). In addition, CAS-Alibaba Quantum Laboratory’s 11-qubit superconducting chip[8] and Tsinghua University’s 4-qubit NMR chip[9] both reduce the fully connectivity to the linear nearest-neighbor connectivity, as shown in Fig.1 (d). Distinctly, this non-fully connected connection sets additional barriers for implementing quantum algorithms and programming quantum programs.

(a) Tenerife
(b) Yorktown
(c) Central layout
(d) Linear layout
Figure 1: The four different physical layouts

On the other hand, decoherence[10] is a huge challenge for quantum computing and the quantum programs should be executed within coherence time[11]. For getting more reliable results, we need to reduce the quantum circuit depth[12] as far as possible. However, for non-fully connected physical layouts, if we want to execute arbitrary quantum programs, we must add additional quantum gates to adjust the original quantum program, which will inevitably lead to an increase in the depth. Therefore, it is of great practical significance to design an optimization algorithm which can minimize the overhead as mush as possible.
As early as 2007, D. Cheung et al. made a discussion about the non-fully connected physical layout[4]. By adding SWAP gates, they turned illegal CNOT operations into legitimate operations and proved that the star-shaped or the linear nearest-neighbor connectivity could be able to utilize additional quantum gates to complete the adjustment, where stands for the number of qubits. In 2017, IBM developed a quantum information science kit, namely QISKit [13], which contains an algorithm that can adjust and optimize quantum programs according to any layout. Recently, in order to find more efficient solutions, IBM organized the QISKit Developer Challenge [14]. As for the optimization of quantum circuits, in order to simulate more qubits on classical computers, E. Pednault et al. proposed a method, namely slice[15], to split the original quantum circuit into multiple subcircuits. In this way, they simulate a random quantum circuit with depth 27 in a 2D lattice of qubits and a circuit with depth 23 in a 2D lattice of qubits on the IBM Blue Gene/Q supercomputer, which improved the number of entangled qubits that classical computers can simulate. However, the slice approach is focused on the simulation of more entangled qubits, so it do not take into account the physical layout, and is only applicable to programs with short circuit depth.
In this paper, we propose a general enough quantum circuit optimizing scheme which can efficiently adjust and optimize any quantum circuit according to any layout. The remainder of this paper is organized as follows: Section 2 briefly introduce the necessary conceptions. In Section 3, the design concept of our optimizing scheme is presented in detail. We next compare the cost and efficiency of our scheme with QISKit’s optimizing method in Section 4. The conclusion and future research can be found in Section 5.

2 Preliminaries

2.1 QISKit

QISKit is a quantum information science kit developed by IBM, which takes the quantum programs written by Open-QASM[16] as the input. It adjusts and optimizes the input programs according to the given layout, and then executed the programs by its built-in QASM-simulator or cloud-based quantum chips.
Open-QASM is a variant of QASM[17], which is designed to control a physical system with a parameterized gate set. Specifically, Open-QASM takes as the basic quantum gates set, where

(1)

Obviously, actually has an infinite number of single-qubit gates and it is universal[18]. For comparison with QISKit, our optimizing scheme also takes it as the basic set of quantum gates.

2.2 Common solutions

Before introducing the common solutions, we need to point out the main obstacles for hindering the execution of quantum programs:

  • Obstacle-1: the direction of CNOT gate is illegal, as shown in the red line in Fig.2 (a);

  • Obstacle-2: the connectivity between two specific qubits is illegal, as shown in the blue line.

(a) Given Layout
(b) Actual Layout
Figure 2: An example of Obstacle-1 and Obstacle-2.

For Obstacle-1, a common solution is to flip the direction by 4 additional H gates:

As for Obstacle-2, the basic idea is exchanging the states of qubits by SWAP gates. For example, although cnot(, ) is illegal in Fig.2 (a), we can use another way to accomplish the same task, such as the circuit shown in Fig.3.
However, the additional overhead of this solution is costly, especially for sparse physical layouts. Specifically,

where stands for the number of intermediate nodes on the shortest path between the control-qubit and the target-qubit, stands for 3 CNOT gates and 4 H gates.

(a) An implementation of cnot()
(b) SWAP()
Figure 3: An equivalent circuit of cnot(), where SWAP() is implemented by (b).

3 Our Optimizing Scheme

As mentioned before, the non-fully connected layout is widely adopted. There are only two ways to execute arbitrary quantum programs:

  • Hardware solution: Completely changing the layouts of chips and constructing fully connected chips;

  • Software solution: Designing a circuit optimization algorithm, which is able to adjust the original quantum program to meet requirements of the chip.

Our optimizing scheme is an efficient general solution from software level. Specifically, we design the following three steps to adjust and optimize quantum programs based on the common solutions described in Section 2.2.

3.1 The global adjustment of qubits

The global adjustment of qubits means that before the execution of quantum programs, we first compare the connected relation of quantum programs with the given layout, and directly exchange the qubits. The greatest advantage of this step is that no additional quantum gates need to be consumed. Therefore, the number of additional quantum gates consumed will be minimum if all illegal CNOT gates can be handled in this step. For simplicity, we assume that any edge in the given layout is bidirectional in this step and Local adjustment, that is, the Obstacle-1 is ignored in the two steps.
Specifically, this step can be described as Algorithm 1. In Algorithm 1, we extract all CNOT gates from the quantum program separately and traverse them from front to back. Once encountering an illegal CNOT gate, we try to find an available qubits’ mapping to adjust the whole Open-QASM code without converting the traversed CNOT gates illegal. At each adjustment, we have available mappings to choose, where stands for the number of mappings which make some traversed CNOT gates illegal, and stand for the number of adjacent qubits of control-qubit and target-qubit in the given layout, respectively. The traversal terminates when there is no illegal CNOT gate or .
Suppose that there are possible mappings, where

is related to the given layout and the connectivity of quantum programs. At this point, we need to estimate the cost of solving Obstacle-2 in the program adjusted according to these

mappings ( mappings and one empty mapping) respectively. Then take the smallest one as the global adjustment mapping. The reason for estimation, rather than accurate calculation, and the estimation process are explained in the next part. Finally, we adjust the qubits of the original Open-QASM code according to the global mapping. As for the classical register, which stores the results of the measurement, does not need to be modified. For example, is illegal in Fig.2 and it can be adjusted by the global mapping , as shown in Fig.4.

Input: The set of CNOT in QP, ; the set of legal CNOT, ; the record of all possible costs, ; the record of all possible mappings, ; the current mapping, ;
Output: The mapping of qubits’ ID,
1 GlobalAdjust()
2       [ ],[ ] and [ ];
3       Adjust();
4       getIndexofMinValue();
5       return ;
6      
7
8
9 Adjust()
10       [ ];
11       for CNOT in  do
12             if  not in  then
13                   and ;
14                   getAdjacentQubit() and getAdjacentQubit();
15                   ;
16                   for map in  do
17                         ;
18                         change qubits’ ID in according to ;
19                         if no illegal CNOT in  then
20                               add to ;
21                              
22                        
23                  break;
24                  
25            
26      if  == [ ] then
27             estimateCost();
28             add to and add to ;
29            
30      for map in  do
31             and add to ;
32             change qubits’ ID in according to ;
33             if no illegal CNOT in  then
34                   add to and add to ;
35                  
36            else
37                   Adjust();
38                  
39            
40      
41
Algorithm 1 Global Adjustment
(a) Before adjusting
(b) After adjusting
Figure 4: Adjust the circuit according to and (b) can be executed on Fig.2 (a)

3.2 The local adjustment of qubits

In this step, the exchange of qubits’ state mainly depends on adding SWAP gates. Compared with the basic solution described in Section 2.2, our scheme has the following differences:

  • There is no need to use SWAP gates again to restore the state. Instead, we use the qubits involved in the exchange and intermediate qubits to generate a local mapping, then modify the subsequent gates and classical registers according to the mapping;

  • Due to the existence of the first difference, the effect of exchanging control-qubit with intermediate qubits by SWAP gates and exchanging target-qubit with these qubits is completely different for the subsequent code. Therefore, we need to calculate the gate costs in the two cases respectively and take the smaller one as the object of exchange.

However, it is difficult to accurately calculate the costs of these two cases in the second difference. During the calculation, we will encounter several illegal CNOT gates, and for each illegal CNOT, we have two solutions. Actually, the solution space is a binary tree whose height is and the number of leaf nodes is approximately , where stands for the number of illegal CNOT gates. Obviously, classical computers have no ability to complete such large-scale calculations in a relatively short time and we can only estimate the cost. Essentially, the estimation process is based on greedy ideas and easily trapped into the local optimization. With the increase in the scale of quantum programs, the manifestation of this greedy choice is more obvious, which can be seen in Section 4.
In our scheme, the cost of adjusting the Open-QASM code is estimated by

where stands for the number of intermediate qubits between the control-qubit and the target-qubit of the th illegal CNOT, and stands for 3 CNOT gates and 4 H gates. Among the various estimation formulas we tried, the result obtained by Equation (4) is optimal. The reason for adding the correction factor in Equation (4) is that the later the CNOT gate is executed, the easier it is influenced by the previous adjustments. That is, estimation is not reliable for the later CNOT gates. Multiplying the factor, which will continue to decrease as the estimation progress, with the estimation results can have a certain correction effect.
For improving the accuracy of estimation, we accurately calculate the top layers of the binary tree, and estimate the cost of the subsequent gates of the cases respectively, where is the optimal value determined after repeated trials. Then add the estimated result and the calculated result together and choose the smallest one among the 16 cases as our choice.
Specifically, we traverse the Open-QASM code. Whenever encountering an illegal CNOT, we call Algorithm 2 to adjust it and then update the subsequent code and the classical register until the traversal terminates. It can be seen from Algorithm 2 that the mapping generated by Adjust function only affects the subsequent code of and that is why we call this step Local adjustment.
At this point, there is no Obstacle-2 in quantum programs. Then we traverse the new Open-QASM code again to handle Obstacle-1 by Equation (2).

Input: The Open-QASM code of the quantum program, ; the first illegal CNOT, ; the rest CNOTs after in , ; the record of all possible costs, ; the cost in the current case, ; the record of all possible mappings, ; the current mapping, ; the depth of recursion,
Output: The adjusted Open-QASM code,
1 LocalAdjust()
2       , [ ], [ ], and [ ];
3       Adjust();
4       getIndexofMinValue();
5       add SWAP gates to according to ;
6       change qubits’ID in according to ;
7       return ;
8      
9
10
11 Adjust()
12       getIntermediateNode();
13       .length;
14       for qubit in  do
15             ;
16             if  is control-qubit then
17                   ;
18                  
19             constructMapBetweenQ(,);
20             change qubits’ ID in according to ;
21             getFirstIllegalCnot();
22             getAllCnotAfterNewIllC();
23             if  != [ ] then
24                   ;
25                  
26            if  None then
27                   add to and to ;
28                  
29            else if  ==  then
30                   + estimateCost();
31                   add to and add to ;
32                  
33            else
34                   Adjust();
35                  
36            
37      
38
Algorithm 2 Local Adjustment

3.3 The mergence of single-qubit gates

In this step, we will reduce the circuit depth by merging single-qubit gates. At first, we need to determine which kind of single-qubit gates can be merged.
The random quantum circuit shown in Fig.5 (a) contains three CNOT gates and these gates divide the execution processes of , , into three parts respectively. Obviously, single-qubit gates in these parts can be merged and we can reduce Fig.5 (a) to Fig.5 (b). Based on this example, we can draw a conclusion that for any qubit , the multi-qubit gates with involved can divide the execution process of into subintervals and the single-qubit gates in each subintervals can be merged into one gate.

(a) Before merging
(b) After merging
Figure 5: The change of a quantum random circuit before and after merging single-qubit gates.

As mentioned before, all single-qubit gates in Open-QASM belong to . Therefore, merging single-qubit gates actually contains 9 different cases: , , , , , , , and . In order to handle these cases, we need to do Z-Y decompositions[19] for , and . By Equations (1), we obtain:

(5)

For the first five cases, we can directly merge them by [18]. As for the last four cases, we have:

(6)

The key of this kind of merging lies in how to transform the Y-Z decomposition of a quantum gate to the Z-Y decomposition. And we use QISKit’s merge method proposed in [20] to solve this problem. So far, we complete the adjustment and optimization of the original quantum program according to any given layout.

4 Numerical Results

In this section, we take QISKit’s optimizing method as the benchmark to evaluate the performance of our optimizing scheme in different scales of quantum programs and different layouts of quantum chips. In addition, we use the method proposed in the QISKit Developer Challenge to count the cost of gates:

where and stand for the number of CNOT gates and single-qubit gates in optimized quantum circuit, respectively.

4.1 Platform

Hardware Platform
All the experiments in this paper are executed on a PC with an Intel Core i7 processor and 8GB of RAM. Furthermore, we have no special hardware acceleration, such as a GPU.
Software Platform
In order to verify the correctness of our scheme, we use the QASM-simulator to execute the optimized circuits. In addition, we also use a special method to generate random quantum circuits, which first generates random circuits whose quantum gates belong to [21], and then decomposes these gates into gates belonged to [22]. The advantage of this method is that we can fully test different connections between qubits and the fairness of comparison between our optimizing scheme and QISKit (version=0.4.11) can be guaranteed. The detailed execution flow of our experiments is shown in Fig.6.

Figure 6: Execution Flow Chart

It should be noticed that for accurate description, the circuit depth mentioned in the following is still circuit’s depth, and the actual depth is about 7 times of it.

4.2 Results

As we all know, the number of qubits and the circuit depth are important indicators for the scale of quantum programs. Therefore, the experiments are designed as follow: for the cases of qubits number from to , we generate different random quantum circuits respectively for cases with circuit depth from to respectively. That means, in total, circuits are generated. Then we chose four common connected graphs (linear, central, neighboring and circular) and use our optimizing scheme and QISKit’s algorithm to adjust and optimize these random circuits according to these layouts, respectively. That is, each algorithm handles () quantum circuits. Finally, the optimized quantum programs are executed by QASM-simulator. If the result of our scheme is consistent with QISKit’s result, we count the cost and the execution time of each circuit.
All quantum circuits, layouts and the source code of our scheme can be found in Github111https://github.com/zhangxin20121923/QISKit_Deve_Challenge.
Comparison with QISKit’s optimizing method
Table 1 shows the quantum gates consumption of the original random quantum circuits, and the average cost of gates and compiler time required to adjust and optimize these circuits by our scheme and QISKit.

Time (s) Gate Cost
Original Circuit 0 3084391
Our Scheme 16472.48 6703061
QISKit 127751.99 8974717
Table 1: The overall statistical

Obviously, the quantum gates consumed by our scheme is 74.7% of QISKit, and the execution time is only 12.9%.
Specifically, the performance of our scheme varies for different scales of quantum circuits.

(a) Gate Cost
(b) Efficiency
Figure 7: Experimental Results

Fig.7 (a) and Fig.7 (b) illustrate the ratio of QISKit and our scheme about the cost of quantum gates and efficiency with various qubits and circuit depths , respectively. The two formulas are shown as follows:

where and stand for the gate cost and execution time of QISKit’s algorithm, and and indicate those of our method. Fig.7 shows that in all cases we executed, our algorithm can use fewer quantum gates to adjust and optimize the original circuits in less time. In the worst case (more qubits and more circuit depth), we can use 6% less gates and the efficiency is about 5 times; in optimal case (more qubits and less circuit depth), we can use 63% less gates and the efficiency is about 20 times.
Obviously, the results are consistent with the theory: when the number of qubits is large and the circuit depth is small, since we recursively calculate 4 layers of the solution space tree, the choice is more reliable and the performance is better; when the number of qubits is small, the layout tends to be fully connected and our scheme does not have advantages; and when the circuit depth is large, we will be easily trapped into the local optimum and the performance of our scheme is worse than that of the small depth.
Performance in different physical layouts
For the four layouts we have chosen, there are also significant differences in costs of quantum gates and execution time. In order to deal with different scales of circuits in a fair manner and avoid the statistical result being dominated by large-scale circuits, we no longer directly sum up the gate costs in different cases (as used in Table 1). Specifically, the statistical method is as follows:

where , , , and stand for the gate cost of the th original circuit, the th circuit adjusted by QISKit and our scheme respectively, and stand for the time required to compile the th circuit by QISKit and our scheme respectively.
Fig.8 (a) shows that for the central layout, our scheme requires times the gate consumption of the original circuit, and the optimizing method of QISKit requires times; for the linear layout, the gate cost of our scheme is times as many as the original cost and the cost of QISKit is about times; as for the circle and neighbour layouts, our scheme need to use times and times the gate cost respectively, while QISKit’s method need times and times. And Fig.8 (b) illustrates that for the linear, circle and neighbour layouts, our scheme is about 4 times faster than QISKit; as for the central layout, the efficiency of our schemes is about 17.3 times as fast as QISKit’s method.

(a) Costs of four layouts
(b) Efficiency of four layouts
Figure 8: Experimental Results

5 Conclusions and Future Research

Considering the cost of physical implement, layouts of most existing quantum chips are not fully connected, which sets additional barriers for implementing quantum algorithms and programming quantum programs. Therefore, a better approach is to automate the task of adjusting and optimizing quantum programs according to any given layout by the compiler of quantum computer. We propose a general optimizing scheme to accomplish the task by adding additional logic gates, exchanging qubits in the quantum register and merging single-qubit gates. Compared with QISKit’s optimizing method, the quantum gates consumed by our scheme is and the execution time is only overall. For circuits with more qubits and less circuit depth, this advantage is more obvious. In addition, several common connected graphs (linear, central, neighboring and circular) are compared as well. In these four cases, our scheme has advantages. Especially for the central layout, we can use only gates and execution time of QISKit’s optimizing algorithm to adjust and optimize the original quantum circuits.
Future Research
In our scheme, we often use the idea of greedy algorithm to make a choice when the circuit depth of the quantum program is deep. But the experimental results in section 4 show that we made wrong choices in some cases, and got trapped in the local optimal solution. If we can find more equitable selection criteria or even calculate the global optimal solution, we will further reduce the consumption of additional logic gates.
In addition, a high precision floating-point calculation is needed in the combination of single-qubit logic gates, which takes up about 70% of the total compile time. Whether we can find more efficient merging methods is a problem worth of consideration. In order to further evaluate different physical layouts, we also plan to discuss with the R&D teams of actual quantum chips to combine the actual overhead needed to design different layouts and the expense of the software level.

References