Optimality Study of Existing Quantum Computing Layout Synthesis Tools

02/22/2020
by   Bochen Tan, et al.
0

Layout synthesis, an important step in quantum computing, processes quantum circuits to satisfy device layout constraints. In this paper, we construct QUEKO benchmarks for this problem, which have known optimal depths and gate counts. We use QUEKO to evaluate the optimality of current layout synthesis tools, including Cirq from Google, Qiskit from IBM, 𝗍|𝗄𝖾𝗍⟩ from Cambridge Quantum Computing, and recent academic work. To our surprise, despite over a decade of research and development by academia and industry on compilation and synthesis for quantum circuits, we are still able to demonstrate large optimality gaps: 1.5-12x on average on a smaller device and 5-45x on average on a larger device. This suggests substantial room for improvement of the efficiency of quantum computer by better layout synthesis tools. Finally, we also prove the NP-completeness of the layout synthesis problem for quantum computing. We have made the QUEKO benchmarks open-source.

READ FULL TEXT VIEW PDF

Authors

page 12

07/30/2020

Optimal Layout Synthesis for Quantum Computing

Recent years have witnessed the fast development of quantum computing. R...
08/24/2020

ALIGN: A System for Automating Analog Layout

ALIGN ("Analog Layout, Intelligently Generated from Netlists") is an ope...
05/12/2021

Test of Quantumness with Small-Depth Quantum Circuits

Recently Brakerski, Christiano, Mahadev, Vazirani and Vidick (FOCS 2018)...
09/20/2018

OpenMPL: An Open Source Layout Decomposer

Multiple patterning lithography has been widely adopted in advanced tech...
02/21/2019

On the qubit routing problem

We introduce a new architecture-agnostic methodology for mapping abstrac...
07/01/2016

PyCells for an Open Semiconductor Industry

In the modern semiconductor industry, automatic generation of parameteri...
09/27/2018

Fast and Scalable Position-Based Layout Synthesis

The arrangement of objects into a layout can be challenging for non-expe...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recently, a quantum processor “Sycamore” from Google was shown to have a clear advantage over classical supercomputers on a problem named sampling random quantum circuit [1]

. It is widely expected that in the near future, quantum computing (QC) will outperform its classical counterpart solving more and more problems. Achieving this computational advantage, however, requires executing larger and larger quantum circuits. A quantum circuit consists of quantum gates acting on qubits. It was shown that only gates acting on one or two qubits are required for universal quantum computing

[2]. After a quantum circuit is designed, it needs to be mapped to a QC device. However, qubit connections required by two-qubit gates are often greatly constrained by QC device layouts. QC layout synthesis resolves this issue by producing an initial mapping from the qubits in the circuit to the physical qubits on the QC device, adjusting the mapping to legalize two-qubit gates by inserting some new gates, and scheduling all the gates. The resulting circuit preserves the original functionality and is executable on the QC device. Quantum circuits in earlier experiments used to be only dozens of gates on small devices, e.g., those with 5 or fewer qubits. In those cases, layout synthesis was usually realized by exhaustive enumeration. However, the task is increasingly intractable as the circuits get deeper and wider. Nowadays, a cutting-edge QC experiment requires the execution of a circuit of 53 qubits, 1113 single-qubit gates, and 430 two-qubit gates [1]. For a general circuit of this size, the number of possible initial mappings is , and the subsequent scheduling and legalization steps have large solution space as well. Clearly, design automation is necessary. In addition, the size of the circuits that QC hardware is able to execute has been scaling exponentially in the past few years. [3] The fast increase of hardware capacity presents an even bigger challenge to layout synthesis. Several layout synthesis tools are available and there are also benchmarks that help us to compare them. However, it is currently unknown how far these tools are away from the optimal solutions. In this paper, we present QUEKO benchmarks (quantum mapping examples with known optimal) which are quantum circuits with known optimal depth for the given QC device. Then, we evaluate four existing QC layout synthesis tools with QUEKO, namely 222https://github.com/CQCL/pytket, greedy router included in Cirq333https://github.com/quantumlib/Cirq, DenseLayout plus StochasticSwap included in Qiskit444https://Qiskit.org/, and the recent academic work by Zulehner et al.[4]. To our surprise, rather large optimality gaps are discovered even for feasible-depth circuits: 2-10x on a smaller device and 5-25x on a larger device. The optimality gaps revealed in this study have a strong implication. If we can consistently halve the circuit depth by better layout synthesis, we effectively double the decoherence time of a QC device, which is equivalent to a large advancement in experimental physics and electrical engineering. Therefore, the gaps call for more research investments into QC layout synthesis. To draw a parallel, the VLSI CAD optimality study conducted more than 15 years ago, using the PEKO benchmarks [5], revealed that the optimality gaps of the leading academic and industrial placers were very large at that time. It spurred further research investment resulting in wirelength reduction equivalent to two or more generations of Moore’s Law scaling, but in a more cost-efficient way [6]. The rest of this paper is organized as follows: Sec. 2 reviews relevant background of QC, Sec. 3 formulates the QC layout synthesis problem; Sec. 4 reviews related work; Sec. 5 provides the construction of QUEKO benchmarks; Sec. 6 evaluates aforementioned tools with QUEKO; Sec. 7 proves the NP-completeness of QC layout synthesis; Sec. 8 gives conclusion.

2 Background

2.1 Qubits

A qubit is in a quantum state

represented by a vector in two-dimensional Hilbert space with

-norm equals

(1)

where the two basis vectors are and

. A quantum state of multiple qubits lies in the tensor product of individual Hilbert spaces. For instance, a general two-qubit state

is

(2)

where we omit the tensor product notation between s for convenience. A joint state of two individual qubits is

(3)

2.2 Quantum Gates

A quantum gate transforms an input state to an output state. For example, some common single-qubit gates are , , and . means transpose complex conjugate.

(4)

Two common two-qubit gates are and (also named ).

(5)

NAND gates are sufficient for universal classical computing. For universal quantum computing, there are multiple complete gate sets. Table 1 lists three such sets chosen by QC frameworks Cirq\getrefnumbernt:cirq\getrefnumbernt:cirqfootnotemark: nt:cirq, Qiskit\getrefnumbernt:qiskit\getrefnumbernt:qiskitfootnotemark: nt:qiskit, and pyQuil555https://github.com/rigetti/pyquil. The exact matrix representations of these gates are not specified here because they are irrelevant to the purpose of this paper.

Framework Single-qubit gate Two-qubit entangling gate
Cirq (Google) , phased power
Qiskit (IBM) ,
pyQuil (Rigetti)
Table 1: Complete quantum gate set examples

Another important gate is , or Toffoli, gate which is universal for reversible logic and thus essential for QC logic synthesis [7]. It is a quantum gate on three qubits and can be decomposed into a set of single-qubit and two-qubit gates as shown in the following subsection.

(6)

2.3 Quantum Circuit

In QC, a circuit or program is usually input as a piece of QASM code [8], e.g., Fig. (a)a. The code is rather simple to read, merely specifying each gate sequentially like instructions in traditional assembly language. We thus define a quantum circuit to be a list of quantum gates .

(a) QASM code of Toffoli circuit
(b) 1D diagram of Toffoli circuit (c) Two-qubit gate set of Toffoli circuit
Figure 4: Toffoli circuit (Single-qubit gates are colored gray. Identical two-qubit gates applied at different times have the same color, e.g., and are orange because they are both but at different times.)

It is important to note that the qubits in a quantum circuit are logical qubits denoted as . We do not use “logical” to indicate error correction in this paper. Additionally, we denote qubit count by , gate count by , single-qubit gate count by , and two-qubit gate count by . For instance, in Fig. 4, , , , . We use the notation of gate like a set of qubits. We say the cardinality if is a single-qubit gate and if is a two-qubit gate. denotes the set of qubits involved in or . denotes the set of qubits involved in and . For instance, in Fig. 4, , , , , , and . A 1D circuit diagram can also represent the circuit, e.g., Fig. (b)b. In such a diagram, each wire stands for a logical qubit. We draw the control qubit of gate as and the target qubit as . The gates in the diagram are executed from left to right. Gates aligned vertically are executed simultaneously. The 1D diagram provides some primitive timing information, but not explicitly. Due to its 1D nature, the diagram cannot clearly show some features of the quantum circuit. For instance, and can be executed simultaneously, but we separate them by some horizontal distance to avoid overlapping in the diagram.

2.4 Quantum Computing Device Representation

We represent the layout of a QC device with a device graph where each node stands for a physical qubit and each edge stands for a connection that enables two-qubit entangling gates, i.e., we can only perform such gates on two physical qubits that are connected. This graph is also named as coupling graph or qubit connectivity. Device graphs used in this paper are shown in Fig. 10.

(c) IBM’s Tokyo device graph
(d) IBM’s Rochester device graph
(e) Google’s Sycamore device graph
Figure 10: QC device examples

3 Problem Formulation

Layout synthesis is divided into two sub-tasks: initial placement that produces an initial mapping from logical qubits to physical qubits ,and gate scheduling that decides when and where to execute each input gate and insert SWAP gates to make two-qubit gates legal on the device graph.

(a) Initial placement for Toffoli circuit
(b) Scheduled Toffoli circuit
Figure 13: Quantum computing layout synthesis for Toffoli circuit on IBM’s Ourense device

3.1 Initial Placement

During initial placement, we need to find a mapping from logical qubits in the quantum circuit to physical qubits on the device that benefits subsequent gate scheduling. If the two-qubit gate set, consisting of all the two-qubit gates, can be embedded in the device graph during initial placement (e.g., Fig. (c)c can be embedded in Fig. (c)c), gate scheduling does not necessarily need to insert any gates. However, the case is not so ideal in general. For example, if we want to map the Toffoli circuit as shown in Fig. (b)b onto device Ourense as shown in Fig. (a)a, there must be some additional gates, since the device graph does not contain any triangles, but the two-qubit gates in Fig. (c)c forms a triangle. A valid initial placement in this case is given in Fig. (a)a where , , and .

3.2 Gate Scheduling

Given a quantum circuit , e.g., Fig. (b)b, gate scheduling produces the spacetime coordinates for each gate. The coordinates specify when and where the gates are applied. We say that a gate is scheduled to cycle if its time coordinate is . For a single-qubit gate, the space coordinate is a physical qubit, i.e., ; for a two-qubit gate, it is an edge in the device graph, i.e., . SWAP gates may need to be inserted during gate scheduling to ensure that all two-qubit gates are executable. The input gate plus the inserted SWAP gates constitute the scheduled gate list . Since only SWAP gates are inserted and all the input gates are contained in the scheduled gate list, the functionality of input circuit remains unchanged after the layout synthesis process. Additionally, gate scheduling must respect dependencies in the quantum circuit. If gate acts on qubit , then can only be executed after all prior gates, which act on qubit , are executed. A valid but not necessarily optimal gate scheduling example is given in Fig. (b)b. The time coordinates for all the gates are displayed at the bottom, e.g., and . The space coordinates can be inferred from the mapping, e.g. , , , and . There is an injective map from the original gates to the scheduled gates: for to and for to such that for to . The three gates in the dashed box , , and constitute a SWAP gate. The adjusted mapping is shown after the them. The SWAP gate is inserted so that and are on connected qubits and , thus executable.

3.3 Formal Definition of (Depth-Optimal) Layout Synthesis Problem in Quantum Computing

Input

A device graph and a list of quantum gates acting on logical qubit set . All the input gates are in the implentable gate set of the device, e.g., a set from Table 1. Logical qubits are less or equal than physical qubits, i.e., .

Output

An initial mapping , and a scheduled quantum circuit consists of a new list of gates , including SWAP gates, where each gate has a spacetime coordinate . We use tilde to denote that a gate is scheduled from here on.

Constraints
  • Feasible two-qubit gates: all the two-qubit gates in the scheduled circuit must be on two qubits connected in the device graph. Formally, for to , if , then .

  • Executing all gates: because we assume any circuit optimization/simplification is done prior to layout synthesis, all input gates should be executed. Formally, there is an injective map such that for to .

  • Respecting dependencies: for to , if and then .

Objective

Minimizing circuit depth , which is the maximal time coordinate of all the scheduled gates, i.e., . In this paper, we use depth as the default objective but other objectives can be used as well, e.g., the number of additional gates , or the fidelity of the scheduled circuit. The output and the constraints of the problem are independent of the objective. However, with other objectives like fidelity, more input information may be required.

4 Related Work

In the most general sense, the task of QC layout synthesis is generating a quantum circuit that satisfies QC device constraints and fulfills the functionality of the input circuit. Related works on this problem include [9, 10, 11, 12, 13, 14, 15, 16, 17, 4, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]. These works may have some variations on the problem in mind. [9, 11, 12, 13, 24, 14, 15, 32] focus on multidimensional array device graphs (linear array for 1D, grid for 2D, and so on). [28] focuses specifically on SU(4) circuits and includes post-synthesis optimization. [34] focuses on adjusting the mapping after synthesis to improve fidelity. [19] considers some commutation rules. [25, 26, 27] consider the scheduling of QAOA circuits. The order of some two-qubit gates in QAOA circuit can be exchanged even if there are dependencies, since they commute, which is not applicable to general quantum circuits. The produced quantum circuit should be not only executable, but also efficient. The efficiency can be measured with different metrics. The metric can be the additional “cost”, which is usually proportional to the number of additional gates[10, 11, 12, 13, 24, 14, 15, 17, 18, 22, 19, 4, 28, 21, 16, 20]; or circuit depth like this paper, since the qubits can only function well within the decoherence time [9, 25, 27, 26, 23]; or circuit fidelity, since nowadays a common practice is executing a circuit multiple times and analysing the statistics of the results [30, 34, 32, 33]; or a mix of the above [31, 29]. Detailed discussions of the complexity of QC layout synthesis can be found in Sec. 7. These discussions indicate that large scale instances cannot be solved both exactly and efficiently. From the perspective of solution techniques, the current works can be divided into two categories. The first group focuses on deriving the exact solution for moderate-sized instances with the help of solvers [23, 27, 25, 26, 24, 20, 16, 13, 32, 33]. [24, 16] use a PBO (pseudo Boolean optimizater) to decide the SWAP insertion but do not explicitly schedule the gates. The same goes for [13, 20], which use a MIP (mixed integer programming) solver and a SMT (satisfiability modulo theories) solver correspondingly. [27, 25, 26] use a temporal planner to schedule specifically QAOA circuits. The closest previous works concerning this paper are [23, 32, 33]. However, [32, 33] use a SMT solver to maximize fidelity. [23] splits circuits into “levels” and inserts gates to transform the mapping between the levels. This model of quantum circuit may not yield an optimal solution. Under this imperfect “levels” model, [23]

aims to derive a depth-optimal solution with integer linear programming. The second group of related works use heuristic search techniques

[4, 10, 9, 29, 31, 11, 12, 14, 15, 17, 18, 22, 19, 28, 30, 21]. We only discuss the works targeting general device layouts below [4, 10, 29, 31, 17, 21, 18, 22, 19, 28, 30]. The general approach is splitting the circuit into small sub-circuits for which the layout synthesis can be done efficiently, and then searching for the mapping transformation between these sub-circuits. A sub-circuit can be a “level” or “layer” mentioned in the last paragraph [4, 21, 18, 31, 22, 29, 30], a set of several levels [10], several levels but for a few specific qubits [28], or individual gates [17, 19]. In order to find the mapping transformation, [17] inserts SWAP gates to move the two qubits that required by the next two-qubit gate in the shortest path; [22, 19, 29] also consider distances between qubits of further two-qubit gates; [30] additionally considers fidelity in the qubit movements; [4, 28] use the sum-of-distances plus the number of SWAP gates as the cost function in A* search; [31] uses bidirectional search; [18] uses a 4-approximation algorithm; [21] exploits some existing approximate solution of token swapping; [10] recursively considers SWAP gates as cuts in the device graph. The complexity also brings difficulty to the evaluation of these solutions. Currently, the benchmarks usually are quantum circuit libraries of some realistic functions, e.g., RevLib [35], or certain random circuits, which are thought to be the worst-case scenario, e.g., SU(4) circuits [28]. So far, researchers can only compare against each other, but do not know how far they are from the optimum. This paper aims to fill in this gap. The QC layout synthesis problem is still quite new to compiler and design automation communities, so the name of the problem varies. It can be placement [10, 13], routing [22], compiling quantum circuits [19, 28, 32, 25, 26, 27, 33], quantum circuit transformation [18], mapping circuits to QC architectures [4, 16, 24, 20, 29, 23, 31], conversion [11] or optimization [12] of circuits in QC architecture, realization of quantum circuits [14, 15], or qubit allocation [17, 30, 34, 21].

5 QUEKO Benchmarks

This work is inspired by PEKO [5], placement examples with known optimal. Placement is a crucial step in classical integrated circuit design, where modules are placed on a chip with objective of minimizing expected total wirelength. This is an NP-hard problem to solve in general, but generating benchmarks with know optimal solutions proves to be feasible. Similarly, for a generic input quantum circuit and a generic device graph, finding the scheduled circuit with optimal depth is NP-complete, which will be proved in Sec. 7. However, it is feasible to construct some benchmarks with known optimal solution. Given a target device graph and a target depth , we can construct an depth-optimal circuit. Then, by re-labelling the qubits, we derive a QUEKO benchmark. Additionally, QUEKO has a fully-customizable feature: gate density vector . The two components intuitively stand for the densities of single-qubit gates and two-qubit gates in the whole circuit. Suppose a circuit has logical qubits, single-qubit gates, two-qubit gates, and a longest dependency chain of , then and . For example, in Fig. (b)b, , , , and , so and . Likewise, we can extract from other existing circuits with known functionalities, so that the QUEKO benchmarks imitate some real-world circuits and, at the same time, have known optimal depths. The construction of QUEKO, as shown in Algorithm 1, starts with checking the validity of input data by calculating the number of single-qubit and two-qubit gates and . If , then there would be too few gates to generate a circuit with depth ; if , then there would be too many gates for the given depth and device graph. We define the matching bound of a graph to be the minimal size of maximal matchings of . This means we can find at least edges in that pairwisely share no vertices. If , then there could be too many two-qubit gates for the given depth and device graph. In short, if , , or , we return an error to reject the input data. Otherwise, we proceed to three phases: backbone construction, sprinkling, and scrambling.

0:  a device graph with and its matching bound , a depth target , and a gate density vector
0:  QUEKO benchmark , where and are the numbers of single-/two-qubit gates
1:  ,
2:  if  or or  then
3:     return  error: input data not admissible
4:  end if
5:  , // and are how many single-/two-qubit gates we have used // Backbone construction phase
6:  for  to  do
7:      // randomly decide the type of the gate
8:     if  and  then
9:        
10:        while  and  do
11:           
12:        end while
13:        ,
14:     else
15:        
16:        while  and  do
17:           
18:        end while
19:        ,
20:     end if
21:  end for// Sprinkling phase
22:  for  to  do
23:     
24:     if  and  then
25:        
26:        while  such that and  do
27:           
28:        end while
29:        
30:     else
31:        
32:        while  such that and  do
33:           
34:        end while
35:        
36:     end if
37:  end for
38:   according to , to // Scrambling phase
39:   a random mapping from to
40:  for  to  do
41:     if  then
42:        
43:     else
44:        
45:     end if
46:  end for
47:  return  
Algorithm 1 QUEKO construction
(a) Device graph
(b) Backbone construction phase
(c) Sprinkling phase
(d) Scrambling phase
(e) Output circuit 1D diagram
(f) Output QASM code
Figure 20: QUEKO construction visualization

5.1 Backbone Construction Phase

This phase “grows” a sequence of gates, each depending on the previous one, constituting a dependency chain of length . This chain serves as the “backbone” of the circuit. For example, we start from the device graph as Fig. (a)a (which is just Fig. (a)a rotated), and pick three executable gates , , and whose spacetime coordinates are , , . They constitute a dependency chain of length , since all of them act on . This is shown in Fig. (b)b, where gates at different cycles are put on different “slices” from left to right. The “backbone” is colored green. To be more rigorous, we first choose a random node or edge of as . In every iteration afterwards, we randomly choose that overlaps with . Thus, , which enforces by dependency constraint. On the other hand, since is executable, it can at most take a single cycle, i.e., the optimal . Gate sequence constitutes a dependency chain of length . Because of this “backbone”, the final depth of the scheduled circuit cannot be lower than . Note that we do not need to use any SWAP gates for backbone construction.

5.2 Sprinkling Phase

The backbone construction phase uses gates in total, we then randomly “sprinkle” the rest gates, e.g., shown in Fig. (c)c. We randomly select spacetime coordinates that does not overlap with any existing gates with time coordinate . After sprinkling, a circuit with gates is created. Its gates are all executable; its depth is ; its gate density vector approximates . (There could be minor rounding errors in the ceiling function.) It is worthy of noting that though only one longest dependency chain is explicitly generated in the backbone construction phase, the sprinkling phase may implicitly generate more. For example, depends on ; if we “sprinkles” a gate on at cycle , then another dependency chain of length would exist in the output circuit. The higher the gate densities, the more likely that these implicit longest dependency chains are generated.

5.3 Scrambling Phase

As shown in Fig. (d)d, we sort all the gates by their time coordinate and apply , a random mapping from physical qubits to indices. For instance, first we map cycle : and ; then cycle : , …; then cycle … The result is a QUEKO benchmark, as shown in Fig. (e)e and Fig. (f)f. A QC layout synthesis tool has to reverse the scrambling to find the depth-optimal solution, which is nontrivial.

6 Experiment

6.1 Experimental Setup

To evaluate QC layout synthesis tools with QUEKO, device graphs, depths, and gate density vectors are required. We specify the choice of these parameters and the choice of tools to evaluate in this subsection. All the experiments were run on a Ubuntu 16.04 server, which has two Intel Xeon E5-2699v3 as CPUs and 128GB main memory. The QUEKO benchmarks are made open source\getrefnumbernt:queko\getrefnumbernt:quekofootnotemark: nt:queko under the BSD license.

6.1.1 Device Graph

We used representative devices from three different QC hardware providers. Sycamore from Google [1], Tokyo and Rochester from IBM666https://www.ibm.com/quantum-computing/technology/systems, and Aspen-4 from Rigetti777https://www.rigetti.com/qpu. The graphs of these devices are shown in Fig. 10. Sycamore has 54 qubits, of which 53 are active; Rochester also has 53 qubits. Both of them are state-of-the-art devices, but Sycamore has richer connectivity. Aspen-4 has 16 qubits, and Tokyo has 20 qubits. They are both highly competitive devices, but Tokyo has greater connectivity. Also, we have only listed superconducting devices because they are by far the most advanced QC devices. This does not mean that QUEKO cannot generalize to other technologies such as quantum dot888https://sqc.com.au because our approach is valid as long as the basic quantum gates of this technology are single-qubit and two-qubit gates.

6.1.2 Depth

We constructed two sets of benchmarks with different depth ranges. The first set has depths from 5 to 45, which is the near-term feasible benchmarks (BNTF). In fact, one of the largest quantum circuits executed nowadays has depths 41 [1], which is about the same with the upper bound of BNTF. We intended to find out the layout synthesis performance within the current execution capacity. The second set of benchmarks, denoted as BSS has depth from 100 to 900 which are benchmarks for scaling study. BSS represents the performance of these tools when the decoherence time of QC device improves in the future.

6.1.3 Gate Density Vector

We picked two special gate density vectors in the experiment: based on the quantum circuits used in Google’s quantum supremacy experiment [1], denoted “QSE” below, and based on the Toffoli circuit, denoted “TFL” below. It is beneficial to study QSE, since it is the only circuit so far with which experimental QC has shown a clear advantage. We chose the TFL because existing QC logic synthesis algorithms are based largely on reversible logic synthesis, which uses TFL as a fundamental element [7]. We also swept throught possible gate density vectors and generated benchmarkss for impact of gate density (BIGD).

6.1.4 Layout Synthesis Tools

Currently, Google, IBM, and Rigetti are considered front-runners of superconducting QC. Inside their QC frameworks (Cirq, Qiskit, and pyQuil), there are tools for layout synthesis. Unfortunately, we were unable to breakdown the pyQuil compilation into layout synthesis and optimization. So pyQuil was excluded from the experiments. We also included a recent academic work from Zulehner et al. [4]. The routing module in Cirq is designed to solve layout synthesis problem. So far, only one router named greedy has been released in the development version. We used greedy router in Cirq version 0.7.0 as one of the layout synthesis tools. Qiskit offers the most precise control over the so-called “transpiler”. The transpilation is divided into individual passes, and users can define their own “pass manager” to make use of various transpiling modules that are offered. For the layout synthesis problem, there are Layout modules generating initial mapping and Swap modules inserting SWAP gates to the circuit to enable two-qubit gates. Among the various combinations, we chose DenseLayout and StochasticSwap, which seemed to have the best overall performance. The version of Qiskit in the experiments is 0.13.0. Another highly competitive “router”, , comes from Cambridge Quantum Computing. It is not open source, but provides convenient interface to Cirq, Qiskit, and pyQuil. We used version 0.3.0 in the experiments.

6.2 Experimental Results

6.2.1 Performance on BNtf

(a) Smaller device (Aspen-4), sparser circuits (TFL)
(b) Larger device (Sycamore), denser circuits (QSE)
Figure 23: Performance of QC layout synthesis tools on BNTF (Lines are average.)

In Fig. 23, the horizontal axis is the optimal depth and the vertical axis is the depth ratio, which is the depth of layout synthesis result divided by the optimal depth . In the case of a smaller device (Aspen-4) and sparser circuits (TFL), the optimality gap is about 10x for [4], 5x for Cirq, 2x for and 2x for Qiskit. In the case of a larger device (Sycamore) and denser circuits (QSE), the optimality gap is about 4x to 5x for Qiskit. The optimality gaps of Cirq and grow with depth correspondingly from 15x to 25x and from 3x to 7x. Zulehner et al. is not in the Fig. (b)b, because for the larger device, it took so much memory that the operating system constantly killed it before finishing. This also happens sometimes for the smaller device experiments, so there are less blue data points than the other types of points in Fig. (a)a.

6.2.2 Performance on BSs

(a) scaling behavior
(b) Qiskit scaling behavior
Figure 26: and Qiskit performance on BSS (Lines are average.)

We studied further scaling of and Qiskit on different devices as shown in Fig. 26. What is surprising is that, in general, as the depth increases, the depth ratio decreases at first and then converges to a value. The reason for this phenomenon may be that as the circuit deepens, the influence of initial placement gets smaller than the influence of SWAP insertion. The optimality gap that converged to is about 6x for Rochester, 5x for Sycamore, 3x for Tokyo, and 2x for Aspen-4; the corresponding data for Qiskit is about 4x, 3x, 2x, 2x. It can be seen that larger devices (Rochester and Sycamore) bring about larger optimality gaps. If the number of physical qubits are close, then richer connectivity (Sycamore versus Rochester) brings about smaller optimality gaps.

6.2.3 Performance on BIgd

(a) performance in depth ratio
(b) Qiskit performance in depth ratio
Figure 29: and Qiskit performance on BIGD (Data are 10-time average.)

To better understand the impact of gate density on layout synthesis performance, we fixed the device to Tokyo and the depth to 45, and swept through possible gate densities. The results are shown in Fig. 29. Fixing a column, the single-qubit gate density increases as we go down, Qiskit seems to be rather insensitive to this change, which is sensible since the single-qubit gates do not induce difficulty in layout synthesis. However, is still sensitive to this change. Both tools are more sensitive to the change in the horizontal direction than in the vertical direction. Since the challenge to layout synthesis comes mainly, if not solely, from the two-qubit gates, this result is expected.

7 Complexity

Seeing the large optimality gap, it is natural for us to investigate the computational complexity of the depth-optimal QC layout synthesis problem, which was unknown till this point. Several related results are shown, e.g., determining the minimal number of SWAP gates to insert is NP-complete. [17] proves this theorem by reduction from subgraph isomorphism problem. [10] proves this theorem by reduction from Hamiltonian cycle problem. The NP-completeness of depth-optimal QC layout synthesis for QAOA circuits is proven in [36] by reduction from 3-SAT. In this section, we prove this for general quantum circuits by reduction from Hamiltonian cycle problem, as Theorem 1 states.

Theorem 1.

Depth-optimal QC layout synthesis is NP-complete.

Proof.

The original QC depth-optimal layout synthesis problem is not easier than its decision version: input and constraints remain the same; but output whether the depth of the scheduled circuit can be lower or equal to . Inspired by [10], we show that the problem of determining whether a Hamiltonian cycle exists in a graph is reducible to the QC depth-decision layout synthesis problem. The former is NP-complete, so the latter is also NP-complete. Suppose the graph for the Hamiltonian cycle problem is , where . We construct a depth-decision QC layout synthesis problem using as the device graph and as the target depth. The input circuit is “levels” of gates. Level contains a two-qubit gate . All the other logical qubits at level are occupied by single-qubit gates so that the input circuit is “full”. If there exists a Hamiltonian cycle in , , then let the initial mapping be for to . It is easy to see that, with this mapping, all the gates in the constructed circuit can be executed. On the other hand, if there exists a scheduled circuit with depth within , we first claim that the mapping cannot change during the execution of the circuit. Every gate in a level depends on some gate in the last level. So every gate in level has a dependency chain of length , which is the earliest cycle it can be scheduled to. This means, if any SWAP gates are inserted in gate scheduling, certain dependency chain must lengthen and the depth of the scheduled circuit is larger than . Therefore, if a solution within cycles exists, each gate in level must be scheduled at exactly cycle and there can be no SWAP gates inserted. It is also easy to see that the input gates cannot constitute any SWAP gates. Therefore, the mapping from logical to physical qubits remains throughout all the cycles in the scheduled circuit. The gates being all executable means that they are mapped to edges of . This means , ,…, , is a Hamiltonian cycle in . In conclusion, we established the equivalence between a Hamiltonian cycle in and the existence of QC layout synthesis solution within depth . Thus, the Hamiltonian cycle problem is reducible to the QC layout synthesis problem. The latter is NP-complete, since the former is known to be NP-complete. ∎

8 Conclusion and Future Work

In this paper, we formulated the problem of quantum computing layout synthesis and proved its NP-completeness. We constructed QUEKO benchmarks, each has a known optimal depth for the given device. With QUEKO, we examined four existing quantum computing layout synthesis tools, greedy router in Cirq, , DenseLayout plus StochasticSwap in Qiskit, and Zulehner et al. [4] and showed rather surprising results. Despite over ten years of research and development efforts by both academia and industry, the current QC compilation flow is far from optimal. In fact, even combining the best performances of all tools evaluated, the optimality gaps range from 2x to 25x for circuits of feasible depth on existing devices. These gaps reveal that there is substantial room for research into QC layout synthesis, potentially equivalent to an order of improvement of the decoherence time, which would require much higher investment in quantum device technologies to achieve. We plan to use QUEKO benchmarks as a guide to better layout synthesis tools. In addition, we plan to extend our research to construct examples with known optimal solutions for fidelity optimization, as multiple studies have shown that fidelity is a very important metric for quantum circuits in the NISQ era.

Acknowledgement

The authors would like to thank Iris Cong and Nengkun Yu for valuable comments on the manuscript. This work is partially supported by NEC under the Center for Domain-Specific Computing Industrial Partnership Program.

References

  • [1] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. Buell et al., “Quantum supremacy using a programmable superconducting processor,” Nature, vol. 574, no. 7779, pp. 505–510, 2019.
  • [2] M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information.   Cambridge, UK: Cambridge University Press, 2010.
  • [3] J. Gambetta and S. Sheldon, “Cramming more power into a quantum device,” https://www.ibm.com/blogs/research/2019/03/power-quantum-device/, Mar. 2019.
  • [4] A. Zulehner, A. Paler, and R. Wille, “Efficient mapping of quantum circuits to the IBM QX architectures,” in 2018 Design, Automation Test in Europe Conference Exhibition (DATE), March 2018, pp. 1135–1138.
  • [5] Chin-Chih Chang, J. Cong, M. Romesis, and Min Xie, “Optimality and scalability study of existing placement algorithms,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, no. 4, pp. 537–549, April 2004.
  • [6] “‘Huge opportunity’ in IC design optimization gained by Semiconductor Research Corporation, National Science Foundation: CAD innovation could save industry billions,” https://www.src.org/newsroom/press-release/2007/41/, Dec. 2007.
  • [7] V. V. Shende, A. K. Prasad, I. L. Markov, and J. P. Hayes, “Synthesis of reversible logic circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 22, no. 6, pp. 710–722, June 2003.
  • [8] A. W. Cross, L. S. Bishop, J. A. Smolin, and J. M. Gambetta, “Open quantum assembly language,” 2017.
  • [9] M. Whitney, N. Isailovic, Y. Patel, and J. Kubiatowicz, “Automated generation of layout and control for quantum circuits,” in Proceedings of the 4th International Conference on Computing Frontiers, ser. CF ’07.   New York, NY, USA: Association for Computing Machinery, 2007, pp. 83–94. [Online]. Available: https://doi.org/10.1145/1242531.1242546
  • [10] D. Maslov, S. M. Falconer, and M. Mosca, “Quantum circuit placement,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 4, pp. 752–763, April 2008.
  • [11] Y. Hirata, M. Nakanishi, S. Yamashita, and Y. Nakashima, “An efficient conversion of quantum circuits to a linear nearest neighbor architecture,” Quantum Information & Computation, vol. 11, no. 1&2, pp. 142–166, 2011.
  • [12] A. Shafaei, M. Saeedi, and M. Pedram, “Optimization of quantum circuits for interaction distance in linear nearest neighbor architectures,” in Proceedings of the 50th Annual Design Automation Conference, ser. DAC ’13.   New York, NY, USA: Association for Computing Machinery, 2013. [Online]. Available: https://doi.org/10.1145/2463209.2488785
  • [13] A. Shafaei, M. Saeedi, and M. Pedram, “Qubit placement to minimize communication overhead in 2D quantum architectures,” in 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 2014, pp. 495–500.
  • [14] A. Kole, K. Datta, and I. Sengupta, “A heuristic for linear nearest neighbor realization of quantum circuits by SWAP gate insertion using -gate lookahead,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 6, no. 1, pp. 62–72, March 2016.
  • [15] ——, “A new heuristic for -dimensional nearest neighbor realization of a quantum circuit,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 1, pp. 182–192, Jan 2018.
  • [16] R. Wille, A. Lye, and R. Drechsler, “Optimal SWAP gate insertion for nearest neighbor quantum circuits,” in 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), Jan 2014, pp. 489–494.
  • [17] M. Y. Siraichi, V. F. D. Santos, S. Collange, and F. M. Q. Pereira, “Qubit allocation,” in Proceedings of the 2018 International Symposium on Code Generation and Optimization, ser. CGO 2018.   New York, NY, USA: Association for Computing Machinery, 2018, pp. 113–125. [Online]. Available: https://doi.org/10.1145/3168822
  • [18] A. M. Childs, E. Schoute, and C. M. Unsal, “Circuit transformations for quantum architectures,” in 14th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2019).   Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.
  • [19] T. Itoko, R. Raymond, T. Imamichi, A. Matsuo, and A. W. Cross, “Quantum circuit compilers using gate commutation rules,” in Proceedings of the 24th Asia and South Pacific Design Automation Conference, ser. ASPDAC ’19.   New York, NY, USA: Association for Computing Machinery, 2019, pp. 191–196. [Online]. Available: https://doi.org/10.1145/3287624.3287701
  • [20] R. Wille, L. Burgholzer, and A. Zulehner, “Mapping quantum circuits to IBM QX architectures using the minimal number of SWAP and H operations,” in Proceedings of the 56th Annual Design Automation Conference 2019, ser. DAC ’19.   New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi.org/10.1145/3316781.3317859
  • [21] M. Y. Siraichi, V. F. d. Santos, C. Collange, and F. M. Q. a. Pereira, “Qubit allocation as a combination of subgraph isomorphism and token swapping,” Proc. ACM Program. Lang., vol. 3, no. OOPSLA, Oct. 2019. [Online]. Available: https://doi.org/10.1145/3360546
  • [22] A. Cowtan, S. Dilkes, R. Duncan, A. Krajenbrink, W. Simmons, and S. Sivarajah, “On the qubit routing problem,” in 14th Conference on the Theory of Quantum Computation, Communication and Cryptography, 2019.
  • [23] D. Bhattacharjee, A. A. Saki, M. Alam, A. Chattopadhyay, and S. Ghosh, “MUQUT: Multi-constraint quantum circuit mapping on NISQ computers: Invited paper,” in 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Nov 2019, pp. 1–7.
  • [24] A. Lye, R. Wille, and R. Drechsler, “Determining the minimal number of SWAP gates for multi-dimensional nearest neighbor quantum circuits,” in The 20th Asia and South Pacific Design Automation Conference, Jan 2015, pp. 178–183.
  • [25] D. Venturelli, M. Do, E. Rieffel, and J. Frank, “Temporal planning for compilation of quantum approximate optimization circuits,” in

    Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17

    , 2017, pp. 4440–4446. [Online]. Available: https://doi.org/10.24963/ijcai.2017/620
  • [26] ——, “Compiling quantum circuits to realistic hardware architectures using temporal planners,” Quantum Science and Technology, vol. 3, no. 2, p. 025004, Feb. 2018. [Online]. Available: https://doi.org/10.1088%2F2058-9565%2Faaa331
  • [27] K. E. Booth, M. Do, J. C. Beck, E. Rieffel, D. Venturelli, and J. Frank, “Comparing and integrating constraint programming and temporal planning for quantum circuit compilation,” in Twenty-Eighth International Conference on Automated Planning and Scheduling, 2018.
  • [28] A. Zulehner and R. Wille, “Compiling SU(4) quantum circuits to IBM QX architectures,” in Proceedings of the 24th Asia and South Pacific Design Automation Conference, ser. ASPDAC ’19.   New York, NY, USA: Association for Computing Machinery, 2019, pp. 185–190. [Online]. Available: https://doi.org/10.1145/3287624.3287704
  • [29] A. Kole, S. Hillmich, K. Datta, R. Wille, and I. Sengupta, “Improved mapping of quantum circuits to IBM QX architectures,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1–1, 2019.
  • [30] S. S. Tannu and M. K. Qureshi, “Not all qubits are created equal: A case for variability-aware policies for NISQ-era quantum computers,” in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’19.   New York, NY, USA: Association for Computing Machinery, 2019, pp. 987–999. [Online]. Available: https://doi.org/10.1145/3297858.3304007
  • [31] G. Li, Y. Ding, and Y. Xie, “Tackling the qubit mapping problem for NISQ-era quantum devices,” in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’19.   New York, NY, USA: Association for Computing Machinery, 2019, pp. 1001–1014. [Online]. Available: https://doi.org/10.1145/3297858.3304023
  • [32] P. Murali, J. M. Baker, A. Javadi-Abhari, F. T. Chong, and M. Martonosi, “Noise-adaptive compiler mappings for noisy intermediate-scale quantum computers,” in Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’19.   New York, NY, USA: Association for Computing Machinery, 2019, pp. 1015–1029. [Online]. Available: https://doi.org/10.1145/3297858.3304075
  • [33] P. Murali, N. M. Linke, M. Martonosi, A. J. Abhari, N. H. Nguyen, and C. H. Alderete, “Full-stack, real-system quantum computer studies: Architectural comparisons and design insights,” in Proceedings of the 46th International Symposium on Computer Architecture, ser. ISCA ’19.   New York, NY, USA: Association for Computing Machinery, 2019, pp. 527–540. [Online]. Available: https://doi.org/10.1145/3307650.3322273
  • [34] A. Ash-Saki, M. Alam, and S. Ghosh, “QURE: Qubit re-allocation in noisy intermediate-scale quantum computers,” in Proceedings of the 56th Annual Design Automation Conference 2019, ser. DAC ’19.   New York, NY, USA: Association for Computing Machinery, 2019. [Online]. Available: https://doi.org/10.1145/3316781.3317888
  • [35] R. Wille, D. Große, L. Teuber, G. W. Dueck, and R. Drechsler, “RevLib: An online resource for reversible functions and reversible circuits,” in 38th International Symposium on Multiple Valued Logic (ismvl 2008), May 2008, pp. 220–225.
  • [36] A. Botea, A. Kishimoto, and R. Marinescu, “On the complexity of quantum circuit compilation,” in Eleventh Annual Symposium on Combinatorial Search, 2018.