Efficient ancilla-free reversible and quantum circuits for the Hidden Weighted Bit function

by   Sergey Bravyi, et al.

The Hidden Weighted Bit function plays an important role in the study of classical models of computation. A common belief is that this function is exponentially hard for the implementation by reversible ancilla-free circuits, even though introducing a small number of ancillae allows a very efficient implementation. In this paper, we refute the exponential hardness conjecture by developing a polynomial-size reversible ancilla-free circuit computing the Hidden Weighted Bit function. Our circuit has size O(n^6.42), where n is the number of input bits. We also show that the Hidden Weighted Bit function can be computed by a quantum ancilla-free circuit of size O(n^2). The technical tools employed come from a combination of Theoretical Computer Science (Barrington's theorem) and Physics (simulation of fermionic Hamiltonians) techniques.



There are no comments yet.


page 18


Simple circuit simulations of classical and quantum Turing machines

We construct reversible Boolean circuits efficiently simulating reversib...

Stochastic thermodynamics of computation

One of the major resource requirements of computers - ranging from biolo...

Characteristics of Reversible Circuits for Error Detection

In this work, we consider error detection via simulation for reversible ...

Optimizing Scrubbing by Netlist Analysis for FPGA Configuration Bit Classification and Floorplanning

Existing scrubbing techniques for SEU mitigation on FPGAs do not guarant...

Reaching the speed limit of classical block ciphers via quantum-like operator spreading

We cast encryption via classical block ciphers in terms of operator spre...

Classically-Verifiable Quantum Advantage from a Computational Bell Test

We propose and analyze a novel interactive protocol for demonstrating qu...

Reversibility vs local creation/destruction

Consider a network that evolves reversibly, according to nearest neighbo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The origins of the Hidden Weighted Bit function go back to the study of models of classical computation. This function, denoted , takes as input an -bit string and outputs the -th bit of , where is the Hamming weight of ; if the input weight is , the output is . It is best known for combining the ease of algorithmic description and implementation by classical Boolean circuits with the hardness of representation by Ordered Binary Decision Diagrams (OBDDs) [1]—a popluar tool in VLSI [2]. The difference between logarithmic-depth implementations of by circuits (recall that but ) and an exponential lower bound for the size of the OBDD [3] is startling two exponents. Relaxing the constraints on the type of Binary Decision Diagram considered or restricting the computations by circuits enables a multitude of implementations with polynomial cost [4].

The Hidden Weighted Bit function was first introduced in the context of reversible and quantum computations about 15 years ago by I. L. Markov and K. N. Patel (unpublished), and the earliest explicit mention dates to the year 2005 [5]. The original specification is irreversible, and required a slight modification to comply with the restrictions of reversible and quantum computations. Specifically, the Hidden Weighted Bit function was redefined to become the cyclic shift to the right by the input weight. We denote this reversible specification as . Formally, is defined as the cyclic shift of its input to the right by positions, where is the Hamming weight of . The following shows the truth table of -input :

000 100 010 110 001 101 011 111
000 010 001 101 100 011 110 111

Since its introduction, was used by numerous authors focusing on the synthesis and optimization of reversible and quantum circuits as a test case.

Despite a stream of improvements in the respective circuit sizes by various research groups [6, 7, 8, 9], the best known ancilla-free reversible circuits exhibit exponential scaling in the number of gates. The synthesis algorithms benefiting from the inclusion of additional gates, such as multiple-control multiple-target Toffoli, Fredkin, and Peres gates [5, 8, 10] also failed to find an efficient implementation without ancillae. In 2013, this culminated with the receiving the designation of a “hard” benchmark function [11]. A recent asymptotically optimal synthesis algorithm over the library with NOT, CNOT, and Toffoli gates [12], introduced in the year 2015, was also unable to find an efficient ancilla-free implementation. An ancilla-free quantum circuit can be obtained by employing an asymptotically optimal quantum circuit synthesis algorithm such as [13], but the quantum gate count appears to remain exponential and larger than what is possible to obtain through the application of the asymptotically optimal reversible logic synthesis algorithm [12].

The introduction of even a small number of ancillae changes the picture dramatically. Just ancillary (qu)bits suffice to develop a reversible circuit with gates [14]. Barrington’s theorem [15] allows one to obtain a polynomial-size reversible circuit using three ancillae. This polynomial-size three-ancilla reversible circuit can be obtained by computing the individual bits of the input weight through Barrington’s theorem, and using such bits logarithmically many times to control-SWAP the respective input (qu)bits into their desired positions. Finally, the existence of a polynomial-size quantum circuit using a single ancilla follows from [16].

State of the art, in both the classical reversible and quantum settings, thus points to an exponential difference in the gate count between circuits with no ancillae and circuits with a constant number of ancillae. In this paper, we demonstrate efficient implementations of the function by ancilla-free reversible and quantum circuits, thereby reducing these exponential differences to polynomial. Specifically, our reversible ancilla-free circuit requires gates and our quantum ancilla-free circuit requires gates. These results refute the exponential hardness belief and remove from the class of hard benchmarks.

We next sketch main ideas behind our ancilla-free circuits. We begin with the reversible circuit. Our construction works as follows. First, we show that the -bit function can be decomposed into a product of gates denoted , where is a symmetric Boolean function and is a subset with input bits. The gate cyclically shifts the -bit register if , and does nothing when . To implement , we first break it down into a product of gates of the form , where , each is a fixed set of Boolean -tuples, and are symmetric Boolean functions. The gate restricts the operation of the corresponding gate onto the set and simultaneously separates the set of bits being cycle-shifted from the set controlling these shifts. This allows to employ Barrington’s theorem [15] to implement the gates in the ancilla-free fashion by expressing them as polynomial-size branching programs with the input and computing into . Each instruction in such program realizes a permutation of -bit strings controlled by a single bit and it can thus be mapped into a reversible circuit over wires.

Next we introduce our quantum ancilla-free circuit. Let be the

-qubit unitary operator implementing the

function. By definition, , where is the cyclic shift of qubits. Suppose we can find an -qubit Hamiltonian such that and commutes with the Hamming weight operator . Then . Thus it suffices to construct a quantum circuit simulating the time evolution under the Hamiltonian . Since the cyclic shift is analogous to the translation operator for a particle moving on a circle, the Hamiltonian generating the cyclic shift is analogous to the particle’s momentum operator. This observation suggests that

can be diagonalized by a suitable Fourier transform. We formalize this intuition using the language of fermions and the fermionic Fourier transform, which is routinely used in Physics and quantum simulation algorithms

[17, 18]. The desired Hamiltonian such that is shown to have the form , where is a (modified) fermionic Fourier transform and is a simple diagonal Hamiltonian. We also show that commutes with the Hamming weight operator , so that . We demonstrate that each layer in this decomposition of can be implemented by a quantum circuit of size .

The rest of the paper is organized as follows. Section 2 introduces a simple modification of the known -gate -ancilla reversible circuit that requires gates and ancillary bits. Section 3 describes an -gate ancilla-free reversible circuit. Section 4 reports an ancilla-free -gate quantum circuit. These sections are independent of each other and can be read in any order. Appendices A and B prove technical lemmas stated in Section 4.

2 Reversible circuit of size using ancillas

We start with the description of a modification of the previously reported classical/reversible circuit that implements with gates and ancillae [14]. Compared to [14], our circuit features favorable asymptotics. However, it uses twice the computational/ancillary space.

Similarly to [14], we break down the computation into three stages:

  1. Compute the input weight .

  2. Apply controlled-SWAP gates to SWAP inputs into their correct position as specified by the .

  3. Restore the value of ancillary register to by appending the inverse of the stage 1.

Note that the stage 3. is omitted in [14], allowing a direct comparison to our circuit illustrated in Fig. 1. The difference between our construction and [14] is how we compute the input weight. Specifically, we use the same “plus-one” approach to calculate the weight into the ancillary register, however, we implement the integer increment function differently. Given input , , the resister , where the input weight is being computed into, and temporary storage , “increment by one” works as follows. If , apply . For :

  1. if : apply Toffoli gate to ; for from to apply the Toffoli gate ;

  2. if or apply ;
    else apply .

  3. if : for from down to apply the half adder, computed by the circuit . Apply .

In our implementation, the register is used to store necessary digit shifts. Advertised asymptotics follow by inspection of the above construction. We furthermore illustrated our circuit in Fig. 1 for .

Figure 1: 10-stage reversible circuit applying the -bit to . Each of first gate stages increments by one depending on the value of input variable, next Fredkin gate stages perform controlled-SWAP. Vertical red lines separate these stages. Not shown is Garbage uncomputation that can be performed by appending the inversion of the weight calculation circuit ( gate part).

3 Ancilla-free reversible circuit of size

In this section we show how to construct an ancilla-free classical reversible circuit of size implementing . We focus on , noting that optimal circuits with up to are already known.

Let be the total number of bits, and be the input. In some discussions where it is convenient, we label these bits by the integers . Suppose is a subset of bits and is a symmetric Boolean function (that is, depends only on the Hamming weight of ). Define a reversible gate

where the output is obtained from the input by applying the cyclic shift to the register if . Otherwise, when , the gate does nothing. Note that, because the symmetric function does not depend on the order of the bits, is a permutation of the set . Moreover, is an even permutation, since it is a product of length- cycles and each length- cycle is an even permutation.

Define to be a reversible gate that applies the cyclic shift of some bits defined by the cycle (where are all distinct) if the symmetric function evaluates to one and does nothing otherwise. We call the targets. We call a collection of -type gates a layer when the sets of their targets do not overlap.

We next construct by first expressing it as a circuit with the -type gates, then breaking down the -type gates into elementary reversible gates and -type gates, and finally expressing the -type gates in terms of the elementary reversible gates.

Lemma 1.

The -bit function can be implemented by an ancilla-free circuit with layers of -type gates.


We will create a circuit with layers numbered . At each layer, the gates take the form . Select the symmetric functions as follows: let iff the th power of in the binary expansion of the weight equals one. Note that are symmetric functions since the calculation of weight does not depend on the order the bits are added in. The function can now be expressed as


For any , let and . Then by elementary modular arithmetic,

and the targets of any two distinct in this product do not overlap. This shows that each of the factors in Eq. (1) can be written as a layer of -type gates. ∎

Figure 2: Implementation of the -bit cyclic shift using the gates and .

We next implement each of layers of cyclic shift gates in Lemma 1 as circuits with -type gates by expressing the cycles as products of length- cycles. Note that a length- cycle is always an even permutation and

is an odd permutation when

is even. It is not possible to implement an odd permutation as a product of even permutations. However, with one exception, the -type gates come in pairs (recall that their number, , is a power of two) and thus they can usually be paired up to form an even permutation that can then be decomposed into a product of length- cycles. The one exception is the leftmost gate in Eq. (1), , when is even. We handle this case first.

Figure 3: Implementation of , where .
Lemma 2.

can be implemented by a reversible circuit with elementary gates.


The Boolean function can be implemented on the top bit to control all bit SWAPs on the bottom bits, and it can be implemented on the bottom bit to control all bit SWAPs on the top bits. The number of controlled-SWAP gates required is , and the total number of the CNOT gates required to compute/uncompute the control register is . We illustrated this construction in Fig. 3 for . ∎

Lemma 3.

For :

  1. for , pairs of two gates can be implemented by an ancilla-free circuit using constantly many gates ;

  2. for odd the gate can be implemented by an ancilla-free circuit using gates .

  3. for even pairs of gates can be implemented by an ancilla-free circuit using gates ;


1. There are three cases to consider: , , and .

  • and can be implemented simultaneously by the circuit . This is equivalent to saying that the following permutation equality holds: . Note that the bit ‘’ can be found since . We will show only the permutation equalities in the rest of the proof, since it is trivial to translate those to circuits.

  • To implement a pair of gates and rely on the cycle product equality .

  • Cycles and can be obtained by the equality

    where first and second part require two gates each, as described in the cases and , for a total of four gates.

2. The goal is to develop a circuit with gates implementing the gate , where is odd. There are two cases to consider, and .

Case 1: . We want to implement the integer permutation given by the cyclic shift by the cyclic shifts of length . This can be done as follows,

This decomposition uses length-5 cycles, resulting in the ability to implement gate using gates. This construction is illustrated in Fig. 2 for .

Case 2: . Use the formula

Since we already implemented with gates in Case 1 above, this implementation requires gates.

3. The goal is to implement a pair of and where is even. Write

Here, requires two gates per item 1. case , and each of and requires gates per item 2.

Observe how the above proof implies that the number of gates required to implement each of stages in Eq. (1) for is between and . Thus, per Lemma 2, the total number of elementary and gates required to implement over qubits is between and .

We next show how to implement as a branching program, using Barrington’s theorem [15], by closely following the original proof. In preparation for using Barringon’s theorem, we first remove the dependence of the functions in on the variables inside the set , to allow the desired cyclic shift to be controlled by the values of variables outside the set itself. To accomplish this, note that acts trivially on the strings and ; those can be ignored. This leaves non-fixed by the operation 5-bit strings that can be partitioned into six disjoint subsets , and , with strings each. Every subset contains cyclic shifts of some fixed -bit string, and is defined as follows:


We implement by performing the cyclic shifts of a single subset per time.

First, let us introduce some more notations. Given a bit string , write , where is the restriction of onto the register and is the rest of . Let be the Hamming weight of bit strings in (note that all strings in the same subset have the same weight). Define a Boolean function such that iff appears in the binary expansion of . Then

Define a gate

that maps an input to an output according to the following rules:

  • if then ;

  • if and then ;

  • if and then is obtained from by cyclically shifting the elements of .

By definition, the cyclic shift of bits in the register can be realized by cyclically shifting elements of each subset for . Thus


Here the order in the product does not matter because the gates pairwise commute. Note that the dependence of function on the variables inside the set has now been removed, and we can proceed to implementing as a branching program, and finally mapping the instructions used by the branching program into reversible gates.

Recall some relevant notation used in Barrington’s paper [15]. Let be the group of permutations of numbers, . Given a -tuple of distinct integers and , we write to denote the -cycle. Let be the identity permutation. A branching program of length with Boolean input variables is a list of instructions with and , such that is applied if , and is executed when . Given a permutation , the branching program is said to -compute a Boolean function if executing the list of all instructions in the program results in (the identity permutation) for all inputs such that and permutation for all inputs such that .

Barrigton’s theorem asserts that any function in the class can be -computed by a branching program of polynomial size [15]. We next specialize the proof of the theorem to explicitly develop a short branching program that -computes the Boolean function . Recall that iff appears in the binary expansion of with being the weight of bit strings in . It suffices to develop a branching program computing the Boolean function with and

by appending at most two constant binary variables

encoding to the bit string .

While the original proof [15] explored the mapping of logarithmic-depth classical circuits over library, we focus on the classical circuits over 3-input 1-output and gates. Recall that the library is universal for classical computations if constant inputs are allowed.

Lemma 4.

Suppose is an -bit string and is the -th bit in the binary representation of . The function can be -computed by a branching program of size .


First, we describe a logarithmic-depth classical circuit that computes functions for the range of applicable values , and second, report expressions for and in the form of a branching program that can be used in the recursion [15, Proof of Theorem 1]. The length of the branching program computing is upper bounded by taking the maximal length of the program implementing or to the power of the circuit depth.

First, construct a classical circuit with and gates that implements . To do so, we develop a circuit that computes all bits of the , and for the purpose of implementing a given single Boolean component, discard all gates that compute the bits we are not interested in. Such operation does not increase the depth of the circuit, and may, in fact, decrease it slightly.

To find , we employ a circuit consisting of two stages. First, compose a circuit of depth with 3-input 2-output Full Adder gates by grouping as many triples of digits of same significance at each step as possible (note that and are implemented in parallel). We finish this first stage when the output contains two -digit integer numbers and such that . To analyze this circuit, it is convenient to group all bits needing to be added into the smallest set of integer numbers, and count the reduction in the number of integers left to be added by treating layers of gates as Carry-Save Adders [19, 20]. A Carry-Save Adder is defined as the 3-integer into 2-integer adder, which is implemented by applying the Full Adders to the individual components of the three integer numbers at the input. Since the number of integers left to be added changes by a factor of at each step, and every step is implemented by a depth- circuit, the depth of the first stage is . To find the individual components of , the second stage adds two -digit integer numbers and . This can be accomplished by any logarithmic-depth integer addition circuit in depth , such as [21]. The total depth is thus .

Next, construct -programs computing the and functions:


The branching program that -computes is created by recursively replacing gates and in the circuit constructed above with the branching programs Eq. (3) and Eq. (3), where each is either one of the primary input variables or one of the intermediate variables in the circuit computing , until all instructions are controlled by constants and primary variables . The recoding of branches of the program -computing a desired intermediate variable when (note how Eq. (3) and Eq. (3) -compute the gates, but not -compute them for arbitrary ) is accomplished in accordance with [15, Lemma 1]. The total length of the branching program is thus upper bounded by the size of longest branching program implementation of the basic gates used ( and ) raised to the power the depth of the circuit it encodes,

We conclude this section by summarizing the main result in a Theorem.

Theorem 1.

The -bit function can be implemented by an ancilla-free reversible circuit of size .


First, implement each instruction where is either a primary variable or a constant and the sets are defined per Eq. (2), using constantly many basic reversible gates. This can be accomplished by employing a reversible logic synthesis algorithm, e.g., [9]. Next, use Lemma 4 with and to implement all necessary gates, using a branching program with

instructions. Each such branching program requires basic reversible gates since every instruction requires constantly many basic reversible gates. Use six gates to implement one gate, using Eq. (3). Each thus costs basic reversible gates. Combine Lemma 1, Lemma 2, and Lemma 3 to implement using gates, implying the total basic reversible gate count of

4 Ancilla-free quantum circuit of size

Consider a register of qubits and let be the cyclic shift operator,

The hidden weighted bit function may be written as


In other words, implements the -th power of on the subspace with the Hamming weight . Here we show that can be implemented by an ancilla-free quantum circuit of the size . The circuit is expressed using Clifford gates and single-qubit -rotations.



be the Hamming weight operator. Our starting point is

Lemma 5.

Suppose for some -qubit Hamiltonian that commutes with . Then


Indeed, let be the subspace spanned by all basis states with the Hamming weight . The full Hilbert space of qubits is the direct sum . Let us say that an operator is block-diagonal if maps each subspace into itself. Since commutes with , we infer that is block-diagonal. Therefore and are also block-diagonal. Note that and have the same restriction onto . Thus and have the same restriction onto . By assumption, . Thus and have the same restriction onto . Likewise, is block-diagonal and the restriction of onto is . We conclude that and have the same restriction onto for all . Since both operators are block-diagonal, one has . ∎