QuantumFlow_Tutorial
A step-by-step tutorial of QuantumFlow, using the MNIST sub-dataset {3,6} and the 16-2-2 neural network as an example.
view repo
Along with the development of AI democratization, the machine learning approach, in particular neural networks, has been applied to wide-range applications. In different application scenarios, the neural network will be accelerated on the tailored computing platform. The acceleration of neural networks on classical computing platforms, such as CPU, GPU, FPGA, ASIC, has been widely studied; however, when the scale of the application consistently grows up, the memory bottleneck becomes obvious, widely known as memory-wall. In response to such a challenge, advanced quantum computing, which can represent 2^N states with N quantum bits (qubits), is regarded as a promising solution. It is imminent to know how to design the quantum circuit for accelerating neural networks. Most recently, there are initial works studying how to map neural networks to actual quantum processors. To better understand the state-of-the-art design and inspire new design methodology, this paper carries out a case study to demonstrate an end-to-end implementation. On the neural network side, we employ the multilayer perceptron to complete image classification tasks using the standard and widely used MNIST dataset. On the quantum computing side, we target IBM Quantum processors, which can be programmed and simulated by using IBM Qiskit. This work targets the acceleration of the inference phase of a trained neural network on the quantum processor. Along with the case study, we will demonstrate the typical procedure for mapping neural networks to quantum circuits.
READ FULL TEXT VIEW PDFA step-by-step tutorial of QuantumFlow, using the MNIST sub-dataset {3,6} and the 16-2-2 neural network as an example.
In the past few years, we have witnessed many breakthroughs in both machine learning and quantum computing research fields. On machine learning, the automated machine learning (AutoML) (Zoph and Le, 2016; Zoph et al., 2018) significantly reduces the cost of designing neural networks to achieve AI democratization. On quantum computing, the scale of the actual quantum computers has been rapidly evolving (e.g., IBM (IBM, 2020) recently announced to debut quantum computer with 1,121 quantum bits (qubits) in 2023). Such two research fields, however, have met the bottlenecks when applying the theoretical knowledge in practice. With the large-size inputs, the size of machine learning models (i.e., neural networks) significantly exceed the resource provided by the classical computing platform (e.g., GPU and FPGA); on the other hand, the development of quantum applications is far behind the development of quantum hardware, that is, it lacks killer applications to take full advantage of high-parallelism provided by a quantum computer. As a result, it is natural to see the emerging of a new research field, quantum machine learning.
Like applying machine learning to the classical hardware accelerators, when machine learning meets quantum computers, there will be tons of opportunities along with the challenges. The development of machine leering on the classical hardware accelerator experienced two phases: (1) the design of neural network tailored hardware (Zhang et al., 2015; Jiang et al., 2018; Zhang et al., 2018b; Jiang et al., 2019a; Li et al., 2016, 2017), and (2) the co-design of neural network and hardware accelerator (Jiang et al., 2019c; Jiang et al., 2019b; Yang et al., 2020a; Bian et al., 2020; Jiang et al., 2020a; Ding et al., 2020; Wu et al., 2019; Cai et al., 2018; Tan et al., 2019; Hao et al., 2019b, a; Zeng et al., 2020; Wu et al., 2020). To best exploit the power of the quantum computer, it would be essential to conduct the co-design of neural network and quantum circuits design; however, with the different basic logic gates between quantum circuit and classical circuit designs, it is still unclear how to design a quantum accelerator for the neural network.
In this work, we aim to fix such a missing link by providing an open-source design framework. In general, the full acceleration system will be divided into three parts, the data pre-processing and data post-processing on a classical computer, and the neural network accelerator on the quantum circuit. In the quantum circuit, it will further include the quantum state preparation and the quantum computing-based neural computation. In the following of this paper, we will introduce all the above components in detail and demonstrate the implementation using IBM Qiskit for quantum circuit design and Pytorch for the machine learning model process.
The remainder of the paper is organized as follows. Section 2 presents an overview of the full system. Section 3 presents the case study on the MNIST dataset. Insights are discussed in Section 4. Finally, concluding remarks are given in Section 5.
Figure 1 demonstrates three types of neural network design: (1) the classical hardware accelerator; (2) the pure quantum computing based accelerator; (3) the hybrid quantum and classical accelerator. All of these accelerators follow the same flow that the data will be first pre-processed, then the neural computation is accelerated, and finally, the output data will go through the post-processing to obtain the final results.
After the success of deep neural networks (e.g., Alexnet (Krizhevsky et al., 2017) and VGGNet (Simonyan and Zisserman, 2014)) in achieving high accuracy, designing hardware accelerator became the hot topic in accelerating the execution of deep neural networks. On the application-specific integrated circuit (ASIC), works (Du et al., 2015; Zhang et al., 2018a; Zhang and Garg, 2018; Zhang et al., 2019b; Chen et al., 2016) studied how to design neural network accelerator using different dataflows, including weight stationery, output stationery, etc. By selecting dataflow for a dedicated neural computation, it can maximize the data reuse to reduce the data movement and accelerate the process, which derived the co-design of neural network and ASICs (Yang et al., 2020b).
On the FPGA, work (Zhang et al., 2015) first proposed the tiling based design to accelerate the neural computation, and works (Jiang et al., 2018; Zhang et al., 2018b; Jiang et al., 2019a; Li et al., 2015) gave different designs and extended the implementation to multiple FPGAs. Driven by the AutoML, work (Jiang et al., 2019c) proposed the first co-design framework to involve the FPGA implementation into the search loop, so that both software accuracy and hardware efficiency can be maximized. The co-design philosophy also applied in other designs (Zhang et al., 2019a; Jiang et al., 2020d; Hao et al., 2019b, a) and in this direction, there exist many research works in further integrating the model compression into consideration (Lu et al., 2019; Jiang et al., 2020c), accelerating the search process (Li et al., 2020; Zhang et al., 2020),
Most recently, the emerging works in using the quantum circuit to accelerate neural computation. The typical work include (Francesco et al., 2019; Tacchino et al., 2020; Jiang et al., 2020b), among which the work (Jiang et al., 2020b) first demonstrates the potential quantum advantage that can be achieved by using a co-design philosophy. These works encode data to either qubits (Francesco et al., 2019) or qubit states (Jiang et al., 2020b) and use superconducting-based quantum computers to run neural networks. These methods have the following limitations: Due to the short decoherence times in the superconducting-based quantum computers
, the condition logic is not supported in the computing process. This makes it hard to implement a function that is not differentiable at all points, like the commonly used Rectified Linear Unit (ReLU) in machine learning models. However, it also has advantages, such as the design can be directly evaluated on an actual quantum computer, and there is no communication between the quantum-classical interface during the computation.
In the quantum circuit design, it includes two components: for quantum states preparation and for neural computation, as shown in Figure 1(b). After the component , it will measure the quantum qubits to extract the output data, which will be further sent to the data post-processing unit to obtain the final results.
To overcome the disadvantage of pure quantum computing and take full use of classical computing, the hybrid quantum-classical computing for machine learning tasks is proposed (Broughton et al., 2020)
. It establishes a computing paradigm where different neurons can be implemented on either quantum or classical
computers, as demonstrated in Figure 1(c). This brings flexibility in implementing functions (e.g., ReLU). However, at the same time, it will lead to massive data transfer between quantum and classical computers.This work focus on providing a full workflow, starting from the data pre-processing, going through quantum computing acceleration, and ending with the data post-processing. We will apply the MNIST data set as an example to carry out a case study.
Computing architecture and neural operation can affect the design. In this work, for the computing architecture, we focus on the pure quantum computing design, since it can be easily extended to the hybrid quantum-classical design by connecting the inputs and output of the quantum acceleration to the traditional classical accelerator; for the neural network, we focus on the multi-layer perceptron, which is the basic operation for a large number of neural computation, like the convolution.
In this section, we will demonstrate the detailed implementation of four components in the pure quantum computing based neural computation as shown in Figure 1(b): data pre-processing, quantum state preparation (), neural computation (), and data post-processing.
The first step of the whole procedure is to prepare the quantum data to be encoded to the quantum states. Kindly note in order to utilize qubits to represent
data, it has constraints on the numbers; more specifically, if a vector
ofdata can be arranged in the first column of a unitary matrix
, then for the initial state of , we can obtain by conducting , where represents the zero state with qubits.Listing 1
demonstrates the data conversion from the classical data to quantum data. We utilize the transforms in torchvision to complete the data conversation. More specifically, we create the ToQuantumData class in Line 5. It will receive a tensor (the original data) as input (Line 6). We apply Singular Value Decomposition (svd) provided by np.linalg to obtain the unitary matrix output_matrix (Line 14), then we extract the first vector from output_matrix as the output_data (Line 16), where the output_matrix represents
and the output_data represents . After we build the ToQuantumData class, we will integrate it into one “transform” variable, which can further include the data pre-processing functions, such as image resize (Line 20) and data normalization (Line 21). In creating the data loader, we can apply the “transform” to the dataset (e.g., we can obtain train data by using “train_data=datasets.MNIST(root=datapath, train=True,download=True, transform=transform)”).Theoretically, with the unitary matrix , we can directly operate the oracle on the quantum circuit to change states from the zero state to . This process is widely known as quantum-state preparation. The efficiency of quantum-state preparation can significantly affect the complexity of the whole circuit, and therefore, it is quite important to improve the efficiency of such a process. In general, there are two typical ways to perform the quantum-state preparation: (1) quantum random access memory (qRAM) (Lvovsky et al., 2009) based approach (Allcock et al., 2020; Kerenidis and Prakash, 2016) and (2) computing based approach (Sanders et al., 2019; Grover, 2000; Bausch, 2020). Let’s first see the qRAM-based approach, where the vector in will be stored in a binary-tree based structure in qRAM, which can be queried in quantum superposition and can generate the states efficiently. In IBM Qiskit, it provides the initialization function to perform quantum-state preparation, which is based on the method in (Shende et al., 2006).
In Listing 2, we give the codes to initialize the quantum states, using the unitary matrix which is converted from the original data in Listing 1(see Line 18). In this code snippet, we first create a 4-qubit QuantumRegister “inp” (line 6) and the quantum circuit (line 7). Then, we convert the input data to data_matrix, which is then employed to initialize the circuit using function UnitaryGate from qiskit.extensions. Finally, from line 10 to line 14, we output the states of all qubits to verify the correctness.
Now, we have encoded the image data (16 inputs) onto 4 qubits. The next step is to perform the neural computation, that is, the weighted sum with quadratic function using the given binary weights . Neural computation is the key component in quantum machine learning implementation. To clearly introduce this component, we first consider the computation of the hidden layer, which can be further divided into two stages: (1) multiplying inputs and weights, and (2) applying the quadratic function on the weighted sum. Then, we will present the computation of the output layer to obtain the final results.
Computation of one neural in the hidden layer
Stage 1: multiplying inputs and weights. Since the weight is given, it is pre-determined. We use the quantum gate to operate the weights with the inputs. The quantum gates applied here include the gate and the 3-controlled-Z gate with 3 trigger qubits. The function of such a 3-controlled-Z is to flip the sign of state , and the function of gate is to swap one state to another state.
For example, if the weight for state is . We operate it on the input follows three steps. First, we swap the amplitude of state to state using two gates on the first two qubits. Then, in the second step, we apply controlled-Z gate to flip the sign of the state . Finally, in the third step, we swap the amplitude of state back to state using two gates on the first two qubits. Therefore, we can transverse all weights and apply the above three steps to flip the sign of corresponding states. Kindly note that since the non-linear function is a quadratic function, if the number of is larger than , we can flip all signs of weights to minimize the number of gates to be put in the circuit.
Listing 3 demonstrates the procedure of multiplying inputs and weights. In the list, the function cccz utilizing the basic quantum logic gates to realize the 3-controlled-Z gate with 3 control qubits. The involved basic gates include Toffoli gate (i.e., CCX) and controlled-Z gate (i.e., CZ). Since such a function needs auxiliary (a.k.a., ancilla) qubits, we include 2 additional qubits (i.e., ) in the quantum circuit (i.e., ), as shown in Lines 19-20.
The function neg_weights_gate flips the sign of the given state, applying the 3-step process. Lines 11-13 complete the first step to swap the amplitude of the given state to the state of . Then, the cccz gate is applied to complete the second step. Finally, from line 15 to line 17, the amplitude is swap back to the given state.
With the above two functions, we traverse the weights to assign the sign to each state from Lines 21-27. Kindly note that, after this operation, the states vector changed from the initial state to where the states have the weights.
Stage 2: applying a quadratic function on the weighted sum. In this stage, it also follows 3 steps to complete the function. In the first step, we apply the Hadamard (H) gates on all qubits to accumulates all states to the zero states. Then, the second step swap the amplitude of zero state and the one-state . Finally, the last step applies the N-control-X gate to extract the amplitude to one output qubit
, in which the probability of
is equal to the square of the weighted sum.In the first step, the H gates can be applied to accumulate the amplitude of states, because the first row of is and the performs the multiplication between the matrix and the state vector . As a result, the amplitude of will be the weighted sum with the coefficient of .
Listing 4 demonstrates the implementation of the quadratic function on the weighted sum on Qiskit. In the list, function ccccx is based on the basic Toffoli gate (i.e., CCX) to implement a 4-control-X gate to swap the amplitude between the zero state and the one-state . In Line 14, is an additional output qubit in the quantum circuit (i.e., ) to hold the result for the neural computation, which is added in Lines 10-11.
For a neural network with neurons in the hidden layer, it has sets of weights. We can apply the above neural computation on set of weights to obtain output qubits.
Computation of one neuron in the output layer
With these output qubits, we have two choices: (1) go to the classical computer and then encode the output of these outputs to
qubits and then repeat these computations for the hidden layer to obtain the final results; (2) continuously use these qubits to directly compute the outputs, but the fundamental computation needs to be changed to the multiplication between random variables because the data associated with a qubit represents the probability of the qubit to be
state.In the following, we demonstrate the implementation of the second choices (fundamental details please refer to (Jiang et al., 2020b; Tacchino et al., 2020)). In this example, we follow the network structure with 2 neurons in the hidden layer. In addition, we consider there is only one parameter for the normalization function using one additional qubit for each output neuron. Let be the outputs of 2 neurons in the hidden layer; let be the weights for the output neuron in the layer; let norm_flag_1 and norm_para_1 be the normalization related parameters for the output neuron. Then, we have the following implementation.
In the above list, it follows the 2-stage pattern for the computation in the hidden layer. If we modify all sub-index to , then we can obtain the quantum circuit for the second output neuron.
After all outputs are computed and stored in the out_q_1 and out_q_2 qubits, we can then measure the output qubits, run a simulation or execute on the IBM Q processors, and finally obtain the classification as follows.
Listing 6 demonstrate the above three tasks. The fire_ibmq function can execute the constructed circuit in either simulation or a given IBM Q processor backend. The parameter “shots” defines the number of execution to be executed. Finally, the counts for each state will be returned. On the implementation, the probability of each qubit (instead of each state) gives the probability to choose the corresponding class. Therefore, we create the “analyze” function to get the probability for each qubits. Finally, we obtain the classification results by extracting the index of the max probability in the “class_prob” set.
Kindly note that the Listing 6 can also be applied for the hybrid quantum-classical computing.
From the study of implementing neural networks onto the quantum circuits, there are several insights in terms of achieving quantum advantages, listed as follows.
Data encoding: this case study encodes data to quantum qubits, which provides the opportunity to achieve quantum advantage for conducting inference for each input. An alternative way is to encode data to qubits, however, with the consideration that each data needs to be operated in the neural computation, such an encoding approach can hardly achieve the quantum advantage.
Quantum-state preparation: by encoding data to quantum qubits, we can achieve quantum advantage only if the quantum-state preparation can be efficiently conducted with complexity at .
Quantum computing-based neural computation: Neural computation can also become the performance bottleneck, using the design in Listing 3 to flip one sign at each time, it requires gates in the worst case. To overcome this, (Jiang et al., 2020b) proposed a co-design approach to reduce the number of gates to .
This work demonstrates the framework in implementing neural networks onto quantum circuits. It is composed of three main components, including data pre-processing, neural computation acceleration, and data post-processing. Based on such a working flow, the data will be first encoded to quantum states and then operated to complete the operations in a neural network. The source codes can be found in https://github.com/weiwenjiang/QML_tutorial
This work is partially supported by IBM and University of Notre Dame (IBM-ND) Quantum program, and in part by the IBM-ILLINOIS Center for Cognitive Computing Systems Research.
Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks.
IEEE journal of solid-state circuits 52, 1 (2016), 127–138.An FPGA implementation of a restricted boltzmann machine classifier using stochastic bit streams. In
2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 68–69.Using stochastic computing to reduce the hardware requirements for a restricted Boltzmann machine classifier. In
Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 36–41.Neural network classifiers using stochastic computing with a hardware-oriented approximate activation function. In
2017 IEEE International Conference on Computer Design (ICCD). IEEE, 97–104.Quantum implementation of an artificial feed-forward neural network.
Quantum Science and Technology (2020).Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
. 2820–2828.Co-exploring neural architecture and network-on-chip design for real-time artificial intelligence. In
2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 85–90.Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators. In
Proceedings of the 55th Annual Design Automation Conference. 1–6.