1 Introduction
Convolutional neural networks (CNNs), proposed by Yann LeCun et al [lecun1998gradient]
in 1989, are one of the most powerful algorithms in the context of deep learning. The main advantage of CNNs is that they use multiple feature extraction stages to automatically and accurately learn important features from the data without any human supervision. Due to this advantage, CNNs have been tremendously successful in a broad array of highlevel computer vision problems, including image recognition
[krizhevsky2012imagenet, simonyan2014very, szegedy2015going, he2016deep] , object detection [girshick2014rich, girshick2015fast, redmon2016you], and image segmentation [he2017mask, ronneberger2015u, chen2018encoder]. In recent years, with further development in deep learning , CNNs have also been demonstrated to show promising performances in other machine learning areas such as time series forecasting [borovykh2017conditional, chen2020probabilistic], speech recognition [45774] and recommendation system [yuan2019simple].Parallelly, with recent achievements in quantum technologies (e.g. noisy intermediatescale quantum (NISQ) processors are currently availlabe), the domain of quantum machine learning has attracted growing concerns and triggered an enormous amount of work. Quantum machine learning is a research area with the purpose of utilizing quantum mechanical effects such as superposition and entanglement to improve the performance of machine learning algorithms. Even though quantum machine learning is a new discipline, it has witnessed a number of successful quantum extensions to classical machine learning problems, including support vector machine
[varatharajan2018big], clustering [kerenidis2018q, otterbach2017unsupervised], and principal component analysis
[article].Among quantum machine learning algoritms, quantum convolutional neural networks (QCNNs), also known as hybrid quantumclassical convolutional neural networks, are a family of variational quantum algorithms and they have recently become a very active research field. The central idea of QCNNs is to construct a quantum convolutional layer within neural networks based on parameterized quantum circuits to estimate complex kernel functions in high dimensional Hilbert space. Inspired by CNNs, Liu et al.
[liu2019hybrid] proposed the first QCNN model and implement it for image recognition. Afterwards, the QCNN model was investigated further in various work [cong2019quantum, henderson2020quanvolutional, oh2020tutorial, chen2020quantum, houssein2021hybrid, alam2021iccad]. Recently, it has been demonstrated in [yang2021decentralizing] that QCNN models can also achieve promising results in speech recognition.Despite these successes, QCNNs suffer from computational bottlenecks which make it time consuming to train QCNNs. Firstly, quantum operations applied on nqubit quantum circuits require unitary matrices of size
which will scale exponentially as the size of quantum circuit. Moreover, the calculation of gradients, due to the parametershift rule [mitarai2018quantum, schuld2019evaluating], result in more quantum circuit executions, when QCNNs are trained on a real quantum device. As an example, a quantum filter with trainable parameters will add more quantum circuit executions for each training sample to compute the required gradients. Even though this problem can be mitigated when QCNNs are implemented on quantum simulators that support more efficient gradient computation methods such as backpropagation [Linnainmaa:1976, Rumelhart:1986we] and adjoint differentiation method [jones2020efficient], it is inevitable for QCNNs to face another challenge. In CNNs, a convolutional layer, due to local connectivity, performs a large amount of elementwise matrix multiplication operations. For example, an output feature map of a convolutional layer is obtained frommultiplication operations. The computational cost will increase significantly with the feature map size. Fortunately, this computational issue in CNNs can be handled by using vectorization techniques
[Chellapilla_highperformance, vectorization]. QCNNs, as the counterpart of CNNs, have the same problem. However, unlike CNNs, most of current quantum devices, including quantum hardware and quantum simulator, do not support vectorization. Despite the availability of more mature quantum devices in the NISQ era, executing a large number of quantum circuits would be impractical in general.A few works have been done to investigated how to reduce the runtime complexity of QCNNs. In the first family of works, a small number of qubits required for the quantum circuit is achieved by using classical data preprocessing techniques to reduce the dimension of the input features fed into the quantum (convolutional) layer. For instance, Pramanik et al. [pramanik2021quantum]
employ principal component analysis (PCA) to reduce the VGG16 features for the quantum variational classifier (VQC), while Hur et al.
[hur2021quantum]adopt autoencoding (AutoEnc) for the dimensionality reduction. Nevertheless, the performance of the model trained in this way is likely to be compromised by the limited expressive power of the reduced features, as shown in
[pramanik2021quantum]. The second family of works focus on how to efficiently encode classical data into quantum states. Schuld and Killoran [schuld2019quantum] propose and implement the amplitude encoding for variational quantum circuits, which is explored further in [mattern2021variational] for Flexible Representation of Quantum Images (FRQI). This type of encoding method is efficient in terms of required qubits for data encoding but it relies on too deep quantum circuits which are unpractical on NISQ devices. In a different direction, some recent researchers [schuld2018supervised, larose2020robust, alam2021iccad] propose angel encoding (also referred to as qubit encoding) and its variants (e.g. dense angel encoding) which use a constant quantum circuit depth for state preparation. This encoding scheme requires one qubit to encode one or a limited number of components of the input feature vector and thus is not efficient for highdimensional input features from a resource prospective. To trades off these two encoding methods mentioned above, Hur et al. [hur2021quantum] further develop a hybrid encoding approach which requires fewer number of qubits than the angel encoding and use shallower quantum circuit depth than the amplitude encoding. Moreover, Henderson et al. [henderson2020quanvolutional] employ a threshold based encoding technique to reduce the inputstate space and made it possible to obtain the output feature map through a lookup table during the quantum convolution process without needing to execute the same quantum circuit repeatedly on image segments. This method is easy to implement, but it is infeasible on real quantum devices, as mentioned in [henderson2020quanvolutional].Having reviewing all these challenges and developments, in this work, we propose a novel hybrid quantumclassical architecture which we will call quantum dilated convolutional neural network (QDCNN). Our approach, motivated by the dilated convolution in deep learning, is an extension of the architectures presented in [liu2019hybrid] and [henderson2020quanvolutional], and helps reduce the computational cost of QCNNs in a different way compared to the aforementioned approaches. Dilated convolution, also known as atrous convolution, was originally developed for efficiently computating the undecimated discrete wavelet transform [Holschneider1989]. In recent years, dilated convolution has attracted more and more attention, and is widely used in semantic segmentation [Yu2016MultiScaleCA, Chen2015SemanticIS, chen2017deeplab, chen2017rethinking, chen2018encoder, hamaguchi2018effective]. Following these successes, dilated convolution has also been adopted for a broader set of tasks, such as object localization [kudo2017dilated], time series forecasting [borovykh2017conditional, chen2020probabilistic] and sound classification [chen2019environmental]. The advantage of dilated convolution is that it allows for effectively expanding the field of view of filters to capture larger context without increasing the number of parameters or the computational complexity. By virtue of dilated convolution, the proposed QDCNNs can generally improve the computational efficiency of existing QCNNs while achieving the better task performance.
In summary, the contributions of our work are

We propose a novel architecture of quantum convolutional neural network based on quantum dilated convolution operation. To the best of our knowledge, our work is the first attempt to combine the concept of dilated convolution with variational quantum circuits.

We conduct experiments using MNIST and FashinMNIST datasets and demonstrate the superior performance of QDCNN models over QCNN models.
2 Method
2.1 Preliminaries
2.1.1 Convolution Operation
The convolutional layer, which performs an operation called a “convolution“, plays a central role in CNNs. In the context of convolutional networks, a convolution is a linear operation that involves the multiplication of a set of weights with the input. For a convolution operation, a kernel or filter is defined as a feature extractor which is a twodimensional (2D) array of learnable weights. A filter is applied to a filtersize patch of the input image called receptive field and a dot product is performed between the pixels within the receptive field and the weight values in the filter. Afterwards, the filter shifts to the next patch according to a step size called stride, and repeats the above process until it has swept across the entire image. The final output from the series of dot products between the filter weights and the values underneath the filter, is called a feature map. Let us denote the output feature map by and the input image by . In the 2D convolution process, the feature map is obtained by applying a filter to the input image :
(1) 
where and are location indices of . The output feature map, due to the convolution operation, usually has smaller spatial resolution than the input image. This reduction in dimensions can be avoided by employing
zero padding
technique, namely adding a border of pixels with value zero around the edges of the input image before the application of a filter. A hyperparameter called
padding can be defined to determine how many zero values to add to the border of the image. Generally, the spatial resolution and of the resulting feature map, extracted from an input image by a kernel, can be calculated as(2)  
(3) 
where and represent padding and stride respectively.
2.1.2 Dilated Convolution
Dilated convolution is a type of convolution that expands the kernel by inserting holes (i.e. points with weight of zero) between the consecutive kernel elements. In simple terms, dilated convolution is just a convolution applied to the input with defined gaps. Compared to standard convolution, dilated convolution introduces an extra hyperparameter called dilation rate that determines the stride with which the input pixels are sampled. According to the definition of dilated convolution, zero values are inserted between two consecutive filter values, if the dilation rate is denoted by . In this spirit, Eq. (1) needs to be reformulated as
(4) 
in the context of dilated convolution. It can be seen from Eq. (4) that dilated convolution is able to capture a larger receptive field without introducing more learnable parameters compared to standard convolution with the same kernel size. Moreover, for dilated convolution, we also need to rewrite Eq. (2) and Eq. (3) as
(5)  
(6) 
which indicate that dilated convolution generally results in a feature map with smaller size compared to standard convolution for the same set of hyperparameters. It is worth noting that the standard convolution can be regarded as a special case of dilated convolution with dilation rate .
2.1.3 Quantum Convolution
In contrast to classical convolution, quantum convolution is a new type of convolution based on quantum circuits and it generally consists of three modules:

ENCODING MODULE. In this module, classical data is encoded into a quantum state which will be further processed in the quantum convolutional circuit. There exist various encoding methods such as angle encoding, amplitude encoding and basis encoding. A summary of them can be found in the literature [dataencoding]. Among these methods, angle encoding is the most commonly used encoding approach. In this encoding scheme, the classical input is treated as the rotation angle of a singlequbit rotation gate (e.g. rotation gate). For example, a classical variable or feature can be encoded by which is applied on some initial state (e.g. vacuum state ). In this sense, we can say that the classical information is encoded into the initial state of a qubit. This type of angle encoding is called one variable/qubit encoding. This approach requires qubits to encode input variables. To reduce the required qubits, we can also encode multiple variables by sequential rotations applied on a single qubit. For example, input variables , and can be encoded using , and rotation gates applied successively on a single qubit. This angle encoding is called multiple variables/qubit encoding or dense angel encoding. In this paper, we focus on one variable/qubit encoding method. Let us denote by the encoding operator where is the input vector. Then the encoded quantum state is obtained by
(7) It is worth noting that usually contain the Hadmard gate which transforms the initial state into a superpostion state.

ENTANGLEMENT MODULE. In this module, a cluster of single and multiqubit gates are applied to the encoded quantum state obtained from the previous module. Mutliqubit gates are usually gate and parametric controlled rotation gate (e.g. where is a trainable parameter), and they are used to generate correlated quantum states, namely entangled states. Singlequbit gates are mainly parametric rotation gates. This combination of single and multiqubit gates is referred to as parameterized layer in a QCNN and is designed to extract taskspecific features. This parameterized layer is usually repeated multiple times to extend the feature space. If we denote all unitary operations in the entanglement module by for simplicity, the output quantum state will be
(8) 
DECODING MODULE. At this stage, certain local observable (e.g. Pauli Z operator ) is measured in the final quantum state from the entanglement module, where is equal or smaller than the total number of qubits in the quantum system. The expectation value of the chosen observable can be obtained by repeated measurements:
(9) So the purpose of this layer is to extract a classical output vector by using the mapping from the quantum state to a classical vector:
(10) This classical vector can be used as the input features for the subsequent layer in the QCNN.
2.2 Qdcnn
The proposed QDCNN is designed in the same fashion as QCNNs described in literatures [liu2019hybrid, henderson2020quanvolutional]. Our model integrates quantum layers with classical layers and the quantum circuit ansatz can be placed anywhere in the model (e.g. at the beginning of the network, at intermediate layers in the network).
The key difference between our method and existing QCNNs is that the dilated convolution is employed for the quantum convolutional layer. So the quantum layer in QDCNNs is called quantum dilated convolutional (QDC) layer. An example of a QDC layer is illustrated in Fig. 1. Due to the mechanism of dilated convolution , the quantum kernel in our model generally covers larger image patches (i.e. receptive fields ). For example, a quantum dilated convolution with dilation rate of 3 has a receptive field of while the standard quantum convolution with the same kernel size has only a receptive field of . It is noteworthy that even though the quantum dilated convolution is able to expand the receptive field the number of data points that are fed into the quantum convolution circuit is the same as the one for the standard quantum convolution. This means that the quantum dilated convolution does not requires more qubits than the standard quantum convolution with the same kernel size.
Our QDCNN model has mainly two advantages. Firstly, the QDC layer in our model, thanks to the enlarged receptive field, requires less number of times that the quantum kernel slides across the image (if there is no padding and the stride is the same), compared to the existing QCNN models. This can be understood by comparing Eqs. (2) and (3) with Eqs. (5) and (6) respectively. Therefore, using the QDC layer helps reduce the number of quantum circuit executions during the quantum convolution process. In the NISQ era, long training time is one of the biggest challenges facing the QCNN models. This difficulty mainly stems from the large number of quantum circuit executions from the quantum layers. In the quantum feature mapping process, due to the probabilistic characteristics, quantum measurement is usually performed multiple times (e.g. 1024) to get expectation values of some observables which can be considered as the extracted quantum feature maps. So how to reduce the number of quantum circuit executions plays a crucial role in mitigating the longrunningtime problem of QCNNs. Our proposed quantum dilation is a powerful tool to explicitly control the amount of quantum circuit executions in the quantum layer.
The second advantage of our model is that it can improve the performance (e.g. classification accuracy) of existing QCNN models. Due to the expanded receptive field, the QDC layer in our model generally reduce the spatial resolution of the resulting feature maps. However, these feature maps are extracted from larger receptive fields of the image and hence contain longrange context which plays an essential role in many machine learning tasks such as image recognition and image segmentation.
3 Experiments
In this section, we conduct two experiments to evaluate the performance of our proposed QDCNN model and compare it with the existing QCNN model. In Experiment A and Experiment B, we construct quantum convolutional models with nontrainable and trainable quantum filters, respectively.
3.1 Experiment settings
3.1.1 Dataset
We choose the image benchmark MNIST and FashionMNIST datasets [deng2012mnist, xiao2017fashion] for our experiments. The MNIST dataset contains 10 different classes of handwritten digits from ‘0’ to ‘9’ , while the FashionMNIST dataset is a collection of 10 different shapes of tshirts, dresses, shoes, etc. Both of these datasets have 60,000 training samples, and 10,000 test samples of 28by28 gray scale pixel images. Due to the expensive training and validation, we pick two subsets of the entire MNIST and FashionMNIST datasets, respectively, both of which consist of 1,000 balanced training samples and 200 balanced testing samples.
3.1.2 Tested Models
In this research, we consider two types of models:

QDCNN Model.
We employ the architecture of the most basic convolutioninspired hybrid quantumclassical neural network. Our QDCNN model consists of one QDC layer with one filter and one fullyconnected layer with 10 neurons. The kernel size and stride for the QDC layer is selected as
and 2 respectively without specification. The quantum circuit ansatz of the QDC layer is designed as below. The 1 variable/qubit encoding scheme is adopted to encode the input image. Specifically, pixels are encoded into a 4qubit state using RY rotation gates. Note that, these pixels are not adjacent to each other in the input image, due to the quantum dilated convolution. The resulting 4qubit state is further transformed by a following random parameterized quantum circuit which might creates the entanglement. The decoding method follows the same spirit of [PennylaneQCNN], in which each expectation value is mapped to a different channel of a single output pixel. Consequently, even though there is only one filter, the quantum layer can transform the input 2D image into four feature maps. This type of quantum layer might benefit the model performance as it allows for correlation among channels of the output feature maps. In both cases of MNIST and FashionMNIST datasets, the QDC layer extracts from theinput image a feature tensor of size
, which is then transformed to 10 output probabilities by the fullyconnected layer with softmax activation. To evaluate how the dilation rate impacts the model performance, we consider two QDCNN models with dilation rate
and . We refer to these two models as QDCNN_r2 and QDCNN_r3 respectively for the rest of the paper.

QCNN Model. We choose the standard QCNN model as our benchmark model. The QCNN model follows the same structure of our QDCNN model with the only difference that it uses standard quantum kernel rather than dilated quantum kernel.
The random quantum circuit in each of these models consists of two 4qubit random layers, each of which has four nontrainable or trainable parameters. For fair comparison, all of these random circuits share the same architecture generated by the same random seed.
3.1.3 Training Setup
In Experiment A, after applying the nontrainable quantum filter to transform the original image data into feature maps, we use a minibatch of 32 and Adam optimizer with a learning rate of 0.01 to train each model for 30 epochs. In Experiment B, due to the computational cost of training parametric quantum circuits involved in the trainable quantum filter, we reduce the batch size to four and train all models for 20 epochs with other hyperparameters remaining unchanged.
3.1.4 Experimental Environment
Experiments are conducted on the local computer with a 6core CPU (2.2 GHz) by using PennyLane [bergholm2018pennylane], Qulacs [suzuki2011qulacs]
and PyTorch
[paszke2019pytorch]. PennyLane is an open source pythonbased framework that enables the automatic differentiation for hybrid quantumclassical computations. It is compatible with mainstream machine learning frameworks such as TensorFlow
[abadi2016tensorflow] and PyTorch, and it has a large plugin ecosystem which offers access to numerous quantum devices (i.e. simulators and hardware) from different vendors including IBM, Google, Microsoft, Rigetti and QunaSys. In Experiment A, we perform the quantum processing of the original image data by using the Qulacs simulator [Qulacs] which is a highperformance C++ quantum simulator and made available through the community contributed PennyLaneQulacs plugin [PennylaneQulacs]. In Experiment B, considering the large amount of quantum circuit executions required in the scheme of parametershift rule, we train all hybrid models by using instead the builtin Pennylane simulator default.qubit which supports backpropagation method for PyTorch interface.3.2 Results
As demonstrated in Table I and Table II, our proposed QDCNN models exhibit significant performance benefits over the QCNN model in terms of runtime efficiency. More specifically, QDCNN models in Experiment A and B speed up the model training process by up to 15.24% and 18.42% respectively, compared with the QCNN model (QDCNN_r2 and QDCNN_r3 have similar training time because their QDC layers output feature maps of the same size in both experiments). These speedups result from the reduced number of quantum circuit executions, as discussed in the subsection 2.2. Take Experiment A as an example. In this experiment, quantum circuits need to be executed for both the QDC layers with dilation rate and dilation rate while quantum circuits for standard quantum convolutional layer. This means that QDCNN_r2 and QDCNN_r3 require 27 fewer quantum circuit executions per image than QCNN models. Compared with Experiment A, it generally takes much longer time to train hybrid models in Experiment B, even though the quantum filter in each of them has only eight trainable parameters coming from the random circuit. This is mainly due to the fact that PennyLane does not support vectorization for quantum circuit executions. Nevertheless, quantum dilated convolution can still help reduce the training time significantly in this case.
Furthermore, it can also be seen from Table I and Table II that our QDCNN models generally enjoy higher recognition accuracy than the QCNN model. In particular, QDCNN_r3 achieves the best performance with regard to both validation loss and accuracy across all tasks. In light of the QDC layer with dilation rate of 3, QDCNN_r3 provides up to 31.74% lower validation loss and up to 3% higher validation accuracy, compared with the QCNN model. This observed model performance boost mainly stems from the contextual information in larger scales captured by the QDC layer.
Dataset  Method  Test acc  Test loss  Running time 

MNIST  QCNN  88.00%  0.4389  624.648s 
QDCNN_r2  88.50%  0.3858  529.472s  
QDCNN_r3  91.00%  0.3466  530.270s  
FashionMNIST  QCNN  78.50%  0.6319  610.978s 
QDCNN_r2  80.00%  0.6354  526.060s  
QDCNN_r3  81.00%  0.6031  525.182s 
Dataset  Method  Test acc  Test loss  Running time 

MNIST  QCNN  86.50%  0.5341  7.420h 
QDCNN_r2  86.50%  0.5315  6.436h  
QDCNN_r3  89.50%  0.3646  6.053h  
FashionMNIST  QCNN  79.50%  0.8923  7.443h 
QDCNN_r2  77.00%  1.1038  6.222h  
QDCNN_r3  80.50%  0.8837  6.372h 
4 Conclusion
In this work, we propose the QDCNN model, which adopts the idea of dilated convolution in deep learning to the quantum neural network. We show through empirical evidence that the QDCNN model outperforms the recent QCNN method in terms of computation time and recognition accuracy. In particular, we find that the quantum dilated convolution with a larger dilation rate generally contribute to a better model performance. Dilated convolution has been extensively studied in the area of deep learning, but little work has been done to explore it in the context of quantum machine learning. Our work constitutes a first step in this direction. With the promising results on both MNIST and FashionMNIST datasets, our QDCNN approach deserves further investigation in the future.
Comments
There are no comments yet.