Transfer learning in hybrid classical-quantum neural networks

by   Andrea Mari, et al.

We extend the concept of transfer learning, widely applied in modern machine learning algorithms, to the emerging context of hybrid neural networks composed of classical and quantum elements. We propose different implementations of hybrid transfer learning, but we focus mainly on the paradigm in which a pre-trained classical network is modified and augmented by a final variational quantum circuit. This approach is particularly attractive in the current era of intermediate-scale quantum technology since it allows to optimally pre-process high dimensional data (e.g., images) with any state-of-the-art classical network and to embed a select set of highly informative features into a quantum processor. We present several proof-of-concept examples of the convenient application of quantum transfer learning for image recognition and quantum state classification. We use the cross-platform software library PennyLane to experimentally test a high-resolution image classifier with two different quantum computers, respectively provided by IBM and Rigetti.



There are no comments yet.


page 5

page 6

page 7

page 8


Classical-to-Quantum Transfer Learning for Spoken Command Recognition Based on Quantum Neural Networks

This work investigates an extension of transfer learning applied in mach...

Hybrid Classical-Quantum method for Diabetic Foot Ulcer Classification

Diabetes is a raising problem that affects many people globally. Diabeti...

Hybrid Classical-Quantum Deep Learning Models for Autonomous Vehicle Traffic Image Classification Under Adversarial Attack

Image classification must work for autonomous vehicles (AV) operating on...

Quantum Transfer Learning for Wi-Fi Sensing

Beyond data communications, commercial-off-the-shelf Wi-Fi devices can b...

Transfer Learning in Quantum Parametric Classifiers: An Information-Theoretic Generalization Analysis

A key step in quantum machine learning with classical inputs is the desi...

QSpeech: Low-Qubit Quantum Speech Application Toolkit

Quantum devices with low qubits are common in the Noisy Intermediate-Sca...

QFold: Quantum Walks and Deep Learning to Solve Protein Folding

We develop quantum computational tools to predict how proteins fold in 3...

Code Repositories


A transfer learning approach applied to hybrid neural networks composed of classical and quantum elements.

view repo


QISKIT Hackathon Korea

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Transfer learning is a typical example of an artificial intelligence technique that has been originally inspired by biological intelligence. It originates from the simple observation that the knowledge acquired in a specific context can be transferred to a different area. For example, when we learn a second language we do not start from scratch, but we make use of our previous linguistic knowledge. Sometimes transfer learning is the only way to approach complex cognitive tasks, e.g., before learning quantum mechanics it is advisable to first study linear algebra. This general idea has been successfully applied also to design artificial neural networks

Pratt (1993); Pan and Yang (2009); Torrey and Shavlik (2010). It has been shown Raina et al. (2007); Yosinski et al. (2014) that in many situations, instead of training a full network from scratch, it is more efficient to start from a pre-trained deep network and then optimize only some of the final layers for a particular task and dataset of interest (see Fig. 1).

The aim of this work is to investigate the potential of the transfer learning paradigm in the context of quantum machine learning Biamonte et al. (2017); Schuld et al. (2015); Dunjko et al. (2016). We focus on hybrid models Farhi and Neven (2018); Schuld and Killoran (2019); McClean et al. (2016), i.e., the scenario in which quantum variational circuits Peruzzo et al. (2014); Schuld et al. (2018); Perdomo-Ortiz et al. (2018); McClean et al. (2016); Sim et al. (2019); Killoran et al. (2018) and classical neural networks can be jointly trained to accomplish hard computational tasks. In this setting, in addition to the standard classical-to-classical (CC) transfer learning strategy in which some pre-acquired knowledge is transferred between classical networks, three new variants of transfer learning naturally emerge: classical to quantum (CQ), quantum to classical (QC) and quantum to quantum (QQ).

In the current era of Noisy Intermediate-Scale Quantum (NISQ) devices Preskill (2018), CQ transfer learning is particularly appealing since it opens the possibility to classically pre-process large input samples (e.g., high resolution images) with any state-of-the-art deep neural network and to successively manipulate few but highly informative features with a variational quantum circuit. This scheme is quite convenient since it makes use of the power of quantum computers, combined with the successful and well-tested methods of classical machine learning. On the other hand, QC and QQ transfer learning might also be very interesting approaches especially once large quantum computers will be available. In this case, fixed quantum circuits might be pre-trained as generic quantum feature extractors, mimicking well known classical models which are often used as pre-trained blocks: e.g., AlexNet Krizhevsky et al. (2012), ResNet He et al. (2016), Inception Szegedy et al. (2015), VGGNet Simonyan and Zisserman (2014), etc. (for image processing), or ULMFiT Howard and Ruder (2018), Transformer Vaswani et al. (2017), BERT Devlin et al. (2018)

, etc. (for natural language processing). In summary, such classical state-of-the-art deep networks can either be

used in CC and CQ transfer learning or replaced by quantum circuits in the QC and QQ variants of the same technique.

Generic dataset

Generic network

Generic task

Specific dataset

Specific task



Figure 1: General representation of the transfer learning method, where each of the neural networks and can be either classical or quantum. Network is pre-trained on a dataset and for a task . A reduced network , obtained by removing some of the final layers of , is used as a fixed feature extractor. The second network , usually much smaller than , is optimized on the specific dataset and for the specific task .

Up to now, the transfer learning approach has been largely unexplored in the quantum domain with the exception of a few interesting applications, for example, in modeling many-body quantum systems ChNg et al. (2017); Huembeli et al. (2018); Zen et al. (2019)

, in the connection of a classical autoencoder to a quantum Boltzmann machine

Piat et al. (2018) and in the initialization of variational quantum networks Verdon et al. (2019). With the present work we aim at developing a more general and systematic theory, specifically tailored to the emerging paradigms of variational quantum circuits and hybrid neural networks.

For all the models theoretically proposed in this work, proof-of-principle examples of practical implementations are presented and numerically simulated. Moreover we also experimentally tested one of our models on physical quantum processors—ibmqx4 by IBM and Aspen-4-4Q-A by Rigetti—demonstrating for the first time the successful classification of high resolution images with a quantum computer.

Ii Hybrid classical-quantum networks

Before presenting the main ideas of this work, we begin by reviewing basic concepts of hybrid networks and introduce some notation.

ii.1 Classical neural networks

A very successful model in classical machine learning is that of deep feed-forward neural networks

Goodfellow et al. (2016). The elementary block of a deep network is called a layer

and maps input vectors of

real elements to output vectors of real elements. Its typical structure consists of an affine operation followed by a non-linear function applied element-wise,


Here, the subscript indicates the number of input and output variables, and are the input and output vectors, is an matrix and is a constant vector of elements. The elements of and are arbitrary real parameters (respectively known as weights and baises) which are supposed be trained, i.e., optimized for a particular task. The nonlinear function is quite arbitrary but common choices are the hyperbolic tangent or the rectified linear unit defined as .

A classical deep neural network is the concatenation of many layers, in which the output of the first is the input of the second and so on:


where different layers have different weights. Characteristic hyper-parameters of a deep network are its depth (number of layers) and the number of features (number of variables) for each layer, i.e., the sequence of integers .

ii.2 Variational quantum circuits

One of the possible quantum generalizations of feed-forward neural networks can be given in terms of variational quantum circuits Farhi and Neven (2018); Schuld and Killoran (2019); McClean et al. (2016); Peruzzo et al. (2014); Schuld et al. (2018); Perdomo-Ortiz et al. (2018); McClean et al. (2016); Sim et al. (2019); Killoran et al. (2018). Following the analogy with the classical case, one can define a quantum layer as a unitary operation which can be physically realized by a low-depth variational circuit acting on the input state of

quantum subsystems (e.g., qubits or continuous variable modes) and producing the output state



where is an array of classical variational parameters. Examples of quantum layers could be: a sequence of single-qubit rotations followed by a fixed sequence of entangling gates Schuld et al. (2018); Sim et al. (2019) or, for the case of optical modes, some active and passive Gaussian operations followed by single-mode non-Gaussian gates Killoran et al. (2018). Notice that, differently from a classical layer, a quantum layer preserves the Hilbert-space dimension of the input states. This fact is due to the fundamental unitary nature of quantum mechanics and, as discussed at the end of this section, should be taken into account when designing quantum networks.

A variational quantum circuit of depth is a concatenation of many quantum layers, corresponding to the product of many unitaries parametrized by different weights:


In order to inject classical data in a quantum network we need to embed a real vector into a quantum state . This can also be done by a variational embedding layer depending on and applied to some reference state (e.g., the vacuum or ground state),


Typical examples are single-qubit rotations or single-mode displacements parametrized by . Notice that, differently from , the embedding layer is a map from a classical vector space to a quantum Hilbert space.

Conversely, the extraction of a classical output vector from the quantum circuit can be obtained by measuring the expectation values of local observables . We can define this process as a measurement layer, mapping a quantum state to a classical vector:


Globally, the full quantum network including the initial embedding layer and the final measurement can be written as


The full network is a map from a classical vector space to a classical vector space depending on classical weights. Therefore, even though it may contain a quantum computation hidden in the quantum circuit, if considered from a global point of view, is simply a black-box analogous to the classical deep network defined in Eq. (2).

However, especially when dealing with real NISQ devices, there are technical limitations and physical constraints which should be taken into account: while in the classical feed-forward network of Eq. (2) we have complete freedom in the choice of the number of features for each layer; in the quantum network of Eq. (7) all these numbers are often linked to the size of the physical system. For example, even if not strictly necessary, typical variational embedding layers encode each classical element of into a single subsystem and so, in many practical situations, one has:


This common constraint of a variational quantum network could be overcome by:

  1. adding ancillary subsystems and discarding/measuring some of them in the middle of the circuit;

  2. engineering more complex embedding and measuring layers;

  3. adding pre-processing and post-processing classical layers.

In this work, mainly because of its technical simplicity, we choose the third option and we formalize it through the notion of dressed quantum circuits introduced in the next subsection.

ii.3 Dressed quantum circuits

In order to apply transfer learning at the classical-quantum interface, we need to connect classical neural networks to quantum variational circuits. Since in general the size of the classical and quantum networks can be very different, it is convenient to use a more flexible model of quantum circuits.

Let us consider the variational circuit defined in Eq. (7) and based on subsystems. With the aim of adding some basic pre-processing and post-processing of the input and output data we place a classical layer at the beginning and at the end of the quantum network, obtaining what we might call a dressed quantum circuit:


where is given in Eq. (1) and is the associated bare quantum circuit defined in Eq. (7). Differently from a complex hybrid network in which the computation is shared between cooperating classical and quantum processors, in this case the main computation is performed by the quantum circuit , while the classical layers are mainly responsible for the data embedding and readout. A similar hybrid model was studied in Benedetti et al. (2018), but applied to a generative quantum Helmholtz machine

We can say that from a hardware point of view a dressed quantum circuit is almost equivalent to a bare one. On the other hand, it has two important advantages:

  1. the two classical layers can be trained to optimally perform the embedding of the input data and the post-processing of the measurement results;

  2. the number of input and output variables are independent from the number of subsystems, allowing for flexible connections to other classical or quantum networks.

Even if our main motivation for introducing the notion of dressed quantum circuits is a smoother implementation of transfer learning schemes, this is also a quite powerful machine learning model in itself and constitutes a non-trivial contribution of this work. In the Examples section, a dressed quantum circuit is successfully applied to the classification of a non-linear benchmark dataset (2D spirals).

Iii Transfer learning

In this section we discuss the main topic of this work, i.e., the idea of transferring some pre-acquired “knowledge” between two networks, say from network to network , where each of them could be either classical or quantum.

As discussed in the previous section, if considered as a black box, the global structure of a quantum variational circuit is similar to that of a classical network (see Eqs. (7), (9) and (2)). For this reason, we are going to define the transfer learning scheme in terms of two generic networks and , independently from their classical or quantum physical nature.

Generic transfer learning scheme (see Fig. 1):

  1. Take a network that has been pre-trained on a dataset and for a given task .

  2. Remove some of the final layers. In this way, the resulting truncated network can be used as a feature extractor.

  3. Connect a new trainable network at the end of the pre-trained network .

  4. Keep the weights of constant, and train the final block with a new dataset and for a new task of interest .

Following the common convention used in classical machine learning Pratt (1993); Pan and Yang (2009); Torrey and Shavlik (2010); Raina et al. (2007); Yosinski et al. (2014), all situations in which there is a change of dataset and/or a change of the final task can be identified as transfer learning methods. The general intuition behind this training approach is that, even if has been optimized for a specific problem it can still act as a convenient feature extractor also for a different problem. This trick is improved by truncating the final layers of (step 2), since the final activations of a network are usually more tuned to the specific problem, while intermediate features are more generic and so more suitable for transfer learning.

In our hybrid setting, the fact that the networks and can be either classical or quantum gives rise to a rich variety of hybrid transfer learning models summarized in Table 1.

A B Transfer learning scheme
Classical Classical CC (Pratt (1993); Pan and Yang (2009); Torrey and Shavlik (2010); Raina et al. (2007); Yosinski et al. (2014))
Classical Quantum CQ (Examples 2 and 3)
Quantum Classical QC (Example 4)
Quantum Quantum QQ (Example 5)
Table 1: Transfer learning schemes in hybrid classical-quantum networks. The corresponding proof-of-principle examples are indicated in the table and presented in Section IV.

For the reader familiar with quantum communication theory, this kind of classification might look similar to that of hybrid channels in which information can be exchanged between quantum and classical systems. Here however there is a fundamental difference: what is actually transferred in this case is not raw information but some more structured and organized learned representations. We expect that the problem of transferring structured knowledge between systems governed by different physical laws (classical/quantum) could stimulate many interesting foundational and philosophical questions. The aim of the present work is however much more pragmatic and consists of studying practical applications of this idea.

iii.1 Classical to quantum transfer learning

As discussed in the introduction, the CQ transfer learning approach is perhaps the most appealing one in the current technological era of NISQ devices. Indeed today we are in a situation in which intermediate-scale quantum computers are approaching the quantum supremacy milestone Harrow and Montanaro (2017); Arute and others (2019)

and, at the same time, we have at our disposal the very successful and well-tested tools of classical deep learning. The latter are universally recognized as the best-performing machine learning algorithms, especially for image and text processing.

In this classical field, transfer learning is already a very common approach, thanks to the large zoo of pre-trained deep networks which are publicly available Canziani et al. (2016). CQ transfer learning consists of using exactly those classical pre-trained models as feature extractors and then post-processing such features on a quantum computer; for example by using them as input variables for the dressed quantum circuit model introduced in Eq. (9). This hybrid approach is very convenient for processing high-resolution images since, in this configuration, a quantum computer is applied only to a fairly limited number of abstract features, which is much more feasible compared to embedding millions of raw pixels in a quantum system. We would like to mention that also other alternative approaches for dealing with large images have been recently proposed Piat et al. (2018); Henderson et al. (2019); Shiba et al. (2019); Liu et al. (2019).

We applied our model for the task of image classification in several numerical examples and we also tested the algorithm with two real quantum computers provided by IBM and Rigetti. All the details about the technical implementation and the associated results are reported in the next Section, Examples 2 and 3.

iii.2 Quantum to classical transfer learning

By switching the roles of the classical and quantum networks, one can also obtain the QC variant of transfer learning. In this case a pre-trained quantum system behaves as a kind of feature extractor, i.e., a device performing a (potentially classically intractable) computation resulting in an output vector of numerical values associated to the input. As a second step, a classical network is used to further process the extracted features for the specific problem of interest. This scheme can be very useful in two important situations: if the dataset consists of quantum states (e.g., in a state classification problem), if we have at our disposal a very good quantum computer which outperforms current classical feature extractors at some task.

For case , one can imagine a situation in which a single instance of a variational quantum circuit is first pre-trained and then used as a kind of multipurpose measurement device. Indeed one could make many different experimental analyses by simply letting input quantum systems pass through the same fixed circuit and applying different classical machine learning algorithms to the associated measured variables.

For case instead, one can envisage a multi-party scenario in which many classical clients can independently send samples of their specific datasets to a common quantum server which is pre-trained to extract generic features by performing a fixed quantum computation. Server can send back the resulting features to the classical clients , which can now locally train their specific machine learning models on pre-processed data.

Given the current status of quantum technology, case is likely beyond a near-term implementation. On the other hand, case could already represent a realistic scenario with current technology.

In Example 4 of the next Section, we present a proof-of-concept example in which a pre-trained quantum network introduced in Ref. Killoran et al. (2018) is combined with a classical post-processing network for solving a quantum state classification problem.

iii.3 Quantum to quantum transfer learning

The last possibility is the QQ transfer learning scheme, where the same technique is applied in a fully quantum mechanical fashion. In this case a quantum network is pre-trained for a generic task and dataset. Successively, some of the final quantum layers are removed, and replaced by a trainable quantum network which will be optimized for a specific problem. The main difference from the previous cases is that, since the process is fully quantum without intermediate measurements, features are implicitly transferred in the form of a quantum state, allowing for coherent superpositions.

The main motivation for applying a QQ transfer learning scheme is to reduce the total training time: instead of training a large variational quantum circuit, it is more efficient to initialize it with some pre-trained weights and then optimize only a couple of final layers. From a physical point of view, such optimization of the final layers could be interpreted as a change of the measurement basis which is tuned to the specific problem of interest.

If compared with classical computers, current NISQ devices are not only noisy and small: they are also relatively slow. Training a quantum circuit might take a long time since it requires taking many measurement shots (i.e., performing a large number of actual quantum experiments) for each optimization step (e.g., for computing the gradient). Therefore any approach which can reduce the total training time, as for example the QQ transfer learning scheme, could be very helpful.

In Example 5 of the next Section, we trained a quantum state classifier by following a QQ transfer learning approach.

Iv Examples

Example 1 - A 2D classifier based on a dressed quantum circuit

This first example demonstrates the dressed quantum circuit model introduced in Eq. (9).

Figure 2: Classification based on a classical network (left) and a dressed quantum circuit (right) of the same dataset consisting of two set of points (blue and red) organized in two concentric spirals. Training points are pale-colored while test points are sharp-colored. The model decision function is evaluated for the whole 2D plane, determining the blue and red regions which should ideally match the color of the data points. The test accuracy of each model is reported in the bottom-right of the corresponding plot.

We consider a typical benchmark dataset consisting of two classes of points (blue and red) organized in two concentric spirals as shown in Fig. 2

. Each point is characterized by two real coordinates and we assume to have at our disposal a quantum processor of 4 qubits. Since we have two real coordinates as input and two real variables as output (one-hot encoding the blue and red classes), we use the following model of a dressed quantum circuit:


where represents a classical layer having the structure of Eq. (1) with , is a (bare) variational quantum circuit, and is a linear classical layer without activation i.e., with . The structure of the variational circuit is as in Eq. (7). The chosen embedding map prepares each qubit in a balanced superposition of and and then performs a rotation around the axis of the Bloch sphere parametrized by a classical vector :


where is the single-qubit Hadamard gate. The trainable circuit is composed of 5 variational layers , where


and is an entangling unitary operation made of three controlled NOT gates:


Finally, the measurement layer is simply given by the expectation value of the

Pauli matrix, locally estimated for each qubit:


Given an input point of coordinates , the classification is done according to , where is the output of the dressed quantum circuit (10).

For training and testing the model, the dataset has been divided into 2000 training points (pale-colored in Fig. 2) and 200 test points (sharp-colored in Fig. 2

). As typical in classification problems, the cross entropy (implicitly preceded by a LogSoftMax layer) was used as a loss function and minimized via the Adam optimizer

Kingma and Ba (2014). A total number of 1000 training iterations were performed, each of them with a batch size of 10 input samples. The numerical simulation was done through the PennyLane software platform Bergholm et al. (2018).

The results are reported in Fig. 2, where the dressed quantum network is also compared with an entirely classical counterpart in which the quantum circuit is replaced by a classical layer, i.e., . The corresponding accuracy, i.e., the fraction of test points correctly classified, is for the dressed quantum circuit and for the classical network.

The presented results suggest that a dressed quantum circuit is a very flexible quantum machine learning model which is capable of classifying highly non-linear datasets. We would like to remark that the classical counter-part has been presented just as a qualitative benchmark: even if in this particular example the quantum model outperforms the classical one, any general and rigorous comparison would require a much more complex and detailed analysis which is beyond the aim of this work.

Example 2 - CQ transfer learning for image classification (ants / bees)

In this second example we apply the classical-to-quantum transfer learning scheme for solving an image classification problem. We first numerically trained and tested the model, using PennyLane with the PyTorch

Paszke et al. (2017) interface. Successively, we have also run it on two real quantum devices provided by IBM and Rigetti. To our knowledge, this is the first time that high resolution images have been successfully classified by a quantum computer.

Our example is a quantum model inspired by the official PyTorch tutorial on classical transfer learning 34. The model can be defined in terms of the general CQ scheme proposed in Section III and represented in Fig. 1, with the following specific settings:


ImageNet: a public image dataset with 1000 classes Deng et al. (2009).


RestNet18: a pre-trained residual neural network introduced by Microsoft in 2016 He et al. (2016).


Classification (1000 labels).


RestNet18 without the final linear layer, obtaining a pre-trained extractor of 512 features.


Images of two classes: ants and bees (Hymenoptera subset of ImageNet), separated into a training set of 245 images and a testing set of 153 images.


: i.e., a 4-qubit dressed quantum circuit (9) with 512 input features and 2 real outputs.


Classification (2 labels).

The bare variational circuit is essentially the same as the one used in the previous example (see Eqs. (11,12,13,14)), with the only difference that in this case the quantum depth is set to . The cross entropy is used as a loss function and minimized via the Adam optimizer Kingma and Ba (2014). We trained the variational parameters of the model for epochs over the training dataset, with a batch size of and an initial learning rate of , which was successively reduced by a factor of every 10 epochs. After each epoch, the model was validated with respect to the test dataset, obtaining a maximum accuracy of . A visual representation of a random batch of images sampled from the test dataset and the corresponding predictions is given in Fig. 3.

Figure 3: Random batch of 4 images sampled form the test dataset and classified by our classical-quantum model (numerically simulated). Predictions are reported in square brackets above each image.

We also tested the model (with the same pre-trained parameters), on two different real quantum computers: the ibmqx4 processor by IBM and the Aspen-4-4Q-A processor by Rigetti (see Fig. 4). The corresponding classification accuracies, evaluated on the same test dataset, are reported in Table 2.

QPU Accuracy
Table 2: Image classification accuracy associated to different quantum processing units (QPU): exact simulator, ibmqx4 (IBM) and Aspen-4-4Q-A (Rigetti).
Figure 4: Two high-resolution images sampled from the dataset and experimentally classified with two different quantum processors: ibmqx4 by IBM (first line) and Aspen-4-4Q-A by Rigetti (second line). In both cases the same pre-trained classical network (ResNet18 by Microsoft He et al. (2016)) was used to pre-process the input image, extracting 512 highly informative features. The rest of the computation was performed by a trainable variational quantum circuit “dressed” by two classical encoding and decoding layers, as described in Eq. (9).

Our results demonstrate the promising potential of the CQ transfer learning scheme applied to current NISQ devices, especially in the context of high-resolution image processing.

Example 3 - CQ transfer learning for image classification (CIFAR)

We now apply the same CQ transfer learning scheme of the previous example but with a different dataset . Instead of classifying images of ants and bees, we use the standard CIFAR-10 dataset Krizhevsky et al. (2009) restricted to the classes of cats and dogs. Successively we also repeat again the training and testing phases with the CIFAR-10 dataset restricted to the classes of planes and cars (see Fig. 5).

We remark that, in both cases, the feature extractor ResNet18 is pre-trained on ImageNet. Despite CIFAR-10 and ImageNet being quite different datasets (they also have very different resolutions), the CQ transfer learning method achieves nonetheless relatively good results.

ants/bees dogs/cats planes/cars
Quantum depth 6 5 4
Number of epochs 30 3 3
Batch size 4 8 8
Learning rate 0.0004 0.001 0.0007
Accuracy 0.976 0.8270 0.9605
Table 3: Hyper-parameters used for the classification of different image datasets . The last line reports the corresponding accuracy achieved by our model (numerically simulated).

The hyper-parameters used for all the different datasets (including the previous example) are summarized in Table 3. The corresponding test accuracies are also reported in Table 3, while some of the predictions for random samples of the restricted CIFAR datasets are visualized in Fig. 5.

(a)                                     (b)

Figure 5: Random batches of 4 images sampled from the CIFAR-10 test dataset restricted to the classes of cats and dogs (a), and to the classes of planes and cars (b). In both cases the binary classification problem is solved by our hybrid classical-quantum model (numerically simulated). Predictions are reported in square brackets above each image.

Example 4 - QC transfer learning for quantum state classification

Quantum to classical (QC) transfer learning consists of using a pre-trained quantum circuit as a feature extractor and in post-processing its output variables with a classical neural network. In this case only the final classical part will be trained to the specific problem of interest.

The starting point of our example is the pre-trained continuous-variable quantum network presented in Ref. Killoran et al. (2018), Section IV.D, Experiment C. The original aim of this network was to encode different images, representing the (L,O,T,I,S,J,Z) tetrominos (popularized by the video game Tetris 43), in the Fock basis of two-mode quantum states. The expected input of the quantum network is one of the following combinations of two-mode coherent states:


where the parameter is a fixed constant. In Ref. Killoran et al. (2018) the network was successfully trained to generate an optimal unitary operation

, such that the probability of finding

photons in the first mode and photons in the second mode is proportional to the amplitude of the image pixel . More precisely, the network was trained to reproduce the tetromino images after projecting the quantum state on the subspace of up to 3 photons (see Fig. 6).

Figure 6:

Images of the 7 different tetrominos encoded in the photon number probability distribution of two optical modes, after projecting on the subspace of up to

photons. These images are extracted from Fig. 10 of Ref. Killoran et al. (2018).

For the purposes of our example, we now assume that the previous input states (15) are subject to random Gaussian displacements in phase space:


where is a two-mode displacement operator Weedbrook et al. (2012), the values of the complex displacements and

are sampled from a symmetric Gaussian distribution with zero mean and quadrature variance

, and is the label associated to the input states (15). The noise is similar to a Gaussian additive channel Weedbrook et al. (2012); however, for simplifying the numerical simulation, here we assume that the unknown displacements remain constant during the estimation of expectation values. Physically, this situation might represent a slow phase-space drift of the input light mode.

We also assume that, differently from the original image encoding problem studied in Ref. Killoran et al. (2018), our new task is to classify the noisy input states. In other words, the network should take the states defined in (16) as inputs, and should ideally produce the correct label as output. In order to tackle this problem, we apply a QC transfer learning approach: we pre-process our random input states with the quantum network of Ref. Killoran et al. (2018) and we consider the corresponding images as features which we are going to post-process with a classical layer to predict the state label . In simple terms, the QC transfer learning method allows us to convert a quantum state classification problem into an image classification problem.

Also in this case we can summarize the transfer learning scheme according to the notation introduced in Section III and represented in Fig. 1:


two-mode coherent states defined in Eq. (15).


Photonic neural network introduced in Ref. Killoran et al. (2018), consisting of an encoding layer, variational layers, and a final (Fock) measurement layer.


Fock basis encoding of tetrominos images (see Fig. 6).


Pre-trained network , truncated up to a quantum depth of variational layers.


Same states of the original dataset but subject to random phase-space displacements as described in Eq. (16).


: i.e., a classical linear layer having the structure of Eq. (1), without activation ().

Also in this case we used the Adam optimizer Kingma and Ba (2014) to minimize a cross-entropy loss function associated to our classification problem. For each optimization step we sampled independent random displacements with variance that we applied to a batch of states defined in Eq. (15). We optimized the model over 1000 training batches with a learning rate of , obtaining a classification accuracy of . The numerical simulation was performed with the Strawberry Fields software platform Killoran et al. (2019)

, combined with the TensorFlow

Abadi et al. (2016) optimization back-end.

A summary of the hyper-parameters and of the corresponding accuracy is given in Table 4.

QC Classifier
Quantum depth 15
Classical depth 1
Noise variance 0.6
Training batches 1000
Batch size 7
Learning rate 0.01
Fock-space cutoff 11
Accuracy 0.803
Table 4: Hyper-parameters used for our quantum state classifier based on the QC transfer learning scheme. The last line reports the corresponding accuracy achieved by the model, simulated on Strawberry Fields with a fixed cutoff in the Fock basis.

Finally, the predictions for a sample of 7 noisy states are graphically visualized in Fig. 7

, where the features extracted by the pre-trained quantum network

are represented as gray scale images. The features of Fig. 7 are quite different from the original tetrominos images shown in Fig. 6. This due to the truncation of network and to the presence of input noise. However, as long as the images of Fig. 7 are distinguishable, this is not a relevant issue since the final classical layer is still able to correctly classify the input states with high accuracy.

Figure 7: Features of a batch of 7 noisy states extracted by the quantum network and represented as gray-scale images. The associated classes and the predictions made by our quantum-classical model are reported above each image.
Figure 8: Accuracy of the hybrid QC classifier with respect to the quantum depth of the pre-trained network , evaluated for three different classical networks of depth and respectively. The existence of an intermediate optimal value for the quantum depth is a characteristic signature typical of the transfer learning method.

We conclude this example with an analysis of the model performance with respect to the values of the quantum and classical depths. Since the original pre-trained network has quantum layers, for the truncated network we can choose a quantum depth within the interval 0-25. For the classical network we consider the cases of and layers, corresponding to the models , and , respectively.

The results are shown in Fig. 8. By direct inspection we can see that increasing the classical depth is helpful but it saturates the accuracy already after two layers. On the other hand, it is evident that the quantum depth has an optimal value around while for larger values the accuracy is reduced. This is a paradigmatic phenomenon well known in classical transfer learning: better features are usually extracted after removing some of the final layers of . Notice that because of the quantum nature of the system, the quantum state produced by the truncated variational circuit could be entangled and/or not aligned with the measurement basis. So the numerical evidence that the truncation of a quantum network does not always reduce the quality of the measured features, but it can actually be a convenient strategy for transfer learning, is a notable result.

Example 5 - QQ transfer learning for quantum state classification (Gaussian / non-Gaussian)

Finally, our last example is a proof-of-principle demonstration of QQ transfer learning. In this case we train an optical network to classify a particular dataset of Gaussian and non-Gaussian quantum states. Successively, we use it as a pre-trained block for a dataset consisting of Gaussian and non-Gaussian states which are different from those of . The pre-trained block is followed by some quantum variational layers that will be trained to classify the quantum states of .

Before presenting our model we need to define a continuous-variable single-mode variational layer, the analog of Eq. (12). We follow the general structure proposed in Ref. Killoran et al. (2018):


where is a phase space rotation, is a squeezing operation, is a displacement and is a cubic phase gate. All operations depend on variational parameters and, for sufficiently many layer applications, the model can generate any single-mode unitary operation. Moreover, by simply removing the last non-Gaussian gate from (17), we obtain a Gaussian layer which can generate all Gaussian unitary operations.

Figure 9: Evolution of the loss function (cross entropy) with respect to the number of training iterations. The top plot represents a network of total depth 3 optimized with a QQ transfer learning scheme (orange) compared with a network trained from scratch (blue). In the bottom plot instead the total depth is fixed to 4 layers.
QQ Classifier
Depth of 1
Depth of 3
Training batches 500
Batch size 8
Learning rate 0.01
Fock-space cutoff 15
Accuracy 0.869
Table 5: Hyper-parameters used for our quantum state classifier based on the QQ transfer learning scheme. The last line reports the corresponding accuracy achieved by the model (numerically simulated).

We can express the QQ transfer learning model of this example according to the notation introduced in Section III and represented in Fig. 1:


Two classes, 0 and 1, of quantum states generated by two different variational random circuits. States of class 0 are generated by a random single-mode Gaussian layer applied to the vacuum. States of class 1 are generated by a random non-Gaussian layer applied to the vacuum.


Single-mode variational quantum layer followed by an on/off threshold detector.


Classification (labels: 0 and 1).


Network without the measurement layer.


Two classes, 0 and 1, of quantum states. States of class 0 are generated by a random single-mode Gaussian layer applied to the coherent state with . States of class 1 are generated by a random Gaussian layer applied to the Fock state .


Single-mode variational quantum circuit of depth , followed by a on/off threshold detector.

A summary of the hyper-parameters used for defining and training this QQ model is reported in Table 5, together with the associated accuracy. In Fig. 9 we plot the loss function (cross entropy) of our quantum variational classifier with respect to the number of training iterations. We compare the results obtained with and without the pre-trained layer (i.e., with and without transfer learning), for a fixed total depth of 3 or 4 layers. It is clear that the QQ transfer learning approach offers a strong advantage in terms of training efficiency.

For a sufficiently long training time however, the network optimized from scratch achieves the same or better results with respect to the network with a fixed initial layer . This effect is well known also in the classical setting and it is not surprising: the network trained from scratch is in principle more powerful by construction, because it has more variational parameters. However, there are many practical situations in which the training resources are limited (especially when dealing with real NISQ devices) or in which the dataset is experimentally much more expensive with respect to . In all these kind of practically constrained situations, QQ transfer learning could represent a very convenient strategy.

V Conclusions

We have outlined a framework of transfer learning which is applicable to hybrid computational models where variational quantum circuits can be connected to classical neural networks. With respect to the well-studied classical scenario, in hybrid systems several new and promising opportunities naturally emerge as, for example, the possibility of transferring some pre-acquired knowledge at the classical-quantum interface (CQ and QC transfer learning) or between two quantum networks (QQ transfer learning). As an additional contribution, we have also introduced the notion of “dressed quantum circuits”, i.e., variational quantum circuits augmented with two trainable classical layers which improve and simplify the data encoding and decoding phases.

Each theoretical idea proposed in this work is supported with a proof-of-concept example, numerically demonstrating the validity of our models for practical applications such as image recognition or quantum state classification. Particular focus has been dedicated to the CQ transfer learning scheme because of its promising potential with currently available quantum computers. In particular we have used the CQ transfer learning method to successfully classify high resolution images with two real quantum processors (by IBM and Rigetti).

From our theoretical and experimental analysis, we can conclude that transfer learning is a promising approach, allowing to get performances which can already compete with classical algorithms, despite the early stage of current quantum technology. In the hybrid classical-quantum scenario considered in this work, transfer learning could be a key tool to help observe evidence of a quantum advantage in the near future.

We thank Christian Weedbrook for helpful discussions. The authors would like to thank Rigetti for access to their resources, Forest Smith et al. (2016), QCS and Aspen-4-4Q-A backend. We also acknowledge the use of the IBM Q Experience, Qiskit et al. . (2019) and IBM Q 5 Tenerife v1.0.0 (ibmqx4) backend.


  • [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al. (2016) TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467. Cited by: §IV.
  • [2] F. Arute et al. (2019) Quantum supremacy using a programmable superconducting processor. Nature 574 (7779), pp. 505–510. Cited by: §III.1.
  • [3] M. Benedetti, J. Realpe-Gómez, and A. Perdomo-Ortiz (2018) Quantum-assisted Helmholtz machines: a quantum–classical deep learning framework for industrial datasets in near-term devices. Quantum Science and Technology 3 (3), pp. 034007. Cited by: §II.3.
  • [4] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, and N. Killoran (2018) PennyLane: automatic differentiation of hybrid quantum-classical computations. arXiv preprint arXiv:1811.04968. Cited by: §IV.
  • [5] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd (2017) Quantum machine learning. Nature 549 (7671), pp. 195. Cited by: §I.
  • [6] A. Canziani, A. Paszke, and E. Culurciello (2016) An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678. Cited by: §III.1.
  • [7] K. ChNg, J. Carrasquilla, R. G. Melko, and E. Khatami (2017) Machine learning phases of strongly correlated fermions. Physical Review X 7 (3), pp. 031038. Cited by: §I.
  • [8] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) ImageNet: a large-scale hierarchical image database. In

    2009 IEEE Conference on Computer Vision and Pattern Recognition

    pp. 248–255. Cited by: item =.
  • [9] J. Devlin, M. Chang, K. Lee, and K. Toutanova (2018) BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Cited by: §I.
  • [10] V. Dunjko, J. M. Taylor, and H. J. Briegel (2016) Quantum-enhanced machine learning. Physical Review Letters 117 (13), pp. 130501. Cited by: §I.
  • [11] H. A. et al. . (2019)

    Qiskit: an open-source framework for quantum computing

    Note: doi:10.5281/zenodo.2562110 External Links: Document Cited by: §V.
  • [12] E. Farhi and H. Neven (2018) Classification with quantum neural networks on near term processors. arXiv preprint arXiv:1802.06002. Cited by: §I, §II.2.
  • [13] I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT press. Cited by: §II.1.
  • [14] A. W. Harrow and A. Montanaro (2017) Quantum computational supremacy. Nature 549 (7671), pp. 203. Cited by: §III.1.
  • [15] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Cited by: §I, Figure 4, item =.
  • [16] M. Henderson, S. Shakya, S. Pradhan, and T. Cook (2019) Quanvolutional neural networks: powering image recognition with quantum circuits. arXiv preprint arXiv:1904.04767. Cited by: §III.1.
  • [17] J. Howard and S. Ruder (2018) Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146. Cited by: §I.
  • [18] P. Huembeli, A. Dauphin, and P. Wittek (2018)

    Identifying quantum phase transitions with adversarial neural networks

    Physical Review B 97 (13), pp. 134109. Cited by: §I.
  • [19] N. Killoran, T. R. Bromley, J. M. Arrazola, M. Schuld, N. Quesada, and S. Lloyd (2018) Continuous-variable quantum neural networks. arXiv preprint arXiv:1806.06871. Cited by: §I, §II.2, §III.2, Figure 6, item =, §IV, §IV, §IV.
  • [20] N. Killoran, J. Izaac, N. Quesada, V. Bergholm, M. Amy, and C. Weedbrook (2019) Strawberry Fields: a software platform for photonic quantum computing. Quantum 3, pp. 129. Cited by: §IV.
  • [21] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV, §IV, §IV.
  • [22] A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Technical report University of Toronto. Cited by: §IV.
  • [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012)

    Imagenet classification with deep convolutional neural networks

    In Advances in Neural Information Processing Systems, pp. 1097–1105. Cited by: §I.
  • [24] D. Liu, S. Ran, P. Wittek, C. Peng, R. B. García, G. Su, and M. Lewenstein (2019)

    Machine learning by unitary tensor network of hierarchical tree structure

    New Journal of Physics 21 (7), pp. 073059. Cited by: §III.1.
  • [25] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-Guzik (2016) The theory of variational hybrid quantum-classical algorithms. New Journal of Physics 18 (2), pp. 023023. Cited by: §I, §II.2.
  • [26] S. J. Pan and Q. Yang (2009) A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22 (10), pp. 1345–1359. Cited by: §I, Table 1, §III.
  • [27] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in PyTorch. In NIPS Autodiff Workshop, Cited by: §IV.
  • [28] A. Perdomo-Ortiz, M. Benedetti, J. Realpe-Gómez, and R. Biswas (2018) Opportunities and challenges for quantum-assisted machine learning in near-term quantum computers. Quantum Science and Technology 3 (3), pp. 030502. Cited by: §I, §II.2.
  • [29] A. Peruzzo, J. McClean, P. Shadbolt, M. Yung, X. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’brien (2014)

    A variational eigenvalue solver on a photonic quantum processor

    Nature Communications 5, pp. 4213. Cited by: §I, §II.2.
  • [30] S. Piat, N. Usher, S. Severini, M. Herbster, T. Mansi, and P. Mountney (2018) Image classification with quantum pre-training and auto-encoders. International Journal of Quantum Information 16 (08), pp. 1840009. Cited by: §I, §III.1.
  • [31] L. Y. Pratt (1993) Discriminability-based transfer between neural networks. In Advances in Neural Information Processing Systems, pp. 204–211. Cited by: §I, Table 1, §III.
  • [32] J. Preskill (2018) Quantum computing in the NISQ era and beyond. Quantum 2, pp. 79. Cited by: §I.
  • [33] R. Raina, A. Battle, H. Lee, B. Packer, and A. Y. Ng (2007) Self-taught learning: transfer learning from unlabeled data. In Proceedings of the 24th International Conference on Machine Learning, pp. 759–766. Cited by: §I, Table 1, §III.
  • [34] Sasank Chilamkurthy, PyTorch transfer learning tutorial. Note: 2019-08-08 Cited by: §IV.
  • [35] M. Schuld, A. Bocharov, K. Svore, and N. Wiebe (2018) Circuit-centric quantum classifiers. arXiv preprint arXiv:1804.00633. Cited by: §I, §II.2.
  • [36] M. Schuld and N. Killoran (2019) Quantum machine learning in feature Hilbert spaces. Physical Review Letters 122 (4), pp. 040504. Cited by: §I, §II.2.
  • [37] M. Schuld, I. Sinayskiy, and F. Petruccione (2015) An introduction to quantum machine learning. Contemporary Physics 56 (2), pp. 172–185. Cited by: §I.
  • [38] K. Shiba, K. Sakamoto, K. Yamaguchi, D. B. Malla, and T. Sogabe (2019) Convolution filter embedded quantum gate autoencoder. arXiv preprint arXiv:1906.01196. Cited by: §III.1.
  • [39] S. Sim, P. D. Johnson, and A. Aspuru-Guzik (2019) Expressibility and entangling capability of parameterized quantum circuits for hybrid quantum-classical algorithms. arXiv preprint arXiv:1905.10876. Cited by: §I, §II.2.
  • [40] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §I.
  • [41] R. S. Smith, M. J. Curtis, and W. J. Zeng (2016) A practical quantum instruction set architecture. arXiv preprint arXiv:1608.03355. Cited by: §V.
  • [42] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich (2015) Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. Cited by: §I.
  • [43] Tetris, Wikipedia, 2019.. Note: 2019-08-08 Cited by: §IV.
  • [44] L. Torrey and J. Shavlik (2010) Transfer learning. In Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques, pp. 242–264. Cited by: §I, Table 1, §III.
  • [45] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008. Cited by: §I.
  • [46] G. Verdon, M. Broughton, J. R. McClean, K. J. Sung, R. Babbush, Z. Jiang, H. Neven, and M. Mohseni (2019) Learning to learn with quantum neural networks via classical neural networks. arXiv preprint arXiv:1907.05415. Cited by: §I.
  • [47] C. Weedbrook, S. Pirandola, R. García-Patrón, N. J. Cerf, T. C. Ralph, J. H. Shapiro, and S. Lloyd (2012) Gaussian quantum information. Reviews of Modern Physics 84 (2), pp. 621. Cited by: §IV.
  • [48] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson (2014) How transferable are features in deep neural networks?. In Advances in Neural Information Processing Systems, pp. 3320–3328. Cited by: §I, Table 1, §III.
  • [49] R. Zen, L. My, R. Tan, F. Hebert, M. Gattobigio, C. Miniatura, D. Poletti, and S. Bressan (2019) Transfer learning for scalability of neural-network quantum states. arXiv preprint arXiv:1908.09883. Cited by: §I.