Representation Learning via Quantum Neural Tangent Kernels

by   Junyu Liu, et al.
The University of Chicago

Variational quantum circuits are used in quantum machine learning and variational quantum simulation tasks. Designing good variational circuits or predicting how well they perform for given learning or optimization tasks is still unclear. Here we discuss these problems, analyzing variational quantum circuits using the theory of neural tangent kernels. We define quantum neural tangent kernels, and derive dynamical equations for their associated loss function in optimization and learning tasks. We analytically solve the dynamics in the frozen limit, or lazy training regime, where variational angles change slowly and a linear perturbation is good enough. We extend the analysis to a dynamical setting, including quadratic corrections in the variational angles. We then consider hybrid quantum-classical architecture and define a large-width limit for hybrid kernels, showing that a hybrid quantum-classical neural network can be approximately Gaussian. The results presented here show limits for which analytical understandings of the training dynamics for variational quantum circuits, used for quantum machine learning and optimization problems, are possible. These analytical results are supported by numerical simulations of quantum machine learning experiments.



There are no comments yet.


page 9


PennyLane: Automatic differentiation of hybrid quantum-classical computations

PennyLane is a Python 3 software framework for optimization and machine ...

Variational Quanvolutional Neural Networks with enhanced image encoding

Image classification is an important task in various machine learning ap...

Hybrid quantum-classical optimization for financial index tracking

Tracking a financial index boils down to replicating its trajectory of r...

Predicting human-generated bitstreams using classical and quantum models

A school of thought contends that human decision making exhibits quantum...

Quantum Neural Machine Learning - Backpropagation and Dynamics

The current work addresses quantum machine learning in the context of Qu...

Chaos and Complexity from Quantum Neural Network: A study with Diffusion Metric in Machine Learning

In this work, our prime objective is to study the phenomena of quantum c...

Dynamical large deviations of two-dimensional kinetically constrained models using a neural-network state ansatz

We use a neural network ansatz originally designed for the variational o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The idea of using quantum computers for machine learning has recently received attention both in academia and industry harrow2009quantum; wiebe2012quantum; lloyd2014quantum; wittek2014quantum; wiebe2014quantum; rebentrost2014quantum; biamonte2017quantum; mcclean2018barren; schuld2019quantum; tang2019quantum; havlivcek2019supervised; huang2021power; liu2021rigorous. While proof of principle study have shown that some problems of mathematical interest quantum computers are useful liu2021rigorous, quantum advantage in machine learning algorithms for practical applications is still unclear huang2021information

. On classical architectures, a first-principle theory of machine learning, especially the so-called deep learning that uses a large number of layers, is still in development. Early developments of the statistical learning theory provide rigorous guarantees on the learning capability in generic learning algorithms, but theoretical bounds obtained from information theory are sometimes weak in practical settings.

The theory of neural tangent kernel (NTK) has been deemed an important tool to understand deep neural networks lee2017deep; jacot2018neural; lee2019wide; arora2019exact; sohl2020infinite; yang2020feature; yaida2020non. In the large-width limit, a generic neural network becomes nearly Gaussian when averaging over the initial weights and biases, and the learning capabilities become predictable. The NTK theory allows to derive analytical understanding of the neural networks dynamics, improving on statistical learning theory and shedding light on the underlying principle of deep learning dyer2019asymptotics; halverson2021neural; roberts2021ai; roberts2021principles; Liu:2021ohs. In the quantum machine learning community, a similar first principle theory would help in understanding the training dynamics and selecting appropriate variational quantum circuits to target specific problems. A step in this direction has been onsidered recently for quantum classical neural networks nakaji2021quantumenhanced. However in the framework considered there no variational parameters were considered in the quantum circuits, leaving the problem of understanding and designing the quantum dynamical training not addressed.

In this paper, we address this problem, focusing on the limit where the learning rate is sufficiently small, inspired by the classical theory of NTK. Following the framework and results from roberts2021ai; roberts2021principles; summer, we first define a quantum analogue of a classical NTK. In the limit where the variational angles do not change much, the so-called lazy training chizat2018lazy, the frozen QNTK leads to an exponential decaying of the loss function used on the training set. We furthermore compute the leading order perturbation above the static limit, where we define a quantum version of the classical meta-kernel. We derive closed-form formulas for the dynamics of the training in terms of parameters of variational quantum circuits, see Fig. 1).

Figure 1: A cartoon illustration of the QNTK theory. : the QNTK characterizes the gradient descent dynamics in the variational quantum circuit. The quantum state modifies according to the QNTK prediction. : Around the end of the training, the QNTK is frozen and almost a constant. : The gradient descent dynamics could be highly non-linear, and the QNTK is running during gradient descent, which is a property of representation learning.

We then move to a hybrid quantum-classical neural network framework, and find that it becomes approximately Gaussian, as long as the quantum outputs are sufficiently orthogonal. We present an analytic derivation of the large-width limit where the non-Gaussian contribution to the neuron correlations is suppressed by large width. Interestingly, we observe that now the

width is defined by the number of independent Hermitian operators in the variational ansatz, which is upper-bounded by (a polynomial of) the dimension of the Hilbert space. Thus, a large Hilbert space size will naturally bring our neural network to the large-width limit. Moreover, the orthogonality assumption in the variational ansatz could be achieved statistically using randomized assumptions. If not, the hybrid quantum-classical neural networks could still learn features even at the large width, indicating a significant difference comparing to the classical neural networks.

We test the analytical derivations of our theory comparing against numerical experiments with the IBM quantum device simulator aleksandrowicz2019qiskit

, on a classification problem in the supervised learning setting, finding good agreement with the theory. The structure of this paper and the ideas presented are summarized in Fig.


Figure 2: Structure of our paper. In Section II we establish the theory of QNTK in the context of optimization without data for generic variational quantum ansatz, which is the typical task in quantum simulation. In Section III, we establish the theory of quantum machine learning with the help of QNTK. In Section IV, we define the hybrid quantum-classical neural network model, and we prove that in the large-width limit, the model is approximated by the Gaussian process. In Section V, we give numerical examples to demonstrate our quantum representation theory. In Section VI, we discuss the implication of this work, and outline open problems for future works. More technical details are given in the Supplementary Material (SM).

Ii Theory of quantum optimization

ii.1 QNTK for optimization

We start from a relatively simple example about the optimization of a quantum cost function, without a model to be learned from some data associate to it. Let a variational quantum wavefunction peruzzo2014variational; farhi2014quantum; mcclean2016theory; kandala2017hardware; mcardle2020quantum; cerezo2021variational be given as


Here we have defined unitary operators of the type , with a variational parameter , and a Hermitian operator associated to them. We denote the collection of all variational parameters as and the initial state as . Moreover, our ansatz also includes constant gates s that do not depend on the variational angles.

We introduce the following mean squared error (MSE) loss function when we wish to optimize the expectation value of a Hermitian operator

to its minimal eigenvalue

, which is assumed to be known here, over the class of states


Here we have defined the residual optimization error . When using gradient descent to optimize Eq. (2), the difference equation for the dynamics of the training parameter is given by


We use the notation to denote the difference between the step and the step during gradient descent for the quantity , , associated to a learning rate . Then we have also, to the linear order in ,


The object serves to construct a toy version of the NTK in the quantum setup, in the sense that it can be seen as a 1-dimensional kernel matrix with training data . We can make our definition of a QNTK associated to an optimization problem more precise as follows:

Definition 1 (QNTK for optimization).

The quantum neural tangent kernel (QNTK) associated to the optimization problem of Eq. (2) is given by




It is easy to show that the quantity squared in Eq (1) is imaginary, hence is always non-negative, . A derivation of Eq. (1) can be found in SM.

ii.2 Frozen QNTK limit for optimization

An analytic theory of the NTK is established when the learning rate is sufficiently small. It is defined by solving the coupled difference equations Eqs. (3, 4), which we report here


In the continuum learning rate limit , Eqs. (7

) become coupled non-linear ordinary differential equations, which are hard to solve in general. Note that this system of equations stems from a quantum optimization problem and in general it is classically hard to even instantiate.

Nevertheless, in the following we build an analytic model for a quantum version of the frozen NTK (frozen QNTK) in the regime of lazy training, where variational angles do not change too much. To be more precise, we assume that at a certain value our variational angles change by a small amount, . A typical scenario is to do the Taylor expansion around such values during the convergence regime for instance. Here is a small scaling parameter. We will call the limit the frozen QNTK limit.

In this limit, one can write , so that the dependence is absorbed into the non-variational part of the unitary by defining , and we have . In what follows, we drop the notation and understand the variational angles as small parameters that change by around a value . Then, expanding linearly for small we can define

Definition 2 (Frozen QNTK for quantum optimization).

In the optimization problem Eq. (2) the frozen QNTK limit is




In the frozen kernel limit, we can state the following result about the dependency of the residual error , solving Eq. (7) linearly for small .

Theorem 1 (Performance guarantee of optimization within the frozen QNTK approximation).

When using standard gradient descent for the optimization problem Eq. (2) within the frozen QNTK limit, the residual optimization error decays exponentially as


with a convergence rate defined as


with the norm.

The derivation is given in the SM. An immediate consequence is that the residual error will converge to zero,


ii.3 dQNTK

The frozen QNTK limit describes the regime of the linear approximation of non-linearities. Therefore, the frozen QNTK cannot reflect the non-linear nature of the variational quantum algorithms. In order to formulate an analytical model of the non-linearities, we now analyze the leading order correction in terms of the expansion of the learning rate and the size of the variational angle . We formulate the expansion of to the second order in ,


This time during gradient descent will follow the equation roberts2021principles:


With this expansion at second order, we have two contributing terms in Eq. (13). We label the first term of Eq. (13) quantum effective kernel, . We use to distinguish it from , when only a first-order expansion is considered in the description of the dynamics. It is dynamical in the sense that it depends on the value of the training parameter during the dynamics regulated by a gradient descent. We label the variable part of the second term in Eq. (14) quantum meta-kernel or dQNTK (differential of QNTK),

Definition 3 (Quantum meta-kernel for optimization).

The quantum meta-kernel associated with the optimization problem in Eq (2) is defined via


In the limit of small changes in , optimization problem Eq. 2, the quantum meta-kernel is given at the leading order perturbation theory in as


The residual error in the optimization problem of Eq. (2), can then be computed as


We are now ready to make a statement about the residual error in the limit of the dQNTK

Theorem 2 (Performance guarantee of optimization from dQNTK).

In optimization problem Eq. (2) at the dQNTK order, we split the residual optimization error into two pieces, the free part, and the interacting part,


The free part follows the exponentially decaying dynamics


and the interacting part is given by


Here we have


Thus, the residual optimization error will always finally approach zero,


Thus, the leading order perturbative correction gives the contribution .

Iii Theory of learning

iii.1 General theory

The results outlined in Section II can be extended in the context of supervised learning from a data space . In particular, we are given a training set contained in the dataspace . The data can be loaded into quantum states through a quantum feature map schuld2019quantum; havlivcek2019supervised. We define the variational quantum ansatz with a single layer by regarding the output of a quantum neural network as


Here, we assume that is taken from , a subset of the space of Hermitian operators of the Hilbert space , and the index describes the -th component of the output, associated to the -th operator . The above Hermitian operator expectation value evaluation model is a common definition of the quantum neural network. One could also measure the real and imaginary parts directly to define a complexified version of the quantum neural network, useful in the context of amplitude encoding for the , as discussed in the Supplementary Material. We are now in the position of introducing the loss function


Here, we call the residual training error and we assume is associated with the encoded data . Now, similarly to what described in section II.1, we have the gradient descent equation


with an associated kernel


To ease the notation, we shall define the joint index


which are running in the space and respectively (we use to indicate that the corresponding data component is in the sample set , and if we wish to make a general data point we will denote it as ), and our gradient descent equations are


It is possible to show that this kernel is always positive semidefinite and Hermitian, see Supplementary Material for a proof. Now recalling Eq.(1), we are in the position to give an analytical expression for the QNTK for a supervised learning problem as follows. Details on the derivation can be found in the Supplementary Material.

Definition 4 (QNTK for quantum machine learning).

The QNTK for the quantum learning model Eq. (24) is given by


iii.2 Absence of representation learning in the frozen limit

In the frozen QNTK case, the kernel is static, and the learning algorithm cannot learn features from the data. In the same fashion of section II.2, we take the frozen QNTK limit where the changes of variational angles are small. Using the previous notations we can define the QNTK in for quantum machine learning in the frozen limit, and a performance guarantee for the error on the loss function in this regime as follows.

Definition 5 (Frozen QNTK for quantum machine learning).

In the quantum learning model Eq. (24) with the frozen QNTK limit,

Theorem 3 (Performance guarantee of quantum machine learning in the frozen QNTK limit).

In the quantum learning model Eq.( 24) with the frozen QNTK limit, the residual optimization error decays exponentially during the gradient descent as


The convergence rate is defined as


Then we obtain for the quantum learning model Eq. 24 with the frozen QNTK limit, the asymptotic dynamics with the index , is given by


Here means that the kernel defined only restricted to the space (note that it is different from the kernel inverse defined for the whole space in general), and we denote the kernel inverse as


Specifically, if we assume indicates the data in the space , we will have . Proofs and details of these results are given in the SM. Moreover, the asymptotic value is different from the frozen QNTK case in the optimization problem, because of the existence of the difference between the training set and the whole data space.

iii.3 Representation learning in the dynamical setting

In the dynamical case, the kernel is changing during the gradient descent optimization, due to non-linearity in the unitary operations. In this case then the variational quantum circuits could naturally serve as architectures of representation learning in the classical sense.

We generalize the leading order perturbation theory of optimization naturally to the learning case, and we state the main theorems here. First, we have

Theorem 4 (Performance guarantee of quantum machine learning in the dQNTK limit).

In the quantum learning model Eq. (24) at the dQNTK order, the training error is given by two contributions, a free and interacting part, as follows






Here is the frozen (linear) part of the QNTK. Using a matrix notation for the compact indices , in the space , we have


where is defined as




For the quantum learning model Eq. 24 at the dQNTK order, the dynamics given by gradient descent on a general data point is given by


where s are called the quantum algorithm projectors (see roberts2021ai; summer for their original framework),


and is defined as




Finally, is the quantum meta-kernel in the quantum machine learning context,