## I Introduction

The idea of using quantum computers for machine learning has recently received attention both in academia and industry harrow2009quantum; wiebe2012quantum; lloyd2014quantum; wittek2014quantum; wiebe2014quantum; rebentrost2014quantum; biamonte2017quantum; mcclean2018barren; schuld2019quantum; tang2019quantum; havlivcek2019supervised; huang2021power; liu2021rigorous. While proof of principle study have shown that some problems of mathematical interest quantum computers are useful liu2021rigorous, quantum advantage in machine learning algorithms for practical applications is still unclear huang2021information

. On classical architectures, a first-principle theory of machine learning, especially the so-called deep learning that uses a large number of layers, is still in development. Early developments of the statistical learning theory provide rigorous guarantees on the learning capability in generic learning algorithms, but theoretical bounds obtained from information theory are sometimes weak in practical settings.

The theory of neural tangent kernel (NTK) has been deemed an important tool to understand deep neural networks lee2017deep; jacot2018neural; lee2019wide; arora2019exact; sohl2020infinite; yang2020feature; yaida2020non. In the large-width limit, a generic neural network becomes nearly Gaussian when averaging over the initial weights and biases, and the learning capabilities become predictable. The NTK theory allows to derive analytical understanding of the neural networks dynamics, improving on statistical learning theory and shedding light on the underlying principle of deep learning dyer2019asymptotics; halverson2021neural; roberts2021ai; roberts2021principles; Liu:2021ohs. In the quantum machine learning community, a similar first principle theory would help in understanding the training dynamics and selecting appropriate variational quantum circuits to target specific problems. A step in this direction has been onsidered recently for quantum classical neural networks nakaji2021quantumenhanced. However in the framework considered there no variational parameters were considered in the quantum circuits, leaving the problem of understanding and designing the quantum dynamical training not addressed.

In this paper, we address this problem, focusing on the limit where the learning rate is sufficiently small, inspired by the classical theory of NTK. Following the framework and results from roberts2021ai; roberts2021principles; summer, we first define a quantum analogue of a classical NTK. In the limit where the variational angles do not change much, the so-called *lazy training* chizat2018lazy, the *frozen* QNTK leads to an exponential decaying of the loss function used on the training set. We furthermore compute the leading order perturbation above the static limit, where we define a quantum version of the classical *meta-kernel*. We derive closed-form formulas for the dynamics of the training in terms of parameters of variational quantum circuits, see Fig. 1).

We then move to a hybrid quantum-classical neural network framework, and find that it becomes approximately Gaussian, as long as the quantum outputs are sufficiently orthogonal. We present an analytic derivation of the large-width limit where the non-Gaussian contribution to the neuron correlations is suppressed by large width. Interestingly, we observe that now the

*width*is defined by the number of independent Hermitian operators in the variational ansatz, which is upper-bounded by (a polynomial of) the dimension of the Hilbert space. Thus, a large Hilbert space size will naturally bring our neural network to the large-width limit. Moreover, the orthogonality assumption in the variational ansatz could be achieved statistically using randomized assumptions. If not, the hybrid quantum-classical neural networks could still learn features even at the large width, indicating a significant difference comparing to the classical neural networks.

We test the analytical derivations of our theory comparing against numerical experiments with the IBM quantum device simulator aleksandrowicz2019qiskit

, on a classification problem in the supervised learning setting, finding good agreement with the theory. The structure of this paper and the ideas presented are summarized in Fig.

2.## Ii Theory of quantum optimization

### ii.1 QNTK for optimization

We start from a relatively simple example about the optimization of a quantum cost function, without a model to be learned from some data associate to it. Let a variational quantum wavefunction peruzzo2014variational; farhi2014quantum; mcclean2016theory; kandala2017hardware; mcardle2020quantum; cerezo2021variational be given as

(1) |

Here we have defined unitary operators of the type , with a variational parameter , and a Hermitian operator associated to them. We denote the collection of all variational parameters as and the initial state as . Moreover, our ansatz also includes constant gates s that do not depend on the variational angles.

We introduce the following mean squared error (MSE) loss function when we wish to optimize the expectation value of a Hermitian operator

to its minimal eigenvalue

, which is assumed to be known here, over the class of states(2) |

Here we have defined the *residual optimization error* . When using gradient descent to optimize Eq. (2), the difference equation for the dynamics of the training parameter is given by

(3) |

We use the notation to denote the difference between the step and the step during gradient descent for the quantity , , associated to a learning rate . Then we have also, to the linear order in ,

(4) |

The object serves to construct a toy version of the NTK in the quantum setup, in the sense that it can be seen as a 1-dimensional kernel matrix with training data . We can make our definition of a QNTK associated to an optimization problem more precise as follows:

###### Definition 1 (QNTK for optimization).

The quantum neural tangent kernel (QNTK) associated to the optimization problem of Eq. (2) is given by

(5) |

where

(6) |

### ii.2 Frozen QNTK limit for optimization

An analytic theory of the NTK is established when the learning rate is sufficiently small. It is defined by solving the coupled difference equations Eqs. (3, 4), which we report here

(7) |

In the continuum learning rate limit , Eqs. (7

) become coupled non-linear ordinary differential equations, which are hard to solve in general. Note that this system of equations stems from a quantum optimization problem and in general it is classically hard to even instantiate.

Nevertheless, in the following we build an analytic model for a quantum version of the *frozen NTK* (frozen QNTK) in the regime of lazy training, where variational angles do not change too much. To be more precise, we assume that at a certain value our variational angles change by a small amount, . A typical scenario is to do the Taylor expansion around such values during the convergence regime for instance. Here is a small scaling parameter. We will call the limit the *frozen QNTK limit*.

In this limit, one can write , so that the dependence is absorbed into the non-variational part of the unitary by defining , and we have . In what follows, we drop the notation and understand the variational angles as small parameters that change by around a value . Then, expanding linearly for small we can define

###### Definition 2 (Frozen QNTK for quantum optimization).

In the frozen kernel limit, we can state the following result about the dependency of the residual error , solving Eq. (7) linearly for small .

###### Theorem 1 (Performance guarantee of optimization within the frozen QNTK approximation).

When using standard gradient descent for the optimization problem Eq. (2) within the frozen QNTK limit, the residual optimization error decays exponentially as

(10) |

with a convergence rate defined as

(11) |

with the norm.

The derivation is given in the SM. An immediate consequence is that the residual error will converge to zero,

(12) |

### ii.3 dQNTK

The frozen QNTK limit describes the regime of the linear approximation of non-linearities. Therefore, the frozen QNTK cannot reflect the non-linear nature of the variational quantum algorithms. In order to formulate an analytical model of the non-linearities, we now analyze the leading order correction in terms of the expansion of the learning rate and the size of the variational angle . We formulate the expansion of to the second order in ,

(13) |

This time during gradient descent will follow the equation roberts2021principles:

(14) |

With this expansion at second order, we have two contributing terms in Eq. (13). We label the first term of Eq. (13) quantum *effective* kernel, . We use to distinguish it from , when only a first-order expansion is considered in the description of the dynamics. It is dynamical in the sense that it depends on the value of the training parameter during the dynamics regulated by a gradient descent.
We label the variable part of the second term in Eq. (14) quantum *meta-kernel* or dQNTK (differential of QNTK),

###### Definition 3 (Quantum meta-kernel for optimization).

The quantum meta-kernel associated with the optimization problem in Eq (2) is defined via

(15) |

In the limit of small changes in , optimization problem Eq. 2, the quantum meta-kernel is given at the leading order perturbation theory in as

(16) |

The residual error in the optimization problem of Eq. (2), can then be computed as

(17) |

We are now ready to make a statement about the residual error in the limit of the dQNTK

###### Theorem 2 (Performance guarantee of optimization from dQNTK).

In optimization problem Eq. (2) at the dQNTK order, we split the residual optimization error into two pieces, the free part, and the interacting part,

(18) |

The free part follows the exponentially decaying dynamics

(19) |

and the interacting part is given by

(20) |

Here we have

(21) |

Thus, the residual optimization error will always finally approach zero,

(22) |

Thus, the leading order perturbative correction gives the contribution .

## Iii Theory of learning

### iii.1 General theory

The results outlined in Section II can be extended in the context of supervised learning from a data space . In particular, we are given a training set contained in the dataspace . The data can be loaded into quantum states through a quantum feature map schuld2019quantum; havlivcek2019supervised.
We define the variational quantum ansatz with a single *layer* by regarding the output of a quantum neural network as

(23) |

Here, we assume that is taken from , a subset of the space of Hermitian operators of the Hilbert space , and the index describes the -th component of the output, associated to the -th operator . The above *Hermitian operator expectation value evaluation* model is a common definition of the quantum neural network. One could also measure the real and imaginary parts directly to define a complexified version of the quantum neural network, useful in the context of amplitude encoding for the , as discussed in the Supplementary Material.
We are now in the position of introducing the loss function

(24) |

Here, we call the residual training error and we assume is associated with the encoded data . Now, similarly to what described in section II.1, we have the gradient descent equation

(25) |

with an associated kernel

(26) |

To ease the notation, we shall define the joint index

(27) |

which are running in the space and respectively (we use to indicate that the corresponding data component is in the sample set , and if we wish to make a general data point we will denote it as ), and our gradient descent equations are

(28) |

It is possible to show that this kernel is always positive semidefinite and Hermitian, see Supplementary Material for a proof. Now recalling Eq.(1), we are in the position to give an analytical expression for the QNTK for a supervised learning problem as follows. Details on the derivation can be found in the Supplementary Material.

###### Definition 4 (QNTK for quantum machine learning).

The QNTK for the quantum learning model Eq. (24) is given by

(29) |

### iii.2 Absence of representation learning in the frozen limit

In the frozen QNTK case, the kernel is static, and the learning algorithm cannot learn *features* from the data. In the same fashion of section II.2, we take *the frozen QNTK limit* where the changes of variational angles are small. Using the previous notations we can define the QNTK in for quantum machine learning in the frozen limit, and a performance guarantee for the error on the loss function in this regime as follows.

###### Definition 5 (Frozen QNTK for quantum machine learning).

In the quantum learning model Eq. (24) with the frozen QNTK limit,

(30) |

###### Theorem 3 (Performance guarantee of quantum machine learning in the frozen QNTK limit).

In the quantum learning model Eq.( 24) with the frozen QNTK limit, the residual optimization error decays exponentially during the gradient descent as

(31) |

The convergence rate is defined as

(32) |

Then we obtain for the quantum learning model Eq. 24 with the frozen QNTK limit, the asymptotic dynamics with the index , is given by

(33) |

Here means that the kernel defined only restricted to the space (note that it is different from the kernel inverse defined for the whole space in general), and we denote the kernel inverse as

(34) |

Specifically, if we assume indicates the data in the space , we will have . Proofs and details of these results are given in the SM. Moreover, the asymptotic value is different from the frozen QNTK case in the optimization problem, because of the existence of the difference between the training set and the whole data space.

### iii.3 Representation learning in the dynamical setting

In the dynamical case, the kernel is changing during the gradient descent optimization, due to non-linearity in the unitary operations. In this case then the variational quantum circuits could naturally serve as architectures of representation learning in the classical sense.

We generalize the leading order perturbation theory of optimization naturally to the learning case, and we state the main theorems here. First, we have

###### Theorem 4 (Performance guarantee of quantum machine learning in the dQNTK limit).

In the quantum learning model Eq. (24) at the dQNTK order, the training error is given by two contributions, a free and interacting part, as follows

(35) |

where

(36) |

and

(37) |

Here is the frozen (linear) part of the QNTK. Using a matrix notation for the compact indices , in the space , we have

(38) |

where is defined as

(39) |

and

(40) |

For the quantum learning model Eq. 24 at the dQNTK order, the dynamics given by gradient descent on a general data point is given by

(41) |

where s are called the quantum algorithm projectors (see roberts2021ai; summer for their original framework),

(42) |

and is defined as

(43) |

or

(44) |

Finally, is the quantum meta-kernel in the quantum machine learning context,

Comments

There are no comments yet.