Continuous-variable quantum neural networks

06/18/2018
by   Nathan Killoran, et al.
2

We introduce a general method for building neural networks on quantum computers. The quantum neural network is a variational quantum circuit built in the continuous-variable (CV) architecture, which encodes quantum information in continuous degrees of freedom such as the amplitudes of the electromagnetic field. This circuit contains a layered structure of continuously parameterized gates which is universal for CV quantum computation. Affine transformations and nonlinear activation functions, two key elements in neural networks, are enacted in the quantum network using Gaussian and non-Gaussian gates, respectively. The non-Gaussian gates provide both the nonlinearity and the universality of the model. Due to the structure of the CV model, the CV quantum neural network can encode highly nonlinear transformations while remaining completely unitary. We show how a classical network can be embedded into the quantum formalism and propose quantum versions of various specialized model such as convolutional, recurrent, and residual networks. Finally, we present numerous modeling experiments built with the Strawberry Fields software library. These experiments, including a classifier for fraud detection, a network which generates Tetris images, and a hybrid classical-quantum autoencoder, demonstrate the capability and adaptability of CV quantum neural networks.

READ FULL TEXT VIEW PDF

Authors

page 14

page 15

04/04/2022

Continuous Variable Quantum MNIST Classifiers

In this paper, classical and continuous variable (CV) quantum neural net...
03/08/2021

The Hintons in your Neural Network: a Quantum Field Theory View of Deep Learning

In this work we develop a quantum field theory formalism for deep learni...
10/16/2018

Universal Uhrig dynamical decoupling for bosonic systems

We construct efficient deterministic dynamical decoupling schemes protec...
01/15/2019

Separation and approximate separation of multipartite quantum gates

The number of qubits of current quantum computers is one of the most dom...
10/08/2020

Neural Group Actions

We introduce an algorithm for designing Neural Group Actions, collection...
05/26/2022

QSpeech: Low-Qubit Quantum Speech Application Toolkit

Quantum devices with low qubits are common in the Noisy Intermediate-Sca...
07/05/1999

The importance of quantum decoherence in brain processes

Based on a calculation of neural decoherence rates, we argue that that t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

After many years of scientific development, quantum computers are now beginning to move out of the lab and into the mainstream. Over those years of research, many powerful algorithms and applications for quantum hardware have been established. In particular, the potential for quantum computers to enhance machine learning is truly exciting

biamonte2017quantum ; wittek2014quantum ; schuld2018book . Sufficiently powerful quantum computers can in principle provide computational speedups for key machine learning algorithms and subroutines such as data fitting wiebe2012quantum

, principal component analysis

lloyd2014quantum

, Bayesian inference

low2014quantum ; wiebe2015can , Monte Carlo methods montanaro2015quantum

, support vector machines

rebentrost2014quantum ; schuld2018quantum

, Boltzmann machines

amin2018quantum ; kieferova2016tomography , and recommendation systems kerenidis2016quantum .

On the classical computing side, there has recently been a renaissance in machine learning techniques based on neural networks, forming the new field of deep learning

lecun2015deep ; schmidhuber2015deep ; goodfellow2016deep . This breakthrough is being fueled by a number of technical factors, including new software libraries bergstra2010theano ; jia2014caffe ; maclaurin2015autograd ; abadi2016tensorflow ; paszke2017automatic and powerful special-purpose computational hardware chetlur2014cudnn ; jouppi2017datacenter

. Rather than the conventional bit registers found in digital computing, the fundamental computational units in deep learning are continuous vectors and tensors which are transformed in high-dimensional spaces. At the moment, these continuous computations are still approximated using conventional digital computers. However, new specialized computational hardware is currently being engineered which is fundamentally analog in nature

mead1990neuromorphic ; poon2011neuromorphic ; appeltant2011information ; tait2014broadcast ; monroe2014neuromorphic ; tait2014photonic ; vandoorne2014experimental ; shen2017deep .

Quantum computation is a paradigm that furthermore includes nonclassical effects such as superposition, interference, and entanglement, giving it potential advantages over classical computing models. Together, these ingredients make quantum computers an intriguing platform for exploring new types of neural networks, in particular hybrid classical-quantum schemes romero2017quantum ; wan2017quantum ; verdon2017quantum ; farhi2018classification ; mitarai2018quantum ; schuld2018circuit ; grant2018hierarchical ; chen2018universal

. Yet the familiar qubit-based quantum computer has the drawback that it is not wholly continuous, since the measurement outputs of qubit-based circuits are generally discrete. Rather, it can be thought of as a type of

digital quantum hardware adesso2014continuous , only partially suited to continuous-valued problems perdomo2017opportunities ; benedetti2018quantum .

The quantum computing architecture which is most naturally continuous is the continuous-variable (CV) model. Intuitively, the CV model leverages the wave-like properties of nature. Quantum information is encoded not in qubits, but in the quantum states of fields, such as the electromagnetic field, making it ideally suited to photonic hardware. The standard observables in the CV picture, e.g., position or momentum , have continuous outcomes. Importantly, qubit computations can be embedded into the quantum field picture gottesman2001encoding ; knill2001scheme , so there is no loss in computational power by taking the CV approach. Recently, the first steps towards using the CV model for machine learning have begun to be explored, showing how several basic machine learning primitives can be built in the CV setting lau2017quantum ; das2018continuous . As well, a kernel-based classifier using a CV quantum circuit was trained in schuld2018quantum . Beyond these early forays, the CV model remains largely unexplored territory as a setting for machine learning.

In this work, we show that the CV model gives a native architecture for building neural network models on quantum computers. We propose a variational quantum circuit which straightforwardly extends the notion of a fully connected layer structure from classical neural networks to the quantum realm. This quantum circuit contains a continuously parameterized set of operations which are universal for CV quantum computation. By stacking multiple building blocks of this type, we can create multilayer quantum networks which are increasingly expressive. Since the network is made from a universal set of gates, this architecture can also provide a quantum advantage: for certain problems, a classical neural network would require exponentially many resources to approximate the quantum network. Furthermore, we show how to embed classical neural networks into a CV quantum network by restricting to the special case where the gates and parameters of the network do not create any superposition or entanglement.

This paper is organized as follows. In Sec. II, we review the key concepts from deep learning and from quantum computing which set up the remainder of the paper. We then introduce our basic continuous-variable quantum neural network model in Sec. III and explore it in detail theoretically. In Sec. IV, we validate and showcase the CV quantum neural network architecture through several machine learning modeling experiments. We conclude with some final thoughts in Sec. V.

Ii Overview

In this section, we give a high-level synopsis of both deep learning and the CV model. To make this work more accessible to practitioners from diverse backgrounds, we will defer the more technical points to later sections. Both deep learning and CV quantum computation are rich fields; further details can be found in various review papers and textbooks lecun2015deep ; goodfellow2016deep ; ferraro2005gaussian ; weedbrook2012gaussian ; adesso2014continuous ; serafini2017quantum .

ii.1 Neural networks and deep learning

The fundamental construct in deep learning is the feedforward neural network (also known as the multilayer perceptron) goodfellow2016deep . Over time, this key element has been augmented with additional structure – such as convolutional feature maps lecun1989backpropagation , recurrent connections rumelhart1986learning , attention mechanisms ba2014multiple , or external memory graves2014neural

– for more specialized or advanced use cases. Yet the basic recipe remains largely the same: a multilayer structure, where each layer consists of a linear transformation followed by a nonlinear ‘activation’ function. Mathematically, for an input vector

, a single layer performs the transformation

(1)

where is a matrix, is a vector, and is the nonlinear function. The objects and – called the weight matrix and the bias vector, respectively – are made up of free parameters and . Typically, the activation function contains no free parameters and acts element-wise on its inputs.

The ‘deep’ in deep learning comes from stacking multiple layers of this type together, so that the output of one layer is used as an input for the next. In general, each layer will have its own independent weight and bias parameters. Summarizing all model parameters by the parameter set , an -layer neural network model is given by

(2)

and maps an input to a final output .

Building machine learning models with multilayer neural networks is well-motivated because of various universality theorems hornik1989multilayer ; cybenko1989approximation ; leshno1993multilayer . These theorems guarantee that, provided enough free parameters, feedforward neural networks can approximate any continuous function on a closed and bounded subset of to an arbitrary degree of accuracy. While the original theorems showed that two layers were sufficient for universal function approximation, deeper networks can be more powerful and more efficient than shallower networks with the same number of parameters maass1994comparison ; montufar2014universal ; lin2017does .

The universality theorems prove the power of the neural network model for approximating functions, but those theorems do not say anything about how to actually find this approximation. Typically, the function to be fitted is not explicitly known, but rather its input-output relation is to be inferred from data. How can we adjust the network parameters so that it fits the given data? For this task, the workhorse is the stochastic gradient descent algorithm

bottou1998online

, which fits a neural network model to data by estimating derivatives of the model’s parameters – the weights and biases – and using gradient descent to minimize some relevant objective function. Combined with a sufficiently large dataset, neural networks trained via stochastic gradient descent have shown remarkable performance for a variety of tasks across many application areas

lecun2015deep ; goodfellow2016deep .

ii.2 Quantum computing and the CV model

The quantum analogue of the classical bit is the qubit. The quantum states of a many-qubit system are normalized vectors in a complex Hilbert space. Various attempts have been made over the years to encode neural networks and neural-network-like structures into qubit systems, with varying degrees of success schuld2014quest

. One can roughly distinguish two strategies. There are approaches that encode inputs into the amplitude vector of a multiqubit state and interpret unitary transformations as neural network layers. These models require indirect techniques to introduce the crucial nonlinearity of the activation function, which often lead to a nonnegligible probability for the algorithm to fail

torrontegui18 ; cao17 ; schuld18cc . Other approaches, which encode each input bit into a separate qubit farhi18 ; wan17

, have an overhead stemming from the need to binarize the continuous values. Furthermore, the typical neural network structure of matrix multiplication and nonlinear activations becomes cumbersome to translate into a quantum algorithm, and the advantages of doing so are not always apparent. Due to these constraints, qubit architectures are arguably not the most flexible quantum frameworks for encoding neural networks, which have continuous real-valued inputs and outputs.

Fortunately, qubits are not the sole medium available for quantum information processing. An alternate quantum computing architecture, the CV model lloyd1999quantum , is a much better fit with the continuous picture of computation underlying neural networks. The CV formalism has a long history, and can be physically realized using optical systems andersen2015hybrid ; yoshikawa2016invited , in the microwave regime moon2005theory ; peropadre2016proposal ; girvin2017schrodinger , and using ion traps shen2014scalable ; meekhof1996generation ; monroe1996schrodinger . In the CV model, information is carried in the quantum states of bosonic modes, often called qumodes, which form the ‘wires’ of a quantum circuit. Continuous-variable quantum information can be encoded using two related pictures: the wavefunction representation schrodinger1926quantisierung ; schrodinger1926undulatory and the phase space formulation of quantum mechanics weyl1927quantenmechanik ; wigner1932quantum ; groenewold1946principles ; moyal1949quantum . In the former, we specify a single continuous variable, say , and represent the state of the qumode through a complex-valued function of this variable called the wavefunction . Concretely, we can interpret as a position coordinate, and as the probability density of a particle being located at . From elementary quantum theory, we can also use a wavefunction based on a conjugate momentum variable, . Instead of position and momentum, and can equivalently be pictured as the real and imaginary parts of a quantum field, such as light.

In the phase space picture, we treat the conjugate variables and on equal footing, giving a connection to classical Hamiltonian mechanics. Thus, the state of a single qumode is encoded with two real-valued variables . For -qumodes, the phase space employs real variables . Qumode states are represented as real-valued functions in phase space called quasiprobability distributions

. ‘Quasi’ refers to the fact that these functions share some, but not all, properties with classical probability distributions. Specifically, quasiprobability functions can be negative. While normalization forces qubit systems to have a unitary geometry, normalization gives a much looser constraint in the CV picture, namely that the function

has unit integral over the phase space. Qumode states also have a representation as vectors or density matrices in the countably infinite Hilbert space spanned by the Fock states , which are the eigenstates of the photon number operator . These basis states represent the particle-like nature of qumode systems, with denoting the number of particles. This is analogous to how square-integrable functions can be expanded using a countable basis set like sines or cosines.

The phase space and Hilbert space formulations give equivalent predictions. Thus, CV quantum systems can be explored from both a wave-like and a particle-like perspective. We will mainly concentrate on the former.

Gaussian operations

There is a key distinction in the CV model between the quantum gates which are Gaussian and those which are not. In many ways, the Gaussian gates are the “easy” operations for a CV quantum computer. The simplest single-mode Gaussian gates are rotation , displacement , and squeezing . The basic two-mode Gaussian gate is the (phaseless) beamsplitter , which can be understood as a rotation between two qumodes. More explicitly, these Gaussian gates produce the following transformations on phase space:

(3)
(4)
(5)
(6)

The ranges for the parameter values are , , and .

Notice that most of these Gaussian operations have names suggestive of a linear character. Indeed, there is a natural correspondence between Gaussian operations and affine transformations on phase space. For a system of modes, the most general Gaussian transformation has the effect

(7)

where is a real-valued symplectic matrix and is a complex vector with real/imaginary parts . This native affine structure will be our key for building quantum neural networks.

A matrix is symplectic if it satisfies the relation where

(8)

is the symplectic form. A generic symplectic matrix

can be split into a type of singular-value decomposition – known as the

Euler or Bloch-Messiah decomposition weedbrook2012gaussian ; serafini2017quantum – of the form

(9)

where with , and and are real-valued matrices which are symplectic and orthogonal. A matrix with these two properties must have the form

(10)

with

(11)
(12)

We will also need later the fact that if

is an arbitrary orthogonal matrix, then

is both orthogonal and symplectic. Importantly, the intersection of the symplectic and orthogonal groups on dimensions is isomorphic to the unitary group on dimensions. This isomorphism allows us to perform the transformations via the unitary action of passive linear optical interferometers.

Every Gaussian transformation on modes (Eq. (7)) can be decomposed into a CV circuit containing only the basic gates mentioned above. Looking back to Eqs. (3)-(6), we can recognize that interferometers made up of and gates are sufficient to generate the orthogonal transformations , , while gates are sufficient to give the scaling transformation

. Finally, displacement gates complete the full affine transformation. Alternatively, we could have defined the Gaussian transformations as those quantum circuits which contain only the gates given above. The Gaussian transformations are so-named because they map the set of Gaussian distributions in phase space to itself.

Classical CV quantum computing
feedforward neural network CV variational circuit
weight matrix symplectic matrix
bias vector displacement vector
affine transformations Gaussian gates
nonlinear function non-Gaussian gate
weight/bias parameters gate parameters
variable operator
derivative conjugate operator
no classical analogue superposition
no classical analogue entanglement
Table 1: Conceptual correspondences between classical neural networks and CV quantum computing. Some concepts from the quantum side have no classical analogue.

Universality in the CV model

Similar to neural networks, quantum computing comes with its own inherent notions of ‘universality.’ To define universality in the CV model, we need to first introduce operator versions of the phase space variables, namely and . The operator has a spectrum consisting of the entire real line:

(13)

where the vectors are orthogonal, . This operator is not trace-class, and the vectors are not normalizable. In the phase space representation, the eigenstates correspond to ellipses centered at which are infinitely squeezed, i.e., infinitesimal along the -axis and correspondingly infinite in extent on the -axis. The conjugate operator has a similar structure:

(14)

where and . Each qumode of a CV quantum computer is associated with a pair of operators . For multiple modes, we combine the associated operators together into vectors .

These operators have the commutator , which leads to the famous uncertainty relation for simultaneous measurements of and . Connecting to Eq. (3), we can associate with a rotation of the operator ; more concretely,

is the Fourier transform of

. Indeed, we can transform between and with the special rotation gate . Using a functional representation, the operator has the effect of multiplication . In this same representation, is proportional to the derivative operator, , as expected from the theory of Fourier transforms.

Universality of the CV model is defined as the ability to approximate arbitrary transformations of the form

(15)

where the generator is a polynomial function of with arbitrary but fixed degree lloyd1999quantum . Crucially, such transformations are unitary in the Hilbert space picture, but can have complex nonlinear effects in the phase space picture, a fact that we later make use of for designing quantum neural networks. A set of gates is universal if it can be used to build any through a polynomial-depth quantum circuit. In fact, a universal gate set for CV quantum computing consists of the following ingredients: all the Gaussian transformations from Eq. (3)-(6), combined with any single non-Gaussian transformation, which corresponds to a nonlinear function on the phase space variables . This is analogous to classical neural networks, where affine transformations combined with a single class of nonlinearity are sufficient to universally approximate functions. Commonly encountered non-Gaussian gates are the cubic phase gate and the Kerr gate .

Iii Continuous-variable quantum neural networks

Figure 1: The circuit structure for a single layer of a CV quantum neural network: an interferometer, local squeeze gates, a second interferometer, local displacements, and finally local non-Gaussian gates. The first four components carry out an affine transformation, followed by a final nonlinear transformation.

In this section, we present a scheme for quantum neural networks using the CV framework. It is inspired from two sides. First, from the structure of classical neural networks, which are universal function approximators and have demonstrated impressive performance on many practical problems. Second, from variational quantum circuits, which have recently become the predominant way of thinking about algorithms on near-term quantum devices peruzzo2014variational ; moll2017quantum ; verdon2017quantum ; farhi2018classification ; schuld2018quantum ; schuld2018circuit ; dallaire2018quantum ; benedetti2018adversarial ; havlicek2018supervised . The main idea is the following: the fully connected neural network architecture provides a powerful and intuitive ansatz for designing variational circuits in the CV model.

We will first introduce the most general form of the quantum neural network, which is the analogue of a classical fully connected network. We then show how a classical neural network can be embedded into the quantum formalism as a special case (where no superposition or entanglement is created), and discuss the universality and computational complexity of the fully quantum network. As modern deep learning has moved beyond the basic feedforward architecture, considering ever more specialized models, we will also discuss how to extend or specialize the quantum neural network to various other cases, specifically recurrent, convolutional, and residual networks. In Table 1, we give a high-level matching between neural network concepts and their CV analogues.

Figure 2: An example multilayer continuous-variable quantum neural network. In this example, the later layers are progressively decreased in size. Qumodes can be removed either by explicitly measuring them or by tracing them out. The network input can be classical, e.g., by displacing each qumode according to data, or quantum. The network output is retrieved via measurements on the final qumode(s).

iii.1 Fully connected quantum layers

A general CV quantum neural network is built up as a sequence of layers, with each layer containing every gate from the universal gate set. Specifically, a layer consists of the successive gate sequence shown in Fig. 1:

(16)

where are general -port linear optical interferometers containing beamsplitter and rotation gates, and are collective displacement and squeezing operators (acting independently on each mode) and is some non-Gaussian gate, e.g., a cubic phase or Kerr gate. The collective gate variables form the free parameters of the network, where can be optionally kept fixed.

The sequence of Gaussian transformations is sufficient to parameterize every possible unitary affine transformation on qumodes. In the phase space picture, this corresponds to the transformation of Eq. (7). This sequence thus has the role of a ‘fully connected’ matrix transformation. Interestingly, adding a nonlinearity uses the same component that adds universality: a non-Gaussian gate . Using , we can write the combined transformation in a form reminiscent of Eq. (1), namely

(17)

Thanks to the CV encoding, we get a nonlinear functional transformation while still keeping the quantum circuit unitary.

Similar to the classical setup, we can stack multiple layers of this type end-to-end to form a deeper network (Fig. 2). The quantum state output from one layer is used as the input for the next. Different layers can be made to have different widths by adding or removing qumodes between layers. Removal can be accomplished by measuring or tracing out the extra qumodes. In fact, conditioning on measurements of the removed qumodes is another method for performing non-Gaussian transformations andersen2015hybrid . This architecture can also accept classical inputs. We can do this by fixing some of the gate arguments to be set by classical data rather than free parameters, for example by applying a displacement to the vacuum state to prepare the state . This scheme can be thought of as an embedding of classical data into a quantum feature space schuld2018quantum . The output of the network can be obtained by performing measurements and/or computing expectation values. The choice of measurement operators is flexible; different choices (homodyne, heterodyne, photon-counting, etc.) may be better suited for different situations.

iii.2 Embedding classical neural networks

The above scheme for a CV quantum neural network is quite flexible and general. In fact, it includes classical neural networks as a special case, where we don’t create any superposition or entanglement. We now present a mathematical recipe for embedding a classical neural network into the quantum CV formalism. We give the recipe for a single feedforward layer; multilayer networks follow straightforwardly. Throughout this part, we will represent -dimensional real-valued vectors using -mode quantum optical states built from the eigenstates of the operators :

(18)

For the first layer in a network, we create the input by applying the displacement operator to the state . Subsequent layers will use the output of the previous layer as input. To read out the output from the final layer, we can use ideal homodyne detection in each qumode, which projects onto the states serafini2017quantum .

We would like to enact a fully connected layer (Eq. (1)) completely within this encoding, i.e.,

(19)

This transformation will take place entirely within the coordinates; we will not use the momentum variables. We thus want to restrict our quantum network to never mix between and . To proceed, we will break the overall computation into separate pieces. Specifically, we split up the weight matrix using a singular value decomposition, , where the are orthogonal matrices and is a positive diagonal matrix. For simplicity, we assume that is full rank. Rank-deficient matrices form a measure-zero subset in the space of weight matrices, which we can approximate arbitrarily closely with full-rank matrices.

Multiplication by an orthogonal matrix.

The first step in Eq. (16) is to apply an interferometer , which corresponds to the rightmost orthogonal matrix in Eq. (9). In order not to mix and , we must restrict to block-diagonal . With respect to Eqs. (10)-(12), this means that is an orthogonal matrix and . This choice corresponds to an interferometer which only contains phaseless beamsplitters. With this restriction, we have

(20)

The full derivation of this expression can be found in Appendix A. Thus, the phaseless linear interferometer is equivalent to multiplying the encoded data by an orthogonal matrix . To connect to the weight matrix , we choose the interferometer which has . A similar result holds for the other interferometer .

Multiplication by a diagonal matrix.

For our next element, consider the squeezing gate. The effect of squeezing on the eigenstates is kok2010introduction

(21)

where . An arbitrary positive scaling can thus be achieved by taking . Note that squeezing leads to compression (positive , ), while antisqueezing gives expansion (negative , ), matching with Eq. (5). A collection of local squeezing transformations thus corresponds to an elementwise scaling of the encoded vector,

(22)

where . We note that since the eigenstates are not normalizable, the prefactor has limited formal consequence.

Addition of bias.

Finally, it is well-known that the displacement operator acting locally on quadrature eigenstates has the effect

(23)

for , which collectively gives

(24)

Thus, to achieve a bias translation of , we can simply displace by .

Affine transformation.

Putting these ingredients together, we have

(25)

where we have omitted the parameters for clarity. Hence, using only Gaussian operations which do not mix and , we can effectively perform arbitrary full-rank affine transformations amongst the vectors .

Nonlinear function.

To complete the picture, we need to find a non-Gaussian transformation which has the following effect

(26)

where is some nonlinear function. We will restrict to an element-wise function, i.e., acts locally on each mode, similar to the activation function of a classical neural network. For simplicity, we will consider to be a polynomial of fixed degree. By allowing the degree of to be arbitrarily high, we can approximate any function which has convergent Taylor series. The most general form of a quantum channel consists of appending an ancilla system, performing a unitary transformation on the combined system, and tracing out the ancilla. For qumode , we will append an ancilla in the eigenstate, i.e.,

(27)

where, for clarity, we have made the temporary notational change .

Consider now the unitary , where is understood as a Taylor series using powers of . Applying this to the above two-mode system, we get

(28)

where we have recognized that is the generator of displacements in . We can now swap modes and (using a perfectly reflective beamsplitter) and trace out the ancilla. The combined action of these operations leads to the overall transformation

(29)

Alternatively, we are free to keep the system in the form ; this can be useful for creating residual quantum neural networks.

Together, the above sequence of Gaussian operations, followed by a non-Gaussian operation, lead to the desired transformation , which is the same as a single-layer classical neural network. We remark finally that the states were used in order to provide a convenient mathematical embedding; in a practical CV device, we would need to approximate the states via finitely squeezed states. In practice, the general quantum neural network framework does not require any particular choice of basis or encoding. Because of this additional flexibility, the full quantum network has larger representational capacity than a conventional neural network and cannot be efficiently simulated by classical models, as we now discuss.

iii.3 The power of CV neural networks

None of the transformations considered in the previous section ever generate superpositions or entanglement. A distinguishing feature of quantum physics is that we can act not only on some fixed basis states, e.g., the states , but also on superpositions – that is, linear combinations – of those basis states, , where is a multimode wavefunction. The general CV neural network provides greater freedom in the allowed operations by leveraging the power of universal quantum computation. Indeed, the quantum gates in a single layer form a universal gate set, which implies that a CV quantum neural network shares all the capabilities of a universal CV quantum computer.

To see this, consider an arbitrary quantum computation and its decomposition in terms of a circuit consisting of a sequence of gates from universal gate set. We assign a quantum neural network to this circuit by replacing each gate in the circuit by a single layer. Since each layer contains all gates from the universal set, it can reproduce the action of the single selected gate by setting the parameters of all other gates to zero. Therefore the full network can also replicate the complete quantum circuit.

Since CV quantum neural networks are capable of universal CV quantum computation, in general we do not expect that they can be efficiently simulated on a classical computer. This statement can be put on firmer ground by considering a simple modification to the classical neural network embedding from Sec. III.2. Specifically, we carry out a Fourier transform on all modes at the beginning and end of the network. The result is that input states are replaced by momentum eigenstates and the position homodyne measurements are replaced with momentum homodyne measurements. A momentum eigenstate is an equal superposition over all position eigenstates and thus this circuit can be interpreted as acting on an equal superposition of all classical inputs.

The resulting circuits, consisting of input momentum eigenstates, a unitary transformation that is diagonal in the position basis, and momentum homodyne measurements, are known as continuous-variable instantaneous quantum polynomial (CV-IQP) circuits. It was proven in Ref. douce2017iqp that efficient exact classical simulation of CV-IQP circuits would imply a collapse of the polynomial hierarchy to third level. This result was extended in Ref. arrazola2017quantum to the case of approximate classical simulation, under the validity of a plausible conjecture concerning the computational complexity of evaluating high-dimensional integrals. Thus, even a simple modification of the classical embedding presented above gives quantum neural networks the ability to perform tasks that would require exponentially many resources to replicate on classical devices.

iii.4 Beyond the fully connected architecture

Figure 3: Quantum adaptations of the convolutional layer, recurrent layer, and residual layer. The convolutional layer is enacted using a Gaussian unitary with translationally invariant Hamiltonian, resulting in a corresponding symplectic matrix that has a block Toeplitz structure. The recurrent layer combines an internal signal from previous layers with an external source, while the residual layer combines its input and output signals using a controlled-X gate.

Modern deep learning techniques have expanded beyond the basic fully connected architecture. Powerful deep learning software packages bergstra2010theano ; jia2014caffe ; maclaurin2015autograd ; abadi2016tensorflow ; paszke2017automatic have allowed researchers to explore more specialized networks or complicated architectures. For the quantum case, we should also not feel restricted to the basic network structure presented above. Indeed, the CV model gives us flexibility to encode problems in a variety of representations. For example, we can use the phase space picture, the wavefunction picture, the Hilbert space picture, or some hybrid of these. We can also encode information in coherent states, squeezed states, Fock states, or superpositions of these states. Furthermore, by choosing the gates and parameters to have particular structure, we can specialize our network ansatz to more closely match a particular class of problems. This can often lead to more efficient use of parameters and better overall models. In the rest of this section, we will highlight potential quantum versions of various special neural network architectures; see Fig. 3 for a visualization.

Convolutional network.

A common architecture in classical neural networks is the convolutional network, or convnet lecun1989backpropagation

. Convnets are particularly well-suited for computer vision and image recognition problems because they reflect a simple yet powerful observation: since the task of detecting an object is largely independent of where the object appears in an image, the network should be equivariant to translations

goodfellow2016deep . Consequently, the linear transformation in a convnet is not fully connected; rather, it is a specialized sparse linear transformation, namely a convolution. In particular, for one-dimensional convolutions, the matrix has a Toeplitz structure, with entries repeated along each diagonal. This is similar to the well-known principle in physics that symmetries in a physical system can lead to simplifications of our physical model for that system (e.g., Bloch’s Theorem bloch1929quantenmechanik or Noether’s Theorem noether1918invariante ).

We can directly enforce translation symmetry on a quantum neural network model by making each layer in the quantum circuit translationally invariant. Concretely, consider the generator of a Gaussian unitary, . Suppose that this generator is translationally invariant, i.e., does not change if we map to . Then the symplectic matrix that results from this Gaussian unitary will have the form

(30)

where each is itself a Toeplitz matrix, i.e., a one-dimensional convolution (see Appendix B). The matrix can be seen as a special kind of convolution that respects the uncertainty principle: performing a convolution on the coordinates naturally leads to a conjugate convolution involving . The connection between translationally invariant Hamiltonians and convolutional networks was also noted in lin2017does .

Recurrent network.

This is a special-purpose neural network which is used widely for problems involving sequences graves2012supervised , e.g., time series or natural language. A recurrent network can be pictured as a model which takes two inputs for every time step . One of these inputs, , is external, coming from a data source or another model. The other input is an internal state , which comes from the same network, but at a previous time-step (hence the name recurrent). These inputs are processed through a neural network , and an output is (optionally) returned. Similar to a convolutional network, the recurrent architecture encodes translation symmetry into the weights of the model. However, instead of spatial translation symmetry, recurrent models have time translation symmetry. In terms of the network architecture, this means that the model reuses the same weights matrix and bias vector in every layer. In general, or are unrestricted, though more specialized architectures could also further restrict these.

This architecture generalizes straightforwardly to quantum neural networks, with the inputs, outputs, and internal states employing any of the data-encoding schemes discussed earlier. It is particularly well-suited to an optical implementation, since we can connect the output modes of a quantum circuit back to the input using optical fibres. This allows the same quantum optical circuit to be reused several times for the same model. We can reserve a subset of the modes for the data input and output channels, with the remainder used to carry forward the internal state of the network between time steps.

Figure 4: Machine learning problems and architectures explored in this work: A. curve fitting of functions is achieved through a multilayer network, with encoded through a position displacement on the vacuum and through a position homodyne measurement at output; B. credit card fraud detection using a hybrid classical-quantum classifier, with the classical network controlling the parameters of an input layer; C. image generation of the Tetris dataset from input displacements to the vacuum, with output image encoded in photon number measurements at the output mode; D. hybrid classical-quantum autoencoder for finding a continuous phase-space encoding for the first three Fock states.
Residual network.

The residual network he2016deep , or resnet, is a more recent innovation than the convolutional and recurrent networks. While these other models are special cases of feedforward networks, the resnet uses a modified network topology. Specifically, ‘shortcut connections,’ which perform a simple identity transformation, are introduced between layers. Using these shortcuts, the output of a layer can be added to its input. If a layer by itself would perform the transformation , then the corresponding residual network performs the transformation

(31)

To perform residual-type computation in a quantum neural network, we look back to Eq. (28), where a two-mode unitary was given which carries out the transformation

(32)

where is some desired non-Gaussian function. To complete the residual computation, we need to sum these two values together. This can be accomplished using the controlled-X (or ) gate gottesman2001encoding , which can be carried out with purely Gaussian operations, namely squeezing and beamsplitters strawberryfields_cxgate . Adding a gate after the transformation in Eq. (32), we obtain

(33)

which is a residual transformation. This residual transformation can also be carried out on arbitrary wavefunctions in superposition, giving the general mapping

(34)

Iv Numerical Experiments

We showcase the power and versatility of CV quantum neural networks by employing them in a range of machine learning tasks. The networks are numerically simulated using the Strawberry Fields software platform killoran2018strawberry

and the Quantum Machine Learning Toolbox app which is built on top of it. We use both automatic differentiation with respect to the quantum gate parameters, which is built into Strawberry Fields’ TensorFlow

abadi2016tensorflow quantum circuit simulator, as well as numerical algorithms to train these networks. Automatic differentiation techniques allow for a direct use of established optimization algorithms based on stochastic gradient descent. On the other hand, numerical techniques such as the finite-difference method or Nelder-Mead will allow training of hardware-based implementations of quantum neural networks.

Figure 5: Experiment A. Curve fitting with continuous-variable quantum neural networks. The networks consist of six layers and were trained for 2000 steps with a Hilbert-space cutoff dimension of 10. As examples, we consider noisy versions of the functions , , and

, displayed respectively from left to right. We set a standard deviation of

for the noise. The training data is shown as red circles. The outputs of the quantum neural network for the test inputs are shown as blue crosses. The outputs of the circuit very closely resemble the noiseless ground truth curves, shown in green.

We study several tasks in both supervised and unsupervised settings, with varying degrees of hybridization between quantum and classical neural networks. Some cases employ both classical and quantum networks whereas others are fully quantum. The architectures used are illustrated in Fig. 4. Unless otherwise stated, we employ the Adam optimizer kingma2014adam to train the networks and we choose the Kerr gate as the non-Gaussian gate in the quantum networks. Our results highlight the wide range of potential applications of CV quantum neural networks, which will be further enhanced when deployed on dedicated hardware which exceeds the current limitations imposed by classical simulations.

iv.1 Training quantum neural networks

A prototypical problem in machine learning is curve fitting: learning a given relationship between inputs and outputs. We will use this simple setting to analyze the behaviour of CV quantum neural networks with respect to different choices for the model architecture, cost function, and optimization algorithm. We consider the simple case of training a quantum neural network to reproduce the action of a function on one-dimensional inputs , when given a training set of noisy data. This is summarized in Fig. 4(a). We encode the classical inputs as position-displaced vacuum states , where is the displacement operator and is the single-mode vacuum. Let be the output state of the circuit given input . The goal is to train the network to produce output states whose expectation value for the quadrature operator is equal to , i.e., to satisfy the relation for all .

To train the circuits, we use a supervised learning setting where the training and test data are tuples

for values of

chosen uniformly at random in some interval. We define the loss function as the mean square error (MSE) between the circuit outputs and the desired function values

(35)

To test this approach in the presence of noise in the data, we consider functions of the form where

is drawn from a normal distribution with zero mean and standard deviation

. The results of curve fitting on three noisy functions are illustrated in Fig. 5.

Avoiding overfitting.

Ideally, the circuits will produce outputs that are smooth and do not overfit the noise in the data. CV quantum neural networks are inherently adept at achieving smoothness because quantum states that are close to each other cannot differ significantly in their expectation value with respect to observables. Quantitatively, Hölder’s inequality states that for any two states and it holds that

(36)

for any operator . This smoothness property of quantum neural networks is clearly seen in Fig. 5, where the input/output relationship of quantum circuits gives rise to smooth functions that are largely immune to the presence of noise, while still being able to generalize from training to test data. We found that no regularization mechanism was needed to prevent overfitting of the problems explored here.

Improvement with depth.

The circuit architecture is defined by the number of layers, i.e., the circuit depth. Fig. 6 (top) studies the effect of the number of layers on the final value of the MSE. A clear improvement for the curve fitting task is seen for up to six layers, at which point the improvements saturate. The MSE approaches the square of the standard deviation of the noise, , as expected when the circuit is in fact reproducing the input-output relationship of the noiseless curve.

Figure 6: MSE as a function of the number of layers and as a function of photon loss. The plots correspond to the task of fitting the function in the interval . (Top) Increasing the number of layers is helpful until a saturation point is reached with six layers, after which little improvement is observed. (Bottom) The networks can be resilient to imperfections, as seen by the fact that only a slight deviation in the mean square error appears for losses of 10% in each layer. The fits with a photon loss coefficient of 10% and 30% are shown in the inset.
Quantum device imperfections.

We also study the effect of imperfections in the circuit, which for photonic quantum computers is dominated by photon loss. We model this using a lossy bosonic channel, with a loss parameter . Here stands for perfect transmission (no photon loss). The lossy channel acts at the end of each individual layer, ensuring that the effect of photon loss increases with circuit depth. For example, a circuit with six layers and loss coefficient experiences a total loss of . The effect of loss is illustrated in Fig. 6 (bottom) where we plot the MSE as a function of . The quality of the fit exhibits resilience to this imperfection, indicating that the circuit learns to compensate for the effect of losses.

Optimization methods.

We also analyze different optimization algorithms for the sine curve-fitting problem. Fig. 7 compares three numerical methods and two methods based on automatic differentiation. Numerical SGD approximates the gradients with a finite differences estimate. Nelder-Mead is a gradient-free technique, while the sequential least-squares programming (SLSQP) method solves quadratic subproblems with approximate gradients. These latter two converge significantly slower, but can have advantages in smoothness and speed per iteration. The Adam optimizer with adaptive learning rate performed better than vanilla SGD in this experiment.

Figure 7: Loss function for the different optimizers mentioned in the text.
Penalties and regularization.

In the numerical simulations of quantum circuits, each qumode is truncated to a given cutoff dimension in the infinite-dimensional Hilbert space of Fock states. During training, it is possible for the gate parameters to reach values such that the output states have significant support outside of the truncated Hilbert space. In the simulation, this results in unnormalized output states and unreliable computations. To address this issue, we add a penalty to the loss function that penalizes unnormalized quantum states. Given a set of output states , we define the penalty function

(37)

where is a projector onto the truncated Hilbert space of the simulation. This function penalizes unnormalized states whose trace is different to one. The overall cost function to be minimized is then

(38)

where

is a user-defined hyperparameter.

An alternate approach to the trace penalty is to regularize the circuit parameters that can alter the energy of the state, which we refer to as the active parameters. Fig. 8 compares optimizing the function of Eq. (37) without any penalty (first column from the left), imposing an L2 regularizer (second column), using an L1 regularizer (third column), and using the trace penalty (fourth column). Without any strategy to keep the parameters small, learning fails due to unstable simulations: the trace of the state drops in fact to . Both regularization strategies as well as the trace penalty manage to bring the loss function to almost zero within a few steps while maintaining the unit trace of the state. However, there are interesting differences. While L2 regularization decreases the magnitude of the active parameters, L1 regularization dampens all but two of them. The undamped parameters turn out to be the circuit parameters for the nonlinear gates in layer and , a hint that these nonlinearities are most essential for the task. The trace penalty induces heavy fluctuations in the loss function for the first steps, but finds parameters that are larger in absolute value than those found by L2 regularization, with a lower final loss.

Figure 8: Cost function and circuit parameters during 60 steps of stochastic gradient descent training for the task of fitting the sine function from Fig. 6. The active parameters are plotted in orange, while all others are plotted in purple. As hyperparameters, we used an initial learning rate of which has an inverse decay of , a penalty strength , a regularization strength of , batch size of , a cutoff of 10 for the Hilbert-space dimension, and randomly chosen but fixed initial circuit parameters.

iv.2 Supervised learning with hybrid networks

Classification of data is a canonical problem in machine learning. We construct a hybrid classical-quantum neural network as a classifier to detect fraudulent transactions in credit card purchases. In this hybrid approach, a classical neural network is used to control the gate parameters of the quantum network, the output of which determines whether the transactions are classified as genuine or fraudulent. This is illustrated in Fig. 4(b).

Data preparation.

For the experiment, data was taken from a publicly available database of labelled historical credit card transactions which are flagged as either fraudulent or genuine dal2015calibrating . The data is composed of features derived through a principal component analysis of the raw data, providing an anonymization of the transactions. Of the provided transactions, only are fraudulent. We create training and test datasets by splitting the fraudulent transactions in two and combining each subset with genuine transactions. For the training dataset, we undersample the genuine transactions by randomly selecting them so that they outnumber the fraudulent transactions by a ratio of . This undersampling is used to address the notable asymmetry in the number of fraudulent and genuine transactions in the original dataset. The test dataset is then completed by adding all the remaining genuine transactions.

Hybrid network architecture.

The first section of the network is composed of a series of classical fully connected feedforward layers. Here, an input layer accepts the first features. This is followed by two hidden layers of the same size and the result is output on a layer of size . An exponential linear unit (ELU) was used as the nonlinearity. The second section of our architecture is a quantum neural network consisting of two modes initially in the vacuum. An input layer first operates on the two modes. The input layer omits the first interferometer as this has no effect on the vacuum qumodes. This results in the layer being described by free parameters, which are set to be directly controlled by the output layer of the classical neural network. The input layer then feeds onto four hidden layers with fully controllable parameters, followed by an output layer in the form of a photon number measurement. An output encoding is fixed in the Fock basis by post-selecting on single-photon outputs and associating a photon in the first mode with a genuine transaction and a photon in the second mode with a fraudulent transaction.

Training.

To train the hybrid network, we perform SGD with a batch size of 24. Let be the probability that a single photon is observed in the mode corresponding to the correct label for the input transaction. The cost function to minimize is

(39)

where is the probability of the single photon being detected in the correct mode on input . The probability included in the cost function is not post-selected on single photon outputs, meaning that training learns to output a useful classification as often as possible. We perform training with a cutoff dimension of 10 in each mode for approximately batches. Once trained, we use the probabilities post-selected on single photon events as classification, which could be estimated experimentally by averaging the number of single-photon events occurring across a sequence of runs.

Figure 9:

Experiment B. (Left) Confusion matrix for the test dataset with a threshold probability of

. (Right) Receiver operating characteristic (ROC) curve for the test dataset, showing the true negative rate against the false negative rate as a parametric plot of the threshold probability. Here, the ideal point is given by the circle in the top-left corner, while the triangle denotes the closest point to optimal among chosen thresholds. This point corresponds to the confusion matrix given here, with threshold .
Model performance.

We test the model by choosing a threshold probability required for transactions to be classified as genuine. The confusion matrix for a threshold of is given in Fig. 9. By varying the classification threshold, a receiver operating characteristic (ROC) curve can be constructed, where each point in the curve is parametrized by a value of the threshold. This is shown in Fig. 9, where the true negative rate is plotted against the false negative rate. An ideal classifier has a true negative rate of and a false negative rate of , as illustrated by the circle in the figure. Conversely, randomly guessing at a given threshold probability results in the dashed line in the figure. Our classifier has an area under the ROC curve of , compared to the optimal value of .

For detection of fraudulent credit card transactions, it is imperative to minimize the false negative rate (bottom left square in the confusion matrix of Fig. 9), i.e., the rate of misclassifying a fraudulent transaction as genuine. Conversely, it is less important to minimize the false positive rate (top right square) – these are the cases of genuine transactions being classed as fraudulent. Such cases can typically be addressed by sending verification messages to cardholders. The larger false positive rate in Fig. 9 can also be attributed to the large asymmetry between the number of genuine and fraudulent data points.

The results here illustrate a proof-of-principle hybrid classical-quantum neural network able to perform classification for a problem of genuine practical interest. While it is simple to construct a classical neural network to outperform this hybrid model, our network is restricted in both width and depth due to the need to simulate the quantum network on a classical device. It would be interesting to further explore the performance of hybrid networks in conjunction with a physical quantum computer.

iv.3 Generating images from labeled data

Figure 10: Experiment C. Ouput images for the ‘LOTISJZ’ tetromino image data. The top row shows the output two-mode states where the intensity of the pixel in the th row and th column is proportional to the probability of finding photons in the first mode and photons in the second mode. The bottom row is a close-up in the image Hilbert space of up to 3 photons, renormalized with respect to the probability of projecting the state onto that subspace. In other words, this row illustrates the states of Eq. (44). The fidelities of the output states with respect to the desired image states are respectively 99.0%, 98.6%, 98.6%, 98.1%, 98.0%, 97.8%, and 98.8% for an average fidelity of 98.4%. The probabilities of projecting the state onto the image space of at most three photons are respectively 5.8%, 36.0%, 21.7%, 62.1%, 40.7%, 71.3%, and 5.6% .

Next, we study the problem of training a quantum neural network to generate quantum states that encode grayscale images. We consider images of pixels specified by a matrix whose entries indicate the intensity of the pixel on the th row and th column of the picture. These images can be encoded into two-mode quantum states by associating each entry of the matrix with the coefficients of the state in the Fock basis:

(40)

where is a normalization constant. We refer to these as image states. The matrix coefficients are the probability amplitude of observing photons in the first mode and photons in the second mode. Therefore, given many copies of a state , the image can be statistically reconstructed by averaging photon detection events at the output modes. This architecture is illustrated in Fig. 4(c).

Image encoding strategy.

Given a collection of images , we fix a set of input two-mode coherent states . The goal is to train the quantum neural network to perform the transformation for all . Since the transformation is unitary, the Gram matrix of input and output states must be equal, i.e., it must hold that

(41)

for all .

In general, it is not possible to find coherent states that satisfy this condition for arbitrary collections of output states. To address this, we consider output states with support in regions of larger photon number and demand that their projection onto the image Hilbert space of at most photons in each mode coincides, modulo normalization, with the desired output states. Mathematically, if is the unitary transformation performed by the quantum neural network, the goal is to train the circuit to produce output states such that

(42)

where is a projector onto the Hilbert space of at most photons in each mode and is the probability of observing the state in the subspace defined by this projector. The quantum neural network therefore needs to learn not only how to transform input coherent states into image states, it must also learn to employ the additional dimensions in Hilbert space to satisfy the constraints imposed by unitarity. This approach still allows us to retrieve the encoded image by performing photon counting, albeit with a penalty of in the sampling rate.

As an example problem, we select a database of images corresponding to the seven standard configurations of four blocks used in the digital game Tetris. These configurations are known as tetrominos. For a fixed value of the parameter , the seven input states are set to

each of which must be mapped to the image state of a corresponding tetromino.

Figure 11: Experiment D. (Left) Learning a continuous phase-space encoding of the Fock states. The quantum decoder element of a trained classical-quantum autoencoder can be investigated by varying the displacement on the vacuum, which represents the chosen encoding method. The hybrid network has learned to encode the Fock states in different regions of phase space. This is illustrated by a contour plot showing, for each point in phase space, the largest fidelity between the output state for that displacement and the first three Fock states. The thin white circle represents a clipping applied to input displacements during training, i.e., so that no displacement can ever reach outside of the circle. The white circles at points , , and represent the input displacements leading to optimal fidelities with the , , and

Fock states, the white lines represent the lines interpolating these optimal displacements, and the white squares represent the halfway points. (Right) Visualizing the wavefunctions of output states. The top row represents the position wavefunctions of states with highest fidelity to

, , and , respectively. The bottom row represents the wavefunctions of states with intermediate displacements between the points corresponding to and , and , and , respectively. Each wavefunction is rescaled so that the maximum in absolute value is , while the axis denotes positions in the range .
Training.

We define the states

(43)
(44)

i.e., is the output state of the network and is the normalized projection of the output state onto the image Hilbert space of at most 3 photons in each mode. To train the quantum neural network, we define the cost function

(45)

where are the image states of the seven tetrominos, is the trace penalty as in Eq. (37) and we set . By choosing this cost function we are forcing each input to be mapped to a specific image of our choice. In this sense, we can view the images as labeled data of the form where the label specifies which input state they correspond to. We employed a network with 25 layers (see Fig. 4(c)) and fixed a cutoff of 11 photons in the numerical simulation, setting the displacement parameter of the input states to .

Model performance.

The resulting image states are illustrated in Fig. 10, where we plot the absolute value squared of the coefficients in the Fock basis as grayscale pixels in an image. Tetrominos are referred to in terms of the letter of they alphabet they resemble. We fixed the desired output images according to the sequence ‘LOTISJZ’ such that the first input state is mapped to the tetromino ‘L’, the second to ‘O’, and so forth.

Fig. 10 clearly illustrates the role of the higher-dimensional components of the output states in satisfying the constraints imposed by unitarity: the network learns not only how to reproduce the images in the smaller Hilbert space but also how to populate the remaining regions in order to preserve the pairwise overlaps between states. For instance, the input states and are nearly orthogonal, but the images of the ‘L’ and ‘O’ tetrominos have a significant overlap. Consequently, the network learns to assign a relatively small probability of projecting onto the image space while populating the higher photon sectors in orthogonal subspaces. Overall, the network is successful in reproducing the images in the space of a few photons, precisely as it was intended to do.

iv.4 Hybrid quantum-classical autoencoder

In this example, we build a joint quantum-classical autoencoder (see Fig. 4(d)). Conventional autoencoders are neural networks consisting of an encoder network followed by a decoder network. The objective is to train the network to act as an identity operation on input data. During training, the network learns a restricted encoding of the input data – which can be found by inspecting the small middle layer which links the encoder and decoder. For the hybrid autoencoder, our goal is to find a continuous phase-space encoding of the first three Fock states , , and . Each of these states will be encoded into the form of displaced vacuum states, then decoded back to the correct Fock state form.

Model architecture.

For the hybrid autoencoder, we fix a classical feedforward architecture as an encoder and a sequence of layers on one qumode as a decoder, as shown in Fig. 4(d). The classical encoder begins with an input layer with three dimensions, allowing for any real linear combination in the subspace to be input into the network. The input layer is followed by six hidden layers of dimension five and a two-dimensional output layer. We use a fully connected model with an ELU nonlinearlity.

The two output units of the classical network are used to set the and components of a displacement gate acting on the vacuum in one qumode. This serves as a continuous encoding of the Fock states as displaced vacuum states. In fact, displaced vacuum states have Gaussian distributions in phase space, so the network has a resemblance to a variational autoencoder kingma2013auto . We employ a total of layers with controllable parameters. The goal of the composite autoencoder is to physically generate the Fock state originally input into the network. Once the autoencoder has been trained, by removing the classical encoder we are left with a method to generate Fock states by varying the displacement of the vacuum. Notably, there is no need to specify which displacement should be mapped to each Fock state: this is automatically taken care of by the autoencoder.

Training.

Our hybrid network is trained in the following way. For each of the Fock states , , and , we input the corresponding one-hot vectors , and into the classical encoder. Suppose that for an input the encoder outputs the vector . This is used to displace the vacuum in one mode, i.e., enacting with . The output of the quantum decoder is the quantum state , with the unitary resulting from the layers. We define the normalized projection

(46)

onto the subspace of the first three Fock states, with being the corresponding projector. As we have discussed previously, this allows the network to output the state probabilistically upon a successful projection onto the subspace. The objective is to train the network so that is close to , where closeness is measured using the fidelity . As before, we introduce a trace penalty and set a cost function given by

(47)

with for the regularization parameter. Additionally, we constrain the displacements in the input phase space to a circle of radius to make sure the encoding is as compact as possible.

Model performance.

After training, the classical encoder element can be removed and we can analyze the quantum decoder by varying the displacements applied to the vacuum. Fig. 11 illustrates the resulting performance by showing the maximum fidelity between the output of the network and each of the three Fock states used for training. For the three Fock states , , and , the best matching input displacements each lead to a decoder output state with fidelity of .

The hybrid network has learned to associate different areas of phase space with each of the three Fock states used for training. It is interesting to investigate the resultant output states from the quantum network when the vacuum is displaced to intermediate points between the three areas. These displacements can result in states that exhibit a transition between the Fock states. We use the wavefunction of the output states to visualize this transition. We plot on the right-hand side of Fig. 11 the output wavefunctions which give best fidelity to each of the three Fock states , , , respectively. Wavefunctions are also plotted for displacements which are the intermediate points between those corresponding to: and ; and ; and and , respectively. These plots illustrate a smooth transition between the encoded Fock states in phase space.

V Conclusions

We have presented a quantum neural network architecture which leverages the continuous-variable formalism of quantum computing, and explored it in detail through both theoretical exposition and numerical experiments. This scheme can be considered as an analogue of recent proposals for neural networks encoded using classical light shen2017deep , with the additional ingredient that we leverage the quantum properties of the electromagnetic field. Interestingly, as light-based systems are already used in communication networks (both classical and quantum), an optical CV neural network could be wired up directly to communication channels, allowing us to avoid the costly interconversion of classical and quantum information.

We have proposed variants for several well-known classical neural networks, specifically fully connected, convolutional, recurrent, and residual networks. We envision that in future work specialized neural networks will also be inspired purely from the quantum side. We have numerically analyzed the performance of quantum neural network models and demonstrated that they show promise in the tasks we considered. In several of these examples, we employed joint architectures, where classical and quantum networks are used together. This is another promising direction for future exploration, in particular given the current technological lead of classical computers and the expectation that near-term quantum hardware will be limited in size. The quantum part of the model can be specialized to process classically difficult parts of a larger computational to which it is naturally suited. In the longer term, as larger-scale quantum computers are built, the quantum component could take a larger role in hybrid models. Finally, it would be a fruitful research direction to explore the role that fundamental quantum physics concepts – such as symmetry, interference, entanglement, and the uncertainty principle – play in quantum neural networks more deeply.

Acknowledgements.
We thank Krishna Kumar Sabapathy, Haoyu Qi, Timjan Kalajdzievski, and Josh Izaac for helpful discussions. SL was supported by the ARO under the Blue Sky program.

Appendix A Linear interferometers

In this section, we derive Eq. (III.2) for the effect of a passive interferometer on the eigenstates . A simple expression for an eigenstate of the

quadrature with eigenvalue

can be found in Appendix 4 of Ref. barnett2002methods

(48)

where is the bosonic annihilation operator, and is the single mode vacuum state. The last expression is independent of any prefactors used to define the quadrature operator in terms of and .

This can be easily generalized to modes:

(49)

where now

(50)
(51)

and is the multimode vacuum state. Now consider a (passive) linear optical transformation

(52)
(53)

In general,

is an arbitrary unitary matrix,

. We will however restrict to have real entries and thus to be orthogonal. In this case, and hence .

We can now examine how the multimode state transforms under such a linear interferometer :

(54)

We can use the transformation in Eq. (52) to write

(55)

Now we use that to write the last expression as

(56)

Let us define the vector and, to match the notation of Eq. (III.2), the orthogonal matrix , in terms of which we find

(57)

Note that the output state is also a product state. This simple product transformation is a corollary of the elegant results of Ref. jiang2013mixing : “Given a nonclassical pure-product-state input to an -port linear-optical network, the output is almost always mode entangled; the only exception is a product of squeezed states, all with the same squeezing strength, input to a network that does not mix the squeezed and antisqueezed quadratures.” In our context the eigenstates are nothing but infinitely squeezed states and the fact that our passive linear optical transformation is orthogonal immediately implies that squeezed and antisqueezed quadratures are not mixed.

Appendix B Convolutional networks

In this section, we derive the connection between a translationally-invariant Hamiltonian and a Block Toeplitz symplectic transformation. The notion of translation symmetry and Toeplitz structure are both connected to one-dimensional convolutions. Two-dimensional convolutions, naturally appearing in image processing applications, are connected not with Toeplitz matrices, but with doubly block circulant matrices goodfellow2016deep . We will not consider this extension here, but the basic ideas are the same.

Suppose we have a Hamiltonian operator which generates a Gaussian unitary on modes. We are interested only in the matrix multiplication part of an affine transformation, i.e., does not generate displacements. Under these conditions, has to be quadratic in the operators ,

(58)

where each is an matrix. We will call the inner matrix in this equation . In the phase space picture, the symplectic transformation generated by is obtained via the rule serafini2017quantum

(59)

where is the symplectic form from Eq. (8).

We now fix to be translationally invariant, i.e., does not change under the transformation

(60)

where we have introduced the shift operator which maps and . We assume periodic boundary conditions on the modes, and , which allows us to represent translation as an orthogonal matrix: