Learning to Utilize Correlated Auxiliary Classical or Quantum Noise

06/08/2020 ∙ by Aida Ahmadzadegan, et al. ∙ University of Waterloo Perimeter Institute for Theoretical Physics 0

This paper has two messages. First, we demonstrate that neural networks can learn to exploit correlations between noisy data and suitable auxiliary noise. In effect, the network learns to use the correlated auxiliary noise as an approximate key to decipher its noisy input data. Second, we show that the scaling behavior with increasing noise is such that future quantum machines should possess an advantage. For a concrete example, we reduce the image classification performance of convolutional neural networks (CNNs) by adding noise of different amounts and quality to the input images. We then demonstrate that the CNNs are able to partly recover their performance if, along with each noisy image, they are given auxiliary noise that is correlated with the image noise. We analyze the scaling of a CNN ability to learn and utilize these noise correlations as the level, dimensionality, or complexity of the noise is increased. We thereby find numerical and theoretical indications that quantum machines, due to their efficiency in representing complex correlations, could possess a significant advantage over classical machines.



There are no comments yet.


page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We consider neural networks that are simultaneously fed noisy data as well as separate noise that is correlated with the noise in the data. We demonstrate that networks can learn to exploit these correlations to better tolerate the noise in the data, i.e., to improve their performance, for example, in classification tasks. In effect, the network learns to use the correlations to subtract some of the noise from the data. This new approach of ‘Utilizing Correlated Auxiliary Noise’ (UCAN), has potential applications in scenarios where the noise on the data and the auxiliary noise are of unintended origins, such as measurement errors, but also in scenarios where the noise is added intentionally, for example, for cryptographic purposes. In the latter case, the UCAN setup is essentially a generalized one-time-pad protocol

[1, 2, 3]

. The auxiliary noise then plays the role of an approximate key that is correlated with the exact key that is represented by the noise on the data. In effect, the network uses the approximate key represented by the auxiliary noise to partially decipher the noisy data, in order to thereby improve its performance, for example, in classifying the data. The novel UCAN approach is, therefore, not primarily concerned with traditional denoising, see, e.g.,

[4, 5, 6, 7, 8, 9, 10, 11], but is instead concerned with new opportunities for neural networks that arise in the event of the availability of correlated auxiliary noise.

To this end, we numerically and theoretically investigate, in particular, the scaling of the efficiency of the UCAN method as either the level of the noise, the dimensionality of the noise or the complexity of the noise are increased. We find that as the magnitude of the noise is increased, the efficiency of the UCAN approach increases. The efficiency becomes optimal in the regime where the magnitude of the noise is close to the threshold where the noise starts to overwhelm the network, i.e., where the performance of the network without UCAN would drop steeply. Further, we find that also as the dimensionality of the function space from which the noise is drawn is increased, the efficiency of the UCAN approach generally increases. Crucially, we also find that as the complexity of the noise is increased, the capacity of a neural network to use UCAN can easily be exhausted on classical computers.

As we will discuss on theoretical grounds, this offers a potential advantage for quantum computers over classical computers in UCAN-type applications. The advantage could arise from the ability of quantum computers to store, and quickly draw from, extraordinarily complex probability distributions, even when operating only on a relatively small number of qubits. As to the availability of correlated auxiliary quantum noise in practice, we note that such noise is generated, for example, in the process of decoherence. From this perspective, Quantum UCAN methods may, therefore, eventually yield a path from traditional, scripted quantum error correction, see e.g.,


, to machine-learned quantum error correction. On quantum computing, communication and cryptography, see, e.g.,

[13, 2, 3].

2 Application of the UCAN approach to CNNs

We begin with a concrete demonstration of UCAN on classical computers. While UCAN should be applicable to most neural network architectures, we here demonstrate the UCAN approach by applying it to image classification by convolutional neural networks (CNNs).

To this end, we choose the standard Fashion-MNIST pixel grey level image data set and we add around the image, by zero-padding, a rectangular rim of black pixels which we refer to as a ‘bezel’. We choose the bezel to be 6 pixels wide so that the number of pixels in the bezel around the image roughly matches the number of pixels in the image itself. We will refer to an image together with its bezel as a ‘panel’, which has pixels. First, we add noise only to the image part of the panels. The image classification performance of a CNN trained on these noisy panels correspondingly diminishes. We then examine to what extent the CNN can recover part of the noise-induced drop of its image classification performance when trained and tested with panels that possess noise on the image as well as noise on the bezel that is correlated with the noise on the image.

Concretely, we generate three sets of labeled data. One set, A, consists of the original set of labeled MNIST images, with the black bezel added. The second set of labeled data, B, consists of the same set of labeled images with noise added only to the images. The third set of labeled data, C, consists of the same set of labeled images but with noise added to both the images and their bezels, with the image noise and the bezel noise generated so as to be correlated.

We then train CNNs of identical architecture with the three sets of data and compare their image classification performance on the noisy images. We find that after the image classification performance drops from A to B, as expected, it increases again with C. This means that a CNN trained with noisy images with noisy bezel can outperform a CNN with the same architecture but trained on the noisy images with a noiseless bezel. This demonstrates that CNNs can be trained to use access to correlated noise on the bezel to improve their image classification performance by implicitly subtracting some of the noise from the image.

The amount of performance recovery from B to C, as a fraction of the initial performance drop from A to B, may be called the efficiency of the UCAN method in the case at hand. In our experiments we explored how this efficiency depends on the level of the noise as well as on the dimensionality and the complexity of the noise. We will now discuss how we generate these varying types of noise.

2.1 Method to generate correlated noise of varying level, dimensionality and complexity

Noise-to-Signal ratio.  We increase the noise level, i.e., the noise-to-signal ratio, by increasing the noise amplitude range relative to the amplitude range of the pixels of the clear image. The brightness values of the clear image are ranging in the interval from zero (black) to one (white). We therefore lift and compress the brightness values of the clear image (and bezel) pixels to a suitable smaller range so that after the noise (whose amplitudes are allowed to take positive and negative values) is added, the brightness values of the noisy image and bezel is ranging again between zero and one.

Dimension of the noise space.  In addition to varying the noise-to-signal ratio, we are also varying the dimension of the space from which the noise is drawn. The dimension of the space of panels of size is . We choose a set of

basis vectors in that space and we then generate the noise as a linear combination of these noise basis vectors with coefficients drawn from a Gaussian probability distribution. In order to explore the scaling of the efficiency of the UCAN approach when increasing the dimensionality of the vector space from which the noise is drawn, we find that choosing the number,

, of noise basis functions to be either or or suffices to show the trend. (As shown in Sec.3.2, the squares arise when constructing the basis functions as the product of an equal number of Fourier modes in the and directions.)

Noise complexity.  In order to vary the complexity of the noise, we choose the noise basis vectors such that the pixel pattern that they represent is either of low or high algorithmic complexity, i.e., such that it is either relatively easy or relatively hard to learn for a machine such as a neural network. On the notion of algorithmic complexity, see, e.g., [14]. In order to generate relatively low complexity noise, we choose as the basis vectors those pixel patterns that correspond to the first , or sine functions of the discrete Fourier sine transform of the full image with bezel. Recall that sine functions are of low algorithmic complexity as they can be generated by a short program. In order to generate relatively high complexity noise, we span the noise vector space using , or

basis vectors that correspond to pixel patterns that approximate white noise. Recall that white noise is algorithmically complex. Correspondingly, it should become harder for a CNN to learn and utilize the more complex noise. Indeed, as the experimental results discussed in Sec.

3 show, the level of noise complexity that we can achieve by the above noise generating method is sufficient to reach the limit of noise complexity that the network architecture which we use in our experiments can accommodate for the purpose of UCAN.

Fig.1 shows examples of panels of relatively low complexity noise drawn from noise spaces of increasing dimension. Fig.2 shows panels of a noisy image and bezel with increasing noise-to-signal ratios. The noise-to-signal ratio is increased until the image is no longer classifiable by human perception. In the experiments, we increase the noise-to-signal ratio until the networks classify no better than chance.

Quantum perspective.  In Sec.4

we will show that the above method for generating noise can also be viewed as a classical simulation of the quantum noise generated by a quantum system such as a quantum field in a suitable quantum state. For example, the generating of low-complexity noise as a linear combination of Fourier sine functions with Gaussian distributed coefficients can be viewed as accurately simulating the vacuum fluctuations of a bandlimited Klein-Gordon (KG) quantum field in two dimensions. On quantum field fluctuations, see, e.g.,

[15, 16, 17, 18, 19, 20, 21, 22, 23]. In Sec.4, we will discuss the scaling of the noise complexity that could be achieved in the UCAN context through the use of generic highly entangled states on a quantum machine.

Figure 1: Examples of low complexity noise panels drawn from noise spaces of dimensions , , and respectively.
Figure 2: a) Processed image from Fashion-MNIST dataset with added 6 pixel-width initially black bezel. All pixels are rescaled to an interval of (therefore the grey bezel has pixel values of ). Finally, noise of amplitude interval is added to the image. Panels b) to f) show examples where 30%, 50%, 70%, 85%, and 95% of the panel is noise, respectively.

3 Experiments

3.1 Experimental setup

In this section, we detail our implementation of the new UCAN scheme in convolutional neural networks. We use the slightly modified version of CNN architecture given in [24] which contains three convolutional layers and two fully connected layers. Full details regarding our network architecture, training, and evaluation are provided in the supplementary materials.

In brief, our training data sets are generated from the set of labeled pixel Fashion-MNIST images [25]. The data set contains 10 different types of fashion items, i.e., if a CNN performs at the level of 10% accuracy then it classifies no better than chance. The data set consists of 10k test images and 60k training images which we divided into a 50k training and a 10k validation set. The Fashion-MNIST images are in grey scale with the pixel values originally ranging between 0 and 255, including both bounding values. We re-scale these values to the interval .

To obtain our data sets of type A, we add to the images a black bezel of 6 pixel width by zero-padding. We obtain data sets of type B by adding noise only to the image, as, e.g., in Fig.2a, and data sets of type C by adding noise to both the image and the bezel, as, e.g., in Figs.2b-f.

3.2 Generating low and high complexity noise

In order to generate noise with relatively low algorithmic complexity, we construct a basis of the noise space by using the orthogonal sine functions of the Fourier sine transform of functions defined on the square :


Here, is a pair of positive integers that label the choice of basis function. Each basis function yields a panel, , by evaluating the basis function on the grid of integers: . Each such panel serves as a basis vector in the space of panels from which we draw the noise. In order to avoid needlessly small amplitudes near the boundary (due to the vanishing of all sines there), we choose slightly larger than , at . We then generate each noise panel, say , as a random linear combination of the basis panels obtained from the first sine functions in the and directions. The pixel values of are


where we choose the coefficients from Gaussian probability distributions111We choose the width of the Gaussians to be . This choice lessens the probability of large amplitudes of the coefficients of sines of short wavelength, leading to a pink noise spectrum. We choose these Gaussian distributions since, as discussed in detail in the supplementary materials, this choice also happens to exactly match the statistics of the quantum vacuum fluctuations of a neutral scalar Klein-Gordon quantum field.. Examples of noise panels drawn from noise spaces of different dimensions, , are shown in Fig.1. Since we have 60k training and 10k test images, we need 70k such noise panels to add to our total 70k Fashion-MNIST dataset. In order to study the effect of increasing the dimension of the noise space on the performance of the network, we create three such data sets of 70k panels each, with the noise space of dimensions , , and , respectively.

In order to generate noise with high algorithmic complexity, we proceed exactly as above, except that we use as the basis of the space of noise panels not sine functions but instead panels of fixed approximate white noise. Each of the basis noise panels is generated by drawing for each pixel its grey level from a normal distribution. For later reference, let us note here that the so-obtained basis noise panels are generally not orthogonal, unlike the sine based base noise panels. The 70k noise panels are then generated each as a linear combination of these basis noise panels, with coefficients drawn from a Gaussian probability distribution and truncated so that the grey levels of the noise panel is in the range of

. Analogous to the case of relatively low complexity noise, we generate also the sets of relatively high complexity noise panels by linearly combining, with Gaussian-distributed random coefficients, either , , or basis noise panels.

3.3 Experimental results

The experimental results, i.e., the performances of our convolutional neural networks as a function of the level, dimensionality and complexity of the noise are shown in Fig.3. The -axis indicates the performance of the CNN and the -axis denotes increasing levels of noise. The left panel, Fig.3a, shows the performance for noise of relatively low computational complexity, i.e., noise arising as linear combinations of basis noise panels that represent sine functions. The right panel, Fig.3b, shows the performance for noise of high computational complexity, i.e., for the noise that arises as linear combinations of basis noise panels that each represent approximate white noise. The blue, green, and red curves in Fig.3a,b represent the choice of or dimensions for the space of noise panels.

The dashed lines represent the performance of the CNN on the data sets with the noise only on the image while the solid lines represent the performance of the CNN with the noise both on the image and on the bezel. Each data point has been calculated multiple times and the mean value together with its standard deviation in the form of error bars is plotted. The error bars on Fig.

3b are there but they are small, as we will discuss below.

Figure 3: The test accuracy, i.e., the performance of the network on the test dataset, as a function of the noise-to-signal ratio. Notice that, since there are 10 different fashion items, a success rate of 10% indicates that the network classifies at a rate that is equal to pure chance.

We begin our analysis of the experimental data with the observation that the curves show that, as the level of noise increases, the performance generally drops. In addition, we notice that on the noise-to-signal ratio axis, there are well-defined ‘cliffs’ where the performance sharply drops to the level of 10% and the network is no longer able to learn to classify better than chance. We also see that the performance drops as the dimensionality of the noise space is increased, i.e., from blue to green to red. As the complexity of the noise is increased, namely from Fig.3a to 3b, the performance also drops - except for the red curves, i.e., except if the dimension of the noise space is highest. We will discuss this exception further below.

The most crucial observation, however, is that all the solid lines are above the dashed lines. This means that the CNNs were able to improve their performance due to UCAN, i.e., when they are given access to correlated noise on the bezel. In particular, we see in Fig.3a that the efficiency of UCAN, i.e., the gap between the dashed and solid lines of equal color, increases with increasing noise level. Most importantly, we observe that the cliff at which the performance of the network drops sharply is at a higher noise level for the solid lines, with UCAN, than it is for the dashed lines, without UCAN, i.e., without noise on the bezel. Concretely, we observe that there exists a special regime of noise-to-signal levels, here in Fig.3a around . At that level of noise, a CNN without UCAN (dashed lines) cannot learn at all, i.e., its performance drops to 10%, which is the performance level of pure chance. At the same level of noise, however, a CNN of the same architecture but with access to correlated auxiliary noise on the bezel (solid lines) learns to perform considerably well, here with performance levels from about 40% to about 90%, depending on the dimension of the noise space. The upshot is that UCAN possesses its highest efficiency in the regime of such high noise-to-signal ratio, where the network without UCAN starts to fail to learn at all.

It is intuitive that the efficiency of UCAN is best in regimes of high levels of noise. This is because UCAN in effect reduces the network’s rate of those misclassifications that are due to noise while, at low noise levels, most misclassifications of a CNN are not primarily due to noise. However, we can only expect the efficiency of UCAN to increase with increasing noise as long as the capacity of the network suffices to learn to utilize the correlations in the noise. Indeed, our experiments showed that the networks struggled to achieve UCAN efficiency in the regime of high noise complexity: in Fig.3b, the solid lines are barely above the dashed lines. This demonstrates that the UCAN approach can quickly exhaust a classical network’s capacity. In Sec.4, we will come back to this point in our discussion of the prospect of UCAN on quantum machines, which should possess a much higher capacity to represent complex correlations.

Let us now also discuss why the error bars on Fig.3b are smaller than those on Fig.3a. Superficially, the reason is that the performance of the CNNs was more uniform in the case of the high complexity noise on the right. We conjecture the reason to be that the network, when trained on the low complexity noise data, succeeded to learn, to a varying extent, the algorithmically relatively simple long distance correlations between bezel and image noise that are due to the algorithmically relatively simple nature of the sine functions. In contrast, the CNNs appear to have consistently struggled to learn any correlations between the bezel and image noise in the case of relatively high noise complexity.

Finally, let us discuss why the red curves are higher in Fig.3b than in Fig.3a. We expect the reason to be that the -dimensional noise space on the left is spanned by sine functions that are orthogonal while -dimensional noise space on the right is spanned by random white noise panels that are at random angels to another. This means that the noise space is more uniformly sampled for the red curves on the left than for those on the right, which leads to more predictability of the noise on the right and therefore to an advantage for the CNNs on the right. This phenomenon arises only for high-dimensional noise spaces where the directions of random basis vectors start crowding together.

3.4 Correlation analysis

So far, we discussed the efficiency of the UCAN method as a function of the the noise-to-signal-ratio, the noise dimensionality and the algorithmic complexity of the noise. We are now ready to discuss the performance of the UCAN method in terms of the correlations between the noise on the bezel and the noise on the image.

We begin by noting that, since uncorrelated noise is of no use for UCAN, we chose all of the noise in our experiments to be perfectly correlated between the bezel and the image. If the noise on the bezel was known then, in principle, the noise on the image could be perfectly inferred. To see this, let us consider the simple case where the noise space is one dimensional, i.e., where all noise panels are a multiple of just one basis noise panel. In this case, knowing one pixel value anywhere, for example on the bezel, would imply knowing the noise everywhere. More generally, if the noise space is chosen to be -dimensional, then knowledge of the grey level values of any pixels, e.g., bezel pixels, if the bezel has enough pixels, allows one to infer the noise everywhere, namely by solving a linear system of equations. Since the largest dimension of the noise space that we considered is , while the bezel possesses a larger number of pixels, namely , it is always possible to determine the noise on the image from the noise on the bezel, in principle. However, for a network to infer the image noise from the bezel noise, it would first need to determine the exact noise space. One challenge for the network is that while it is trained with a clear view of the noise on the bezel, its view of the noise on the images is obscured by the presence of the images.

More importantly, some noise spaces are easier for a network to learn than others. For example, if the noise space is one dimensional, then the network needs to learn only one noise basis panel. If this panel is simple, e.g., if the grey level values follow a sine wave, then the panel is easier to learn than when the noise basis panel is of high algorithmic complexity, such as a panel of white noise. The challenge to the network increases as the dimensionality of the noise space is increased. Experimentally, as is clear when comparing Fig.3a and Fig.3b, it is indeed easy to overwhelm the network’s limited capacity to benefit from UCAN by using basis noise panels of high algorithmic complexity.

Our experiments have been limited, so far, to UCAN applied to CNNs for image classification. It should be very interesting to apply UCAN to other network architectures whenever auxiliary correlated noise is naturally available or can usefully be added, e.g., as the case may be, with RNNs for signal processing, or autoencoders for denoising.

Independently of which suitable neural network architecture the UCAN method is applied to, we are led to conjecture that the efficiency of the UCAN method tends to increase as the amount of noise increases, and that the efficiency of UCAN is highest when the noise reaches the level at which the network without UCAN would start to fail to learn. We are also led to conjecture that when UCAN is applied to any neural network architecture, then even perfect correlations between the noise on the input signal and the auxiliary noise can easily be made sufficiently complex to exhaust the capacity of the network to learn to utilize these correlations.

To support this conjecture, let us discuss to what extent the complexity of the noise could be increased. For example, in the case of the CNNs that we studied here, the noise panels do not need to be generated in the way we did, i.e., by linearly combining noise basis panels with independently distributed coefficients. Instead, in principle, the noise panels could be drawn from any probability distribution over the manifold , i.e., over the -dimensional unit cube. Even if the pixel values are restricted to be or , a generic and therefore highly complex probability distribution would require the specification of coefficients. This confirms, in this example, that even if the noise that occurs in practical applications of UCAN is manageable for a suitable network, the complexity of the noise could easily be increased to exceed any network’s ability to learn or store or draw from its probability distribution, at least if running on a classical machine.

4 Highly complex noise correlations and quantum UCAN

Quantum machines may offer advantages for neural networks using UCAN, especially in the regime of highly complex noise, as a quantum machine with qubits can store probability distributions described by a -dimensional Hilbert space. Also, while classically it is generally prohibitively expensive to draw from high-dimensional probability distributions, quantum machines allow one to easily draw from any of its states’ probability distribution, through measurement. Further, the quantumness provides a source of true randomness rather than approximate randomness, as the violation of Bell inequalities shows, see, e.g., [26, 27]. In addition, the quantum mechanical Hilbert space of probability distributions is richer than that of classical distributions, due to the additional dependence on the choice of measured observables. This in turn allows entangled states that violate Bell-type inequalities to describe correlations, say between noise on data and auxiliary noise, that could not arise from localized classical dynamics.

Illustrating the generality of the quantum perspective is the fact that the noise that we used for our experiments could also be generated by a quantum system, namely by a -dimensional neutral massless Klein-Gordon quantum field (which is similar to one polarization of the quantized electromagnetic field) discretized to a grid. As we describe in detail in the supplementary materials, the low complexity noise that is generated using sinusoidal noise panels statistically exactly matches the quantum fluctuations of the amplitudes of the Klein-Gordon field in the vacuum state with a bandlimit [19, 20, 21, 22, 23] determined by the dimension of the noise space. The entanglement entropy in the vacuum state obeys an area law and is correspondingly low, consistent with the fact that the noise here is of low algorithmic complexity. For the relationship between Shannon entropy (here in the form of von Neumann entropy) and algorithmic complexity, see, e.g., [14]. The statistics of the high complexity noise generated using white noise panels also matches the field’s fluctuations, namely if the field’s state is a suitable superposition of field-amplitude eigenstates.

While a quantum field can, therefore, generate the noise that we considered in our experiments, it is also capable to generate noise of extremely higher complexity. In fact, it is known [28], that any generic pure state is close to being maximally entangled and therefore possesses close to maximum entanglement entropy between two equal size partitions of the system, such as here the bezel and the image. The almost maximal von Neumann entropies of the noise on the data and the auxiliary noise then imply a correspondingly almost maximal algorithmic complexity of the noise, illustrating the ability of quantum systems to efficiently store and draw from truly highly complex probability distributions.

5 Outlook

We now address the question of the potential availability of correlated classical or quantum auxiliary noise for future applications of UCAN, with a view especially on quantum technologies. In the literature, there are indeed a few examples of uses of auxiliary classical and quantum noise, although so far we know of none that is bringing the power of machine learning to bear, as is our proposal here.

For example, in the quantum energy teleportation (QET) protocol [29, 30], an agent invests energy into a local measurement of quantum noise and communicates the outcome to a distant agent who, on the basis of entanglement in the underlying medium or vacuum, uses this information to correspondingly interact with his local quantum noise, enabling that agent to locally extract energy. Quantum energy teleportation has been generalized to aid in algorithmic cooling in quantum processors, [31, 32, 33]. Also, for example, in quantum optics, see, e.g., [34, 35], the technique of ghost imaging is based on utilizing what is effectively correlated auxiliary classical or quantum noise, see, e.g., [36, 37].

Further, it was shown in [30] that in communication through a quantum field, access to correlated auxiliary quantum noise is always available to the receiver, due to the ubiquitous entanglement in quantum fields and that, in principle, this auxiliary noise is usable to increase the channel capacity. This suggests that classical or quantum implementations of UCAN on quantum machines could be useful, for example, to improve the classical or quantum channel capacity within or between quantum processors or quantum memory. In this case, the quantum noise on the data and the correlated auxiliary quantum noise would arise from the quantum fluctuations of the quantum field that is used for the communication, such as the electromagnetic field, or a quantum field of collective excitations such as the effective phononic field of ion traps [38]. In the context of superconducting qubits, see in particular also, e.g., [39].

Generally, any process of decoherence can be viewed as a process that creates correlated auxiliary quantum noise in the environment. This suggests, as we indicated in the introduction, that classical or quantum or hybrid neural networks may be able to learn to undo some of the deleterious effects of decoherence. If we can use our limited experimental results of above as guidance, we expect that a UCAN approach to machine-learned quantum error correction (as compared to the traditional, scripted approach to quantum error correction) is most efficient in the nonperturbative regime of relatively strong noise or strong decoherence.

While the implementation of a UCAN approach in classical neural networks should be relatively straightforward whenever correlated auxiliary noise is available or useful, the field of quantum machine learning (QML) is still relatively new, see, e.g., [40, 41, 42, 43, 44, 45, 46, 47, 48, 49]. It should be very interesting to explore how some of these exciting new QML architectures could be applied in order to develop applications of UCAN with quantum neural networks.

Broader impact

We believe that the novel UCAN method presented here is not intrinsically prone to biases and therefore we do not anticipate negative societal impacts. The novel UCAN method for machine learning could help the performance of suitable quantum computing technologies surpass that of classical computers.

This research was enabled in part by support provided by Compute Canada (www.computecanada.ca). AK is acknowledging support through the Discovery Program of the National Science and Engineering Research Council of Canada (NSERC), support through the Discovery Project Program of the Australian Research Council (ARC) and through two Google Faculty Research Awards. PS acknowledges the support of the NSERC Canada Graduate Scholarship. ML acknowledges support through the Canada Research Chair and Discovery programs of NSERC.


  • [1] Steven M. Bellovin. Frank miller: Inventor of the one-time pad. Cryptologia, 35(3):203–222, 2011.
  • [2] Stefano Pirandola, Ulrik L. Andersen, Leonardo Banchi, Mario Berta, et al. Advances in quantum cryptography. arXiv:1906.01645, 2019.
  • [3] Alexander V. Sergienko. Quantum communications and cryptography. CRC press, 2018.
  • [4] Viren Jain and Sebastian Seung. Natural image denoising with convolutional networks. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 769–776. Curran Associates, Inc., 2009.
  • [5] Marco Perez-Cisneros, Cătălina Cocianu, and Alexandru Stan. Neural architectures for correlated noise removal in image processing. Mathematical Problems in Engineering, 2016:6153749, 2016.
  • [6] Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. Imagenet classification with deep convolutional neural networks. Neural Information Processing Systems, 25, 01 2012.
  • [7] Christopher M. Bishop.

    Neural networks for pattern recognition

    Oxford University Press, Inc., USA, 1995.
  • [8] Sudipta S. Roy, Mahtab Ahmed, and M. A. H. Akhand. Classification of massive noisy image using auto-encoders and convolutional neural network. In 2017 8th International Conference on Information Technology (ICIT), pages 971–979, 2017.
  • [9] Tiago Nazaré, Gabriel De Barros Paranhos da Costa, Welinton Contato, and Moacir Ponti. Deep Convolutional Neural Networks and Noisy Images, pages 416–424. 01 2018.
  • [10] Pascal Vincent, Hugo Larochelle, Y. Bengio, and Pierre-Antoine Manzagol.

    Extracting and composing robust features with denoising autoencoders.

    pages 1096–1103, 01 2008.
  • [11] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre-Antoine Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res., 11:3371–3408, 2010.
  • [12] Simon J Devitt, William J Munro, and Kae Nemoto. Quantum error correction for beginners. Rep. Prog. Phys., 76(7):076001, 2013.
  • [13] Laszlo Gyongyosi and Sandor Imre. A survey on quantum computing technology. Computer Science Review, 31:51 – 71, 2019.
  • [14] Ming Li and Paul Vitányi. An introduction to Kolmogorov complexity and its applications. Texts in Computer Science. Springer International Publishing, 4th edition, 2019.
  • [15] Andrew R. Liddle and David H. Lyth. Cosmological inflation and large-scale structure. Cambridge university press, 2000.
  • [16] Viatcheslav Mukhanov. Physical foundations of cosmology. Cambridge university press, 2005.
  • [17] Viatcheslav Mukhanov and Sergei Winitzki. Introduction to quantum effects in gravity. Cambridge university press, 2007.
  • [18] Nicholas D. Birrell and Paul C. W. Davies. Quantum fields in curved space. Cambridge Monographs on Mathematical Physics. Cambridge University Press, 1982.
  • [19] Achim Kempf, Gianpiero Mangano, and Robert B. Mann. Hilbert space representation of the minimal length uncertainty relation. Phys. Rev. D, 52:1108–1118, 1995.
  • [20] Jason Pye, William Donnelly, and Achim Kempf. Locality and entanglement in bandlimited quantum field theory. Phys. Rev. D, 92:105022, 2015.
  • [21] Achim Kempf. Fields over unsharp coordinates. Phys. Rev. Lett., 85:2873–2876, 2000.
  • [22] Aidan Chatwin-Davies, Achim Kempf, and Robert T. W. Martin. Natural covariant planck scale cutoffs and the cosmic microwave background spectrum. Phys. Rev. Lett., 119:031301, 2017.
  • [23] Achim Kempf, Aidan Chatwin-Davies, and Robert T. W. Martin. A fully covariant information-theoretic ultraviolet cutoff for scalar fields in expanding friedmann robertson walker spacetimes. J. Math. Phys., 54(2):022301, 2013.
  • [24] James Le. Fashion-MNIST. https://github.com/khanhnamle1994/fashion-mnist/commits?author=khanhnamle1994, 2018.
  • [25] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747, 2017.
  • [26] Gregor Weihs, Thomas Jennewein, Christoph Simon, Harald Weinfurter, and Anton Zeilinger. Violation of Bell’s inequality under strict Einstein locality conditions. Phys. Rev. Lett., 81:5039–5043, 1998.
  • [27] Wenjamin Rosenfeld, Daniel Burchardt, Robert Garthoff, Kai Redeker, Norbert Ortegel, Markus Rau, and Harald Weinfurter. Event-ready bell test using entangled atoms simultaneously closing detection and locality loopholes. Phys. Rev. Lett., 119:010402, 2017.
  • [28] Don N. Page. Information in black hole radiation. Phys. Rev. Lett., 71:3743–3746, 1993.
  • [29] Masahiro Hotta. A protocol for quantum energy distribution. Phys. Lett. A, 372(35):5671 – 5676, 2008.
  • [30] Koji Yamaguchi, Aida Ahmadzadegan, Petar Simidzija, Achim Kempf, and Eduardo Martín-Martínez. Superadditivity of channel capacity through quantum fields. Phys. Rev. D, 101:105009, 2020.
  • [31] P. Oscar Boykin, Tal Mor, Vwani Roychowdhury, Farrokh Vatan, and Rutger Vrijen. Algorithmic cooling and scalable NMR quantum computers. Proceedings of the National Academy of Sciences, 99(6):3388–3393, 2002.
  • [32] Nayeli A. Rodríguez-Briones, Jun Li, Xinhua Peng, Tal Mor, Yossi Weinstein, and Raymond Laflamme. Heat-bath algorithmic cooling with correlated qubit-environment interactions. New J. Phys., 19(11):113047, 2017.
  • [33] Nayeli A. Rodríguez-Briones, Eduardo Martín-Martínez, Achim Kempf, and Raymond Laflamme. Correlation-enhanced algorithmic cooling. Phys. Rev. Lett., 119:050502, 2017.
  • [34] Daniel F. Walls and Gerard J. Milburn. Quantum optics. Springer Science & Business Media, 2007.
  • [35] Hans-Albert Bachor and Timothy C. Ralph. A guide to experiments in quantum optics, volume 1. Wiley Online Library, 2004.
  • [36] Todd B. Pittman, Y. H. Shih, D. V. Strekalov, and Alexander V. Sergienko. Optical imaging by means of two-photon quantum entanglement. Phys. Rev. A, 52:R3429–R3432, 1995.
  • [37] Nicholas Bornman, Megan Agnew, Feng Zhu, Adam Vallés, Andrew Forbes, and Jonathan Leach. Ghost imaging using entanglement-swapped photons. npj Quantum Inf., 5(1):63, 2019.
  • [38] Colin D. Bruzewicz, John Chiaverini, Robert McConnell, and Jeremy M. Sage. Trapped-ion quantum computing: Progress and challenges. Appl. Phys. Rev., 6(2):021314, 2019.
  • [39] Murphy Yuezhen Niu, Vadim Smelyanskyi, Paul Klimov, Sergio Boixo, Rami Barends, Julian Kelly, and et al. Learning non-Markovian quantum noise from moiré-enhanced swap spectroscopy with deep evolutionary algorithm. arXiv:1912.04368, 2019.
  • [40] Marcello Benedetti, Erika Lloyd, Stefan Sack, and Mattia Fiorentini. Parameterized quantum circuits as machine learning models. Quantum Sci. Technol., 4(4):043001, 2019.
  • [41] Vedran Dunjko and Hans J. Briegel.

    Machine learning & artificial intelligence in the quantum domain: a review of recent progress.

    Rep. Prog. Phys., 81(7):074001, 2018.
  • [42] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, and Seth Lloyd. Quantum machine learning. Nature, 549(7671):195–202, 2017.
  • [43] Carlo Ciliberto, Mark Herbster, Alessandro Davide Ialongo, Massimiliano Pontil, Andrea Rocchetto, Simone Severini, and Leonard Wossnig. Quantum machine learning: a classical perspective. Proceedings. Mathematical, Physical, and Engineering Sciences, 474, 2018.
  • [44] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. The quest for a quantum neural network. Quantum Inf. Process., 13:2567–2586, 2014.
  • [45] Zhimin Yang and Xiangdong Zhang.

    Entanglement-based quantum deep learning.

    New J. Phys., 22:033041, 2020.
  • [46] Michael Broughton, Guillaume Verdon, Trevor McCourt, Antonio J. Martinez, Jae Hyeon Yoo, and et al. Tensorflow quantum: A software framework for quantum machine learning. arXiv:2003.02989, 2020.
  • [47] Guillaume Verdon, Jason Pye, and Michael Broughton. A universal training algorithm for quantum deep learning. arXiv:1806.09729, 2018.
  • [48] Guillaume Verdon, Trevor McCourt, Enxhell Luzhnica, Vikash Singh, Stefan Leichenauer, and Jack Hidary. Quantum graph neural networks. arXiv:1909.12264, 2019.
  • [49] Guillaume Verdon, Michael Broughton, Jarrod R. McClean, Kevin J. Sung, Ryan Babbush, Zhang Jiang, Hartmut Neven, and Masoud Mohseni. Learning to learn with quantum neural networks via classical neural networks. arXiv:1907.05415, 2019.