- Murray et al.  M. M. Murray, D. J. Lewkowicz, A. Amedi, and M. T. Wallace, Multisensory Processes: A Balancing Act across the Lifespan, Trends Neurosci 39, 567 (2016).
- Zenke et al.  F. Zenke, W. Gerstner, and S. Ganguli, The temporal paradox of Hebbian learning and homeostatic plasticity, Curr Opin Neurobiol 43, 166 (2017).
- Legg and Hutter  S. Legg and M. Hutter, Universal Intelligence: A Definition of Machine Intelligence, Minds & Machines 17, 391 (2007).
- Legg and Veness  S. Legg and J. Veness, An Approximation of the Universal Intelligence Measure, arXiv:1109.5951 (2011).
- Hernández-Orallo and Dowe  J. Hernández-Orallo and D. L. Dowe, Measuring universal intelligence: Towards an anytime intelligence test, Artificial Intelligence 174, 1508 (2010).
- Parisi et al.  G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, Continual lifelong learning with neural networks: A review, Neural Networks 113, 54 (2019).
- Parisi et al.  G. I. Parisi, J. Tani, C. Weber, and S. Wermter, Lifelong learning of human actions with deep neural network self-organization, Neural Networks 96, 137 (2017).
Chen et al. 
Z. Chen, B. Liu, R. Brachman, P. Stone, and F. Rossi, Lifelong Machine Learning: Second Edition
, Synthesis Lectures on Artificial Intelligence and Machine Learning (Morgan & Claypool Publishers, 2018).
- Mnih et al.  V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, Playing Atari with Deep Reinforcement Learning, arXiv:1312.5602 (2013).
- Mnih et al.  V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., Human-level control through deep reinforcement learning, Nature 518, 529 (2015).
- Silver et al.  D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., Mastering the game of Go with deep neural networks and tree search, Nature 529, 484 (2016).
- Silver et al.  D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, et al., Mastering the game of Go without human knowledge, Nature 550, 354 (2017).
- Krizhevsky et al.  Commun. ACM 60, 84 (2017).
- Silver et al.  D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, et al., A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play, Science 362, 1140 (2018).
- McCloskey and Cohen  M. McCloskey and N. J. Cohen, Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem, in Psychology of Learning and Motivation, Vol. 24, edited by G. H. Bower (Academic Press, 1989) pp. 109–165.
- Robins  A. Robins, Catastrophic forgetting, rehearsal and pseudorehearsal, Connect. Sci. 7, 123 (1995).
- French  R. M. French, Catastrophic forgetting in connectionist networks, Trends Cognit. Sci. 3, 128 (1999).
- Goodfellow et al.  I. J. Goodfellow, M. Mirza, D. Xiao, A. Courville, and Y. Bengio, An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks, arXiv:1312.6211 (2015).
- Kemker et al.  R. Kemker, M. McClure, A. Abitino, T. Hayes, and C. Kanan, Measuring Catastrophic Forgetting in Neural Networks, arXiv:1708.02072 (2017).
- Lloyd et al.  S. Lloyd, M. Mohseni, and P. Rebentrost, Quantum algorithms for supervised and unsupervised machine learning, arXiv:1307.0411 (2013).
- Lloyd and Weedbrook  S. Lloyd and C. Weedbrook, Quantum generative adversarial learning, Phys. Rev. Lett. 121, 040502 (2018).
Amin et al. 
M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and R. Melko, Quantum Boltzmann Machine,Phys. Rev. X 8, 021050 (2018).
- Cong et al.  I. Cong, S. Choi, and M. D. Lukin, Quantum convolutional neural networks, Nat. Phys. 15, 1273 (2019).
- Lamata  L. Lamata, Basic protocols in quantum reinforcement learning with superconducting circuits, Sci. Rep.00796 7, 1609 (2017).
- Du et al.  Y. Du, M.-H. Hsieh, T. Liu, and D. Tao, Implementable Quantum Classifier for Nonlinear Data, arXiv:1809.06056 (2018).
- Hu et al.  L. Hu, S.-H. Wu, W. Cai, Y. Ma, X. Mu, Y. Xu, H. Wang, Y. Song, D.-L. Deng, C.-L. Zou, et al., Quantum generative adversarial learning in a superconducting quantum circuit, Sci. Adv. 5, eaav2761 (2019).
- Saggio et al.  V. Saggio, B. E. Asenbeck, A. Hamann, T. Strömberg, P. Schiansky, V. Dunjko, N. Friis, N. C. Harris, M. Hochberg, D. Englund, et al., Experimental quantum speed-up in reinforcement learning agents, Nature 591, 229 (2021).
- Cong and Duan  I. Cong and L. Duan, Quantum discriminant analysis for dimensionality reduction and classification, New J. Phys. 18, 073011 (2016).
- Biamonte et al.  J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Nature 549, 195 (2017).
- Gao et al.  X. Gao, Z.-Y. Zhang, and L.-M. Duan, A quantum machine learning algorithm based on generative models, Sci. Adv. 4, eaat9004 (2018).
- Sarma et al.  S. D. Sarma, D.-L. Deng, and L.-M. Duan, Machine learning meets quantum physics, Phys. Today 72, 48 (2019).
- Aaronson  S. Aaronson, Read the fine print, Nat. Phys. 11, 291 (2015).
- Carleo et al.  G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, and L. Zdeborová, Machine learning and the physical sciences, Rev. Mod. Phys. 91, 045002 (2019).
- Liu et al.  Y. Liu, S. Arunachalam, and K. Temme, A rigorous and robust quantum speed-up in supervised machine learning, Nat. Phys. , 1 (2021).
- Alexeev et al.  Y. Alexeev, D. Bacon, K. R. Brown, R. Calderbank, L. D. Carr, F. T. Chong, B. DeMarco, D. Englund, E. Farhi, B. Fefferman, et al., Quantum Computer Systems for Scientific Discovery, PRX Quantum 2, 017001 (2021).
- Awschalom et al.  D. Awschalom, K. K. Berggren, H. Bernien, S. Bhave, L. D. Carr, P. Davids, S. E. Economou, D. Englund, A. Faraon, M. Fejer, et al., Development of Quantum Interconnects (QuICs) for Next-Generation Information Technologies, PRX Quantum 2, 017002 (2021).
- Altman et al.  E. Altman, K. R. Brown, G. Carleo, L. D. Carr, E. Demler, C. Chin, B. DeMarco, S. E. Economou, M. A. Eriksson, K.-M. C. Fu, et al., Quantum Simulators: Architectures and Opportunities, PRX Quantum 2, 017003 (2021).
- Dunjko and Briegel  V. Dunjko and H. J. Briegel, Machine learning & artificial intelligence in the quantum domain: A review of recent progress, Rep. Prog. Phys. 81, 074001 (2018).
Lu et al. 
S. Lu, L.-M. Duan, and D.-L. Deng, Quantum adversarial machine learning, Phys Rev Res2, 033212 (2020).
- Liu and Wittek  N. Liu and P. Wittek, Vulnerability of quantum classification to adversarial perturbations (2020).
- Gong and Deng  W. Gong and D.-L. Deng, Universal adversarial examples and perturbations for quantum classifiers, arXiv:2102.07788 (2021).
- Schuld and Killoran  M. Schuld and N. Killoran, Quantum machine learning in feature hilbert spaces, Phys. Rev. Lett. 122, 040504 (2019).
- Grant et al.  E. Grant, M. Benedetti, S. Cao, A. Hallam, J. Lockhart, V. Stojevic, A. G. Green, and S. Severini, Hierarchical quantum classifiers, npj Quantum Inf. 4, 65 (2018).
- Blank et al.  C. Blank, D. K. Park, J.-K. K. Rhee, and F. Petruccione, Quantum classifier with tailored quantum kernel, npj Quantum Inf. 6, 41 (2020).
- Du et al.  Y. Du, M.-H. Hsieh, T. Liu, D. Tao, and N. Liu, Quantum noise protects quantum classifiers against adversaries, arXiv:2003.09416 (2020).
- Russell and Norvig  S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 4th ed. (Pearson, Hoboken, 2020).
- J. L. Carroll and K. Seppi  J. L. Carroll and K. Seppi, Task similarity measures for transfer in reinforcement learning task libraries, in Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., Vol. 2 (31) pp. 803–808 vol. 2.
- LeCun et al.  Y. LeCun, C. Cortes, and C. J. Burges, THE MNIST DATABASE of handwritten digits, http://yann.lecun.com/exdb/mnist/ (1998).
-  See Supplemental Material at [URL will be inserted by publisher] for details on the setting of continual learning, the elastic weight consolidation method and more numerical results.
- Chang et al.  C.-Z. Chang, J. Zhang, X. Feng, J. Shen, Z. Zhang, M. Guo, K. Li, Y. Ou, P. Wei, L.-L. Wang, et al., Experimental Observation of the Quantum Anomalous Hall Effect in a Magnetic Topological Insulator, Science 340, 167 (2013).
- Yang et al.  G. Yang, F. Pan, and W.-B. Gan, Stably maintained dendritic spines are associated with lifelong memories, Nature 462, 920 (2009).
- Kirkpatrick et al.  J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, et al., Overcoming catastrophic forgetting in neural networks, arXiv:1612.00796 (2017).
- Bottou  L. Bottou, Stochastic Learning, in Advanced Lectures on Machine Learning: ML Summer Schools 2003, Canberra, Australia, February 2 - 14, 2003, Tübingen, Germany, August 4 - 16, 2003, Revised Lectures, edited by O. Bousquet, U. von Luxburg, and G. Rätsch (Springer Berlin Heidelberg, Berlin, Heidelberg, 2004) pp. 146–168.
- Kingma and Ba  D. P. Kingma and J. Ba, Adam: A Method for Stochastic Optimization, arXiv:1412.6980 (2017).
- Goodfellow et al.  I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016).
- Garipov et al.  T. Garipov, P. Izmailov, D. Podoprikhin, D. P. Vetrov, and A. G. Wilson, Loss surfaces, mode connectivity, and fast ensembling of dnns, in Advances in Neural Information Processing Systems, Vol. 31, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Curran Associates, Inc., 2018).
- Scott  W. A. Scott, Maximum likelihood estimation using the empirical fisher information matrix, J. Stat. Comput. Simul. 72, 599 (2002).
- Ly et al.  A. Ly, M. Marsman, J. Verhagen, R. Grasman, and E.-J. Wagenmakers, A Tutorial on Fisher Information, arXiv:1705.01064 (2017).
- Kunstner et al.  F. Kunstner, L. Balles, and P. Hennig, Limitations of the Empirical Fisher Approximation for Natural Gradient Descent, arXiv:1905.12558 (2020).
- Frieden  B. R. Frieden, Physics from Fisher Information: A Unification (Cambridge University Press, 1998).
- Petz and Ghinea  D. Petz and C. Ghinea, Introduction to quantum Fisher information, Quantum Probab. Relat. Top. , 261 (2011), arXiv:1008.2417 .
- Liu et al.  J. Liu, H. Yuan, X.-M. Lu, and X. Wang, Quantum Fisher information matrix and multiparameter estimation, J. Phys. A: Math. Theor. 53, 023001 (2019).
- Smacchia et al.  P. Smacchia, L. Amico, P. Facchi, R. Fazio, G. Florio, S. Pascazio, and V. Vedral, Statistical mechanics of the cluster Ising model, Phys. Rev. A 84, 022304 (2011).
- Rao et al.  D. Rao, F. Visin, A. A. Rusu, Y. W. Teh, R. Pascanu, and R. Hadsell, Continual unsupervised representation learning, arXiv:1910.14481 (2019).
- Luo et al.  X.-Z. Luo, J.-G. Liu, P. Zhang, and L. Wang, Yao.jl: Extensible, Efficient Framework for Quantum Algorithm Design, Quantum 4, 341 (2020).
- Kandala et al.  A. Kandala, A. Mezzacapo, K. Temme, M. Takita, M. Brink, J. M. Chow, and J. M. Gambetta, Hardware-efficient variational quantum eigensolver for small molecules and quantum magnets, Nature 549, 242 (2017).
- LeCun et al.  Y. LeCun, C. Cortes, and C. Burges, Mnist handwritten digit database, ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist 2 (2010).
- Aljundi  R. Aljundi, Continual Learning in Neural Networks, arXiv:1910.02718 (2019).
- Wil  Precise and accurate estimation, in Parameter Estimation for Scientists and Engineers (John Wiley, Sons, Ltd, 2007) Chap. 5, pp. 99–162.
- Chi  Fundamentals of statistical signal processing, in Blind Equalization and System Identification: Batch Processing Algorithms, Performance and Applications (Springer London, London, 2006) pp. 83–182.
I The Setting
Our numerical simulations are based on the open source package Yao.jl. To illustrate the catastrophic forgetting phenomena, we randomly initialize an eight-qubit variational quantum circuit (as shown in Fig. S4) as the ansatz for our quantum classifier, in which those rotation angles are variational parameters updated in the training process and unchanged in the inference process, and the CNOT gates is necessary to entangle all qubits since entanglement in quantum circuits is a key resource for potential quantum advantages. This variational architecture is hardware-efficient  and is capable to achieve satisfactory performances for our classification tasks (see Fig. S5). Besides, this architecture does not take advantages of the specific structure information of datasets.
All data encountered in our numerical simulations consists of 256 features and can be represented by eight qubits using amplitude encoding. For the original MNIST hand-written digit images, those -pixel images  are reduced to -pixel images (see Fig. S5(a)), so that we can simulate this quantum learning process with moderate classical computational resources. Then, we randomly choose a permutation of the 256 pixels and apply it for all images, which produces a new dataset consisting of pixel-permuted images (see Fig. S5(b)). For time-of-flight (TOF) images, we diagonalize the Hamiltonian of quantum anomalous Hall effect with an open boundary condition and calculate the atomic density distributions with different spin bases for the lower band in momentum space to obtain input data. We vary the strength of the spin-orbit coupling and the strength of the on-site Zeeman interaction in both the topological and topologically trivial regions to generate several thousand data samples (see Fig. S5(c)). For the symmetry protected topological state (SPT), we consider the model involving eight spins and exactly diagonalize its Hamiltonian to obtain the ground state which can be naturally represented using eight qubits (see Fig. S5(d)). In this work, we use amplitude encoding to convert the data of our classification tasks into the input quantum states for the quantum classifier.
The process of sequential learning is divided into different phases and our quantum classifier are trained with only one specific dataset in each training phase. For example, to illustrate the catastrophic forgetting phenomena, we first use the randomly initialized quantum classifier to learn to classify original MNIST images. After a satisfactory performance is obtained, this classifier are trained to distinguish permuted MINIST images. The results of different learning phases are shown in the main text, where the forgetting phenomena is revealed. As for continual learning via EWC method, the Fisher information matrix for each task is computed after the corresponding training phases and is stored for those following training phases.
Ii Elastic Weight Consolidation
From a high-level perspective, overcoming catastrophic forgetting in quantum continual learning requires protecting the learned knowledge of those previous tasks, as well as learning the new-coming knowledge of following tasks [64, 68]. So our quantum learning model should have enough capacity to store those information. Besides, appropriate management of model’s capacity is required to achieve quantum continual learning in practice. EWC method offers a practical method to do the capacity management: it estimates the necessary capacity for previous tasks and refreshes the rest part which contains rare information about those previously trained tasks. To do this, EWC method evaluates the importance of each variational parameter in the quantum classifier and only allows significant twist for those relatively unimportant ones.
We then give a detailed mathematical derivation of EWC method. For simplicity, we concern the two-task scenario here and use the similar philosophy to explicitly write down the result for the multi-task scenario. From the perspective of maximum likelihood estimation , we explore all possibilities of parameters of the quantum classifier to maximize the likelihood function , where is the total dataset ( and are datasets for task and task respectively and we assume that these two tasks are independent to each other). So we have expression
where the first and third equation use the Bayes’ rule and the second equation uses the independence condition. As shown in the main text, we have Taylor Series for the second term:
It is worthwhile to mention that from the perspective of parameter estimation 
, this treatment means that we sample parameters from a multivariate normal distribution:
where the optimal solution for task is the mean value of this normal distribution and is the precision matrix ( is the Hessian matrix at the optimal solution for task and is equal to the minus of the Fisher information matrix under some specific conditions ). We can rewrite the quadratic term using the Fisher information matrix and absorb it into the likelihood function of sequential tasks. This leads to the loss function for the second task in our scenario:
To reduce the potential storage and computation overhead for those possible large quantum models, we use the diagonal elements of the Fisher matrix as the weights of variational parameters and neglect those off-diagonal entries, which will be discussed later. Thus, we could add the regularization term shown in the main text to the loss function of the second task in order to maximize the likelihood function of joint tasks.
For continual learning of more than two tasks, we can compute the regularization term for each trained task and add them together to overcome catastrophic forgetting:
where is the original loss function for current task given current parameters is the -th diagonal element of the Fisher information matrix at the optimal point for previous task , is a hyper-parameter controlling the strength of this EWC restriction and so on.
Iii Reasons for neglecting off-diagonal elements
In our numerical simulations, the quantum classifier consists of 248 variational parameters, in which computing and storing the full Fisher matrix is not very hard. Nevertheless, if the number of parameters gets larger and larger to match the exponentially growing dimensionality of the Hilbert space, computing and storing its full Fisher matrix can be quite challenging. From a more practical perspective, we use the diagonal elements of the Fisher matrix which can be estimated by the first order derivative .
To compare the learning result of using the diagonal elements of the Fisher matrix and that of using the full Fisher matrix, we train our quantum classifier using the original MNIST images and the permuted MNIST images sequentially. In this simulation, the diagonal elements of the Fisher matrix and the full Fisher matrix are adapted as the metric to quantify the derivative distance in the parameter space respectively. The results in Fig. S6 shows that the performances of both metric choices are at the same level. We remark here that in consideration of the summation of those off-diagonal elements, we manually lower down the strength parameter in the simulation of using the full Fisher matrix. The similar performances between those two learning scenarios indicate that neglecting those off-diagonal elements in the Fisher matrix has no significant influence on the results of quantum continual learning. Thus, we use the diagonal elements as our distance metric in all other numerical simulations.
Iv More numerical results
In this section, we give more results of quantum continual learning. Performances of learning single tasks are shown in Fig. S5 and one sample image of each dataset is plotted. Those results indicate that our quantum classifier is capable to achieve satisfactory performances on those chosen classification tasks.
In the main text, we show that quantum continual learning of two-task case can be accomplished when those two problems are similar or dissimilar to each other. As a complementary example, we also simulate the quantum continual learning of two related problems. We use MNIST images of different digits to construct several classification tasks and find that the continual learning of this kind of tasks can also be accomplished (see Fig. S7).
We group MNIST hand-written images of different digits to construct several binary classification tasks and use them to train our quantum classifier. For multi-task cases, we choose three pairs of digits and use our quantum classifier to classify their hand-written images. We first train our quantum classifier using images of digit 2 and images of digit 8, which ends with a high classification performance (). Then, we train this quantum classifier to identify digit 1 and digit 4. In the favor of EWC method, our quantum classifier behaves reasonably well at both tasks after the second training phase. Sequentially, we train this circuit to classify digit 0 and digit 9, and find that our quantum classifier can perform relatively well in all three different classification tasks after those training processes.
We also notice that in the continual learning scenario, the performance of our quantum classifier on each task has a slight reduction compared with that in the single task learning scenario. Intuitively, this is caused by an inevitable small deviation from the optimal solution of a single task to the optimal solution of the joint task.