Information Scrambling in Quantum Neural Networks

09/26/2019 ∙ by Huitao Shen, et al. ∙ Tsinghua University 0

Quantum neural networks are one of the promising applications for near-term noisy intermediate-scale quantum computers. A quantum neural network distills the information from the input wavefunction into the output qubits. In this Letter, we show that this process can also be viewed from the opposite direction: the quantum information in the output qubits is scrambled into the input. This observation motivates us to use the tripartite information, a quantity recently developed to characterize information scrambling, to diagnose the training dynamics of quantum neural networks. We empirically find strong correlation between the dynamical behavior of the tripartite information and the loss function in the training process, from which we identify that the training process has two stages for randomly initialized networks. In the early stage, the network performance improves rapidly and the tripartite information increases linearly with a universal slope, meaning that the neural network becomes less scrambled than the random unitary. In the latter stage, the network performance improves slowly while the tripartite information decreases. We present evidences that the network constructs local correlations in the early stage and learns large-scale structures in the latter stage. We believe this two-stage training dynamics is universal and is applicable to a wide range of problems. Our work builds bridges between two research subjects of quantum neural networks and information scrambling, which opens up a new perspective to understand quantum neural networks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

References

  • Goodfellow et al. (2016) Ian Goodfellow, Yoshua Bengio,  and Aaron Courville, Deep Learning (MIT Press, 2016).
  • Benedetti et al. (2017) Marcello Benedetti, John Realpe-Gómez, Rupak Biswas,  and Alejandro Perdomo-Ortiz, “Quantum-Assisted Learning of Hardware-Embedded Probabilistic Graphical Models,” Phys. Rev. X 7, 041052 (2017).
  • Torrontegui and García-Ripoll (2019)

    E. Torrontegui and J. J. García-Ripoll, “Unitary quantum perceptron as efficient universal approximator,” 

    EPL (Europhysics Letters) 125, 30004 (2019).
  • Benedetti et al. (2019) Marcello Benedetti, Delfina Garcia-Pintos, Oscar Perdomo, Vicente Leyton-Ortega, Yunseong Nam,  and Alejandro Perdomo-Ortiz, “A generative modeling approach for benchmarking and training shallow quantum circuits,” npj Quantum Inf. 5, 45 (2019).
  • (5) Edward Farhi and Hartmut Neven, “Classification with Quantum Neural Networks on Near Term Processors,”  arXiv:1802.06002 .
  • McClean et al. (2018) Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy, Ryan Babbush,  and Hartmut Neven, “Barren plateaus in quantum neural network training landscapes,” Nat. Commun. 9, 4812 (2018).
  • Mitarai et al. (2018) K. Mitarai, M. Negoro, M. Kitagawa,  and K. Fujii, “Quantum circuit learning,” Phys. Rev. A 98, 032309 (2018).
  • Huggins et al. (2019) William Huggins, Piyush Patil, Bradley Mitchell, K Birgitta Whaley,  and E Miles Stoudenmire, “Towards quantum machine learning with tensor networks,” Quantum Sci. Technol. 4, 024001 (2019).
  • (9)

    Maria Schuld, Alex Bocharov, Krysta Svore,  and Nathan Wiebe, “Circuit-centric quantum classifiers,” 

     arXiv:1804.00633 .
  • Grant et al. (2018) Edward Grant, Marcello Benedetti, Shuxiang Cao, Andrew Hallam, Joshua Lockhart, Vid Stojevic, Andrew G Green,  and Simone Severini, “Hierarchical quantum classifiers,” npj Quantum Inf. 4, 65 (2018).
  • Liu and Wang (2018) Jin-Guo Liu and Lei Wang, “Differentiable learning of quantum circuit Born machines,” Phys. Rev. A 98, 062324 (2018).
  • (12) Guillaume Verdon, Jason Pye,  and Michael Broughton, “A Universal Training Algorithm for Quantum Deep Learning,”  arXiv:1806.09729 .
  • Zeng et al. (2019) Jinfeng Zeng, Yufeng Wu, Jin-Guo Liu, Lei Wang,  and Jiangping Hu, “Learning and inference on generative adversarial quantum circuits,” Phys. Rev. A 99, 052306 (2019).
  • (14) Yuxuan Du, Min-Hsiu Hsieh, Tongliang Liu,  and Dacheng Tao, “The Expressive Power of Parameterized Quantum Circuits,”  arXiv:1810.11922 .
  • (15) Kerstin Beer, Dmytro Bondarenko, Terry Farrelly, Tobias J Osborne, Robert Salzmann,  and Ramona Wolf, “Efficient Learning for Deep Quantum Neural Networks,”  arXiv:1902.10445 .
  • (16) Matthew J. S. Beach, Roger G. Melko, Tarun Grover,  and Timothy H. Hsieh, “Making Trotters Sprint: A Variational Imaginary Time Ansatz for Quantum Many-body Systems,”  arXiv:1904.00019 .
  • Biamonte et al. (2017) Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe,  and Seth Lloyd, “Quantum machine learning,” Nature 549, 195–202 (2017).
  • Preskill (2018) John Preskill, “Quantum Computing in the NISQ era and beyond,” Quantum 2, 79 (2018).
  • Nielsen and Chuang (2010) Michael A. Nielsen and Isaac .L. Chuang, Quantum Computation and Quantum Information: 10th Anniversary Edition (Cambridge University Press, 2010).
  • Altman (2018) Ehud Altman, “Many-body localization and quantum thermalization,” Nat. Phys. 14, 979–983 (2018).
  • Qi (2018) Xiao-Liang Qi, “Does gravity come from quantum information?” Nat. Phys. 14, 984–987 (2018).
  • Swingle (2018) Brian Swingle, “Unscrambling the physics of out-of-time-order correlators,” Nat. Phys. 14, 988–990 (2018).
  • Larkin and Ovchinnikov (1969) A I Larkin and Yu N Ovchinnikov, “Quasiclassical Method in the Theory of Superconductivity,” Sov. Phys. JETP 28, 1200–1205 (1969).
  • (24) Alexei Kitaev, “Hidden correlations in the hawking radiation and thermal noise,” A talk given at Fundamental Physics Prize Symposium, 2014.
  • Shenker and Stanford (2014) Stephen H. Shenker and Douglas Stanford, “Black holes and the butterfly effect,” J. High Energy Phys. 2014, 67 (2014).
  • Maldacena et al. (2016) Juan Maldacena, Stephen H. Shenker,  and Douglas Stanford, “A bound on chaos,” J. High Energy Phys. 2016, 106 (2016).
  • Fan et al. (2017) Ruihua Fan, Pengfei Zhang, Huitao Shen,  and Hui Zhai, “Out-of-time-order correlation for many-body localization,” Sci. Bull. 62, 707 – 711 (2017).
  • (28) Ravid Shwartz-Ziv and Naftali Tishby, “Opening the Black Box of Deep Neural Networks via Information,”  arXiv:1703.00810 .
  • Saxe et al. (2018) Andrew Michael Saxe, Yamini Bansal, Joel Dapello, Madhu Advani, Artemy Kolchinsky, Brendan Daniel Tracey,  and David Daniel Cox, “On the information bottleneck theory of deep learning,” in International Conference on Learning Representations (2018).
  • Goldfeld et al. (2019)

    Ziv Goldfeld, Ewout Van Den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury,  and Yury Polyanskiy, “Estimating information flow in deep neural networks,” in 

    Proceedings of the 36th International Conference on Machine Learning, Proceedings of Machine Learning Research, Vol. 97, edited by Kamalika Chaudhuri and Ruslan Salakhutdinov (PMLR, Long Beach, California, USA, 2019) pp. 2299–2308.
  • Kitaev and Preskill (2006) Alexei Kitaev and John Preskill, “Topological Entanglement Entropy,” Phys. Rev. Lett. 96, 110404 (2006).
  • Hosur et al. (2016) Pavan Hosur, Xiao-Liang Qi, Daniel A. Roberts,  and Beni Yoshida, “Chaos in quantum channels,” J. High Energy Phys. 2016, 4 (2016).
  • Dita (2003) P Dita, “Factorization of unitary matrices,” J. Phys. A 36, 2781–2789 (2003).
  • Reddi et al. (2018) Sashank J. Reddi, Satyen Kale,  and Sanjiv Kumar, “On the Convergence of Adam and Beyond,” in International Conference on Learning Representations (2018).
  • (35) See Supplemental Material, which includes Ref. PhysRevLett.120.066401, for further results on magnetization learning, results of winding number learning, and details of gradient calculation and measurement.
  • (36) For network initializations, we require initial unitaries to be scrambled enough such that initial ( is about is half of the negative-most value). For training algorithms, we require these algorithms to be gradient-based. For network depths, we require the networks to be not too shallow.