- Jurafsky and Martin (2000) Daniel Jurafsky and James H. Martin, Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 1st ed. (Prentice Hall PTR, USA, 2000).
- Blackburn and Bos (2005) Patrick Blackburn and Johan Bos, Representation and Inference for Natural Language: A First Course in Computational Semantics (Center for the Study of Language and Information, Stanford, CA, 2005).
- Brown et al. (2020) Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei, “Language models are few-shot learners,” (2020), arXiv:2005.14165 [cs.CL] .
- TURING (1950) A. M. TURING, “I.—COMPUTING MACHINERY AND INTELLIGENCE,” Mind LIX, 433–460 (1950), https://academic.oup.com/mind/article-pdf/LIX/236/433/30123314/lix-236-433.pdf .
- Searls (2002) David B. Searls, “The language of genes,” Nature 420, 211–217 (2002).
- Zeng et al. (2015) Zhiqiang Zeng, Hua Shi, Yun Wu, and Zhiling Hong, “Survey of natural language processing techniques in bioinformatics,” Computational and Mathematical Methods in Medicine 2015, 674296 (2015).
- Buhrmester et al. (2019) Vanessa Buhrmester, David Münch, and Michael Arens, “Analysis of explainers of black box deep neural networks for computer vision: A survey,” (2019), arXiv:1911.12116 [cs.AI] .
- Lambek (1958) Joachim Lambek, “The mathematics of sentence structure,” AMERICAN MATHEMATICAL MONTHLY , 154–170 (1958).
- MONTAGUE (2008) RICHARD MONTAGUE, “Universal grammar,” Theoria 36, 373–398 (2008).
- Chomsky (1957) Noam Chomsky, Syntactic Structures (Mouton, 1957).
- Coecke et al. (2010) Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark, “Mathematical foundations for a compositional distributional model of meaning,” (2010), arXiv:1003.4394 [cs.CL] .
- Grefenstette and Sadrzadeh (2011) E. Grefenstette and M. Sadrzadeh, “Experimental support for a categorical compositional distributional model of meaning,” in The 2014 Conference on Empirical Methods on Natural Language Processing. (2011) pp. 1394–1404, arXiv:1106.4058.
- Kartsaklis and Sadrzadeh (2013) D. Kartsaklis and M. Sadrzadeh, “Prior disambiguation of word tensors for constructing sentence vectors.” in The 2013 Conference on Empirical Methods on Natural Language Processing. (ACL, 2013) pp. 1590–1601.
- Sadrzadeh et al. (2013) M. Sadrzadeh, S. Clark, and B. Coecke, “The frobenius anatomy of word meanings i: subject and object relative pronouns,” Journal of Logic and Computation 23, 1293–1317 (2013).
- Sadrzadeh et al. (2014) Mehrnoosh Sadrzadeh, Stephen Clark, and Bob Coecke, “The frobenius anatomy of word meanings ii: possessive relative pronouns,” Journal of Logic and Computation 26, 785–815 (2014).
- Lewis (2020) Martha Lewis, “Towards logical negation for compositional distributional semantics,” (2020), arXiv:2005.04929 [cs.CL] .
- Pestun and Vlassopoulos (2017) Vasily Pestun and Yiannis Vlassopoulos, “Tensor network language model,” (2017), arXiv:1710.10248 [cs.CL] .
- Gallego and Orus (2019) Angel J. Gallego and Roman Orus, “Language design as information renormalization,” (2019), arXiv:1708.01525 [cs.CL] .
- Bradley et al. (2019) Tai-Danae Bradley, E. Miles Stoudenmire, and John Terilla, “Modeling sequences with quantum states: A look under the hood,” (2019), arXiv:1910.07425 [quant-ph] .
- Efthymiou et al. (2019) Stavros Efthymiou, Jack Hidary, and Stefan Leichenauer, “Tensornetwork for machine learning,” (2019), arXiv:1906.06329 [cs.LG] .
- Socher et al. (2013) Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts, “Recursive deep models for semantic compositionality over a sentiment treebank,” in Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Seattle, Washington, USA, 2013) pp. 1631–1642.
- Arute et al. (2019) Frank Arute, Kunal Arya, Ryan Babbush, Dave Bacon, Joseph C. Bardin, Rami Barends, Rupak Biswas, Sergio Boixo, Fernando G. S. L. Brandao, David A. Buell, Brian Burkett, Yu Chen, Zijun Chen, Ben Chiaro, Roberto Collins, William Courtney, Andrew Dunsworth, Edward Farhi, Brooks Foxen, Austin Fowler, Craig Gidney, Marissa Giustina, Rob Graff, Keith Guerin, Steve Habegger, Matthew P. Harrigan, Michael J. Hartmann, Alan Ho, Markus Hoffmann, Trent Huang, Travis S. Humble, Sergei V. Isakov, Evan Jeffrey, Zhang Jiang, Dvir Kafri, Kostyantyn Kechedzhi, Julian Kelly, Paul V. Klimov, Sergey Knysh, Alexander Korotkov, Fedor Kostritsa, David Landhuis, Mike Lindmark, Erik Lucero, Dmitry Lyakh, Salvatore Mandrà, Jarrod R. McClean, Matthew McEwen, Anthony Megrant, Xiao Mi, Kristel Michielsen, Masoud Mohseni, Josh Mutus, Ofer Naaman, Matthew Neeley, Charles Neill, Murphy Yuezhen Niu, Eric Ostby, Andre Petukhov, John C. Platt, Chris Quintana, Eleanor G. Rieffel, Pedram Roushan, Nicholas C. Rubin, Daniel Sank, Kevin J. Satzinger, Vadim Smelyanskiy, Kevin J. Sung, Matthew D. Trevithick, Amit Vainsencher, Benjamin Villalonga, Theodore White, Z. Jamie Yao, Ping Yeh, Adam Zalcman, Hartmut Neven, and John M. Martinis, “Quantum supremacy using a programmable superconducting processor,” Nature 574, 505–510 (2019).
- Harrow et al. (2009) Aram W. Harrow, Avinatan Hassidim, and Seth Lloyd, “Quantum algorithm for linear systems of equations,” Physical Review Letters 103 (2009), 10.1103/physrevlett.103.150502.
- Beer et al. (2020) Kerstin Beer, Dmytro Bondarenko, Terry Farrelly, Tobias J. Osborne, Robert Salzmann, Daniel Scheiermann, and Ramona Wolf, “Training deep quantum neural networks,” Nature Communications 11 (2020), 10.1038/s41467-020-14454-2.
- Kerenidis et al. (2019) Iordanis Kerenidis, Jonas Landman, Alessandro Luongo, and Anupam Prakash, “q-means: A quantum algorithm for unsupervised machine learning,” in Advances in Neural Information Processing Systems, Vol. 32, edited by H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlché-Buc, E. Fox, and R. Garnett (Curran Associates, Inc., 2019) pp. 4134–4144.
- Dunjko et al. (2016) Vedran Dunjko, Jacob M. Taylor, and Hans J. Briegel, “Quantum-enhanced machine learning,” Physical Review Letters 117 (2016), 10.1103/physrevlett.117.130501.
- Chia et al. (2020) Nai-Hui Chia, András Gilyén, Tongyang Li, Han-Hsuan Lin, Ewin Tang, and Chunhao Wang, “Sampling-based sublinear low-rank matrix arithmetic framework for dequantizing quantum machine learning,” Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (2020), 10.1145/3357713.3384314.
- Havlíček et al. (2019) Vojtěch Havlíček, Antonio D. Córcoles, Kristan Temme, Aram W. Harrow, Abhinav Kandala, Jerry M. Chow, and Jay M. Gambetta, “Supervised learning with quantum-enhanced feature spaces,” Nature 567, 209–212 (2019).
- Li et al. (2015) Zhaokai Li, Xiaomei Liu, Nanyang Xu, and Jiangfeng Du, “Experimental realization of a quantum support vector machine,” Physical Review Letters 114 (2015), 10.1103/physrevlett.114.140504.
- Zeng and Coecke (2016) William Zeng and Bob Coecke, “Quantum algorithms for compositional natural language processing,” Electronic Proceedings in Theoretical Computer Science 221, 67–75 (2016).
- O’Riordan et al. (2020) Lee James O’Riordan, Myles Doyle, Fabio Baruffa, and Venkatesh Kannan, “A hybrid classical-quantum workflow for natural language processing,” Machine Learning: Science and Technology (2020), 10.1088/2632-2153/abbd2e.
- Wiebe et al. (2019) Nathan Wiebe, Alex Bocharov, Paul Smolensky, Matthias Troyer, and Krysta M Svore, “Quantum language processing,” (2019), arXiv:1902.05162 [quant-ph] .
- Bausch et al. (2020) Johannes Bausch, Sathyawageeswar Subramanian, and Stephen Piddock, “A quantum search decoder for natural language processing,” (2020), arXiv:1909.05023 [quant-ph] .
- Chen (2002) Joseph CH Chen, “Quantum computation and natural language processing,” (2002).
- Abramsky and Coecke (2004) S. Abramsky and B. Coecke, “A categorical semantics of quantum protocols,” in Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004. (2004) pp. 415–425.
- Coecke and Kissinger (2017) B. Coecke and A. Kissinger, Picturing Quantum Processes. A First Course in Quantum Theory and Diagrammatic Reasoning (Cambridge University Press, 2017).
- Meichanetzidis et al. (2020) Konstantinos Meichanetzidis, Stefano Gogioso, Giovanni De Felice, Nicolò Chiappori, Alexis Toumi, and Bob Coecke, “Quantum natural language processing on near-term quantum computers,” (2020), arXiv:2005.04147 [cs.CL] .
Schuld et al. (2020)
Maria Schuld, Alex Bocharov, Krysta M. Svore, and Nathan Wiebe, “Circuit-centric quantum classifiers,”Physical Review A 101 (2020), 10.1103/physreva.101.032308.
- Benedetti et al. (2019) Marcello Benedetti, Erika Lloyd, Stefan Sack, and Mattia Fiorentini, “Parameterized quantum circuits as machine learning models,” Quantum Science and Technology 4, 043001 (2019).
- (40) Joachim Lambek, “From word to sentence,” .
- Preller (2007) Anne Preller, “Linear processing with pregroups,” Studia Logica: An International Journal for Symbolic Logic 87, 171–197 (2007).
- Baez and Stay (2009) John C. Baez and Mike Stay, “Physics, topology, logic and computation: A rosetta stone,” (2009), arXiv:0903.0340 [quant-ph] .
- Selinger (2010) P. Selinger, “A survey of graphical languages for monoidal categories,” Lecture Notes in Physics , 289–355 (2010).
- Schuld and Killoran (2019) Maria Schuld and Nathan Killoran, “Quantum machine learning in feature hilbert spaces,” Physical Review Letters 122 (2019), 10.1103/physrevlett.122.040504.
- Lloyd et al. (2020) Seth Lloyd, Maria Schuld, Aroosa Ijaz, Josh Izaac, and Nathan Killoran, “Quantum embeddings for machine learning,” (2020), arXiv:2001.03622 [quant-ph] .
- Mikolov et al. (2013) Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, “Efficient estimation of word representations in vector space,” (2013), arXiv:1301.3781 [cs.CL] .
- Spall (1998) J. C. Spall, “Implementation of the simultaneous perturbation algorithm for stochastic optimization,” IEEE Transactions on Aerospace and Electronic Systems 34, 817–823 (1998).
- de Felice et al. (2020) Giovanni de Felice, Konstantinos Meichanetzidis, and Alexis Toumi, “Functorial question answering,” Electronic Proceedings in Theoretical Computer Science 323, 84–94 (2020).
- Chen et al. (2020) Yiwei Chen, Yu Pan, and Daoyi Dong, “Quantum language model with entanglement embedding for question answering,” (2020), arXiv:2008.09943 [cs.CL] .
- Zhao et al. (2020) Qin Zhao, Chenguang Hou, Changjian Liu, Peng Zhang, and Ruifeng Xu, “A quantum expectation value based language model with application to question answering,” Entropy 22, 533 (2020).
- Sivarajah et al. (2020) Seyon Sivarajah, Silas Dilkes, Alexander Cowtan, Will Simmons, Alec Edgington, and Ross Duncan, “t—ket⟩: A retargetable compiler for nisq devices,” Quantum Science and Technology (2020), 10.1088/2058-9565/ab8e92.
- Aharonov et al. (2008) Dorit Aharonov, Vaughan Jones, and Zeph Landau, “A polynomial quantum algorithm for approximating the jones polynomial,” Algorithmica 55, 395–421 (2008).
- Mitarai and Fujii (2019) Kosuke Mitarai and Keisuke Fujii, “Methodology for replacing indirect measurements with direct measurements,” Physical Review Research 1 (2019), 10.1103/physrevresearch.1.013006.
- Benedetti et al. (2020) Marcello Benedetti, Mattia Fiorentini, and Michael Lubasch, “Hardware-efficient variational quantum algorithms for time evolution,” (2020), arXiv:2009.12361 [quant-ph] .
- Piedeleu et al. (2015) Robin Piedeleu, Dimitri Kartsaklis, Bob Coecke, and Mehrnoosh Sadrzadeh, “Open system categorical quantum semantics in natural language processing,” (2015), arXiv:1502.00831 [cs.CL] .
- Bankova et al. (2016) Desislava Bankova, Bob Coecke, Martha Lewis, and Daniel Marsden, “Graded entailment for compositional distributional semantics,” (2016), arXiv:1601.04908 [cs.CL] .
- Coecke (2020) Bob Coecke, “The mathematics of text structure,” (2020), arXiv:1904.03478 [cs.CL] .
- Coecke et al. (2020) Bob Coecke, Giovanni De Felice, Konstantinos Meichanetzidis, and Alexis Toumi, “Foundations for near-term quantum natural language processing (unpublished),” (2020).
- (59) Wojciech Buszkowski and Katarzyna Moroz, “Pregroup grammars and context-free grammars,” .
- Pentus (1993) M. Pentus, “Lambek grammars are context free,” in  Proceedings Eighth Annual IEEE Symposium on Logic in Computer Science (1993) pp. 429–433.
- (61) https://github.com/andim/noisyopt.
- Bradbury et al. (2018) James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, and Skye Wanderman-Milne, “JAX: composable transformations of Python+NumPy programs,” (2018).
- Olson et al. (2012) Brian Olson, Irina Hashmi, Kevin Molloy, and Amarda Shehu, “Basin hopping as a general and versatile optimization framework for the characterization of biological macromolecules,” Advances in Artificial Intelligence 2012, 1–19 (2012).
- Gao and Han (2010) Fuchang Gao and Lixing Han, “Implementing the nelder-mead simplex algorithm with adaptive parameters,” Computational Optimization and Applications 51, 259–277 (2010).
- (65) https://pypi.org/project/scipy.
- Cowtan et al. (2019) Alexander Cowtan, Silas Dilkes, Ross Duncan, Alexandre Krajenbrink, Will Simmons, and Seyon Sivarajah, “On the Qubit Routing Problem,” in 14th Conference on the Theory of Quantum Computation, Communication and Cryptography (TQC 2019), Leibniz International Proceedings in Informatics (LIPIcs), Vol. 135, edited by Wim van Dam and Laura Mancinska (Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 2019) pp. 5:1–5:32.
- (67) https://github.com/CQCL/pytket.
Appendix A Pregroup Grammar
Pregroup grammars where introduced by Lambek as an algebraic model for grammar Lambek .
A pregroup grammar is freely generated by the basic types in a finite set . Basic types are decorated by an integer , which signifies their adjoint order. Negative integers , with , are called left adjoints of order and positive integers are called right adjoints. We shall refer to a basic type to some adjoint order (include the zeroth order) simply as ‘type’. The zeroth order signifies no adjoint action on the basic type and so we often omit it in notation, .
The pregroup algebra is such that the two kinds of adjoint (left and right) act as left and right inverses under multiplication of basic types
where is the trivial or unit type. The left hand side of this reduction is called a contraction and the right hand side an expansion. Pregroup grammar also accommodates induced steps for . The symbol ‘’ is to be read as ‘type-reduction’ and the pregroup grammar sets the rules for which reductions are valid.
Now, to go from word to sentence, we consider a finite set of words called the vocabulary . We call the dictionary
(or lexicon) the finite set of entries. The star symbol denotes the set of finite strings that can be generated by the elements of the set . Each dictionary entry assigns a product (or string) of types to a word , .
Finally, a pregroup grammar G generates a language as follows. A sentence is a sequence (or list) of words . The type of a sentence is the product of types of its words , where and . A sentence is grammatical, i.e. it belongs to the language generated by the grammar , if and only if there exists a sequence of reductions so that the type of the sentence reduces to the special sentence-type as . Note that it is in fact possible to type-reduce grammatical sentences only using contractions.
Appendix B String Diagrams
String diagrams describing process theories are generated by states, effects, and processes. In Fig.6 we comprehensively show these generators along with constraining equations on them. String diagrams for process theories formally describe process networks where only connectivity matters, i.e. which outputs are connected to which inputs. In other words, the length of the wires carries no meaning and the wires are freely deformable as long as the topology of the network is respected.
It is beyond the purposes of this work to provide a comprehensive exposition on diagrammatic languages. We provide the necessary elements which are used for the implementation of our QNLP experiments.
Appendix C Random Sentence Generation with CFG
A context-free grammar generates a language from a set of production (or rewrite) rules applied on symbols. Symbols belong to a finite set and There is a special type called initial. Production rules belong to a finite set and are of the form , where . The application of a production rule results in substituting a symbol with a product (or string) of symbols. Randomly generating a sentence amounts to starting from and randomly applying production rules uniformly sampled from the set . The production ends when all types produced are terminal types, which are non other than words in the finite vocabulary .
From a process theory point of view, we represent symbols as types carried by wires. Production rules are represented as boxes with input and output wires labelled by the appropriate types. The process network (or string diagram) describing the production of a sentence ends with a production rule whose output is the -type. Then we randomly pick boxes and compose them backwards, always respecting type-matching when inputs of production rules are fed into outputs of other production rules. The generation terminates when production rules are applied which have no inputs (i.e. they are states), and they correspond to the words in the finite vocabulary.
In Fig.7 (on the left hand side of the arrows) we show the string-diagram generators we use to randomly produce sentences from a vocabulary of words composed of nouns, transitive verbs, intransitive verbs, and relative pronouns. The corresponding types of these parts of speech are . The vocabulary is the union of the words of each type, .
Having randomly generated a sentence from the CFG, its string diagram can be translated into a pregroup sentence diagram. To do so we use the translation rules shown in Fig.7. Note that a cup labeled by the basic type is used to represent a contraction . Pregroup grammars are weakly equivalent to context-free grammars, in the sense that they generate the same language Buszkowski and Moroz ; Pentus (1993).
Appendix D Corpora
Here we present the sentences and their labels used in the experiments presented in the main text.
The types assigned to the words of this sentence are as follows. Nouns get typed as , transitive verbs are given type , intransitive verbs are typed , and the relative pronoun is typed .
Corpus of labeled sentences from the vocabulary
, , , :
[(’Dude who loves Walter bowls’, 1),
(’Dude bowls’, 1),
(’Dude annoys Walter’, 0),
(’Walter who abides bowls’, 0),
(’Walter loves Walter’, 1),
(’Walter annoys Dude’, 1),
(’Walter bowls’, 1),
(’Walter abides’, 0),
(’Dude loves Walter’, 1),
(’Dude who bowls abides’, 1),
(’Walter who bowls annoys Dude’, 1),
(’Dude who bowls bowls’, 1),
(’Dude who abides abides’, 1),
(’Dude annoys Dude who bowls’, 0),
(’Walter annoys Walter’, 0),
(’Dude who abides bowls’, 1),
(’Walter who abides loves Walter’, 0),
(’Walter who bowls bowls’, 1),
(’Walter loves Walter who abides’, 0),
(’Walter annoys Walter who bowls’, 0),
(’Dude abides’, 1),
(’Dude loves Walter who bowls’, 1),
(’Walter who loves Dude bowls’, 1),
(’Dude loves Dude who abides’, 1),
(’Walter who abides loves Dude’, 0),
(’Dude annoys Dude’, 0),
(’Walter who annoys Dude bowls’, 1),
(’Walter who annoys Dude abides’, 0),
(’Walter loves Dude’, 1),
(’Dude who bowls loves Walter’, 1)]
Corpus of labeled sentences from the vocabulary
, , , :
[(’Romeo dies’, 1.0),
(’Romeo loves Juliet’, 0.0),
(’Juliet who dies dies’, 1.0),
(’Romeo loves Romeo’, 0.0),
(’Juliet loves Romeo’, 0.0),
(’Juliet dies’, 1.0)]
Corpus of labeled sentences from the vocabulary
, , , :
[(’Juliet kills Romeo who dies’, 0),
(’Juliet dies’, 1),
(’Romeo who loves Juliet dies’, 1),
(’Romeo dies’, 1),
(’Juliet who dies dies’, 1),
(’Romeo loves Juliet’, 1),
(’Juliet who dies loves Juliet’, 0),
(’Romeo kills Juliet who dies’, 0),
(’Romeo who kills Romeo dies’, 1),
(’Romeo who dies dies’, 1),
(’Romeo who loves Romeo dies’, 0),
(’Romeo kills Juliet’, 0),
(’Romeo who dies kills Romeo’, 1),
(’Juliet who dies kills Romeo’, 0),
(’Romeo loves Romeo’, 0),
(’Romeo who dies kills Juliet’, 0)]
Appendix E Sentence to Circuit mapping
Quantum theory has formally been shown to be a process theory. Therefore it enjoys a diagrammatic language in terms of string diagrams. Specifically, in the context of the quantum circuits we construct in our experiments, we use pure quantum theory. In the case of pure quantum theory, processes are unitary operations, or quantum gates in the context of circuits. The monoidal structure allowing for parallel processes is instantiated by the tensor product and sequential composition is instantiated by sequential composition of quantum gates.
In Fig.8 we show the generic construction of the mapping from sentence diagrams to parameterised quantum circuits for the hyperparameters and parameterised word-circuits we use in this work.
A wire carrying basic pregroup type is given qubits. A word-state with only one output wire becomes a one-qubit-state prepared from . For the preparation of such unary states we choose the sequence of gates defining an Euler decomposition of one-qubit unitaries . Word-states with more than one output wires become multiqubit states on qubits prepared by an IQP-style circuit from . Such a word-circuit is composed of -many layers. Each layer is composed of a layer of Hadamard gates followed by a layer in which every neighbouring pair of qubit wires is connected by a gate, . Since all gates commute with each other it is justified to consider this as a single layer, at least abstractly. The Kronecker tensor with -many output wires of type is mapped to a GHZ state on qubits. Specifically, GHZ is a circuit that prepares the state , where is the binary expression of an integer. The cup of pregroup type is mapped to -many nested Bell effects, each of which is implemented as a CNOT followed by a Hadamard gate on the control qubit and postselection on .
Appendix F Optimisation Method
The gradient-free otpimisation method we use, Simultaneous Perturbation Stochastic Approximation (SPSA), works as follows. Start from a random point in parameter space. At every iteration pick randomly a direction and estimate the derivative by finite difference with step-size depending on towards that direction. This requires two cost function evaluations. This provides a significant speed up the evaluation of . Then take a step of size depending on towards (opposite) that direction if the derivative has negative (positive) sign. In our experiments we use minimizeSPSA from the Python package noisyopt noi , and we set and , except for the experiment on ibmq for for which we set and .
Note that for classical simulations, we use just-in-time compilation of the cost function by invoking jit from jax Bradbury et al. (2018). In addition, the choice of the squares-of-differences cost we defined in Eq.2 is not unique. One can as well use the binary cross entropy
and the cost function can be minimised as well, as shown in Fig.9.
In our classical simulation of the experiment we also used basinhopping Olson et al. (2012) in combination with Nelder-Mead Gao and Han (2010) from the Python package SciPy sci . Nelder-Mead is a gradient-free local optimisation method. basinhopping hops (or jumps) between basins (or local minima) and then returns the minimum over local minima of the cost function, where each minimum is found by Nelder-Mead. The hop direction is random. The hop is accepted according to a Metropolis criterion depending on the the cost function to be minimised and a temperature. We used the default temperature value () and the default number of basin hops ().
f.1 Error Decay
f.2 On the influence of noise to the cost function landscape
Regarding optimisation on a quantum computer, we comment on the effect of noise on the optimal parameters. Consider a successful optimisation of performed on a NISQ device, returning . However, if we instantiate the circuits and evaluate them on a classical computer to obtain the predicted labels , we observe that these can in general differ from the labels predicted by evaluating the circuits on the quantum computer. In the context of a fault-tolerant quantum computer, this should not be the case. However, since there is a non-trivial coherent-noise channel that our circuits undergo, it is expected that the optimiser’s result are affected in this way.
Appendix G Quantum Compilation
In order to perform quantum compilation we use Sivarajah et al. (2020). It is a Python module for interfacing with CQC’s , a toolset for quantum programming. From this toolbox, we need to make use of compilation passes.
At a high level, quantum compilation can be described as follows. Given a circuit and a device, quantum operations are decomposed in terms of the devices native gateset. Furthermore, the quantum circuit is reshaped in order to make it compatible with the device’s topology Cowtan et al. (2019). Specifically, the compilation pass that we use is default_compilation_pass(2). The integer option is set to for maximum optimisation under compilation pyt .
Circuits written in can be run on other devices by simply changing the backend being called, regardless whether the hardware might be fundamentally different in terms of what physical systems are used as qubits. This makes it platform agnostic. We stress that on IBMQ machines specifically, the native gates are arbitrary single-qubit unitaries (‘U3’ gate) and entangling controlled-not gates (‘CNOT’ or ‘CX’). Importantly, CNOT gates show error rates which are one or even two orders of magnitude larger than error rates of U3 gates. Therefore, we measure the depth of or circuits in terms of the CNOT-depth. Using pytket this can be obtained by invoking the command depth_by_type(OpType.CX).
For both backends used in this work, ibmq_montreal and ibmq_toronto, the reported quantum volume is 32 and the maximum allowed number of shots is .
Appendix H Hadamard Test
In our binary classification NLP task, the predicted label is the norm squared of zero-to-zero transition amplitude , where the unitary includes the word-circuits and the circuits that implement the Bell effects as dictated by the grammatical structure. Estimating these amplitudes can be done by postselecting on . However, postselection costs exponential time in the number of postselected qubits; in our case needs to discard all bitstring sampled from the quantum computer that have Hamming weight other than zero. This is the procedure we follow in this proof of concept experiment, as we can afford doing so due to the small circuit sizes.
In such a setting, postselection can be avoided by using the Hadamard test Aharonov et al. (2008). See Fig.11 for the circuit allowing for the estimation of the real and imaginary part of an amplitude. In Fig.12 we show how the Hadamard test can be used to estimate the amplitude represented by the postselected quantum circuit of Fig.3.