# Group-Invariant Quantum Machine Learning

Quantum Machine Learning (QML) models are aimed at learning from data encoded in quantum states. Recently, it has been shown that models with little to no inductive biases (i.e., with no assumptions about the problem embedded in the model) are likely to have trainability and generalization issues, especially for large problem sizes. As such, it is fundamental to develop schemes that encode as much information as available about the problem at hand. In this work we present a simple, yet powerful, framework where the underlying invariances in the data are used to build QML models that, by construction, respect those symmetries. These so-called group-invariant models produce outputs that remain invariant under the action of any element of the symmetry group 𝔊 associated to the dataset. We present theoretical results underpinning the design of 𝔊-invariant models, and exemplify their application through several paradigmatic QML classification tasks including cases when 𝔊 is a continuous Lie group and also when it is a discrete symmetry group. Notably, our framework allows us to recover, in an elegant way, several well known algorithms for the literature, as well as to discover new ones. Taken together, we expect that our results will help pave the way towards a more geometric and group-theoretic approach to QML model design.

## Authors

• 2 publications
• 3 publications
• 2 publications
• 7 publications
• 26 publications
• 19 publications
05/12/2022

### Exploiting symmetry in variational quantum machine learning

Variational quantum machine learning is an extensively studied applicati...
02/04/2022

### Group invariant machine learning by fundamental domain projections

We approach the well-studied problem of supervised group invariant and e...
05/20/2021

### Negational Symmetry of Quantum Neural Networks for Binary Pattern Classification

Entanglement is a physical phenomenon, which has fueled recent successes...
06/30/2022

### Group-invariant tensor train networks for supervised learning

Invariance has recently proven to be a powerful inductive bias in machin...
05/30/2022

### Testing for Geometric Invariance and Equivariance

Invariant and equivariant models incorporate the symmetry of an object t...
02/15/2022

### Unsupervised Learning of Group Invariant and Equivariant Representations

Equivariant neural networks, whose hidden features transform according t...
02/20/2021

### Provably Strict Generalisation Benefit for Equivariant Models

It is widely believed that engineering a model to be invariant/equivaria...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Symmetries have always held a special place in the imaginarium of scientists seeking to understand the universe through physical theories. As such, it is not strange for a scientist to equate a theory’s beauty and elegance with its symmetry and harmony [gellmann2007beauty]. Still, the role of symmetries in science is more than simply aesthetic, as in many cases they constitute the underlying force behind a theory. For instance, Galilean invariance is pivotal in Newton’s laws of motion [newton1999principia], and Lorentz and gauge invariances were fundamental for Maxwell to unify electricity and magnetism into the general theory of electromagnetism [maxwell1865viii]. In the 20th century, symmetries would take the center stage as Einstein’s theory of general relativity provided the first geometrization of symmetries [einstein1922general, gross1996role]. Soon after, Noether’s theorem showed a connection between differentiable symmetries and conserved quantities [noether1918invariante], proving that symmetries have defining implications in nature.

More recently, the importance of symmetries has been explored in the context of machine learning, and is core to the development of the field of geometric deep learning [bronstein2021geometric]

. Here, the key insight was to note that the most successful neural network architectures can be viewed as models with inductive biases that respect the underlying structure and symmetries of the domain over which they act. The inductive bias refers to the fact that the model explores only a subset of the space of functions due to the assumptions imposed on its definition. Geometric deep learning not only constitutes a unifying mathematical framework for studying neural network architectures, but also provides guidelines to incorporate prior physical (and geometrical) knowledge into new architectures with better generalization performance, more efficient data requirements, as well as favorable optimization landscapes

[cohen2016group, kondor2018generalization, bronstein2021geometric, bogatskiy2022symmetry].

In this work, we propose to import ideas from the field of geometric deep learning to the realm of Quantum Machine Learning (QML). QML has recently emerged as a leading candidate to make practical use of near-term quantum devices [biamonte2017quantum, cerezo2020variationalreview]. QML formally generalizes classical machine learning by embedding it into the formalism of quantum mechanics and quantum computation. This formal generalization can lead to practical speedups with the potential to significantly outperform classical machine learning [huang2021provably], since quantum computers can efficiently manipulate information stored in quantum states living in exponentially large Hilbert spaces.

Similar to their classical counterparts, the ability of QML models to solve a given task hinges on several factors, with one of the most important being the choice of the model itself. If the inductive biases 111This is also known as the choice and parameterization of prior, in the language of Bayesian theory [battaglia2018relational]. of a model are uninformed, its expressibility is large, leading to issues such as barren plateaus in the training landscape [holmes2021connecting, mcclean2018barren, cerezo2021cost, sharma2020trainability, thanasilp2021subtleties, arrasmith2021equivalence]. Adding sharp priors to the model narrows the effective search space and increases its performance [cong2019quantum, pesah2020absence, volkoff2021large]. As such, a great deal of effort has been recently put forward towards designing more problem-specific schemes with strong inductive biases [larocca2021diagnosing, larocca2021theory, wecker2015progress, gard2020efficient, lee2021towards, tang2019qubit, glick2021covariant, verdon2019quantumgraph, verdon2019quantum].

The main goal of this work is to show that by studying the symmetries in a given problem one can construct architectures with sharp geometric priors (see Fig. 1). Our contribution is a simple, yet extremely powerful, framework to build QML models that are, by design, invariant under a given symmetry group

associated with the quantum dataset. For illustration, we apply our framework to classifying datasets based on purity, time-reversal dynamics, multipartite entanglement, and graph isomorphism. We also discuss the extension of our framework to the design of equivariant quantum neural networks. Finally, we highlight the exciting outlook for the new field of geometric quantum machine learning, for which our article lays some of the groundwork.

## Ii Preliminaries

Here we provide the background and definitions needed for our group-invariant QML framework.

### ii.1 Symmetry groups in supervised QML

In this work we consider supervised binary classification tasks on quantum data. We remark, however, that the methods derived here can be readily applied to more general supervised learning scenarios or to unsupervised learning tasks. In addition, our work applies to classical data that has been encoded into quantum states.

For our purposes, we consider the case where one is given repeated access to a set of labeled training data from a dataset of the form . Here, are

-qubit quantum states from a data domain

in a -dimensional Hilbert space (with ), while are binary labels from a label domain . The data in is drawn i.i.d. from a distribution defined over , and we assume that the labels associated to each quantum state are assigned according to some (unknown) function , so that . As shown in Fig. 1(a), the goal is to train a parameterized model (or hypothesis) to produce labels that match those of the target function

with high probability. Here

denotes the set of trainable parameters in the model.

For a given dataset, a fundamental question to ask is: what is the set of unitary operations on the states that leave their respective labels unchanged? Such set of operations forms a group 222We make here two important remarks. First, must form a group as composing two symmetries leads to a new symmetry. Similarly, symmetric transformations are always invertible and their inverse is a symmetry itself. Second, in more precise terms, is a unitary representation of a group. , a subset of the unitary group of degree . In the following, is referred to as the symmetry group of the dataset. Explicitly, for every element in , the label associated with any transformed state is exactly the same as the label associated with the original . Hence, it is natural to require that the model we are training should produce labels that also remain invariant under the action of on the data. To capture such invariance, we introduce the following definition.

###### Definition 1 (G-invariance).

A function is -invariant iff

 h(VρV†)=h(ρ)for all {\hbox{V\in\mathfrak{G}}}, {% \hbox{\rho\in\mathcal{R}}}. (1)

In principle, such

-invariance could be heuristically learnt by

via data-augmentation [bekkers2018roto], i.e., by including additional training instances of the form . However, such effort is undesirable for two main reasons. First, it obviously means an increased algorithmic run-time cost. But second, and most importantly, such invariance learning is not guaranteed to be completely successful (especially when is large or continuous) [bronstein2021geometric]. Instead, as shown in Fig. 2, our main approach here is to design QML models that are, by construction, -invariant for all . To achieve this, we introduce biases in the structure of , for instance, by carefully choosing the architecture of the quantum neural network employed and the physical observable measured.

In the context of binary classification, there are two main scenarios to consider. In a first scenario, the data in both classes is invariant under a same symmetry group , and thus needs to be invariant under this sole symmetry group. In the second scenario the data in different classes have different symmetries, and we denote as and the symmetry groups corresponding to data with labels and , respectively. In such case, we have the freedom to consider QML models that are either -invariant, -invariant, or both. As shown below, it is often convenient to build models that are invariant only under the action of one of the symmetry groups, as this is sufficient for data classification.

### ii.2 Conventional and quantum-enhanced experiments

Thus far, we have not defined what constitutes the parameterized model . This choice is tied to the physical resources one may have access to, i.e., the way quantum data can be stored, accessed and measured. Since there is a large amount of freedom in this regard, we find it useful to restrict ourselves to two scenarios. Following the demarcation proposed in [huang2021quantum, aharonov2022quantum] we consider two settings, a conventional and a quantum-enhanced one, which are defined as:

###### Definition 2 (Conventional experiment).

In conventional experiments, each data instance is processed in a quantum computer and measured individually.

###### Definition 3 (Quantum-enhanced experiment).

In quantum-enhanced experiments, multiple copies of each data instance can be stored in a quantum memory, and later simultaneously processed and measured in a quantum computer.

In both settings, the model predictions are obtained from the quantum device experiment outcomes. However, as illustrated in Fig. 3, the key difference between the classical and quantum-enhanced settings is that, in the latter, the QML model is allowed to act coherently on multiple copies of . This is in contrast with the conventional setting where the QML model can only operate over a single copy of at a time.

### ii.3 QML model structure

Throughout this work we consider models consisting in a quantum neural network (i.e., a parameterized unitary that can be realized on a quantum computer) operating on copies of an input state , followed by a measurement on the resulting state. In other words, we work with models belonging to the following hypothesis class.

###### Hypothesis Class 1.

We define the Hypothesis Class as composed of functions of the form

 h(k)θ(ρ)=\Tr[U(θ)(ρ⊗k)U†(θ)O], (2)

where is the number of copies of the data state , is a quantum neural network, and is a Hermitian operator.

For copies, the models belonging to the Hypothesis Class correspond to those that can be computed on a conventional experiment according to Definition 2. On the other hand, copies lead to models in quantum-enhanced experiment according to Definition 3.

Arguably, models from the Hypothesis Class 1 are not of the most general form. For instance, these could be extended to allow for non-trivial classical post-processing of the measurement outcomes and also to involve more than one circuit or observable. Still, the Hypothesis Class 1 already encompasses most of the current QML frameworks [cerezo2020variational] and can serve as a basis for more expressive QML models. In the following we restrict our attention to models pertaining to and leave the study of more general models for future work.

### ii.4 Classification accuracy

Let us define some terminology that will allow us to assess the accuracy of a model’s classification. First, we remark that we do not consider precision issues when discussing classification accuracy. Recall that the model’s predictions in Eq. (2

) are expectation values, which in practice need to be estimated via measurements on a quantum computer. Hence, given a finite number of shots (measurement repetitions), these can only be resolved up to some additive errors. However, for the sake of simplicity, we here assume the limit of zero shot noise (i.e., infinite precision), and we will challenge this assumption when appropriate in the results section. With this remark in hand, consider the following definitions of different degrees of classification accuracy.

###### Definition 4 (Classification Accuracy).

i) We say that a model provides no information that can classify the data if its outputs are always the same irrespective of the label associated to the input quantum state. ii) We say that a model performs noisy classification if its outputs are the same for some, but not all, data in different classes. iii) We say that a model perfectly classifies the data if its outputs are never the same for data in different classes.

We note that in some cases a model can, at best, only perform noisy classification as its accuracy will be fundamentally limited by the distinguishability of the quantum states in the dataset. Note that this is typically not an issue for classical datasets, although the issue does arise for noisy classical data. In some cases, for example as in the time-reversal dataset that we consider below, the quantum data states associated with different output labels are non-orthogonal. In this case, perfect classification cannot be achieved, regardless of the form of the model.

### ii.5 Useful definitions

In this, rather mathematical, section we present definitions that will be used throughout the main text. For further reading we refer to [kirillov2008introduction, zeier2011symmetry].

While describes the symmetries in the data, it will be crucial to characterize the symmetries of itself. The symmetries of a group are captured by the commutant

 C(G)={W∈Cd×d|[W,V]=0,∀V∈G}, (3)

which is the vector space of all

complex matrices that commute with . More generally one can also consider the space of matrices in commuting with the

-th power tensor of the elements in

[zimboras2015symmetry].

###### Definition 5 (k-th order symmetries).

Given a unitary representation of a group, its -th order symmetries are

 C(k)(G)={W∈Cdk×dk|[W,V⊗k]=0,∀V∈G}, (4)

for all positive integers .

First-order symmetries () are known as the linear symmetries, while second order ones () are known as quadratic symmetries of . In general, there may be -th order symmetries that are not Hermitian (and thus, not physical observables). However, as proved in Appendix A, any matrix in has non-zero projection into the Hermitian subspace of . Hence, one can always associate any non-Hermitian element in to a Hermitian one that also belongs in .

While the -th order symmetries can be defined for any group, in the case when is a Lie group there exists additional structure that one can exploit. In particular, there exists an associated Lie algebra such that . That is, . Here, denotes the set of skew-symmetric matrices. We also find it convenient to introduce the following definition:

###### Definition 6 (Orthogonal complement).

Given a Lie algebra , its orthogonal complement with respect to the Hilbert-Schmidt norm is defined as

 g⊥={h∈u(d)|\Tr[h†g]=0,∀g∈g}. (5)

Note that is not a Lie algebra.

## Iii General Results for G-invariance

In this section we determine conditions leading to models that are -invariant by design. These results are stated in a general problem-agnostic way and will be applied to specific datasets in Secs. IV and V.

### iii.1 A single symmetry group

Let us first consider the case when there is a single symmetry group associated with all the instances in the dataset. We aim at finding models from Hypothesis Class that are -invariant, i.e., models such that for all and choice of parameters . Defining

 ˜O(θ)=U†(θ)OU(θ), (6)

so that , we explicitly have

 h(k)θ(VρV†) =\Tr[V⊗kρ⊗k(V†)⊗k˜O(θ)]. (7)

Evidently, the model will be invariant under if for all . Thus, the following proposition holds:

###### Proposition 1.

Let be a model in Hypothesis Class 1, and be the symmetry group associated with the dataset. The model will be -invariant if .

###### Proof.

The proof of this proposition follows from Definition 5. If belongs to the vector space of the -th order symmetries then, , and thus for all . ∎

Furthermore, as previously discussed, we can guarantee that in Proposition 1 can always be taken as a Hermitian operator and thus as an observable.

Complementary to Proposition 1 we now prescribe a second way of ensuring -invariance of the model when forms a Lie group. This is achieved when the operator can be taken orthogonal to for all in and for all in . We formalize this statement in the following proposition, proved in Appendix B.

###### Proposition 2.

Let be a model in Hypothesis Class 1. Then, let be the symmetry Lie group associated with the dataset, and let be its Lie algebra with . The model will be -invariant when and . Here, , an element of the orthogonal complement of , is a Hermitian operator acting on the -th copy of , and is an operator acting on all copies of but the -th one.

Note that in Proposition 2 we have assumed that , where denotes the identity matrix. However, it could happen that is in instead. In this case, the proposition will hold if and , as needs to have support on a vector space containing the identity.

### iii.2 Multiple symmetry groups

Let us now consider the case when each of the two classes in the dataset have a different symmetry group associated to them, which we denote as and . The concepts used in the previous section to obtain group-invariant models, i.e., commutant and orthogonal complement, can also be leveraged to derive conditions under which a model is -invariant, -invariant, or both. The following proposition, proved in Appendix C, generalizes Proposition 1 to the case of two symmetry groups.

###### Proposition 3.

Let be a model in Hypothesis Class 1, and let and be the symmetry groups associated with the dataset. The model will be - and -invariant if and . In addition, the model will be -invariant but not necessarily -invariant if but .

Conversely, while not stated explicitly in Proposition 3, the model will be -invariant but not necessarily -invariant if but .

Additionally, when and are Lie groups, with associated Lie algebras and , we can generalize Proposition 2 to the case of two symmetry groups. Then, the following proposition, proved in Appendix C, holds.

###### Proposition 4.

Let be a model in Hypothesis Class 1, and let and be the symmetry Lie groups associated with the dataset, with and their respective Lie algebras with . The model will be -invariant and -invariant when and when . Here, is a Hermitian operator acting on the -th copy of , is an operator acting on all copies of but the -th one . In addition, the model will be -invariant but not necessarily -invariant when and but .

In Proposition 4 we have assumed that belongs to and . However, if instead belongs to (with ), then the proposition will hold by replacing by , and conversely.

Propositions 14 provide conditions under which one can guarantee that a QML model in Hypothesis Class 1 is -invariant. While the results presented in this section are valid for the case when there are two symmetry groups, the previous propositions can be readily extended to more general scenarios (such as multi-class classification), where one has a set of symmetry groups. For instance, one could generalize Proposition 3 to show that a model will be invariant under all symmetry groups if for all .

## Iv Lie Group-Invariant Models

We now apply the general results presented in the previous section to identify -invariant models that can classify states originating from several paradigmatic quantum datasets whose invariances are captured by Lie groups. These include the purity dataset (Sec. IV.1), the time-reversal dataset (Sec. IV.2), and the multipartite entanglement dataset (Sec. IV.3). Our results are stated in the form of theorems. For pedagogical reasons, we include in the main text the proofs for most of these theorems, as they provide a constructive introduction to our framework.

### iv.1 Purity dataset

As a first application, we consider the QML task of classifying -qubit states according to their purity [huang2021quantumadvantage]. Given a dataset , we want to discriminate those that are pure from those that are not. That is, we assign labels

 yi={1if \Tr[ρ2i]=1,0if \Tr[ρ2i]=b<1 (8)

to states according to values of their purities .

The symmetry group associated with the data in both classes is the group of unitaries . This follows from the fact that unitaries preserve the spectral properties of quantum states, and thus their purity remain unchanged under the action of .

#### iv.1.1 Conventional experiments

Let us first consider the case of conventional experiments (see Definition 2), i.e., when the model in Eq. (2) has access to copy of each data at a time. For such a case, we can derive the following theorem.

###### Theorem 1.

Let be a model in Hypothesis Class 1, computable in a conventional experiment. There exists no quantum neural network and operator such that is invariant under the action of and can classify (i.e., provide any relevant information about) the data in the purity dataset.

###### Proof.

The strategy of this proof is as follows. First, we identify the possible -invariant models arising from Propositions 1 and 2. Then, we show that these models cannot be used to perform classification for the purity dataset. Finally, we prove that no other -invariant model within (with copies) exist.

Recall from Proposition 1 that a model is -invariant under if is in the commutant of . Since is irreducible 333A representation is irreducible if it cannot be further decomposed into a direct sum of representations. in , we know from Schur’s Lemma [kirillov2008introduction] that

 C(G)=span({\openone}). (9)

It follows that if is in , it takes the form . Moreover, we impose to ensure the Hermiticity of . This yields a constant model prediction for any . Hence, the -invariant models of Proposition 1 do not provide any information about the purity of a state, and thus cannot classify the data.

Let us now analyze the models arising from Proposition 2. Since , we have

 g⊥={0}, (10)

where denotes the null matrix. Hence, if , then for any . This shows that the -invariant models arising from Proposition 2 do not provide any information about the purity of a state and cannot classify the data.

So far, we have seen that -invariant models obtained by applying Propositions 1 and 2 do not allow for classification of the purity dataset. Still, this does not preclude the existence of other -invariant models within that may be adequate for classification. However, we now prove that no other -invariant models exist, beyond those already considered. Given that should be true for any unitary in , the latter also needs to hold when uniformly averaging over all possible in . Namely, we require that . The left-hand-side of the equality is evaluated as

 EG[h(1)θ(VρV†)] =∫U(d)dμ(V)\Tr[U(θ)VρV† U†(θ)O] =\Tr[(∫U(d)dμ(V)U(θ)Vρ(U(θ)V)†)O] =\Tr[(∫U(d)dμ(V)VρV†)O] =\Tr[ρ]\Tr[O]d, (11)

where the integral denotes the Haar average over the unitary group. In the second equality, we have used the linearity of the trace and of the integral. Then, in the third equality, we have used the left-invariance of the Haar-measure. Finally, in the fourth equality, we have explicitly computed the integration via the Weingarten calculus [collins2006integration, puchala2017symbolic]. From Eq. (11) we can see that the only way for to be equal to for general quantum states is to have or , which leads to the solutions given by Propositions 1 and 2. Hence, we have shown that there are no models in the Hypothesis Class 1 that are -invariant under the action of and that can classify the data in the purity dataset as they all provide no information according to Definition 4. ∎

The previous proposition shows that one cannot classify the data in the purity dataset with a model in operating in a conventional experiment. In hindsight, one could have foreseen this result. Indeed, computing the purity requires evaluating a polynomial of order two in the matrix elements of , and thus, the linear functions as the ones here considered are deemed to fail. From a QML perspective, is ultimately a linear classifier where the parameterized quantum neural network

defines a hyperplane such that the expectation value of

is positive for one class and negative for the other. However, the manifolds of quantum states with different purities are not linearly separable in the state space. This can be better exemplified by single-qubit states in the Bloch sphere, where no plane can be drawn across the sphere which linearly separates pure and mixed states.

In the spirit of kernel tricks [cristianini2000introduction], one can introduce non-linearities by allowing the models to coherently access multiple copies of . This is precisely the setting of quantum-enhanced experiments, which we now explore.

#### iv.1.2 Quantum-enhanced experiments

We now consider the case of quantum-enhanced experiments (see Definition 3) where multiple copies of a state in the dataset can be operated over in a coherent manner. As we now see, copies is already enough for classifying states according to their purity.

###### Theorem 2.

Let be a model in Hypothesis Class 1, computable in a quantum-enhanced experiment. There always exists quantum neural networks and operators , resulting in , such that is invariant under the action of . If the model has non-zero component in , it can perfectly classify the data in the purity dataset. The special choice of leads to .

###### Proof.

Recall from Proposition 1 that a model in the Hypothesis Class 1 is -invariant if is a quadratic symmetry of , i.e., if for all

 (V†)⊗2˜O(θ)V⊗2=˜O(θ). (12)

From the Schur-Weyl duality [goodman2009symmetry] we know that the -th order symmetries of are given by

 C(k)(U(d))=span(Sk), (13)

with the representation of the Symmetric Group that acts by permuting subsystems of the -fold tensor product of the input state (depicted in Fig. 4).

As shown in Fig. 4, for the case of copies, this group contains only two elements 444We refer the reader to [zeier2011symmetry, zimboras2015symmetry] for an in-depth discussion on the quadratic symmetries of .

 S2={\openone⊗\openone,SWAP}, (14)

with the identity acting on each of the two copies of , and the operator swapping these copies. As a consequence, can be made -invariant under the action of when with . The latter is diagrammatically presented in Fig. 5. This yields predictions

 h(2)θ(ρ)=a1+a2\Tr[ρ2], (15)

showing that the model will be able to perfectly classify the states in the purity dataset (according to Definition 4) for any choice of . ∎

We now make several remarks regarding the results in Theorem 2, and regarding our framework in general. First, we note that while Proposition 1 provides a straightforward guideline to obtain -invariant models, it does not prescribe how to actually build the quantum neural networks and the measurement operators ensuring that is a -th order symmetry. All that we know is the specific form that the resulting needs to have. Thus, it is still necessary to find an adequate ansatz for and an appropriate observable that can be efficiently measured. For instance, it is clear that simply choosing and satisfies the conditions of Theorem 2. This has the issue that one cannot efficiently estimate the expectation value of the SWAP operator – its Pauli decomposition has a number of terms that scales exponentially with – using a model such as . However, it is well known that by adding an ancilla qubit and by using the Hadamard-test [buhrman2001quantum] one can efficiently estimate the expectation value of the SWAP operator. In Appendix D, we show how our present formalism can be applied to models that include an ancillary qubit. Surprisingly, the latter allows us to discover a new connection between the Swap Test [buhrman2001quantum] and the ancilla based algorithm of Ref. [cincio2018learning].

### iv.2 Time-reversal dataset

In this section we are interested in classifying states according to whether they are obtained from a time-reversal-symmetric [sachs1987physics] dynamic or from an arbitrary one. That is, the states of the corresponding dataset now have labels

 yi={1if ρi is real valued,0if ρi is Haar random. (16)

Specifically, the states in the dataset have a label if they are generated by evolving some (fixed) real-valued fiduciary state with a time-reversal-symmetric unitary (and thus are real-valued too), and a label if they are generated by evolving the same reference state with a Haar random unitary.

In contrast to the case of the purity dataset previously considered, one can now associate a distinct symmetry group to each of the two classes. On one hand, the states with label have as a symmetry group. On the other hand, the states with label have, as a symmetry group, , which is the orthogonal Lie group of degree . This is because the unitaries in preserve the time-reversal symmetry of the states (and thus their label).

For convenience, we recall that is the group of orthogonal matrices. That is, , . This group can be obtained by exponentiation of the orthogonal Lie algebra, which consists of skew-symmetric matrices, , i.e., . Moreover, the unitary Lie algebra can be split as . Here, note that corresponds to the purely real-valued subspace of the unitary algebra. Its orthogonal complement

 g⊥1=uC(d) (17)

corresponds to the purely imaginary subspace of the unitary Lie algebra.

Having two symmetry classes allows for the design of a new classification strategy. Namely, one can classify the data using a -invariant model (but not -invariant) such that

 h(k)θ(ρi)=cif yi=1,h(k)θ(ρi)∈[b1,b2]if yi=0, (18)

with , and real values determined by the measurement operator and the states in the dataset. If , then Eq. (18) suffices for perfect classification according to Definition 4. If , we can still use Eq. (18) for noisy classification (see Definition 4) but there could be a chance of misclassification as one cannot perfectly distinguish between states in different classes yielding the same prediction. Such misclassification events will remain unlikely as long as the probability that (up to some additive error) is small for states with label . In Appendix E, we present a Lemma that formalizes the previous statement. In any case, for now we assume that a model satisfying Eq. (18) can classify the data in the dataset with probability high enough, and will challenge this assumption in due course.

#### iv.2.1 Conventional experiments

For the case of conventional experiments, i.e., copies in Eq. (2), the results in Proposition 3 cannot be used to find -invariant models classifying the time-reversal dataset. Indeed, since the representation of is irreducible, using Schur’s Lemma [kirillov2008introduction] we know that

 C(G1)=span({\openone}), (19)

i.e., has no non-trivial linear symmetries that could be exploited for the purpose of classification.

Still, we can use Proposition 4 to find group invariant models. First, we note that the input states belong to when they are time-reversal-symmetric but to when they are Haar random. Hence, will be invariant under the action of , but not necessarily invariant under the action of , if is in but not in . This is formalized below.

###### Theorem 3.

Let be a model in Hypothesis Class 1, computable in a conventional experiment. There always exist real-valued quantum neural networks and operators , resulting in , with , such that is invariant under the action of and can perform noisy classification of the data in the time-reversal dataset.

###### Proof.

We aim at finding models that are -invariant (with but not necessarily -invariant (with , distinguishing time-reversal-symmetric states from Haar random ones. According to Proposition 4, the model will be -invariant but not -invariant if and , e.g., if is a purely imaginary operator.

Now, lets show that there is a choice of and allowing for classification. Taking and , the resulting is also contained in , since a Lie algebra is closed under the action of its associated Lie group. Because time-reversal-symmetric states are exclusively contained in , it follows from Eq. (5) that

 h(1)θ(ρi)=0,∀ρi with label yi=1. (20)

Moreover, the previous equation is not satisfied for Haar random states, as these will generally have both real and complex parts. As such, will not necessarily be zero for states with label . Hence, the model satisfies Eq. (18) such that it can perform noisy classification (according to Definition 4) for the states in the time-reversal dataset. ∎

So far, we have identified models that yield predicted values of for time-reversal-symmetric states, but yield values in a continuous range for states drawn from the Haar distribution. As such, when taking into account noise in the prediction of the model, any non-time-reversal state with prediction values close to zero may be misclassified. In fact, as proven in Appendix F, Haar random states lead to prediction values that (with probability close to one) lie in a range that becomes exponentially concentrated around zero with the number of qubits . In turn, it can be shown that to classify states in the dataset with a success probability of at least , one would need to repeat the experiment a number of times that scales as  [chen2021exponential, aharonov2022quantum, huang2021quantum].

This raises attention towards a practical aspect in the design of QML models that we have not previously considered: the scaling in the number of experiment repetitions required for accurate classification. Our framework allows us to identify -invariant models, but we are not guaranteed that such models are practical for large system sizes . In fact, we have seen that an exponential number of repetitions are needed to make practical use of the models in Theorem 3. This motivates us to further continue the search of -invariant models in quantum-enhanced experiments in the hope that these might avoid the exponential scaling present in conventional experiments.

#### iv.2.2 Quantum-enhanced experiments

For quantum-enhanced experiments, i.e., copies in Eq. (2), we can show that the following theorem holds.

###### Theorem 4.

Let be a model in Hypothesis Class 1, computable in a quantum-enhanced experiment. There always exist quantum neural networks and operators , resulting in , with the Bell state on qubits, such that is invariant under the action of and can perform noisy classification of the data in the time-reversal dataset.

###### Proof.

We aim at finding models that are invariant under , but not under . According to Proposition 3, this can be achieved by ensuring that is a quadratic symmetry of but not of . From the Schur-Weyl duality we know that the -th order symmetries of are given by the Brauer algebra  [brown1954algebra],

 C(k)(O(d))=Bk. (21)

The elements of are depicted in Fig. 6.

As shown in Fig. 7, for the Brauer algebra is spanned by three elements

 B2=span({\openone⊗\openone,SWAP,|Φ+⟩⟨Φ+|}), (22)

where denotes the Bell state on qubits

 |Φ+⟩=1dd∑j=1|j⟩|j⟩. (23)

It can be verified that is indeed a quadratic symmetry for . To see that, recall the ricochet property (also called the transpose trick), which states that for any linear operator acting on a -dimensional Hilbert space

 (A⊗\openone)|Φ+⟩=(\openone⊗At)|Φ+⟩. (24)

Using Eq. (24) one can assert that , and hence that (see also Fig. 7(b) for a diagrammatic proof).

The only element that is in but not in is the projector onto the Bell state . Hence, according to Proposition 3, is -invariant if with . Now, the model is such that . In Fig. 8(a) we show a circuit that could be used to measure this overlap.

Recall that the time-reversal states are obtained by evolving a real-valued fiduciary state – taken to be without loss of generality – under a unitary in . One can verify that if , then

 h(2)θ(ρi) =|⟨Φ+|0⟩⊗2n|2=1d2, (25)

for all with labels . On the other hand, the model output will not be constant for states with labels , i.e., for states obtained by evolving under a Haar random unitary . In this case one has

 h(2)θ(ρi)=|⟨Φ+|(Wi⊗Wi)|0⟩⊗2n|2, (26)

which depends on the choice of . Overall, we have shown that choosing leads to a model invariant under that satisfies Eq. (18), and hence, that can perform noisy classification (according to Definition 4) of the states in the time-reversal dataset. ∎

Theorem 4 shows that measuring the Bell state allows us to do classification. However, this does not solve the scaling issue discussed earlier. Indeed, as proven in Appendix F, the predictions values of the model given in Eq. (26) still concentrate exponentially close to zero as a function of the number of qubits. This implies that we still need an exponential number of experiment repetitions to accurately classify the data. However, as we now show, this problem can be overcome if we slightly modify the task at hand, from the classification of time-reversal-symmetric states to the classification of time-reversal-symmetric dynamics.

For this new task, rather than being given states, we assume instead access to the unitaries used to produce these states. The corresponding dataset has the form , with

 yi={1if Wi∈O(d),0if Wi∈U(d), (27)

which has the same two symmetry groups and as before.

As shown in Fig. 8(b), the main advantage of this scenario is that we are now allowed to initialize the –qubit register to any global state , and to simultaneously evolve the first and the second qubits according to the same unitary . To capture this additional freedom, we consider models in a new hypothesis class defined as:

###### Hypothesis Class 2.

We define the Hypothesis Class , computable in a quantum-enhanced experiment, as composed of functions of the form

 hθ(W)=\Tr[U(θ)(W⊗2)|Ψin⟩⟨Ψin|(W⊗2)†U†(θ)O], (28)

where is quantum neural network acting on the qubits, is a Hermitian operator, and is an initial state on qubits.

In this context, we can still use Proposition 3 to show that the following theorem holds.

###### Theorem 5.

Let be a model in Hypothesis Class 2, computable in a quantum-enhanced experiment. There always exist quantum neural networks and operators , resulting in – with being the Bell state on qubits – such that is -invariant, but not -invariant, that can perfectly classify the dynamics in the time-reversal dataset. The special choice of recovers the algorithm for classifying time-reversal-symmetric dynamics presented in [huang2021quantum].

###### Proof.

Recall from Proposition 3 that is invariant under , but not under , if is in but not in . Following the proof of Theorem 4, we know that this can be achieved with the choice of . Moreover, a straightforward calculation shows that if we choose , we have

 hθ(Wi)=1,∀Wi with label % yi=1, (29)

recovering the algorithm in [huang2021quantum]. On the other hand, is -dependent and will concentrate around zero if (see Appendix). This means that the model outputs a value of if the unitary has label , and outputs a value of (with high probability) if the unitary has label . Thus, the models in Hypothesis Class 2 can perform perfect classification (according to Definition 4) of time-reversal-symmetric dynamics. ∎

As shown in the proof of Theorem 5, now the model gives non-overlapping predictions for the data in different classes, meaning that we can now perform classification with experiment repetitions. This is in contrast to the model defined in Theorem 4, which requires an exponential number of experiments for accurate classification. This illustrates how QML models capable of achieving a quantum advantage naturally emerge in our framework as -invariant models.

### iv.3 Multipartite entanglement dataset

In this section, we consider the more involved task of classifying pure quantum states according to the amount of multipartite entanglement they possess. Entanglement has been shown to be a fundamental resource [horodecki2009quantum, gigena2020one] in quantum information, quantum computation and quantum sensing [barrett2002nonsequential, gigena2017bipartite, ekert1991quantum, gisin2002quantum, ekert1998quantum, datta2005entanglement, chalopin2018quantum, beckey2020variational, cerezo2021sub]. Hence, its study and characterization is quintessential for quantum sciences.

Here, we recall that entanglement is relatively well understood for bipartite pure quantum states (e.g., via the Schmidt decomposition for pure states [nielsen2000quantum]), and that group-invariance arguments have been previously used to characterize entanglement in bipartite mixed states [terhal2000entanglement, vollbrecht2001entanglement]. However, the same cannot be said for the multipartite entanglement [walter2016multipartite]. In this case, the entanglement complexity scales exponentially with the number of parties and there is no unique measure to quantify it. Thus, we employ our framework to not only obtain -invariant QML models – that can accurately classify multipartite entangled states – but also to better understand the unique nature of multipartite entanglement. In this context, we also recall that the presence of publicly available datasets, such as the NTangled dataset [schatzki2021entangled], composed of quantum states with varying amounts of multipartite entanglement, makes this an extremely rich application for our framework and for benchmarking QML models.

Let be a multipartite entanglement measure satisfying , with if the state is separable, and if the state contains multipartite entanglement between the qubits (for instance, see the entanglement measures in Refs. [walter2013entanglement, prove2021extending, foulds2020controlled, beckey2021computable]). The multipartite entanglement dataset is of the form , where

 yi={1if E(ρi)=b>0,0if E(ρi)=0. (30)

Here, the symmetry group associated with the data in both classes is the Lie group , with an associated Lie algebra . This is due to the fact that local unitaries do not change the multipartite entanglement in a quantum state.

#### iv.3.1 Conventional experiments

Since computing the entanglement typically requires evaluating a non-linear function of the quantum state [horodecki2009quantum], it is expected that models in conventional experiments will not be able to classify the states in this dataset. This intuition can be confirmed with the following theorem:

###### Theorem 6.

Let be a model in Hypothesis Class 1, computable in a conventional experiment. There exists no quantum neural network and operator such that is invariant under the action of and can classify (i.e., provide any relevant information about) the data in the multipartite entanglement dataset.

###### Proof.

First let us verify that Propositions 1 and 2 do not yield any adequate model for classification purposes. To identify the linear symmetries of required for the application of Proposition 1, we apply the Commutation Theorem for tensor products [rieffel1975commutation, mendl2009unital], which states that the commutant of a tensor product of operators is the tensor product of the commutants of each operator. Hence

 C(G) =span({\openone⊗n2}), (31)

where denotes the identity. This results in the choice () and constant model predictions (, ) that cannot distinguish between states. Additionally, one can verify that the orthogonal complement of is trivial:

 g⊥ ={0⊗n2}, (32)

with the null matrix, such that models designed under Proposition 2 would also result in uninformative constant value predictions.

Hence, using Propositions 1 and 2 to obtain -invariant models from Hypothesis Class 1 (with ) will lead to trivial models that cannot classify the states in the multipartite entanglement dataset. Following a similar argument as the one developed in the last part of the proof of Theorem 1, one can also verify that no other -invariant models exist with . Indeed, if is invariant for any , it also has to be invariant when uniformly averaged over every in . Performing this averaging, we obtain

 EG[h(1)θ(VρV†)] =\Tr[ρ]\Tr[O]d, (33)

for . The only way for to be equal to for any state is to have or , that is, solutions already covered by Propositions 1 and 2. ∎

#### iv.3.2 Quantum-enhanced experiments

Let us first consider the case of copies in Eq. (2). We can show that the following theorem holds.

###### Theorem 7.

Let be a model in Hypothesis Class 1, computable in a quantum-enhanced experiment. There always exist quantum neural networks and operators , resulting in , such that