1 Introduction
We consider in this study universal approximations of symmetric and antisymmetric functions. A function is (totally) symmetric if
(1.1) 
for any permutation , and elements . Similarly is (totally) antisymmetric if
(1.2) 
for any permutation, where is the signature of . Note that the permutation is only applied to the particle indices , but not the Cartesian indices for each . In other words, is not totally symmetric / antisymmetric when viewed as a function on . This is the relevant setup in many applications in scientific and engineering computation. A totally symmetric function is also called a permutation invariant function. A closely related concept is the permutation equivariant mapping, which is of the form that satisfies
(1.3) 
for any permutation , and . Here each component , and can be different from .
Perhaps the most important example of totally symmetric and antisymmetric functions is the wavefunction of identical particles in quantum mechanics. The indistinguishability of identical particles implies that their wavefunctions should be either totally symmetric or totally antisymmetric upon exchanging the variables associated with any two of the particles, corresponding to two categories of particles: bosons and fermions. The former can share quantum states, giving rise to, e.g., the celebrated BoseEinstein condensate; while the latter cannot not share quantum states as described by the famous Pauli exclusion principle. Such exchange/permutation symmetry also arise from other applications than identical particles in quantum mechanics, mostly in the form of symmetric functions. For instance, in chemistry and materials science, the interatomic potential energy should be invariant under the permutation of the atoms of the same chemical species. Another example is in computer vision, where the classification of point clouds should not depend on the ordering of points.
The dimension of symmetric and antisymmetric functions is usually large in practice because it is proportional to the number of considered elements. This means that in computation the notorious difficulty of “curse of dimensionality” is often encountered when dealing with such functions. Recent years have witnessed compelling success of neural networks in representing highdimensional symmetric functions with great accuracy and efficiency, see, e.g., [2, 18, 20, 21] for interatomic potential energy, and [13, 14] for 3D classification and segmentation of point sets. For antisymmetric functions, some recent work in the past year [4, 7, 8, 12] has shown exciting potential of solving the manyelectron Schrödinger equation with neural networks. Within the framework of variational Monte Carlo (VMC), for some benchmark systems, the antisymmetric wavefunction parameterized by neural networks can be on a par with the stateoftheart wavefunctions constructed based on chemical or physical knowledge.
Despite the empirical success of neural network approximating symmetric and antisymmetric functions, theoretical understanding of these approximations is still limited. There are numerous results (see e.g. [1, 5, 9]
) concerning the universal approximation of general continuous functions on compact domains. Nevertheless, if the target function is symmetric or antisymmetric, it is much less investigated whether one can achieve the universal approximation with a class of functions with the same symmetry constraints. Explicitly guaranteeing the symmetric or antisymmetric property of an ansatz is often mandatory. For example, for electronic systems, if the wavefunction is not constrained within the space of antisymmetric functions, the resulting variational energy could be lower than the exact groundstate energy and would be no longer physically meaningful. From a machine learning perspective, symmetries can also significantly reduce the number of effective degrees of freedom, improve the efficiency of training, and enhance the generalizability of the model. However, one needs to first make sure that the function class is still universal and sufficiently expressive. Moreover, in many scientific applications, besides being symmetric or antisymmetric, the target function of interest should be at least continuous. For example, a manybody wavefunction should be continuous to ensure the local energy defined through the second order derivatives is finite everywhere; an interatomic potential energy should be continuous to guarantee the total energy is conserved during molecular dynamics simulations. Therefore we wish the universal ansatz for symmetric/antisymmetric functions is continuous as well.
The universal approximations of symmetric functions was partially studied in [19]. However, as will be illustrated in Section 2, the proof of [19] only holds for the case when
. Moreover, there is no error estimation provided for the proposed approximation. The more recent work
[17] considered the universal approximation of permutation invariant functions and equivariant mappings for as well. By respecting the permutation symmetry, the resulting neural network involves much fewer parameters than the corresponding dense neural networks. Similarly, when any antisymmetric polynomial can be factorized as the product of a Vandermonde determinant and a totally symmetric polynomial. Such a universal representation has been known since Cauchy [3]. However, to our knowledge there is no such simple factorization for to guide the design of neural network architecture.In this paper we aim to study the universal approximation of general symmetric and antisymmetric functions for any . We now summarize the main results of the paper: First, for the symmetric function, we give two different proofs of the universality of the ansatz proposed in [19], both with explicit error bounds. The first proof is based on the Ryser formula [15] for permanents, and the second is based on the partition of the state space, as elaborated in Section 2 and Section 3, respectively. Moreover, we also show in Section 4 that, for the general antisymmetric function with elements in any dimension, a simple ansatz combining Vandermonde determinants and the ansatz for symmetric functions is universal, with similar explicit error bounds as in the symmetric case. For readers’ convenience, we summarize below in two theorems the ansatz we proved with the universal approximation for symmetric and antisymmetric functions. The approximation rate only relies on a weak condition that the gradient is uniformly bounded. Note that both ansatzes do not require the procedure of sorting elements so that they can be made continuous functions in favor of many scientific applications. We conclude in Section 5 with some practical considerations and future directions for further investigation. The proofs of Theorem 1 and 1 are given in Section 3 and 4, respectively.
[Approximation to symmetric functions] Let be a continuously differentiable, totally symmetric function, where is a compact subset of . Let . Then there exist , , such that for any ,
where , the number of feature variables, is bounded from above by
(1.4) 
[Approximation to antisymmetric functions] Let be a continuously differentiable, totally antisymmetric function, where is a compact subset of . Then there exist permutation equivariant mappings , and permutation invariant functions , , such that for any ,
where is bounded from above by
For each , there exists , with , such that for any ,
2 Totally symmetric functions
Let , with . Consider a totally symmetric function . It is proved in [19] that when (therefore ), the following universal approximation representation holds
(2.1) 
for continuous functions , and . For completeness we briefly recall the proof.
Let . Define the mapping , with each component function defined as
It can be shown that the mapping is a homeomorphism between and its image in [19]. Hence, if we let and , then we have
Here the number of feature variables is by construction. The main difficulty associated with this construction is that the mapping can be arbitrarily complex to be approximated in practice. In fact the construction is similar in flavor to the KolmogorovArnold representation theorem [10], which provides a universal representation for multivariable continuous functions, but without any a priori guarantee of the accuracy with respect to the number of parameters.
In order to generalize to the case , the proof of [19, Theorem 9] in fact suggested an alternative proof for the case as follows. Using the StoneWeierstrass theorem, a totally symmetric function can be approximated by a polynomials of high degree. After symmetrization, this polynomial becomes a totally symmetric polynomial. By the fundamental theorem of symmetric polynomials [11], any symmetric polynomial can be represented by a polynomial of elementary symmetric polynomials. In other words, for any symmetric polynomial , we have
where is some polynomial, and the elementary polynomials are defined as
Using the NewtonGirard formula, an elementary symmetric polynomial can be represented with power sums by
(2.2) 
Now let be the same function defined in the previous proof, and define in terms of and the determinant computation. We can obtain a polynomial approximation in the form of
Here the error is due to the StoneWeierstrass approximation. Letting we obtain the desired representation.
However, it is in fact not straightforward to extend the two proofs above to the case . For the first proof, we can not define an ordered set when each to define the homeomorphism . For the second proof, in the case a monomial (before symmetrization) takes the form
Note that is only symmetric with respect to the particle index , but not the component index . Hence the symmetrized monomial is not a totally symmetric function with respect to all variables. Therefore the fundamental theorem of symmetric polynomials does not apply.
Below we prove that the representation (2.1) indeed holds for any , and therefore we complete the proof of [19]. For technical reasons to be illustrated below, and without loss of generality, we shift the domain and assume . Following the StoneWeierstrass theorem and after symmetrization, can be approximated by a symmetric polynomial. Every symmetric polynomial can be written as the linear combination of symmetrized monomials of the form
Here , and stands for the permanent of .
Following the Ryser formula [15] for representing a permanent (noting that permanent is invariant under transposition), we have
Here we have used that for all . Now we write down the approximation using a symmetric polynomial, which is a linear combination of symmetrized monomials
Define with each component function
Then we define given by
where is the th component of . We now have an approximation of the target totally symmetric function in the desired form
and we finish the proof. Here the number of feature variables is , where is the number of symmetrized monomials used in the approximation.
3 Totally symmetric function, revisited
In this section, we prove Theorem 1 for any . In particular, our proof is more explicit and does not rely on the StoneWeierstrass theorem. The main idea is to partition the space into a lattice and use piecewiseconstant function to approximate the target permutation invariant function.
Again without loss of generality we assume . We then partition the domain into a lattice with grid size along each direction. Due to symmetry, we can assign a lexicographical order to all lattice points . That is, if for the first where and differs,
. We define the tensor product of the
copies of the lattice as , and a wedge of is defined accordingly asFor each , a corresponding union of boxes in can be written as
By construction, the piecewiseconstant approximation to the target permutation invariant function is then
Here we have assumed that the derivative is uniformly bounded for . Note that the indicator function is permutation invariant and can be rewritten as
where . The constant takes care of repetition that can happen depending on . When all elements in are distinct, the box lives in only corresponds to one permutation, so in this case. If say and all other elements distinct, then the box lives in may have two corresponding permutations that differ by a swapping of the first two elements. In this case, will account for the arising repetition. Next we apply the Ryser formula to the permanent,
We can now define where each component function is given by
and we define as
(3.1) 
Since ’s are indicator functions we naturally have . In the case when , . In this case, , and therefore its contribution to vanishes as desired. In summary, we arrive at the universal approximation
(3.2) 
Due to the explicit tabulation strategy, the number of terms needed in the approximation (3.2) can be counted as follows. The number of points in is , where comes from the lexicographic ordering. Note that formally as , can vanish for fixed . However, this means that the number of elements has exceeded the number of grid points in and is unreasonable. So we should at least have . In order to obtain an close approximation of , we require . When , we have , and the number of points in becomes . For each , the number of terms to be summed over in Eq. (3.1) is . Therefore in order to obtain an approximation, the number of feature variables is given by Eq. (1.4). This proves Theorem 1.
This is of course a very pessimistic bound, and we will discuss on the practical implications for designing neural network architectures in Section 5. We remark that one may expect that following the same tabulation strategy, we may also provide a quantitative bound for constructed by the homeomorphism mapping as discussed in Section 2. However, the difference is that our bound only relies on the smoothness of the original function and hence the bound for . On the other hand, the mapping and hence can be arbitrarily pathological, and therefore it is not even clear how to obtain a doubleexponential type of bound as discussed above. We also remark that if the indicator functions and in the proof are replaced by proper smooth cutoff functions with respect to corresponding domains, the ansatz in Eq. (3.2) can be continuous to accommodate the applications that requires continuity.
4 Totally antisymmetric functions
Now we consider an antisymmetric function. Similar to the symmetric case, when and is a polynomial of and antisymmetric, it is known that
where is a symmetric polynomial and the second term is a Vandermonde determinant. This was first proved by Cauchy [3], who of course also first introduced the concept of determinant in its modern sense.
Our aim is to generalize to . Without loss of generality we again assume . The construction of the ansatz is parallel to the totally symmetric case. Recall the lattice , the wedge , and the corresponding union of boxes . In the totally symmetric case, we make a piecewise constant approximation over the union of boxes. For the antisymmetric case, we would need to insert an antisymmetric factor (w.r.t. ):
(4.1) 
where is a totally antisymmetric function which might be chosen depending on . Note that in principle any antisymmetric function can be chosen as long as is bounded away from . Motivated by the onedimension result, we focus on constructions of given by the Vandermonde determinant. Given a permutation equivariant map , we consider
(4.2) 
It is clear that due to the permutation equivariance of , the defined is antisymmetric. It thus suffices to choose the map such that . Observe that due to antisymmetry whenever for some . In particular, this means that we only need to consider for those that all ’s are different.
We will consider two specific concrete constructions below corresponding to different choice of the equivariant map. The first is an intuitive way to achieve equivariance through sorting, for the purpose of illustration. The second is a linear transformation showing that the ansatz proved in Theorem
1 can be continuous, after replacing the indicator function with a proper smooth cutoff function.Construction 1. As ’s are distinct, for , there exists a unique such that
(4.3) 
Denote this unique permutation as , and we take the permutation equivariant map such that
(4.4) 
Since gives the sorting of according to , it is easy to see that the above is equivariant, and
(4.5) 
Note that , so we arrive at
(4.6) 
with
Construction 2. Our second construction is based on the choice of a linear permutation equivariant map given by
The corresponding Vandermonde determinant is then given by
(4.7) 
The resulting approximation to is
(4.8) 
where we denote the symmetric part
(4.9) 
It thus suffices to choose such that . As the set
consists of a discrete set of vectors, such a
exists since has measure in .In both constructions,
is a scaled version of the characteristic function, and can thus be treated similarly as in the totally symmetric case based on Ryser’s formula. As both construction depends on the same tabular strategy as the totally symmetric case, the number of terms involved in the sum over
is the same too, which we will not repeat. The total number of feature variables is the same as that in Eq. (1.4) due to the use of the indicator function. This proves Theorem 1.5 Practical considerations and discussion
In this paper we study the universal approximation for symmetric and antisymmetric functions. Following the line of learning theory, there are many questions open. For instance, the impact of symmetry on the generalization error remains unclear. This requires indepth understanding of the suitable function class for symmetric and antisymmetric functions, such as some adapted Barron space [6]. Note that a recent work [16] investigates the approximation and generalization bound of permutation invariant deep neural networks in the general case , however with two limitations. The first is a rather strong assumption that the target function is Lipschitz with respect to the norm (but not the usual Euclidean norm). This can be a severe limitation as the dimension (both and ) increases. Indeed, under the same Lipschitz assumption of [16], the number of feature variables in our Theorem 1 can be improved accordingly to . The second limitation is that the proposed ansatz in [16] introduces sorting layers to represent the sorting procedure at the first step. The sorting procedure will bring discontinuity, which leads to serious problems in some scientific applications, as explained in the introduction.
For antisymmetric function, the ansatz suitable for the practical computation is of interest since one main motivation for studying antisymmetric function is to integrate neural networkbased wavefunction into VMC. The Vandermonde determinant considered here is proved to provide a simple but universal ansatz. Its universality suggests us to consider the following trial wavefunction in VMC:
where is symmetric and is a permutation equivariant map. The ansatz for each and can be still quite flexible. Another more general yet more complicated ansatz is based on replacing the above Vandermonde determinant with the Slater determinant (see e.g. [12, 8])
The Slater determinant is widely used in quantum chemistry. It is known to be universal under the complete basis set and, indeed, the basis set derived from the HartreeFock approximation provides a fairly good starting point for most of the modern quantum chemistry methods. However, the complexity of computing a Slater determinant is , while it is only for a Vandermonde determinant. This may become a more severe issue when one calculates the local energy, which involves the evaluation of the Laplacian of the trial wavefunction. Therefore, it remains interesting to spend more effort on the Vandermonde ansatz and its variants, hoping to find good ones that strike a good balance between accuracy and efficiency. It would also be interesting to learn from the second quantized representation of quantum systems, which lifts the symmetry requirement of functions to that of linear operators, and leads to powerful representations such as matrix product states.
Acknowledgement
We thank the hospitality of the American Institute of Mathematics (AIM) for the workshop “Deep learning and partial differential equation” in October 2019, which led to this collaborative effort. The work of LL and JZ was supported in part by the Department of Energy under Grant No. DESC0017867, and No. DEAC0205CH11231. The work of YL and JL was also supported in part by the Natinoal Science Foundation via grants DMS1454939 and ACI1450280.
References

[1]
Andrew R Barron,
Universal approximation bounds for superpositions of a sigmoidal function
, IEEE Transactions on Information Theory, 39 (1993), pp. 930–945.  [2] Jörg Behler and Michele Parrinello, Generalized neuralnetwork representation of highdimensional potentialenergy surfaces, Physical Review Letters, 98 (2007), p. 146401.
 [3] AugustinLouis Cauchy, Mémoire sur les fonctions qui ne peuvent obtenir que deux valeurs égales et de signes contraires par suite des transpositions opérées entre les variables qu’elles renferment, Journal de l’École polytechnique, X (1815).
 [4] Kenny Choo, Antonio Mezzacapo, and Giuseppe Carleo, Fermionic neuralnetwork states for abinitio electronic structure, arXiv preprint arXiv:1909.12852, (2019).
 [5] George Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems, 2 (1989), pp. 303–314.
 [6] Weinan E, Chao Ma, and Lei Wu, Barron spaces and the compositional function spaces for neural network models, arXiv preprint arXiv:1906.08039, (2019).
 [7] Jiequn Han, Linfeng Zhang, and Weinan E, Solving manyelectron Schrödinger equation using deep neural networks, Journal of Computational Physics, 399 (2019), p. 108929.
 [8] Jan Hermann, Zeno Schätzle, and Frank Noé, Deep neural network solution of the electronic Schrödinger equation, arXiv preprint arXiv:1909.08423, (2019).
 [9] Kurt Hornik, Maxwell Stinchcombe, and Halbert White, Multilayer feedforward networks are universal approximators, Neural Networks, 2 (1989), pp. 359–366.
 [10] A. N. Kolmogorov, On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition, in Doklady Akademii Nauk, vol. 114, 1957, pp. 953–956.
 [11] Ian Grant Macdonald, Symmetric functions and Hall polynomials, Oxford Univ. Pr, 1998.
 [12] David Pfau, James S Spencer, Alexander G de G Matthews, and W M C Foulkes, Abinitio solution of the manyelectron Schrödinger equation with deep neural networks, arXiv preprint arXiv:1909.02487, (2019).

[13]
Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas, Pointnet:
Deep learning on point sets for 3D classification and segmentation
, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 652–660.
 [14] Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas, Pointnet++: Deep hierarchical feature learning on point sets in a metric space, in Advances in Neural Information Processing Systems, 2017, pp. 5099–5108.
 [15] Herbert John Ryser, Combinatorial Mathematics, vol. 14 of The Carus Mathematical Monographs, Mathematical Association of America, 1963.
 [16] Akiyoshi Sannai and Masaaki Imaizumi, Improved generalization bound of permutation invariant deep neural networks, arXiv preprint arXiv:1910.06552, (2019).
 [17] Akiyoshi Sannai, Yuuki Takai, and Matthieu Cordonnier, Universal approximations of permutation invariant/equivariant functions by deep neural networks, arXiv preprint arXiv:1903.01939, (2019).

[18]
Kristof Schütt, PieterJan Kindermans, Huziel Enoc Sauceda Felix,
Stefan Chmiela, Alexandre Tkatchenko, and KlausRobert Müller,
Schnet: A continuousfilter convolutional neural network for modeling quantum interactions
, in Advances in Neural Information Processing Systems, 2017, pp. 992–1002.  [19] Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan R Salakhutdinov, and Alexander J Smola, Deep sets, in Advances in Neural Information Processing Systems, 2017, pp. 3391–3401.
 [20] Linfeng Zhang, Jiequn Han, Han Wang, Roberto Car, and Weinan E, Deep potential molecular dynamics: a scalable model with the accuracy of quantum mechanics, Physical Review Letters, 120 (2018), p. 143001.
 [21] Linfeng Zhang, Jiequn Han, Han Wang, Wissam Saidi, Roberto Car, and Weinan E, Endtoend symmetry preserving interatomic potential energy model for finite and extended systems, in Advances in Neural Information Processing Systems, 2018, pp. 4436–4446.
Comments
There are no comments yet.