1 Introduction
With recent developments in data science and computational tools, machine learning algorithms have been increasingly applied in different engineering and science areas to model physical phenomena. The data from physical experiments and numerical simulations are a source of knowledge about the physical world, on which datadriven methods could be performed to extract new physical laws
[1, 2, 3, 4, 5, 6]. For example, in turbulence RANS modeling in fluid mechanics, traditional modeling methods have failed in many flow scenarios. A unified RANS model that can successfully describe complex flows, including boundary layer, a strong rotation, separation still does not exist according to the author’s knowledge [7, 8]. On the other hand, advanced measurement and direct numerical simulations provide plenty of data that could be utilized to establish and validate new models. With the above argument, datadriven methods are particularly suitable for turbulence modeling and some other areas in physics and engineering. There have been many attempts to discover new turbulence models using machine learning methods. Milano and Koumoutsakos [9]reconstruct nearwall flow applying neural networks and compared their results with linear methods (POD). Zhang and Duraisamy
[10] used Gaussian process regression combined with an artificial neural network to predict turbulent channel flow and bypass transition. Beck, Flad, and Munz [11] applied residual neural network for Large Eddy Simulation. Chen et. al. proposed an ODE network to generally learn differential equations [12].The physical laws often appear in the form of tensorial equalities which inherently obey certain types of symmetry. For example, the constitution laws in fluid and solid mechanics should obey translation and rotation invariance [13]. The turbulence RANS model is local tensorial equality between mean velocity gradient and Reynolds stress. The turbulence RANS models should also be rotation invariant [14, 15]. However, machine learning methods for RANS modeling do not automatically guarantee rotation invariance, if we use Cartesian components of tensors as input and output of training data. This problem has been addressed by [16, 5]. In [16, 17], Reynolds stress is expressed as a general expansion of nonlinear integrity basis multiplied by scalar functions of invariants of strain rate and rotation rate tensors. Machine learning is performed to find these scalar functions of tensor invariants of strain rate and rotation rate tensors. Mathematically this expansion comes from an application of the CaylayHamilton theory. The special case used in [16, 17] is derived by S.B.Pope in [15]. Although such construction is general and possible for higherorder tensors and tensor tuples containing multiple tensors, the number of this basis and the derivation complexity will grow exponentially and become prohibitive for real applications [18, 19].
Why would this problem of rotationequivariance be hard to solve? At first glance, if a system has the property of rotationequivariance, one has more information for this system. Therefore, this added property of rotationequivariance would lower the performance of a learner. More specifically, adding this new rule of rotation symmetry in a system will require the machine learning algorithm to extract more rules from existing data [20]
. In this case, the property of rotationequivariance could be considered as a continuous group action. There is limited research in the field of deep learning that considers the preservation of symmetries under continuous group actions for physical systems. To address our second point, continuous information is hard to be absorbed. If we consider a machine learning algorithm as an information compression process from input to output
[21], a continuous transformation as rotation will be difficult for learning algorithms to absorb.Given the universal approximation theorem by [22], it would seem that the application of neural networks, especially deep neural networks could solve any problem. As formulated by [23, 24, 25, 26], advanced machine learning methods, especially deep neural networks [27]
, seem to provide a new opportunity for physical equations approximation. However, in this case of rotation symmetry, if we use a multiple layer perceptron
to learn the relation , then most likely does not preserve rotationequivariance. Generally, the neural network function classes do not satisfy rotation equivariance.There have been previous works considering groupequivariance with convolutional neural networks in image recognition. A general method has been proposed using group convolution
[28, 29, 30]. Based on the idea of using convolution, several methods composed a steerable filter for rotationequivariance in convolutional neural networks [31, 32, 33, 34]. However, these works cannot be applied in physical systems as well. One of the most important reasons is that the rotation operation on the image is different from rotation operation on physical systems. Consider a rotation operation on a specific image. We are thinking of a transformation from polar coordinates centering at a certain point [35]. This kind of transformation is different from rotation operation on tensors. Additionally, these methods have a strong restriction that this model must be built on convolutional neural networks. Yet, considering physical systems, convolutional neural networks might not be the best choice since they are designed for image processing.The problem of rotationequivariance is also quite impossible to be simply solved by data augmentation and preprocessing. Mentioned by previous works [16], a typical solution is to apply the technique of data augmentation. However, the method of data augmentation fails to have a theoretical guarantee of obtaining the property of rotationequivariance with finite sample set. Data augmentation method has a theoretical foundation that at infinite sample limit it will asymptotically reach rotation equivariance. However, such a dataset is not only difficult to obtain but also requires much higher computation power while training the model. In the case of using naive preprocessing methods, the problem is that there are limited theoretical tools to deal with highorder tensors, and only limited methods to use for low order tensors. It is hard to apply specific techniques, such as diagonalization, in the case of highorder tensors. Since naive data preprocessing methods are impossible to apply, a more complex method with a theoretical guarantee should be proposed in order to solve this problem.
In this paper, we establish RotationEquivariant Network (RotEqNet), a new datadriven framework, which guarantees rotationequivariance at a theoretical level. Different from previous methods, we first find a method to preserve rotation operation via tensor contraction. In our proposed position standardization algorithm, it could properly link a highorder tensor to a low order tensor with the same rotation operation. By applying mathematical tools for low order matrices (diagonalization and QR factorization), a desired standard position could be derived by the rotation matrix from the previous step. Standard position algorithm is proven to be rotationinvariant in Theorem 3.4, i.e. two tensors differ by a rotation would have the same standard position. Therefore, the learning rules based on standard position are forming a quotient space of the original rules in random rotated plural position [31, 36]. In this way, RotEqNet lowers the training difficulty of a randomly positioned dataset. Further, RotEqNet is also proven to be rotationequivariant, as we have shown in Theorem 3.5. These advantages of RotEqNet would result in an observable error reduction compared to previously introduced datadriven methods. We applied RotEqNet into four different case studies ranging from secondorder, thirdorder, and fourthorder. These case studies are designed based on Newtonian fluids, Largeeddy simulations, and Electrostriction. Improved performances could be observed for using RotEqNet. The error is reduced for 99.6%, 15.62%, and 54.63% for second, third, forthorder case studies, respectively. Our contribution in this paper is threefold:

We showed an important property of contraction operation on tensors. Contraction operation will preserve rotation operation on tensor with arbitrary order. This is stated in Lemma 2.3.

We propose a properly designed RotEqNet with a position standardization algorithm to guarantee the property of rotationequivariant. We proved the property of rotationinvariant of position standardization algorithm in Theorem 3.4 and the property of rotationequivariant of RotEqNet rigorously in Theorem 3.5.

We implement our proposed algorithm and the architecture of RotEqNet. We further conduct case studies to show its credibility in design and superiority compared to baseline methods.
To provide a general architecture of our paper, in Section 2 we introduce basic definitions of rotation for arbitrary order tensor (tuples) and related concepts. In Section 2.3
we formulate rotation invariance (equivariance) on supervised learning methods. The RotEqNet and main algorithm is presented in Section
3, and numerical results are shown in Section 4.2 Preliminaries and Problem Description
2.1 Tensor and its operations
In this section, we first introduce an abstract way of defining tensor. One reason for us to introduce the more abstract way to think about tensors is that it provides a convenient formalism for the operations we will do on the tonsorial data discussed in the previous section. The operations are

Contraction
The formalism helps us to prove that these two operations commute which lays theoretical ground for the computation of a representative of rotationallyrelatated tensors. We will call this representative standard position
2.1.1 Abstract definition of tensors
Following [37]
, fix a vector space
of dimension over . A tensor product is a vector space with the property that bilinear maps are in natural onetoone correspondence with linear maps .The tensor product can be constructed as the quotient vector space , where is generated by vectors of the following types
(2.1) 
where and are vectors in and is a scalar in . This means any element in can be written as a linear combination of vectors of the above form. is not necessarily a vector space of finite dimension. But the quotient space is. Let be the natural projection map, then we use to denote the image of under .
Let be a basis of , then for and form a basis of . This means any vector can be written as
(2.2) 
for some .
Here are some relations of tensors which come directly as a consequence of the relations generating :
(2.3) 
(2.4) 
The representation of a tensor in is similar to the representation of a linear map , i.e. a matrix. In fact, there is a natural way to think of a tensor as a linear map:
For each element in the basis of , we can think of it as a linear map by defining , where is the natural inner product on . Extend the definition linearly to every element in , we obtain a way to identify as the space of linear map . In fact, the tensor corresponds to the linear map represented by the matrix .
We have defined the tensor product over . The definition/construction of order tensor follows the same course. We will denote order tensor by .
The basis of is given by , where and . With respect to this basis, any order tensor can be written as . Analogous to the order 2 case, we can think of an order tensor as a dimensional matrix, the typical way a tensor in physical experiments are represented.
We will use to denote a tensor of order , i.e. a vector in . is called the rank of the tensor.
2.1.2 Rotation on tensors: a linear transformation
A linear transformation on higherorder tensor is a generalization of a linear transformation on the firstorder tensor, i.e. a vector.
Let be a linear transformation. Use the basis of , we can represent this expression with the equation
(2.5) 
Let denote the matrix representation of with respect to the basis . Then
(2.6) 
i.e. the transpose of the matrix
The map naturally induces a map on . On the basis element , the action of is defined as
(2.7) 
For any tensor , we will use to denote the extension of on
There is a convenient way to represent a linear transformation of 2tensor as matrix multiplication.
For a 2tensor , use be the matrix whose term is .
Lemma 2.1.
Rotation operation by matrix on secondorder tensor (matrix) is a change of basis operation.
(2.8) 
where here means the usual matrix multiplication.
Remark 2.2.
Rotation operation by matrix on firstorder tensor (vectors) could be viewed as
(2.9) 
Lemma 2.1 and remark 2.2 will be used in the proof of Theorem 3.4. As we have shown in this subsection, one could use a matrix form of rotation operation with certain rules of matrix multiplication to perform a rotation on the tensor. In the following proofs of this paper, we applied this idea to perform rotation operation on tensors via matrix multiplication.
2.1.3 Contraction on tensors: reduction of order
Let be the standard inner product on . Using this inner product, we can define the contraction of a tensor. It ”merges” vectors on the specified axes using the inner product and reduces the rank of the tensor by 2. Formally, let denote the contraction along axis and axis. Here, the axis means the ordinal of in . For example, axis refers to the first copy of in .
On the element , acts on it by pairing and via the inner product , i.e.
(2.10) 
where means is not present.
We can then define on by extending linearly. When , contraction is nothing other than taking the trace of the corresponding matrix.
Lemma 2.3.
Let be a rotation. Let , then
(2.11) 
Lemma 2.3 shows an interesting connection between rotation operation and contraction. To understand this lemma, it represents that the contraction of a tensor is compatible with a linear transformation if this linear transformation is a rotation. This is an important lemma which is the foundation of the entire analysis in this paper. We would further utilize this lemma for extracting its rotation operation from higher (arbitrary) orders. We show the proof in 5.
2.2 Supervised learning setup
In our problem, given data set . The data set contains inputoutput pairs . The input here is a tensor tuple:
(2.12) 
is the length of . Normally, we only have one output.
Generally speaking, following the definition of [38, 39], parametric supervised learning can be viewed as a type of a model composed from two parts. The first part is a predictor. Given parameter , we have:
(2.13) 
, where is the prediction output of learning model , is the parameter of . As stated, it predicts value based on input .
The second part is an optimizer, which updates the parameter
based on a loss function. For a regression model, a typical loss function would be defined as:
(2.14) 
where represents 2norm.
We usually hopes to minimize this loss function. It is formulated by:
(2.15) 
where is a learning model and is the optimal solution. Specifically, in this work, we applied Neural Networks [40]
and Random Forests
[41] in the case studies.2.3 Obtaining rotationequivariance properties in systems using supervised learning
Group equivariance is an important property for most physical systems. Typical examples of group equivariance could be rotation group equivariance, scaling group equivariance, and translation group equivariance. Mathematically, group equivariance is a property of a mapping to commute from to under rotation group actions. Specifically, let be a rotation action. is rotationequivariant if
(2.16) 
As a special case of rotationequivariant, a function is rotationinvariant if:
(2.17) 
Since supervised learning models could be considered as functions, name a machine learning model as . For a rotation operation , we hope to obtain the property that:
(2.18) 
For analysis below in Sec. 3.3, we prove the rotationequivariance property following the definition stated here in Equ. 2.18. In other words, if a system would satisfy the property in Equ. 2.18, then this system is rotationequivariant.
2.4 Modeling symmetric fluid systems via supervised learning
The machine learning approach to the fluid dynamics modeling involves training a supervised learning model using as features and as label.
In our case, the underlying space of the fluid dynamic system is complete with respect to rotation. This means for all rotation , for all . The objects we want to model via machine learning are rotationequivariant tensorial fields on .
Let be a tensorial field on , for any point , we use to denote the tensor at (for example, pressure at a particular point in a fluid dynamics system). is said to be rotationequivariant if for all point and all rotation
Suppose one has tensorial fields on such that and are related by some unknown physical law such that
Supervised machine learning methods can be used here to learn a function that approximates such that generalizes well on new data.
Suppose those tensorial fields are rotationequivariant, then naturally the model as well its proxy
3 Rotation Equivariant Network
In this section, we would like to propose Rotation Equivariant Network (RotEqNet) to solve rotation problems for high order tensors in fluid systems. RotEqNet is based on the position standardization algorithm, as we would further discuss in Section 3.2. We first provide a general description of the whole architecture in 3.1.
3.1 Architecture
As shown in Figure1, RotEqNet generally goes through three important steps: position standardization, prediction of kernel predictor, and position resetting. To be specific, the position standardization is an algorithm to transfer incoming tensor to its standard position. In Figure1
, the ’even order standardization’ and ’odd order standardization’ sections denote this algorithm in position standardization. Then,
is considered as a standard position of input tensor , and is an extracted rotation operation to transfer between standard position and original position. The output of kernel predictor is only dealing with standard positions. This will result the output in its standard position as well. Finally, apply to output will be our final prediction. A general mathematical description of this process could be described as:(3.1) 
How would this process help to solve rotation problems for high order tensors?
An important reason is related to a reduced function space for learning. When a learning model is only training with the standard position, it would no longer still have to deal with the entire group action causing a groupequivariant, but only need to focus on the pattern by the related physical equation. Name the rotation group as G, and consider a full function space . As mentioned in [31], instead of performing regression on , RotEqNet is essentially exploring a much smaller space . The reduction of inputoutput dimensionality makes the training easier. With the same number of samples, the pattern for learning requires a far smaller space to explore. The second reason is RotEqNet could provide a theoretical guarantee of the property of rotationequivariant. Utilizing rotation symmetry as a strong prior for most physical systems, RotEqNet have a better generalized result learning from limited amount of data.
3.2 Position standardization algorithm for High Order Tensors
Let denote our data set. The first stage of RotEqNet is to find a good representative of all tensors that are related to each other by rotation. We will call this representative the sample in ”standard position,” and we will denote it by . We will use to denote the position standardization algorithm and to mean reducing to its standard position.
has the following property that and all rotation operation ,
(3.2) 
This means, produces exactly the same output no matter how is rotated, i.e. it is rotationinvariant.
Intuitively, for a tensor , we are selecting a representative on the orbit , (where ), as the rotation invariant of a [42]. In our algorithm, we initially perform a tensor contraction to higherorder tensors, reducing the dimension to obtain a lower order tensor. Then using diagonalization for even cases and QR factorization for odd cases, the algorithm could obtain a rotation operation acting on . Finally, it could get a tensor in standard position by rotating the original tensor with the inverse of the obtained rotation matrix.
This operation is compatible with the theoretical result shown in Lemma 2.3.
3.2.1 Tensor of even order
Given a symmetric tensor of even order ( is even). Let denote a sequence of contraction along the first two axes until we reach a secondorder tensor. Applying to we get:
(3.3) 
(3.4) 
Then we find the orthonormal eigenvectors of and use them to form the orthonormal matrix that diagonalize
(3.5) 
Since is an orthonormal matrix, we have
(3.6) 
We will call the standard position of . We write to shorten the notation
Since contraction and rotation are compatible by Theorem 2.3. We can apply to before we apply contraction, and we will have
(3.7) 
For the even tensor , we define
(3.8) 
3.2.2 Tensor of odd order
Why would it be different for even order and odd order? Since odd order cannot directly reduce its dimension to 2 by contraction. Due to the fact that each contraction will reduce the dimensionality by 2, the reduced dimension will also be an odd number, which cannot be 2. Involving in this problem, this would further be impossible to extract the rotation matrix, which is impossible to rotate the tensor into a standard position. The following described is the method that we use to solve the problem.
Given a symmetric tensor of an odd order tensor ( is odd). Let denote a sequence of contraction along the first two axes until we reach a thirdorder tensor. Applying to we get:
(3.9) 
After we obtain , we could obtain three different order one tensors by contracting it. Name the contracted results, which are firstorder tensors i.e. vectors, . We could get an order two tensor by concatenating them.
(3.10) 
Then, we could perform the a similar process as before. We perform QR factorization to obtain rotation matrix .
(3.11) 
For odd tensor, we define:
(3.12) 
The pseudocode of our proposed algorithm is documented in Algorithm 1. We evaluate our method of position standardization algorithm in Section 4.
3.3 Theoretical Analysis of Rotationequivariant property
In our analysis, we aim to show the rotationequivariant property of RotEqNet. As an important first step, we need to analyze the property of rotation invariant (standard position) derived by the position standardization algoriTheorem We hope to show Equ. 3.2 is true. Once Equ. 3.2 is proved true, RotEqNet will automatically be rotationequivariant based on its architecture.
To sketch an outline about theorems below, Lemma 2.3 would serve as an important fact for preserving rotation information after contraction. Our algorithm has been analyzed by Theorem 3.4.
3.3.1 Main theorems and proofs
Theorem 3.4.
is rotation invariant, i.e. for all rotation and symmetric tensor
(3.13) 
We provide a proof in Appendix 5.4. We call the standard position of
Using Theorem 3.4 we can automatically have the result on tensor with arbitrary order that the standard position, derived by position standardization algorithm, is rotation invariant.
Theorem 3.5.
RotEqNet, , is rotationequivariant, i.e. for all rotation and tensor
(3.14) 
Proof.
Name RotEqNet as
, kernel classifier as
. Consider a input pair . Suppose the result of standardize position algorithm would have , where denote a rotation operation.First, consider the process of RotEqNet is defined as:
(3.15) 
Consider another rotation operation in the matrix form acting on , using Theorem. 3.4 we know that:
(3.16) 
where
is an identity matrix.
Then, consider the process of RotEqNet is defined as:
(3.17) 
We know that from Equ. 3.16. Therefore, we have . Substitute back into previous equation,
(3.18) 
To simplify, for rotation operation on input , we have
(3.19) 
This is showing that is rotationequivariant by definition. Therefore, RotEqNet is rotationequivariant. ∎
it has shown that Algorithm 1 is able to preserve rotation information for low dimension, and further extract using diagonalization for matrices. This part is a theoretical guarantee of our position standardization algoriTheorem
4 Case studies
In this section, a series of cases are provided to show the performance of RotEqNet. In the following subsections, cases are included from secondorder, thirdorder, and fourthorder. We specifically investigate secondorder cases with detailed studies on linear, and nonlinear test equations since, in current applications, secondorder physical systems are widely used. Generally, we reported two properties of RotEqNet in every case study. The first property is a loss reduction property. We apply RotEqNet in each test physical equation and compared it to the baseline models (Neural Networks and Random Forests). Another one is the rotationinvariant property. We examine this property by letting RotEqNet and baseline models to perform prediction on rotations of randomly selected data. We report detailed information for these case studies in every subsection below. The interpretation of experimental results is also included in each subsection.
4.1 Case study from Newtonian fluid: a secondorder linear case
4.1.1 Problem statement
Newtonian fluid is a type of fluid such that its viscous stress changes based on its flow. In this experiment, we aim to use simulation data to demonstrate this rule of Newtonian fluid. This would serve as a case study with secondorder linear equations. Let be stress tensor, pressure and strain rate. The rule of Newtonian fluid is an secondorder physical equation which satisfies the following condition [43]:
(4.1) 
Another definition of the equation for Newtonian fluid would use the velocity vector field, defined as . could be expressed as a matrix. Using this definition, the equation of Newtonian fluid could also be written as:
(4.2) 
This could also be considered as the definition of strain rate Based on this definition, we could observe that , and it is symmetric since . Since is symmetric and is an identity matrix, is also symmetric. Therefore, defining an arbitrary rotation matrix , this system is equipped with the property of rotationequivariant that .
To quantify the stress for Newtonian fluid simulation, it would be useful to be able to predict the Newtonian fluid stress, given the simulation of pressure and velocity vector field. Based on this scenario, in this subsection, we provide a case study for the machine learning model on inputting the shear of Newtonian fluid and prediction of the stress.
4.1.2 Data generation and model description
Based on Eqn.4.2, we first generate random data to obtain and . The generation of random numbers in
follows a normal distribution from range
. Derived from generated and , we could obtain from Eqn.4.2. Denote the dataset as . To form a proper dataset with elements for a machine learning model for Newtonian fluid, the input is set up to be a vector where . Specifically, is composed by and flattened in Eqn.4.1. The output is a vector which is the flattened result of matrix derived by and . The dataset would set up in the description above. To compare the difference of our method to the baseline method, we trained two models with the same hyperparameter using different amounts of training data, ranging from . of generated data is used for training and of data is used for testing. A rotation set with 10,000 random rotation matrices is also generated for evaluating the property of rotationequivariant, denoted by .The machine learning model we apply here is neural networks and random forests because of the ability of these two models to approximate arbitrary functions. For Neural Networks, in our implementation, the logistic activation function is used as an activation function for every neuron. The number of neurons for two layers is 512 and 4, respectively. Adam optimizer
[44] is applied as the algorithm for optimization, and the learning is set up to be. We also set the batch size to be 64. For random forests, 100 estimators are set up with mean squared error as the criterion. The maximum depth of random forests is 3 to lower the chance of overfitting. We used Sklearn for implementation
[45]. The computation environment of this experiment is CPU.4.1.3 Results
There are two properties to evaluate, including error reduction and rotationequivariant of RotEqNet. The effect of error reduction is evaluated for the first. A kernel predictor is trained by standard positions derived from training data. Then, the prediction algorithm is applied to both training and testing set to obtain the training and testing performances. The validation error is defined as the Mean Squared Loss using the formulation that:
(4.3) 
In Eqn. 4.3, is the number of data in dataset , is the trained machine learning model, is the derived parameter from model , and describes inputlabel pair of the dataset. This evaluation represents the expected error of model with dataset .
Kernel predictor  Training Error Reduction  Testing Error Reduction 

Neural Networks  99.56%  99.60% 
Random Forests  99.56%  99.72% 
Fig. 4
(a) shows the error reduction property of RotEqNet. This plot consists of three groups of experimental groups. The first experiment group focuses on the accuracy of the baseline model, a single feedforward Neural Network, on raw data with random rotated positions. As shown in Fig.
4(a) with blue curves, triangle curve represents training error and circle curve represents testing error. The second experiment group is RotEqNet with Neural Network as the kernel predictor. As shown in Fig. 4(a) in red curves, triangle curve represents training error, and the circle curve represents testing error. For 100,000 training samples, the testing error of RotEqNet is 0.0037, and the testing error of the baseline method is 1.333. We could observe a huge error reduction, 99.56% in training, and 99.60% in testing, for RotEqNet compared to the error of using the baseline model. For the last experiment group, as performances marked as black curves in the figure, it reports the performance of kernel predictor with standard position only. This experiment would explain why RotEqNet would improve performance since training with standard positions would be an easier task compared to raw data.Further, Fig. 4(b) shows the error reduction effect of RotEqNet using Random Forest as a kernel predictor. Similarly, as shown in Fig. 4(b) with blue curves, it represents the performance of the baseline method (Random Forests). The second experiment group is RotEqNet with Random Forests as the kernel predictor. As shown in Fig. 4(b) in red curves, triangle curve represents training error and the circle curve represents testing error. We could observe a huge error reduction, 99.56% in training and 99.72% in testing, for RotEqNet compared to the error of using only the Random Forest predictor. For the last experiment group, as performances marked as black curves in the figure, trains the kernel predictor with standard position only. As stated before, this could also serve as a reason for the error reduction effect for RotEqNet on random forests.
According to the reported results, RotEqNet has a good generalization result without overfitting. For cases training with raw data for baseline models, the testing error is typically higher compared to training error. For example, the difference is training and testing errors are 0.44 for Neural Networks, for Random forests when . This represents that for both Neural Networks and Random Forests would be easy to overfit this task on Newtonian Fluid. By contrast, RotEqNet would help to reduce this difference in training and testing error. As we could observe from the training and testing error of RotEqNet, the errors are much lower. When , there are only for Neural Networks and for Random Forests. In the case of linear secondorder equations, the application of RotEqNet would result in bettergeneralized results in learning.
Another important property to evaluate for RotEqNet is rotationequivariant. The experiment is designed on the definition of rotationequivariant mentioned in Eqn. 2.18. First, we pick a randomly generated data . Then we apply the rotation set with 10,000 random rotation matrices to . To fully investigate the property of rotationequivariant, we apply an error evaluation method here to evaluate the error compared to real data, which is defined as:
(4.4) 
Model  Baseline (NN)  RotEqNet (NN)  Baseline (RF)  RotEqNet (RF) 

0.6362  0.0013  3.1334  1.5513  
This error evaluation method () focuses more on the model’s error on real data for all rotations. As shown in Tab. 2, for both baseline methods, using neural networks and random forests, there are large errors for and . The baseline methods have no theoretical guarantee that it has the property of rotationequivariant. However, there is an error reduction for both machine learning models when applying with RotEqNet’s architecture. Especially for RotEqNet with Neural Networks as kernel predictor, we could observe that with of error reduction. This could guarantee the rotationequivariant property of RotEqNet.
4.2 Case study from large eddy simulation: a secondorder nonlinear case
4.2.1 Problem statement
In this case, we consider the subgrid model of large eddy simulation (LES) of turbulent flow by Kosovic [46]. In this case study, as formulated previously in [47, 48], we hope to obtain a learned model by simulation data from LES. This would serve as a case study with secondorder nonlinear equations. LES is defined as:
(4.5) 
Here is subgrid stress, which is a symmetric traceless 2nd order tensor. and are symmetric and antisymmetric parts of velocity gradient tensor , where . Further, , , , are all constants. The configuration of constants above are reported in the next subsection.
In order to qualify the subgrid stress for LES, this study aims to predict the subgrid stress, given the simulation of velocity gradient tensor. This case study for the machine learning model on inputting the velocity gradient tensor.
4.2.2 Data generation and model description
Based on Eqn. 4.5, we first generate random data to obtain a simulated velocity gradient tensor . The generation of random numbers follows a normal distribution from range , and
is obtained from a random matrix
by subtracting from diagonal position. This would keep . and could be obtained from by getting its symmetric and antisymmetric parts. For the setup of constants, . is computed from the above setting with Eqn. 4.2. Denote the dataset as, . To form a proper dataset with elements for a machine learning model for Newtonian fluid, the input is set up to be a vector where . Specifically, is composed by flattened . The output is a vector, which is the flattened result of matrix derived by and other constants. To compare the difference of our method to the baseline method, we trained two models with the same hyperparameter using different amounts of training data, ranging from . of generated data is used for training, and of data is used for testing. A rotation set with 10,000 random rotation matrices is also generated for evaluating the property of rotationequivariant, denoted by . The model setup is the same compared to Sec. 4.1.2.4.2.3 Results
The effect of error reduction is evaluated for the first. The validation error is defined as the Mean Squared Loss using the formulation in Eqn. 4.3. This evaluation represents the expected error of model with dataset .
Fig. 5(a) shows the error reduction effect of RotEqNet with Neural Network as a kernel predictor for secondorder nonlinear cases with three groups of experimental groups. The first experiment group focuses on the accuracy of the baseline method on raw data with random rotated positions. As shown in Fig. 5 with blue curves, triangle curve represents training error and the circle curve represents testing error. The second experiment group is RotEqNet, with Neural Network as a kernel predictor. As shown in Fig. 5(a) in red curves. For 100,000 training samples, the testing error of RotEqNet is 0.1391, and the testing error of the baseline method is 0.2946, with 52.77% of error reduction. The performances of the last experiment group are marked as black curves in the figure, with only standard position trained for kernel predictor.
Based on the experimental results, firstly, RotEqNet could reach a better learning performance compared to simply applying Neural Networks (baseline method). Training with standard positions could lower the training difficulty, and therefore RotEqNet could obtain better performance. Further, Fig. 5(b) shows the error reduction effect of RotEqNet using Random Forest as a kernel predictor. The general performance of using Random Forests as a kernel predictor is relatively worse compared to using Neural Networks as a kernel predictor. In Fig. 5(b), blue curves represent the performance of training with raw data by Random Forests (baseline method); red curves represent the performance of RotEqNet; black curves represent the performance of kernel predictor trained with standard positions. We could observe an error reduction for 36.63% in training and 57.58% in testing for RotEqNet with Random Forests.
Moreover, RotEqNet has a good generalization result without overfitting. Applying raw data in learning directly on baseline models, the testing error is much higher compared to the training error. For example, the difference is training and testing errors are for Neural Networks, for Random forests when . It is also observable in Fig. 5(a) that the training error of the baseline model with raw data is the lowest, while the testing error of the baseline model is the highest. In this case study, Neural Networks are worse for the sake of overfitting compared to Random Forests. By contrast, introducing the architecture RotEqNet would help to reduce this difference in training and testing error. As we could observe from the training and testing error of RotEqNet, the errors are much lower. When , there are only for Neural Networks and for Random Forests. In this case study of LES, the application of RotEqNet would result in bettergeneralized results in learning.
Kernel predictor  Training Error Reduction  Testing Error Reduction 

Neural Networks  98.44%  52.77% 
Random Forests  36.63%  57.58% 
To evaluate the rotationequivariant property of RotEqNet for secondorder nonlinear cases, our experimental process is close to the one stated in Sec. 4.1.3. First, we pick a randomly generated data . Then we apply the rotation set with 10,000 random rotation matrices to . This error evaluation method (), as defined in Eqn. 4.4, focuses more on the model’s error on real data for all rotations. As shown in Tab. 4, for both baseline methods, using neural networks and random forests, there are large error for . The baseline methods have no theoretical way to guarantee that it has the property of rotationequivariant. However, there is an error reduction for both machine learning models when applying with RotEqNet’s architecture. Especially for RotEqNet with Neural Networks as kernel predictor, it is observable that with error reduction. This could guarantee the rotationequivariant property of RotEqNet for nonlinear secondorder cases.
Model  Baseline (NN)  RotEqNet (NN)  Baseline (RF)  RotEqNet (RF) 

0.0945  0.0025  0.1912  0.0084  
4.3 Case study from testing Newtonian Fluid equation: a thirdorder case
4.3.1 Problem statement
In this section, we study the performance of RotEqNet for tensor with odd order. In this case, we specifically set a thirdorder test equation. We used a test equation here revised from Newtonian fluid equation from Eqn. 4.2. We name this testing equation as ’testing Newtonian fluid equation’ for simplicity. The testing equation revised from Newtonian fluid equation can be described as:
(4.6) 
where is testing stress, is testing pressure, and is testing velocity field. is the identity thirdorder tensor.
Based on this testing equation, we could observe that . Since is symmetric, and is symmetric, we have is also symmetric. Therefore, defining an arbitrary rotation matrix , this system is equipped with the property of rotationequivariant that .
In order to qualify stress for testing the Newtonian fluid equation, this study aims to predict the stress, given the simulation of pressure and velocity gradient tensor. This case study for the machine learning model on inputting the pressure and velocity gradient tensor.
4.3.2 Data generation and model description
Based on Eqn. 4.6, we first generate random data to obtain and . The generation of random numbers in follows a normal distribution from range . could be obtained using the Eqn.4.6, derived from generated and . Denote the dataset as , . To form a proper dataset with elements for a machine learning model for Newtonian fluid, the input is set up to be a vector where . Specifically, is composed by and flattened in Eqn.4.6. The output is a vector which is the flattened result of matrix . The dataset would set up in the description above. To compare the difference of our method to the baseline method, we trained two models with the same hyperparameter using different amounts of training data, ranging from . of generated data is used for training and of data is used for testing. A rotation set with 10,000 random rotation matrices is also generated for evaluating the property of rotationequivariant, denoted by . The model setup is the same as Sec. 4.1.2.
4.3.3 Results
Fig. 6(a) shows the error reduction effect of RotEqNet with Neural Network as a kernel predictor for thirdorder cases with three groups of experimental groups. The first experiment group focuses on the accuracy of the baseline model (Neural Network) on raw data with random rotated positions as shown in Fig. 6(a) with blue curves. The second experiment group is RotEqNet, with Neural Network as kernel predictor as shown in Fig. 6(a) in red curves. For 100,000 training samples, the testing error of RotEqNet is 1.8759 and the testing error of baseline method is 2.2232 with 15.62% of error reduction. The performances of the last experiment group are marked as black curves in the figure, with only standard position trained for kernel predictor.
Based on the experimental results, for the thirdorder testing equation, RotEqNet could reach a better learning performance compared to the baseline method. Training with RotEqNet could lower the training difficulty, and therefore RotEqNet could obtain better performance. Moreover, RotEqNet has good generalization capability without overfitting. As shown in the blue curves of Fig. 6, if we apply raw data in learning directly on baseline models, the testing error is much higher compared to the training error. In this case study, introducing the architecture RotEqNet would help to reduce this difference in training and testing error. As we could observe from the training and testing error of RotEqNet, the errors are much lower. When , there are only for Neural Networks and for Random Forests. In this case study of testing the Newtonian fluid equation, the application of RotEqNet would result in bettergeneralized results in learning.
Further, Fig. 6(b) shows the error reduction effect of RotEqNet using Random Forest as a kernel predictor. The general performance of using Random Forests as a kernel predictor is relatively worse compared to using Neural Networks as a kernel predictor. In Fig. 5(b), blue curves represent the performance of training with raw data by Random Forests (baseline method); red curves represent the performance of RotEqNet; black curves represent the performance of Random Forest trained with standard positions. For the first point, we could observe an error reduction for 0.90% in training and 6.84% in testing for RotEqNet with Random Forests. As another point, RotEqNet is also obtaining a betterlearned model for the model’s capability in generalization. The testing error of the baseline method is observably higher than testing error, while the training and testing performance of RotEqNet is approximately the same. As suggested in Fig. 6(a), in secondorder nonlinear cases, RotEqNet could reach a generalized learning result with remarkably lower error compared to baseline methods.
Kernel predictor  Training Error Reduction  Testing Error Reduction 

Neural Networks  9.42%  15.62% 
Random Forests  0.90%  6.84% 
Model  Baseline (NN)  RotEqNet (NN)  Baseline (RF)  RotEqNet (RF) 

2.8454  2.6992  3.0788  3.1068  
To evaluate the rotationequivariant property of RotEqNet for this thirdorder case, we designed an experimental process that is close to the one stated in Sec. 4.1.3. As shown in Tab. 6, for baseline method using neural networks, the error is relatively large for compared to RotEqNet. In our experiment, we reached an error reduction of . We would further discuss this result in Section 4.4.3.
4.4 Case study from Electrostriction: a fourthorder case
4.4.1 Problem statement
This case study focuses on a linear relationship of fourthorder tensor. Nye [49] has introduced a fourthorder tensor in modeling elastic compliances and stiffnesses, which has been investigated using machine learning methods [50, 51]. Generally, in the study of the properties of a crystalline and anisotropic elastic medium, a fourthorder tensor coefficient will typically be applied to model the relationship between two symmetric secondorder tensors [52]. In this case, we study Electrostriction, a property causing all electrical nonconductors to change their shape under the application of an electric field. The relationship is described as:
(4.7) 
Here is a symmetric traceless secondorder strain tensor. where and are firstorder electric polarization density. Note here is symmetric. is the electrostriction coefficient.
Based on the formulation above, this system is symmetric. Since is symmetric, . This could guarantee that is also symmetric. Due to the face that the system is symmetric, applying a random rotation matrix , .
In order to qualify strain for study on Electrostriction, we aim to predict the strain, given the simulation of electrostriction coefficient and electric polarization density.
4.4.2 Data generation and model description
Based on Eqn. 4.7, we first generate random data to obtain simulated electrostriction coefficient tensor and electric polarization density tensor . The generation of random numbers follows a normal distribution. is computed from above setting using and . Denote the dataset as, . To form a proper dataset with elements for a machine learning model for the study on Electrostriction, the input is set up to be a vector where . Specifically, is composed by flattened and . The output is a vector, which is the flattened result of secondorder tensor . To compare the difference of our method to the baseline method, we trained two models with the same hyperparameter using different amounts of training data, ranging from . of generated data is used for training, and of data is used for testing. A rotation set with 10,000 random rotation matrices is also generated for evaluating the property of rotationequivariant, denoted by . The model setup is the same compared to Sec. 4.1.2. We use Numpy to generate this simulated dataset by generating a random symmetric fourthorder tensor , and secondorder tensor . is computed from generated and by Eqn. 4.7.
4.4.3 Results
The effect of error reduction is evaluated for the first. The validation error is defined as the Mean Squared Loss using the formulation in Eqn. 4.3. This evaluation represents the expected error of model with dataset . Fig. 7 shows the performance of Neural Networks and Random Forests as kernel predictor separately. It is observable that in highorder cases, Neural Networks have huge superiority to Random Forests. Details will be demonstrated in the following paragraphs.
We are starting with Neural Networks, Fig. 7(a) shows the error reduction effect of RotEqNet with Neural Network as a kernel predictor. As shown in blue curves, the first experiment group focuses on the accuracy of the baseline model on raw data with random rotated positions. The second experiment group is RotEqNet marked with red curves. As shown in black curves, it shows the performance of the kernel predictor trained by standard position. For 10,000 training samples, the testing error of RotEqNet is 4.0106 and the testing error of baseline model is 8.6458 with 53.61% of error reduction. The testing performance of the kernel predictor is only evaluated on the testing set with only standard positions. It will be helpful to explain the reason for the improved performance of RotEqNet.
To interpret the experimental results, firstly, RotEqNet could reach a better learning performance compared to simply applying Neural Networks (baseline method). A dataset with only standard positions has lower training difficulty compared to random positions. This claim is supported by black curves in Fig. 7(a), the performance of the kernel predictor is much better than the baseline model. RotEqNet could obtain better performance for utilizing rotation symmetry as a prior, and training kernel predictor with only standard positions. Moreover, RotEqNet has a good generalization result without clear overfitting. The training error and testing error of RotEqNet is considerably close to each other, and sometimes, the testing error of RotEqNet is even slightly better than its training error. By contrast, applying raw data in learning directly on would result in an overfitted model. The testing error is much higher compared to the training error. To demonstrate the improved learning result in generalization, for example, when , the difference between training and testing errors for RotEqNet is only while the difference of the baseline method is . As a quick conclusion, for Neural Networks as a kernel predictor, the application of RotEqNet would be better compared to the baseline method.
Kernel predictor  Training Error Reduction  Testing Error Reduction 

Neural Networks  18.93%  54.63% 
Random Forests  0.58%  2.96% 
Further, Fig. 7(b) shows the error reduction effect of RotEqNet using Random Forest as a kernel predictor. At first glance, we could find that the curves for Random Forests are quite messy without certain patterns like Fig. 7(a). The general performance of using Random Forests as a kernel predictor is worse in both aspects of performance and generalization. In Tab. 7, we could observe a training error reduction for 0.58% and testing error reduction of 2.96%. Even if we could still see the general error of RotEqNet seems to be slightly lower than the baseline method. This result is not comparable to the error reduction performance with setting Neural Networks as a kernel predictor. As another point, selecting Random Forests as a kernel predictor fails to extract learning rules with the standard position. As we could observe the black curves in Fig. 7(b) is not showing an improved performance as good as using Neural Networks. Finally, the learned model of RotEqNet is also not getting a model with better generalization capability. There is no significant reduction of overfitting error compared to the baseline method.
Model  Baseline (NN)  RotEqNet (NN)  Baseline (RF)  RotEqNet (RF) 

3.9290  2.7960  4.8976  4.8740  
To evaluate the rotationequivariant property of RotEqNet for this fourthorder case, we designed an experimental process as stated in Sec. 4.1.3. The error evaluation measurement (), as defined in Eqn. 4.4, focuses more on the model’s error on real data for all rotations. As shown in Tab. 8, when using neural networks, baseline method has large error for . RotEqNet helps in keeping the rotationequivariant property as observing error reduction in for . Considering the case using Random Forests as a kernel predictor, as shown in the previous paragraph, because of the reason that Random Forests are relatively bad in learning fourthorder data, the performance of is still affected, which results in a large error in the prediction of RotEqNet with Random Forest.
The large error reduction observed in case studies raised new opportunities in solving the problem of the physical system with rotation symmetry. Most physical systems have the property of rotation symmetry, and currently, there exist few works that could provide a theoretical guarantee to this property for machine learning methods. A key point in this problem is to design a properly defined algorithm to obtain rotation invariant for highorder tensors. This paper has shown RotEqNet with theoretical and experimental results aiming to solve the problem of rotation symmetry.
We first define a standard position as rotation invariant, which is compatible for highorder tensors. It allows us to extract the rotation invariant of highorder tensors using a contraction, diagonalization, and QR factorization. The theoretical guarantee is shown in Thm. 3.5, and the algorithm is shown in Alg. 3.2.2. RotEqNet is built on Alg. 3.2.2 with a kernel predictor which only deals with standard positions (rotation invariants). By setting kernel predictor with Neural Networks and Random Forests, these two methods are compared with baseline methods in four different case studies focusing on secondorder linear, secondorder nonlinear, thirdorder linear, and fourthorder linear cases. There are three important points to address from the observation of case studies.
First, the definition of the standard position is successful. The definition of the standard position is not unique. We aim to define a proper version of the standard position to simplify the learning task by removing the effect of rotation symmetry. In our case, the standard position satisfies the definition of rotationinvariants, which selects a representative point from the orbit of an element via diagonalization (or QR factorization). The experimental results are compatible with this definition of the standard position. We could observe in most of the cases, training kernel predictors with only rotation invariants could reach the lowest error. The reduced error means that the rotation invariant in our definition could lower the difficulty of this learning task as we previously discussed the reason in Sec. 1.
Second, RotEqNet is equipped with the property of Rotationequivariant. As we could observe from the results of case studies, the rotation error is typically low compared to baseline methods. The perseverance of the property of Rotationequivariant shows the successful design of RotEqNet and the correctness of Thm. 3.5. Operating with Alg. 3.2.2, the property of Rotationequivariant of RotEqNet could be held if and only if Thm. 3.5 is correct. Further, this fact would cause an error reduction for RotEqNet. As stated in the previous paragraph, training with rotation invariants will result in a lower error. Under this situation, adding with the property of Rotationequivariant, this would cause RotEqNet could process this system with any rotation.
The two reasons above are the main reasons that are causing the error reduction for RotEqNet. There is also one point to mention is the selection of kernel predictor. The model selection of kernel predictor will affect the learning results significantly since the kernel predictor is essential in learning the physical system without the effect of rotation symmetry. Neural Networks is the best model in the design of the datadriven method for physical systems because of its flexibility to approximate arbitrary functions. We only reported the performance of Neural Networks and Random Forests as previous work by Ling [16]. As described in Sec. 4.4.3, the performance of Random Forests is limited compared to Neural Networks. Also, as a general trend in previous experiments, Neural Networks are usually reaching better performance compared to Random Forests. As a quick conclusion, we believe the application of Neural Networks as a kernel predictor has a series of advantages than other machine learning models.
We wish to further discuss about another error evaluation method of rotationequivariance property that we do not mention in case studies. Consider a type of error evaluation, evaluating rotation error of model itself, the error is defined as:
(4.8) 
The evaluation of this error is actually trivial since we have already rigorously provided a proof in Theorem 3.5 showing the rotationequivariance property of RotEqNet. We applied this evaluation in first two case studies, and the estimated error is around for all these cases.
For future work, there are three directions to this paper: a better definition of standard position, application to other groups, and generalization to nonsymmetric systems. For the first direction, for the current definition, the rotation invariant of oddorder tensors is not reaching equivalent performance as evenorder tensors. It would be a good work for revising the definition of standard position for odd tensors. Second, besides rotation symmetries, there are also physical systems with other groupequivariant properties such as scaling and transaction. This work could provide a method in solving problems with other groups, but the detailed design of an algorithm should differ from case to case. Third, current work could only deal with the symmetric system. However, for a general case, if the system is not symmetric, there are certain methods to use RotEqNet in a symmetric system for solving a nonsymmetric system. A good trick to consider, for example, is to deal with , where is a matrix. This is a great intuition to extend our current work into nonsymmetric physical systems.
Acknowledgments
G. Lin would like to acknowledge the support from National Science Foundation (DMS1555072 and DMS1736364).
5 Appendix
5.1 Lemma 2.1
Proof.
We will use column vector convention to represent vectors in . Let and be vectors in . Then
(A1) 
Then,
(A2)  
(A3)  
(A4)  
(A5) 
Therefore,
(A6) 
∎
5.2 Lemma 2.2
Proof.
We will use column vector convention to represent vectors in . Let be vector in . Then
(A7) 
Therefore,
(A8) 
∎
5.3 Lemma 2.3
Proof.
Since both and are linear, we may assume that is of the form .
(A9)  
(A10) 
Since is a rotation, it preserves the inner product i.e.
(A11) 
So
(A12)  
(A13)  
(A14) 
∎
5.4 Proof of Theorem 1
Proof.
Since the position standardization algorithm defines standard position differently for even and odd orders. We show our proof on even and odd cases separately.
Suppose has even order.
Let be the sequence of contraction along the first two axes such that , where is a secondorder tensor as described in the algorithm.
Given arbitrary even high order tensor , we could perform contraction to a second order tensor via first two indices:
(A15) 
For , using Lemma 2.1, there exists a rotation such that:
(A16) 
where . is diagonalizable because it is symmetric. Since is represented by a orthonormal matrix, therefore .
Based on Lemma 2.3, we know rotation commutes with contraction. Therefore, based on the standard position is defined as
(A17) 
Consider a rotation operation in its matrix form. When we act on we obtain a new tensor . For this new tensor, applying contraction we could have:
(A18) 
For , since Equ. A16, applying Lemma 2.1,
(A19) 
For its standard position we have:
(A20) 
To simplify, for a rotation operation acting on an even high order tensor ,
(A21) 
This satisfy the definition of rotation invariant. Therefore, for even cases, the standard position is a rotation invariant.
Suppose has odd order.
Let be the sequence of contraction along the first two axes such that , where is a thirdorder tensor as described in the algorithm.
(A22) 
Let be vectors of contraction operation on via different axes, i.e.,
(A23) 
Based on , we have
(A24) 
In this case,
(A25) 
Consider any rotation operation acting on . We have,
(A26) 
Using QRfactorization,
(A27) 
The standard position of will be defined as:
(A28) 
Using Remark 2.2, we could obtain
(A29) 
Considering and , for the same reason, we could know that
(A30) 
By reorganizing A24, A27, and A30,
(A31) 
Since QRfactorization is unique [53], we should have that . Therefore,
(A32) 
Plugging A32 into A28, comparing the result of A25 we have: