# Dynamically Stable Poincaré Embeddings for Neural Manifolds

In a Riemannian manifold, the Ricci flow is a partial differential equation for evolving the metric to become more regular. We hope that topological structures from such metrics may be used to assist in the tasks of machine learning. However, this part of the work is still missing. In this paper, we bridge this gap between the Ricci flow and deep neural networks by dynamically stable Poincaré embeddings for neural manifolds. As a result, we prove that, if initial metrics have an L^2-norm perturbation which deviates from the Hyperbolic metric on the Poincaré ball, the scaled Ricci-DeTurck flow of such metrics smoothly and exponentially converges to the Hyperbolic metric. Specifically, the role of the Ricci flow is to serve as naturally evolving to the stable Poincaré ball that will then be mapped back to the Euclidean space. For such dynamically stable neural manifolds under the Ricci flow, the convergence of neural networks embedded with such manifolds is not susceptible to perturbations. And we show that such Ricci flow assisted neural networks outperform with their all Euclidean versions on image classification tasks (CIFAR datasets).

• 73 publications
• 8 publications
• 6 publications
• 163 publications
11/16/2021

### Thoughts on the Consistency between Ricci Flow and Neural Network Behavior

The Ricci flow is a partial differential equation for evolving the metri...
11/15/2018

### Stable discretizations of elastic flow in Riemannian manifolds

The elastic flow, which is the L^2-gradient flow of the elastic energy, ...
05/23/2018

### Hyperbolic Neural Networks

Hyperbolic spaces have recently gained momentum in the context of machin...
07/19/2020

### Bounds for discrepancies in the Hamming space

We derive bounds for the ball L_p-discrepancies in the Hamming space for...
05/26/2018

### Stable Geodesic Update on Hyperbolic Space and its Application to Poincare Embeddings

A hyperbolic space has been shown to be more capable of modeling complex...
05/24/2021

### On the pathwidth of hyperbolic 3-manifolds

According to Mostow's celebrated rigidity theorem, the geometry of close...
09/20/2019

### Computation and verification of contraction metrics for exponentially stable equilibria

The determination of exponentially stable equilibria and their basin of ...

## 1 Introduction

In the field of machine learning, Euclidean embeddings for representation learning are the universal and successful method, which benefits from simply convenience and closed-form formulas in the Euclidean space endowed with the Euclidean metric . Moreover, in the broad application of neural networks, Euclidean embeddings are also showing off, including image classification (Krizhevsky et al., 2012; Simonyan and Zisserman, 2014), semantic segmentation (Long et al., 2015; Chen et al., 2014), object detection (Girshick, 2015), etc.

Recently, some studies have shown that the latent non-Euclidean structure in many data will affect the representation of Euclidean embeddings (Bronstein et al., 2017). In such cases, the advantages of Hyperbolic embeddings are highlighted. Firstly, the Hyperbolic space provides more powerful or meaningful geometrical representations than the Euclidean space. Secondly, for the weight initialization of a neural network, the distribution is generally Gaussian (Glorot and Bengio, 2010; He et al., 2015), which is a manifold of constant negative curvature (Amari, 2016), i.e., the Hyperbolic space (Amari et al., 1987)

. Obviously, the adoption of Hyperbolic embeddings in neural networks and deep learning has become very attractive.

For hierarchical, taxonomic or entailment data, Hyperbolic embeddings outperformed Euclidean embeddings in machine learning (Sala et al., 2018; Ganea et al., 2018). For tree structure, the Euclidean space with infinite dimensions cannot be embedded with arbitrary distortion, but the Hyperbolic space with only 2 dimensions can preserve their metric (Sala et al., 2018)

. As for basic operations (e.g. matrix addition, matrix-vector multiplication, etc.) in the Hyperbolic space, Ganea et al. gave appropriate deep learning tools

(Ganea et al., 2019).

Despite the successful application of Hyperbolic embeddings in deep neural networks, there are two other research areas that have not yet been involved, i.e., how to avoid the perturbation of the parameter update on the stability of the hyperbolic space and how to combine the common advantages of the Euclidean space and Hyperbolic space. In this paper, we propose to use the Ricci flow to assist the training of neural networks in the dynamically stable Poincaré ball. Because, in some cases, the Ricci flow can naturally make the Riemannian manifold converge by evolving metrics.

Ye has considered stability of negatively curved manifolds on compact spaces (Ye, 1993). Suneeta has investigated linearised stability of the Hyperbolic space under the Ricci flow (Suneeta, 2009). Li et al. have proven stability of the Hyperbolic space in dimensions when the deviation of the curvature of the initial metric from Hyperbolic space exponentially decays (Li and Yin, 2010). Based on the above work, Schnürer et al. yielded stability of the Hyperbolic space in dimensions when the initial metric is close to the Hyperbolic metric (Schnürer et al., 2010). A series of works have shown that, in certain cases, the Ricci flow can be used to eliminate the influence of a perturbation in the Hyperbolic space.

There are two main contributions in this paper. On the one hand, we prove that, when initial metrics is close to the Hyperbolic metric (Definition 3.2), the Poincaré ball under the scaled Ricci-DeTurck flow is exponential convergence for an -norm perturbation (Lemma 3.7). Apparently, such results are very meaningful, which shows that the application of the Ricci flow can help Poincaré embedded neural network manifolds to eliminate the perturbation of -norm for the metric. The Ricci flow guarantees the dynamical stability of neural manifolds during the training of neural networks with respect to the input data with transforms.

On the other hand, furthermore, we propose a Ricci flow assisted Eucl2Hyp2Eucl neural network that has been in an alternate state between the Euclidean space and the dynamically stable Poincaré ball . The illustration is shown in Figure 1. Specifically, we map the neural network (for the output before the softmax) from the Euclidean space to the Poincaré ball with an -norm perturbation. Then, the Ricci flow is used to exponentially converge to the Poincaré ball without an -norm perturbation. Finally, we map the neural network from the Poincaré ball to the Euclidean space . In general, Eucl2Hyp2Eucl neural networks take into account the common advantages of Euclidean and Poincaré embeddings, i.e., broad optimization strategies and meaningful geometrical representations. The algorithm is shown in Algorithm 1.

The rest of this paper is organized as follows. Section 2 summarizes basic works on Ricci flow. The proofs of the convergence of the Poincaré ball under Ricci flow are presented in Section 3. In Section 4, we yield the illustration of Ricci flow assisted Eucl2Hyp2Eucl neural networks. For the performance on CIFAR datasets, we compare Ricci flow assisted neural networks with their all Euclidean versions in Section 5. The conclusion is given in Section 6.

## 2 Ricci Flow

For a Riemannian manifold with metric , the Ricci flow was introduced by Hamilton to prove Thurston’s Geometrization Conjecture and consequently the Poincaré Conjecture (Hamilton and others, 1982). The Ricci flow is a partial differential equation that evolves the metric:

 ∂∂tg(t) =−2Ric(g(t)) (1) g(0) =g0

where is a time-dependent Riemannian metric and

denotes the Ricci curvature tensor whose definition can be found in Appendix

A.

The idea is to try to evolve the Riemannian metric in some way that make the manifold become more regular, which can also be understood as rounder from the topological structure point of view. This continuous process is known as manifold “surgery”.

### 2.1 Short Time Existence

If the Ricci flow is strongly parabolic, there exists a unique solution for a short time.

###### Theorem 2.1.

When is a time-dependent section of the vector bundle where is some Riemannian manifold, if the system of the Ricci flow is strongly parabolic at then there exists a solution on some time interval , and the solution is unique for as long as it exists.

###### Proof.

The proofs can be found in (Ladyzhenskaia et al., 1988). ∎

###### Definition 2.2.

The Ricci flow is strongly parabolic if there exists such that for all covectors and all symmetric , the principal symbol of satisfies

 [−2Ric](φ)(h)ijhij =gpq(φpφqhij+φiφjhpq−φqφihjp−φqφjhip)hij >δφkφkhrshrs.

Since the above inequality cannot always be satisfied, the Ricci flow is not strongly parabolic. Empirically, one can not use Theorem 2.1 to prove the existence of the solution directly.

To understand which parts have an impact on its non-parabolic, one linearizes the Ricci curvature tensor.

###### Lemma 2.3.

The linearization of can be rewritten as

 D[−2Ric](h)ij=gpq∇p∇qhij+∇iVj+∇jVi+O(hij) (2) whereVi=gpq(12∇ihpq−∇qhpi).
###### Proof.

The proofs can be found in Appendix B.1. ∎

In particular, the term will have no contribution to the principal symbol of . For convenience of our discussion, we just ignore this term. By carefully observing the above equation, one finds that the impact on the non-parabolic of the Ricci flow comes from the terms in , not the term . The solution is followed by the DeTurck Trick (DeTurck, 1983) that has a time-dependent reparameterization of the manifold:

 ∂∂t¯g(t) =−2Ric(¯g(t))−L∂φ(t)∂t¯g(t) (3) ¯g(0) =¯g0,

See Appendix B.2 for details. By choosing to cancel the effort of the terms in , the reparameterized Ricci flow is strongly parabolic. Thus, one gets that the Ricci-DeTurck flow 111Based on (Sheridan and Rubinstein, 2006), we have

Therefore, we obtain another expression of the Ricci-DeTurck flow
has a unique solution for a short time.

### 2.2 Curvature Explosion at Singularity

Subsequently, we will present the behavior of the Ricci flow in finite time and show that the evolution of the curvature tends to develop singularities. Before giving the core demonstration, Theorem 2.7, some foreshadowing proofs need to be prepared.

###### Theorem 2.4.

Given a smooth Riemannian metric on a closed manifold , there exists a maximal time interval such that a solution of the Ricci flow, with , exists and is smooth on , and this solution is unique.

###### Proof.

The proofs can be found in (Sheridan and Rubinstein, 2006). ∎

###### Theorem 2.5.

Let be a closed manifold and a smooth time-dependent metric on , defined for . If there exists a constant for all such that

 ∫T0∣∣∣∂∂tgx(t)∣∣∣g(t)dt≤C, (4)

then the metrics converge uniformly as approaches to a continuous metric that is uniformly equivalent to and satisfies

 e−Cgx(0)≤gx(T)≤eCgx(0).
###### Proof.

The proofs can be found in Appendix B.3. ∎

###### Corollary 2.6.

Let be a solution of the Ricci flow on a closed manifold. If is bounded on a finite time , then converges uniformly as approaches to a continuous metric which is uniformly equivalent to .

###### Proof.

The bound on implies one on . Based on Eq. 1, we can extend the bound on . Therefore, we obtain an integral of a bounded quantity over a finite interval is also bounded, by Theorem 2.5. ∎

###### Theorem 2.7.

If is a smooth metric on a compact manifold , the Ricci flow with has a unique solution on a maximal time interval . If , then

 limt→T(supx∈M|Rmx(t)|)=∞. (5)
###### Proof.

For a contradiction, we assume that is bounded by a constant. It follows from Corollary 2.6 that the metrics converge uniformly in the norm induced by to a smooth metric . Based on Theorem 2.4, it is possible to find a solution to the Ricci flow on because the smooth metric is uniformly equivalent to initial metric .

Hence, one can extend the solution of the Ricci flow after the time point , which is the result for continuous derivatives at . Naturally, the time of existence of the Ricci flow has not been maximal, which contradicts our assumption. In other words, is unbounded. ∎

According to Theorem 2.7, the Riemann curvature becomes divergent and tends to explode, as approaching the singular time .

## 3 The Poincaré Ball under Ricci Flow

### 3.1 Basics of Hyperbolic Space and The Poincaré Ball

The hyperbolic space has several isometric models (Anderson, 2006). In this paper, similarly as in (Nickel and Kiela, 2017) and (Ganea et al., 2018) , we choose an -dimensional Poincaré ball with radius .

Empirically, the Poincaré ball can be defined by the background manifold endowed with the Hyperbolic metric:

 gHx=λ2xgE, where λx:=21−r∥x∥2. (6)

Note that the Euclidean metric

is equal to the identity matrix

. For , denotes the open ball of radius . In particular, if , then one recovers the Euclidean space .

By Corollary 3.1, The Riemannian gradient endowed with for any point is known to be given by 222Note that the Riemannian gradient is similar to the natural gradient (Martens and Grosse, 2015; Martens, 2020) in Riemannian manifold defined by the KL divergence.

 ∂Hx=1λ2x∂E, where λx:=21−r∥x∥2. (7)
###### Corollary 3.1.

In view of information geometry (Amari, 2016), the steepest descent direction in a Riemannian manifold endowed with satisfies

 ∂g=g−1∂E, (8)

with respect to the steepest descent direction in Euclidean space.

###### Proof.

The proofs can be found in Appendix D. ∎

### 3.2 The Hyperbolic Metric

As the Hyperbolic space evolves under Ricci flow, it is convenient to consider the rescaled Ricci-DeTurck flow (Schnürer et al., 2010) by Eq. 3

 ∂∂t¯g(t) =−2Ric(¯g(t))+∇iWj+∇jWi−2(n−1)¯g(t) (9) ¯g(0) =¯g0, where Wi=gpqgij(Γ¯gΓjpq−ΓgHΓjpq).

The Hyperbolic metric on of sectional curvature is a stationary point to Eq. 9.

Subsequently, we will discuss the Poincaré ball still uniformly converge to the Poincaré ball under the rescaled Ricci-DeTurck flow when the given perturbation satisfies Definition 3.2. Obviously, the introduction of the Ricci flow can ensure dynamically stable neural manifolds 333We assume that the perturbation caused by each iteration of the training of neural networks satisfies Definition 3.2..

###### Definition 3.2.

Let be a metric on . There exists a such that

 (1+ϵ)−1gH≤¯g≤(1+ϵ)gH,

which can be said that is -close to .

### 3.3 Finite Time Existence

We denote the norm of a tensor as , then

###### Lemma 3.3.

Given a Riemannian metric on , there exists a maximal time interval such that a solution to Eq. 9 exists and is smooth 444Smooth is equivalent to , i.e., any derivative is continuous. on . Specifically, is -close to of sectional curvature . If a small enough , then there exists

 ∂∂t∣∣¯g−gH∣∣2≤ Δ∣∣¯g−gH∣∣2−2|∇(¯g−gH)|2 (10) +4∣∣¯g−gH∣∣2

where is the Laplacian defined by .

###### Proof.

The proofs can be found in Appendix C. ∎

###### Corollary 3.4.

Given the Poincaré ball where denotes the boundary, there exists a maximal interval such that a solution on to Eq. 9 exists and is smooth on . Specifically, . There exists a constant such that

 supDnr∣∣¯g−gH∣∣≤C. (11)
###### Proof.

As long as Definition 3.2 is satisfied, Lemma 3.3 gives the proofs. ∎

We denote the -norm with respect to the Hyperbolic metric as , then

###### Theorem 3.5.

For a solution on to Eq. 9 that exists and is smooth on a maximal time interval , if is a metric on satisfying where , then there exists a constant such that

 ∥¯g(t)−gH∥L∞≤C∥¯g(0)−gH∥L∞≤ϵ⋅C. (12)
###### Proof.

The proof follows the similar statement (Simon, 2002; Bamler, 2010). ∎

Empirically, Theorem 3.5 yields that the solutions to the rescaled Ricci-DeTurck flow exists in finite time. Otherwise, Corollary 3.4 gives an upper bound on , which allows us to integrate it.

### 3.4 Exponential Convergence

###### Theorem 3.6.

Based on Corollary 3.4, we further have

 ∫Dnr∣∣¯g(t)−gH∣∣2dΩ≤e−(A(n,r)−4)t∫Dnr∣∣¯g(0)−gH∣∣2dΩ (13)

where is the volume element with respect to .

###### Proof.

Using Lemma 3.3, we yield

 ∂∂t∫Dnr∣∣¯g−gH∣∣2dΩ≤4∫Dnr∣∣¯g−gH∣∣2dΩ +∫Dnr¯gij∇i∇j∣∣¯g−gH∣∣2−2∣∣∇(¯g−gH)∣∣2dΩ =4∫Dnr∣∣¯g−gH∣∣2dΩ−2∫Dnr∣∣∇(¯g−gH)∣∣2dΩ

In the second step, we use .

As , we compute, using Stokes theorem (Wald, 2010),

 ∫Dnr∇i(¯gij∇j)∣∣¯g−gH∣∣2dΩ (14) =∫∂Dnrni¯gij∇j∣∣¯g−gH∣∣2dS=0

where is the area element with respect to and is the outer normal vector with respect to . We define

 A(n,r)=inf∫Dnr2∣∣∇(¯g−gH)∣∣2+∇i¯gij∇j∣∣¯g−gH∣∣2dΩ∫Dnr|¯g−gH|2dΩ, (15)

then

 ∂∂t∫Dnr∣∣¯g−gH∣∣2dΩ≤(4−A(n,r))∫Dnr∣∣¯g−gH∣∣2dΩ. (16)

In view of differential equation, the above inequality extends, using and ,

 ∫∂F(t)F(t)≤(4−A(n,r))∫∂t →logF(t)≤(4−A(n,r))t+logF(0) →F(t)≤e−(A(n,r)−4)tF(0).

Based on Theorem 3.5, we have because decays. ∎

###### Lemma 3.7.

Based on Theorem 3.6

, we yield the estimate

 ∥∥¯g(t)−gH∥∥2L2(Dnr)≤e−(A(n,r)−4)t∥∥¯g(0)−gH∥∥2L2(Dnr) (17)

where is the -norm with respect to the Hyperbolic metric .

###### Proof.

The proofs follow directly from Theorem 3.5 and Corollary 3.4. ∎

Consequently, we see that the scaled Ricci-DeTurck flow is exponential convergence for an -norm perturbation.

## 4 Ricci Flow Assisted Neural Networks

### 4.1 Poincaré Embeddings

On the one hand, a neural network is trained on the given dataset and its representation will gradually become regular. On the other hand, the Ricci flow is a process of “surgery” on a manifold, which will make the manifold also become regular. In this way, we can embed a Riemannian manifold into the neural network, and utilize the Ricci flow to assist in the training of neural networks on dynamically stable manifolds (Chen et al., 2021).

Based on the previous section, we have embedded the Poincaré ball into a neural network. And initial metrics for an -norm perturbation that deviates from the Hyperbolic metric will converge to under the Ricci flow. That is excellent because we embed a dynamically stable Poincaré ball for neural networks, which will not affect the convergence of neural networks.

Empirically, for each input, we can embed an -dimensional Poincaré ball into the output of neural networks before the softmax. Apparently, time-dependent metrics corresponding to the output can be well-defined.

Now, Let us see the Ricci curvature of neural networks. According to Eq. 1, the tensor approaches zero, as approaches zero. We can yield, referring to Appendix A,

 −2Ric(¯g)=−2Riijk (18) =¯gip(∂i∂j¯gpk−∂i∂k¯gpj+∂p∂k¯gij+∂p∂j¯gik).

Inspired by (Kaul and Lall, 2019), we treat the term and as the translation and rotation by considering translation invariance instead of rotation invariance. As for rotations, the standard data augmentation does not include such transformations. For the fairness of ablation studies, we just exclude rotations, i.e., . Therefore, and can be the row and column translation respectively for the input data. Consequently, we have

 −2Ric(¯g)=¯gip∂i(∂j¯gpk−∂k¯gpj). (19)

For the convenient form, we approximate partial derivatives with differences, with respect to the input translation dimensions and

 ∂k¯g=(¯g|k1−¯g|k2)/(k1−k2), (20)
 ∂j¯g=(¯g|j1−¯g|j2)/(j1−j2). (21)

In general, and are translations less than 4 pixels, which is consistent with data augmentation.

### 4.2 Mutual Mapping of Euclidean Space and The Poincaré Ball

We consider alternating neural manifolds between Euclidean embeddings and Poincaré embeddings in back-propagation, which greatly retains the common advantages of the Euclidean space and Hyperbolic space.

Firstly, we map the neural manifold from the Euclidean space to the Poincaré ball , where is a solution to the Ricci flow. By adding the regularization to the neural network, the tensor approaches zero to make satisfy the Definition 3.2. Secondly, we perform the Ricci flow for evolving the metric to the Hyperbolic metric . Thirdly, we map the neural manifold from the Poincaré ball to the Euclidean space

. Fourthly, we complete the backpropagation of the gradient for the neural network.

Since the Poincaré ball is conformal to the Euclidean space, we give the exponential and logarithmic maps 555In (Ganea et al., 2019), the exponential map and logarithm map are also used as mutual mapping of the Euclidean space and the Poincaré ball, i.e., and .:

###### Lemma 4.1.

With respect to the origin , the exponential map and the logarithmic map are given for and by:

 expr(μ)=tanh(√r∥μ∥)μ√r∥μ∥, (22)
 logr(ν)=tanh−1(√r∥ν∥)ν√r∥ν∥. (23)
###### Proof.

The proofs are followed with (Ganea et al., 2018). The algebraic check concludes the identity . ∎

The overall process is illustrated in Figure 1 where the map and are the exponential map and logarithm map.

### 4.3 Eucl2Hyp2Eucl Neural Networks

The above discussion leaves the foreshadowing for designing neural networks. The precise computation of neural networks with layers is performed as follows:

 x=f(a;θ,b), (24) f(a;θ,b)=σl[⋯σ2(σ1(aθ1+b1)θ2+b2)+⋯+bl], y=softmax(x),

where is the input,

is a nonlinear “activation” function,

is the weight and is the bias.

In addition to requiring the neural networks to converge, we also require that is -close to based on Definition 3.2

, which is the necessary condition for the evolution of the Ricci flow. Obviously, we may achieve the goal by adding a regularization into loss function. Followed by Eq.

19, we yield the regularization

 N¯gN=∥∥∥¯g|k1−¯g|k2k1−k2−¯g|l1−¯g|l2l1−l2∥∥∥2L2. (25)

Combined with Definition 3.2, the upper bound of Eq. 25 is estimated by

 N¯gN (26) ≤∥∥ ∥∥(1+ϵ)2gH|k1−gH|k2(1+ϵ)(k1−k2)−gH|l1−(1+ϵ)2gH|l2(1+ϵ)(l1−l2)∥∥ ∥∥2L2 =gE1+ϵ∥∥ ∥∥(1+ϵ)2λ2xk1−λ2xk2k1−k2−λ2xl1−(1+ϵ)2λ2xl2l1−l2∥∥ ∥∥2L2,

and the lower bound of Eq. 25 is estimated by

 N¯gN (27) ≥∥∥ ∥∥gH|k1−(1+ϵ)2gH|k2(1+ϵ)(k1−k2)−(1+ϵ)2gH|l1−gH|l2(1+ϵ)(l1−l2)∥∥ ∥∥2L2 =gE1+ϵ∥∥ ∥∥λ2xk1−(1+ϵ)2λ2xk2k1−k2−(1+ϵ)2λ2xl1−λ2xl2l1−l2∥∥ ∥∥2L2.

As the evolution of the Ricci flow approaches to converge, the estimate of Eq. 25 tends to be stable:

 N¯gNRicci flow⟶N=∥∥ ∥∥λ2xk1−λ2xk2k1−k2−λ2xl1−λ2xl2l1−l2∥∥ ∥∥2L2. (28)

Consequently, we divide Ricci flow assisted Eucl2Hyp2Eucl neural manifold evolution into two stages: coarse convergence and fine convergence. With the help of the regularization , the corresponding metric of this neural manifold will converge to the neighbourhood of , and then the Ricci flow will complete the final convergence. Each training of the neural network includes these two stages, therefore, Ricci flow assisted Eucl2Hyp2Eucl neural networks are trained on dynamically stable neural manifolds as shown in Algorithm 1.

## 5 Experiment

CIFAR datasets. The two CIFAR datasets (Krizhevsky et al., 2009) consist of natural color images with 32

32 pixels, respectively 50,000 training and 10,000 test images, and we hold out 5,000 training images as a validation set from the training set. CIFAR10 consists of images organized into 10 classes and CIFAR100 into 100 classes. We adopt a standard data augmentation scheme (random corner cropping and random flipping) that is widely used for these two datasets. We normalize the images using the channel means and standard deviations in preprocessing.

Settings.

We set total training epochs as 200 where the learning rate of each parameter group is set as a cosine annealing schedule. The learning strategy is a weight decay of 0.0005, a batch size of 128, SGD optimization. On CIFAR10 and CIFAR100 datasets, we apply ResNet18

(He et al., 2016), ResNet50 (He et al., 2016), VGG11 (Simonyan and Zisserman, 2014) and MobileNetV2 (Sandler et al., 2018) to test the classification accuracy. All experiments are conducted for 5 times, and the statistics of the last 10/5 epochs’ test accuracy are reported as a fair comparison.

Details. For Ricci flow assisted Eucl2Hyp2Eucl neural networks and all Euclidean neural networks, we both use the same training strategy and network structure. Note that we both train neural networks from scratch with the initialization Xavier (Glorot and Bengio, 2010).

### 5.1 Classification Tasks

In this experiment, we compare the classification accuracy of Ricci flow assisted Eucl2Hyp2Eucl neural networks and all Euclidean neural networks on CIFAR datasets. As Table 1 shows 666Note that Ricci flow assisted Eucl2Hyp2Eucl neural networks are only used in the training, and we also use all Euclidean neural networks in offline inference., our proposed Ricci flow assisted Eucl2Hyp2Eucl neural networks has better performance than all Euclidean neural networks. Meanwhile, compared to CIFAR10 dataset, the improvement on CIFAR100 dataset seem to be more remarkable. We conjecture that more complex classification tasks bring more meaningful geometric structures to the neural network, and Ricci flow assisted Eucl2Hyp2Eucl neural networks can just mine these geometric representations as much as possible.

### 5.2 Metrics Evolution Analysis

For the training of Ricci flow assisted Eucl2Hyp2Eucl neural networks, we hope to observe the evolution of neural manifolds by the change of metrics. Meanwhile, as far as we define the metric , we can use the length to intuitively reflect the change of metrics. Specifically, we define a ball whose radius is equal to :

 Br(t):=⎧⎪⎨⎪⎩r=√∑i,j¯gij(t)dξidξj⎫⎪⎬⎪⎭. (29)

By observing the change of the ball in Figure 2, we can know the change of the metric. Through simple observation, metrics have a rapid convergence at the beginning of training, and then become relatively flat. It seems that the convergence behavior of metrics is affected by the network structure rather than the depth (There has the similar evolution behavior on ResNet18 and ResNet50). In the training, experiments show that all metrics for Ricci flow assisted Eucl2Hyp2Eucl neural networks have a stable convergence. It is consistent with the evolution of scaled Ricci-DeTurck flow in Section 3.

### 5.3 Training Time Analysis

Our hardware environment is conducted with an Intel(R) Xeon(R) E5-2650 v4 CPU(2.20GHz), GeForce GTX 1080Ti GPU. We test the training time per iteration as for Ricci flow assisted Eucl2Hyp2Eucl neural networks and all Euclidean neural networks with ResNet18 in CIFAR10. For finishing one iteration of training, Ricci flow assisted Eucl2Hyp2Eucl neural network costs 81.06s and all Euclidean neural network costs 39.76s.

## 6 Conclusion

Ricci flow assisted Eucl2Hyp2Eucl neural networks not only provide the simply convenience and closed-form formula in the Euclidean space (for offline inference), but also take into account the geometric representation of the Hyperbolic space. This provides a new idea for neural networks to obtain meaningful geometric representations. Empirically, we found that Ricci flow assisted neural networks outperform with their all Euclidean versions on CIFAR datasets.

The Ricci flow plays a vital role in Eucl2Hyp2Eucl neural networks, not only eliminating an -norm perturbation of the Hyperbolic metric, but also acting as a smooth evolution from the Euclidean space to the Poincaré ball. In fact, Eucl2Hyp2Eucl neural networks without Ricci flow will become the same as all Euclidean neural networks. We hope that this paper will open an exciting future direction which will use the Ricci flow to assist neural network training in a dynamically stable manifold.

## References

• S. Amari (2016) Information geometry and its applications. Vol. 194, Springer. Cited by: §1, Corollary 3.1.
• S. Amari, O. E. Barndorff-Nielsen, R. E. Kass, S. L. Lauritzen, and C. Rao (1987) Differential geometry in statistical inference. Cited by: §1.
• J. W. Anderson (2006) Hyperbolic geometry. Springer Science & Business Media. Cited by: §3.1.
• R. H. Bamler (2010) Stability of hyperbolic manifolds with cusps under ricci flow. arXiv preprint arXiv:1004.2058. Cited by: §3.3.
• M. M. Bronstein, J. Bruna, Y. LeCun, A. Szlam, and P. Vandergheynst (2017) Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine 34 (4), pp. 18–42. Cited by: §1.
• J. Chen, T. Huang, W. Chen, and Y. Liu (2021) Thoughts on the consistency between ricci flow and neural network behavior. arXiv preprint arXiv:2111.08410. Cited by: §4.1.
• L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv preprint arXiv:1412.7062. Cited by: §1.
• D. M. DeTurck (1983) Deforming metrics in the direction of their ricci tensors. Journal of Differential Geometry 18 (1), pp. 157–162. Cited by: §2.1.
• O. Ganea, G. Bécigneul, and T. Hofmann (2018) Hyperbolic entailment cones for learning hierarchical embeddings. In International Conference on Machine Learning, pp. 1646–1655. Cited by: §1, §3.1, §4.2.
• O. Ganea, G. Bécigneul, and T. Hofmann (2019) Hyperbolic neural networks. Advances in Neural Information Processing Systems 31, pp. 5345–5355. Cited by: §1, footnote 5.
• R. Girshick (2015) Fast r-cnn. In

Proceedings of the IEEE international conference on computer vision

,
pp. 1440–1448. Cited by: §1.
• X. Glorot and Y. Bengio (2010) Understanding the difficulty of training deep feedforward neural networks. In

Proceedings of the thirteenth international conference on artificial intelligence and statistics

,
pp. 249–256. Cited by: §1, §5.
• R. S. Hamilton et al. (1982) Three-manifolds with positive ricci curvature. J. Differential geom 17 (2), pp. 255–306. Cited by: Appendix C, §2.
• K. He, X. Zhang, S. Ren, and J. Sun (2015)

Delving deep into rectifiers: surpassing human-level performance on imagenet classification

.
In Proceedings of the IEEE international conference on computer vision, pp. 1026–1034. Cited by: §1.
• K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

Proceedings of the IEEE conference on computer vision and pattern recognition

,
pp. 770–778. Cited by: §5.
• P. Kaul and B. Lall (2019) Riemannian curvature of deep neural networks. IEEE transactions on neural networks and learning systems 31 (4), pp. 1410–1416. Cited by: §4.1.
• A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Cited by: §5.
• A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25, pp. 1097–1105. Cited by: §1.
• O. A. Ladyzhenskaia, V. A. Solonnikov, and N. N. Ural’tseva (1988) Linear and quasi-linear equations of parabolic type. Vol. 23, American Mathematical Soc.. Cited by: §2.1.
• H. Li and H. Yin (2010) On stability of the hyperbolic space form under the normalized ricci flow. International Mathematics Research Notices 2010 (15), pp. 2903–2924. Cited by: §1.
• J. Long, E. Shelhamer, and T. Darrell (2015) Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440. Cited by: §1.
• J. Martens and R. Grosse (2015) Optimizing neural networks with kronecker-factored approximate curvature. In International conference on machine learning, pp. 2408–2417. Cited by: footnote 2.
• J. Martens (2020) New insights and perspectives on the natural gradient method. Journal of Machine Learning Research 21, pp. 1–76. Cited by: footnote 2.
• M. Nickel and D. Kiela (2017) Poincaré embeddings for learning hierarchical representations. Advances in neural information processing systems 30, pp. 6338–6347. Cited by: §3.1.
• F. Sala, C. De Sa, A. Gu, and C. Ré (2018) Representation tradeoffs for hyperbolic embeddings. In International conference on machine learning, pp. 4460–4469. Cited by: §1.
• M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. Chen (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520. Cited by: §5.
• O. C. Schnürer, F. Schulze, and M. Simon (2010) Stability of hyperbolic space under ricci flow. arXiv preprint arXiv:1003.2107. Cited by: §1, §3.2.
• N. Sheridan and H. Rubinstein (2006) Hamilton’s ricci flow. Honour thesis. Cited by: §2.2, footnote 1.
• W. Shi (1989) Deforming the metric on complete riemannian manifolds. Journal of Differential Geometry 30 (1), pp. 223–301. Cited by: Appendix C.
• M. Simon (2002) Deformation of riemannian metrics in the direction of their ricci curvature. Communications in Analysis and Geometry 10 (5), pp. 1033–1074. Cited by: §3.3.
• K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §1, §5.
• V. Suneeta (2009) Investigating the off-shell stability of anti-de sitter space in string theory. Classical and Quantum Gravity 26 (3), pp. 035023. Cited by: §1.
• R. M. Wald (2010) General relativity. University of Chicago press. Cited by: §3.4.
• R. Ye (1993) Ricci flow, einstein metrics and space forms. Transactions of the american mathematical society 338 (2), pp. 871–896. Cited by: §1.

## Appendix A Differential Geometry

1. Riemann curvature tensor (Rm) is a (1,3)-tensor defined for a 1-form :

 Rlijkωl=∇i∇jωk−∇j∇iωk

where the covariant derivative of satisfies

 ∇pFj1…jli1…ik= ∂pFj1…jli1…ik+l∑s=1Fj1…q…jli1…ikΓjspq−k∑s=1Fj1…jli1…q…ikΓqpis.

In particular, coordinate form of the Riemann curvature tensor is:

 Rlijk=∂iΓljk−∂jΓlik+ΓpjkΓlip−ΓpikΓljp.

2. Christoffel symbol in terms of an ordinary derivative operator is:

 Γkij=12gkl(∂igjl+∂jgil−∂lgij).

3. Ricci curvature tensor (Ric) is a (0,2)-tensor:

 Rij=Rppij.

4. Scalar curvature is the trace of the Ricci curvature tensor:

 R=gijRij.

5. Lie derivative of in the direction :

 Ldφ(t)dtF=(ddtφ∗(t)F)t=0

where for is a time-dependent diffeomorphism of to .

## Appendix B Proof for the Ricci Flow

### b.1 Proof for Lemma 2.3

###### Definition B.1.

The linearization of the Ricci curvature tensor is given by

 D[Ric](h)ij=−12gpq(∇p∇qhij+∇i∇jhpq−∇q∇ihjp−∇q∇jhip).
###### Proof.

Based on Appendix A, we have

 ∇q∇ihjp=∇i∇qhjp−Rrqijhrp−Rrqiphjm.

Combining with Definition B.1, we can obtain the deformation equation because of ,

 D[−2Ric](h)ij= gpq∇p∇qhij+∇i(12∇jhpq−∇qhjp)+∇j(12∇ihpq−∇qhip)+O(hij) = gpq∇p∇qhij+∇iVj+∇jVi+O(hij).

### b.2 Description of the DeTurck Trick

Using a time-dependent diffeomorphism , we express the pullback metrics :

 g(t)=φ∗(t)¯g(t)

is a solution of the Ricci flow. Based on the chain rule for the Lie derivative in Appendix

A, we can calculate

 ∂∂tg(t) =∂(φ∗(t)¯g(