Unsupervised learning attempts to learn to encode data living in a high dimensional space in a more useful low dimensional representation. This is often reasonable assuming the manifold hypothesis, which states that real world high dimensional data lives on a low dimensional manifold.
Popular methods methods for representation learning, such as Auto-Encoders, learn an encoder which represents the data in Euclidean spaces as well as a decoder that reconstructs the original representation from the latent space. However, when the true manifold of the data has a non-trivial topology, encoding the data in a Euclidean space leads to poor representations.
In such scenarios, one of two things can happen. When the latent space is sufficiently bigger than the intrinsic dimensionality of the true manifold, the true manifold is again embedded in the latent space, which leads to low density areas in the latent space. In the other case, when the dimensionality of the Euclidean latent space is comparable to the dimensionality of the true manifold, but the true manifold can not be embedded in the latent space, the encoder must be discontinuous. This problem was described in Davidson et al. (2018) and Falorsi et al. (2018)
In many situations, the true manifold is a known non-trivial manifold, for example, when inferring the pose of an object from images in an unsupervised fashion or when inferring joint angles of images of a robot arm. Three natural requirements of an ideal auto encoder can then be formulated. Firstly, we desire the encoder to be bijective, so that every element of the latent space corresponds to exactly one point on the data manifold. Secondly, we desire continuous trajectories on the data manifold, such as motion of robot joint angles, to correspond to continuous trajectories in the latent space. Last, we desire conversely that continuous paths in the latent space correspond to continuous paths on the data manifold, which is to say that latent space interpolations are correct. An encoder that satisfies these requirements is referred to in topology as ahomeomorphism. When combined with a suitable decoder, we call it a Homeomorphic Auto-Encoder. However, the concept translates to popular representation learning methods like the Variational Auto-Encoder, BiGAN and InfoGAN as well.
One immediate consequence of having a homeomorphic encoder is that the latent space is homeomorphic to the true data manifold. This rules out Euclidean spaces for non-trivial true data manifolds. Since neural networks only map to Euclidean spaces, they must be combined with specifically designed functions. Topological arguments allows us to derive necessary and sufficient conditions for such a construction to be a homeomorphic Auto-Encoder.
2 The encoder is globally discontinuous
We assume the data lives on a path connected manifold topologically embedded in the data space for some with embedding . The image of the embedding is submanifold . We desire encoder to be such that its restriction is a homeomorphism. By the following theorem, this immediately leads to a somewhat surprising observation: when is non-trivial, the encoder must be globally discontinuous, even though it is continuous when restricted to .
Let be defined as above. Assume additionally that is continuous, in which case is a retract. Then for all , the -th homotopy group of , , is isomorphic to a subgroup of .
and induce -th homotopy group homomorphisms: and . By construction, the function is a homeomorphism, so is a group automorphism. Thus must be injective, which implies the thesis.
Let be defined as above. Assume having a non-trivial -th homotopy group for some . Then no such exist that is additionally continuous.
3 Existence of the encoder
We now turn to the question of whether a homeomorphic auto-encoder can always be constructed. Since Neural Networks only map Euclidean spaces to Euclidean spaces, we decompose our encoder , for to be learned by a Neural Network, with a Euclidean space, and a designed function . In the following subsections, we discuss the existence of and .
3.1 Existence and continuity of the map
Assuming the dimensionality of is sufficiently large, an embedding can be constructed for any manifold , by the Whitney Embedding Theorem (Theorem 2.6 in (Adachi et al., 1993)), which states that any manifold of dimensions can be embedded in as a closed subset. We take this Euclidean space to be and call the closed embedding .
Then the metric projection can be defined in the following way.
For any , we can define the distance to :
where the last equality holds since is closed. Then the metric projection is:
The projection map may have multiple value may be discontinuous, but the following lemma shows that it is uniquely defined almost everywhere in and that the projection map is continuous almost everywhere.
Thus it is a potential candidate for the desired mapping.
For a closed set , for almost all points , there is a unique point , such that . Additionally, the metric projection is continuous almost everywhere.
First, we show that is 1-Lipschitz. Let . Without loss of generality, choose and let s.t. . Then:
where in the last step we used the triangle inequality.
Therefore, by Rademacher’s theorem (Federer, 1996), is differentiable almost everywhere.
The envelope theorem in Milgrom and Segal (2002) states that if is differentiable in and s.t. , then , so that:
From this expression of the gradient, we can derive uniqueness of the minimising point. Let be differentiable at and let and have that , we have:
Furthermore, we have, for all such that is differentiable:
so that is continuous almost everywhere.
The case of the hypersphere nicely illustrates these properties. The manifold can be embedded in and the projection map is the normalisation: . This map is well-defined and continuous everywhere, except on the set containing just the origin, which is of measure 0. In this particular case the projection map is easily computable and smooth almost everywhere, but this is not the case for any manifold.
3.2 Existence and continuity of the map
By the above argument we can construct an embedding and a map such that and we assumed an embedding exists. Then the function , is a homeormorphism.
Since both and are Euclidean, by the Tietze extention theorem (Munkres, 2000), this map can be extended from the closed set to the entire space , making a continuous map . Since neural networks can approximate continuous functions arbitrarily well (Hornik, 1991), we can then try to learn a neural network to approximate .
4 Constraints on a practical encoder
We can therefore always construct a homeormorphic auto-encoder that is continuous almost everywhere, but the projection map may not be differentiable, nor easily computable, in which cases it is impractical for the learning the neural network. For many manifolds a practical alternative can instead be hand-crafted. For such a construction we can derive several necessary conditions.
4.1 Decomposing the encoder
Knowing that the encoder must be discontinuous for non-trivial manifolds, it can always be decomposed in the composition of a continuous function , a discontinuous function and a continuous function , where and are possibly identity maps. We can then define spaces as in Figure 1.
We additionally define subsets: and restricted maps so that .
4.2 Necessary conditions
When designing such an encoder, care has to be taken to ensure that the intermediate spaces and functions are such that it is possible to express a homeomorphic encoder. This leads to the following necessary conditions.
Let , , , , and be defined as above. If is additionally compact and Hausdorff, then is a homeomorphism and is embeddable in .
As defined above, is continuous and bijective, so is continuous bijective. Since compact then is also compact. Let be any closed subset of , then compact. Now, since continuous maps map compact sets to compact sets then is compact and, since subspace of an Hausdorff space is Hausdorff, is closed. is thus closed. Then is a homeomorphism and is an embedding of in
Let , , , and be defined as above. If additionally is compact then is a homeomorphism and is embeddable in .
As defined above, is continuous and bijective, so is continuous bijective. Since compact, is closed, following an argument similar to the proof of Lemma 3, thus is a a homeomorphism. is an embedding of in .
Let , , , and be defined as above. Alternatively if we additionally assume is compact, then the continuity of implies that is compact, since is closed and thus compact in .
Since we want to use neural networks to learn a homeomorphic mapping of to , we need to be able to be able to apply gradient descent methods. In order for this to be possible we need the discontinuity points of to be negligible (for example sets of measure ), or to add constraints on to encourage to lie outside of the discontinuity regions. If we make this latter modelling choice, we then see that needs to be embeddable in all intermediate spaces:
Let , and be as defined above. If additionally is assumed to be continuous, then , , are all homeomorphisms and must be embeddable in , and .
Notice that by construction , , are bijective functions. We prove that is open: Take open in , since , are continuous then is open in . Now since is a homeomorphism is open. Now using that is a homeomorphism we can prove that is open. Finally using that is a homeomorphism we can prove that is open. From this the thesis follows.
Let , and be defined as above. If additionally is a homeomorphism, by Theorem 1 using the fact that is continuous, is a retract, so for all , the -th homotopy group of , , is isomorphic to a subgroup of .
5 Application to
To show the utility of the proposed theory we it analyse four sensible candidate architectures for a homeomorphic encoder to the compact manifold of 3D rotations . The candidates are taken from Falorsi et al. (2018). Throughout the next lines we will indicate with a neural network and we will use the fact that .
where is the exponential map from the Lie algebra to the Lie group, which is always continuous and surjective for . The encoder is globally continuous, this contradicts the necessary condition following from Theorem 1.
Axis-Angle111Here, indicate the components of , and the vector space isomorphism that maps to the lie algebra of , i.e. the space of skew-symmetric real matrices:
where correspond to the axis and the angle of the rotation. Since is compact, then by Lemma 5, must be a homeomorphism, and thus embeddable , which is false. Thus the necessary conditions are not met.
This follows from the fact that can be seen as the (oriented) bases of a three dimensional space. This construction satisfies the sufficient condition following from Lemma 3. We see this by defining and , then satisfies the hypothesis of Lemma 3. In fact is a homeomorphism since for the matrices, the last column is automatically determined by the first two.
Note that these theoretical results are consistent with the experimental results in Falorsi et al. (2018), where it was found that only the ‘Basis’ method learns a homeomorphic encoder.
6 Conclusions and future work
We developed a theoretical framework for the analysis of architectures of Auto-Encoders for non-trivial manifolds. Several necessary conditions were stated for the Auto-Encoder to be homeomorphic, as well as a sufficient condition, which can guide the development of homeomorphic Auto-Encoders for various manifolds. For the manifold it was found that the theoretical results were consistent with experimental results in prior work.
In future work we will try to generalise the developed theory using other topological invariants besides homotopy and to situations where the manifold is not embedded, but imbedded or embedded with symmetries. In addition we would like to analyse what happens when the embedding is noisy in a probabilistic framework. Closer attention will be given to investigating to what extent these results are relevant for supervised problems, such as pose estimations. Finally we will try to apply the principles of the homeomorphic Auto-Encoders to practical problems and existing architectures, such as a version ofEslami et al. (2018) without supervised pose labels.
We would like to thank Patrick Forré for his suggestions on improving an earlier draft and Tim Davidson and Taco Cohen for their helpful discussions.
- Adachi et al. (1993) Adachi, M., Hudson, K., and Society, A. M. (1993). Embeddings and Immersions. Translations of mathematical monographs. American Mathematical Society.
- Davidson et al. (2018) Davidson, T. R., Falorsi, L., De Cao, N., Kipf, T., and Tomczak, J. M. (2018). Hyperspherical variational auto-encoders. UAI.
Eslami et al. (2018)
Eslami, S. M. A., Jimenez Rezende, D., Besse, F., Viola, F., Morcos, A. S.,
Garnelo, M., Ruderman, A., Rusu, A. A., Danihelka, I., Gregor, K., Reichert,
D. P., Buesing, L., Weber, T., Vinyals, O., Rosenbaum, D., Rabinowitz, N.,
King, H., Hillier, C., Botvinick, M., Wierstra, D., Kavukcuoglu, K., and
Hassabis, D. (2018).
Neural scene representation and rendering.Science, 360(6394):1204–1210.
- Falorsi et al. (2018) Falorsi, L., de Haan, P., Davidson, T. R., De Cao, N., Weiler, M., Forré, P., and Cohen, T. S. (2018). Explorations in homeomorphic variational auto-encoding. ICML Workshop on Theoretical Foundations and Applications of Deep Generative Models.
- Federer (1996) Federer, H. (1996). Geometric Measure Theory (Classics in Mathematics). Springer.
- Hornik (1991) Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural networks, 4(2):251–257.
- Milgrom and Segal (2002) Milgrom, P. and Segal, I. (2002). Envelope theorems for arbitrary choice sets. Econometrica, 70(2):583–601.
- Munkres (2000) Munkres, J. (2000). Topology. Featured Titles for Topology Series. Prentice Hall, Incorporated.