CO-SNE: Dimensionality Reduction and Visualization for Hyperbolic Data
Hyperbolic space can embed tree metric with little distortion, a desirable property for modeling hierarchical structures of real-world data and semantics. While high-dimensional embeddings often lead to better representations, most hyperbolic models utilize low-dimensional embeddings, due to non-trivial optimization as well as the lack of a visualization for high-dimensional hyperbolic data. We propose CO-SNE, extending the Euclidean space visualization tool, t-SNE, to hyperbolic space. Like t-SNE, it converts distances between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of high-dimensional data X and low-dimensional embeddings Y. However, unlike Euclidean space, hyperbolic space is inhomogeneous: a volume could contain a lot more points at a location far from the origin. CO-SNE thus uses hyperbolic normal distributions for X and hyberbolic Cauchy instead of t-SNE's Student's t-distribution for Y, and it additionally attempts to preserve X's individual distances to the Origin in Y. We apply CO-SNE to high-dimensional hyperbolic biological data as well as unsupervisedly learned hyperbolic representations. Our results demonstrate that CO-SNE deflates high-dimensional hyperbolic data into a low-dimensional space without losing their hyperbolic characteristics, significantly outperforming popular visualization tools such as PCA, t-SNE, UMAP, and HoroPCA, the last of which is specifically designed for hyperbolic data.
READ FULL TEXT