Estimating Mutual Information via Geodesic kNN
Estimating mutual information (MI) between two continuous random variables X and Y allows to capture non-linear dependencies between them, non-parametrically. As such, MI estimation lies at the core of many data science applications. Yet, robustly estimating MI for high-dimensional X and Y is still an open research question. In this paper, we formulate this problem through the lens of manifold learning. That is, we leverage the common assumption that the information of X and Y is captured by a low-dimensional manifold embedded in the observed high-dimensional space and transfer it to MI estimation. As an extension to state-of-the-art kNN estimators, we propose to determine the k-nearest neighbours via geodesic distances on this manifold rather than form the ambient space, which allows us to estimate MI even in the high-dimensional setting. An empirical evaluation of our method, G-KSG, against the state-of-the-art shows that it yields good estimations of the MI in classical benchmark, and manifold tasks, even for high dimensional datasets, which none of the existing methods can provide.
READ FULL TEXT