Probabilistic methods for approximate archetypal analysis

08/12/2021
by   Ruijian Han, et al.
0

Archetypal analysis is an unsupervised learning method for exploratory data analysis. One major challenge that limits the applicability of archetypal analysis in practice is the inherent computational complexity of the existing algorithms. In this paper, we provide a novel approximation approach to partially address this issue. Utilizing probabilistic ideas from high-dimensional geometry, we introduce two preprocessing techniques to reduce the dimension and representation cardinality of the data, respectively. We prove that, provided the data is approximately embedded in a low-dimensional linear subspace and the convex hull of the corresponding representations is well approximated by a polytope with a few vertices, our method can effectively reduce the scaling of archetypal analysis. Moreover, the solution of the reduced problem is near-optimal in terms of prediction errors. Our approach can be combined with other acceleration techniques to further mitigate the intrinsic complexity of archetypal analysis. We demonstrate the usefulness of our results by applying our method to summarize several moderately large-scale datasets.

READ FULL TEXT
research
07/12/2018

Turning Big data into tiny data: Constant-size coresets for k-means, PCA and projective clustering

We develop and analyze a method to reduce the size of a very large set o...
research
05/28/2021

Privately Learning Subspaces

Private data analysis suffers a costly curse of dimensionality. However,...
research
03/27/2018

Fast Computation of Robust Subspace Estimators

Dimension reduction is often an important step in the analysis of high-d...
research
09/08/2021

Functional Principal Subspace Sampling for Large Scale Functional Data Analysis

Functional data analysis (FDA) methods have computational and theoretica...
research
10/27/2014

Maximally Informative Hierarchical Representations of High-Dimensional Data

We consider a set of probabilistic functions of some input variables as ...
research
07/26/2021

Global optimization using random embeddings

We propose a random-subspace algorithmic framework for global optimizati...
research
10/16/2020

Consistency of archetypal analysis

Archetypal analysis is an unsupervised learning method that uses a conve...

Please sign up or login with your details

Forgot password? Click here to reset