Archetypal Analysis++: Rethinking the Initialization Strategy

01/31/2023
by   Sebastian Mair, et al.
0

Archetypal analysis is a matrix factorization method with convexity constraints. Due to local minima, a good initialization is essential. Frequently used initialization methods yield either sub-optimal starting points or are prone to get stuck in poor local minima. In this paper, we propose archetypal analysis++ (AA++), a probabilistic initialization strategy for archetypal analysis that sequentially samples points based on their influence on the objective, similar to k-means++. In fact, we argue that k-means++ already approximates the proposed initialization method. Furthermore, we suggest to adapt an efficient Monte Carlo approximation of k-means++ to AA++. In an extensive empirical evaluation of 13 real-world data sets of varying sizes and dimensionalities and considering two pre-processing strategies, we show that AA++ almost consistently outperforms all baselines, including the most frequently used ones.

READ FULL TEXT

page 5

page 17

page 19

research
11/21/2016

Effective Deterministic Initialization for k-Means-Like Methods via Local Density Peaks Searching

The k-means clustering algorithm is popular but has the following main d...
research
07/02/2020

Persistent Neurons

Most algorithms used in neural networks(NN)-based leaning tasks are stro...
research
09/28/2019

A Note On k-Means Probabilistic Poverty

It is proven, by example, that the version of k-means with random initia...
research
02/23/2017

Stability of Topic Modeling via Matrix Factorization

Topic models can provide us with an insight into the underlying latent s...
research
08/23/2023

Constrained Stein Variational Trajectory Optimization

We present Constrained Stein Variational Trajectory Optimization (CSVTO)...
research
04/08/2021

Numerics and analysis of Cahn–Hilliard critical points

We explore recent progress and open questions concerning local minima an...
research
07/03/2022

An Empirical Evaluation of k-Means Coresets

Coresets are among the most popular paradigms for summarizing data. In p...

Please sign up or login with your details

Forgot password? Click here to reset