Model Selection for Nonnegative Matrix Factorization by Support Union Recovery
Nonnegative matrix factorization (NMF) has been widely used in machine learning and signal processing because of its non-subtractive, part-based property which enhances interpretability. It is often assumed that the latent dimensionality (or the number of components) is given. Despite the large amount of algorithms designed for NMF, there is little literature about automatic model selection for NMF with theoretical guarantees. In this paper, we propose an algorithm that first calculates an empirical second-order moment from the empirical fourth-order cumulant tensor, and then estimates the latent dimensionality by recovering the support union (the index set of non-zero rows) of a matrix related to the empirical second-order moment. By assuming a generative model of the data with additional mild conditions, our algorithm provably detects the true latent dimensionality. We show on synthetic examples that our proposed algorithm is able to find an approximately correct number of components.
READ FULL TEXT