Dimension reduction as an optimization problem over a set of generalized functions
Classical dimension reduction problem can be loosely formulated as a problem of finding a k-dimensional affine subspace of R^n onto which data points x_1,..., x_N can be projected without loss of valuable information. We reformulate this problem in the language of tempered distributions, i.e. as a problem of approximating an empirical probability density function p_emp( x) = 1/N∑_i=1^N δ^n (x - x_i), where δ^n is an n-dimensional Dirac delta function, by another tempered distribution q( x) whose density is supported in some k-dimensional subspace. Thus, our problem is reduced to the minimization of a certain loss function I(q) measuring the distance from q to p_emp over a pertinent set of generalized functions, denoted G_k. Another classical problem of data analysis is the sufficient dimension reduction problem. We show that it can be reduced to the following problem: given a function f: R^n→ R and a probability density function p( x), find a function of the form g( w^T_1 x, ..., w^T_k x) that minimizes the loss E_ x∼ p |f( x)-g( w^T_1 x, ..., w^T_k x)|^2. We first show that search spaces of the latter two problems are in one-to-one correspondence which is defined by the Fourier transform. We introduce a nonnegative penalty function R(f) and a set of ordinary functions Ω_ϵ = {f| R(f)≤ϵ} in such a way that Ω_ϵ `approximates' the space G_k when ϵ→ 0. Then we present an algorithm for minimization of I(f)+λ R(f), based on the idea of two-step iterative computation.
READ FULL TEXT