Median K-flats for hybrid linear modeling with many outliers
We describe the Median K-Flats (MKF) algorithm, a simple online method for hybrid linear modeling, i.e., for approximating data by a mixture of flats. This algorithm simultaneously partitions the data into clusters while finding their corresponding best approximating l1 d-flats, so that the cumulative l1 error is minimized. The current implementation restricts d-flats to be d-dimensional linear subspaces. It requires a negligible amount of storage, and its complexity, when modeling data consisting of N points in D-dimensional Euclidean space with K d-dimensional linear subspaces, is of order O(n K d D+n d^2 D), where n is the number of iterations required for convergence (empirically on the order of 10^4). Since it is an online algorithm, data can be supplied to it incrementally and it can incrementally produce the corresponding output. The performance of the algorithm is carefully evaluated using synthetic and real data.
READ FULL TEXT