
Subspace Clustering using Ensembles of KSubspaces
We present a novel approach to the subspace clustering problem that leverages ensembles of the Ksubspaces (KSS) algorithm via the evidence accumulation clustering framework. Our algorithm forms a coassociation matrix whose (i,j)th entry is the number of times points i and j are clustered together by several runs of KSS with random initializations. We analyze the entries of this coassociation matrix and show that a naive version of our algorithm can recover subspaces for points drawn from the same conditions as the Thresholded Subspace Clustering algorithm. We show on synthetic data that our method performs well under subspaces with large intersection, subspaces with small principal angles, and noisy data. Finally, we provide a variant of our algorithm that achieves stateoftheart performance across several benchmark datasets, including a resulting error for the COIL20 database that is less than half that achieved by existing algorithms.
09/14/2017 ∙ by John Lipor, et al. ∙ 0 ∙ shareread it

Algebraic Variety Models for HighRank Matrix Completion
We consider a generalization of lowrank matrix completion to the case where the data belongs to an algebraic variety, i.e. each data point is a solution to a system of polynomial equations. In this case the original matrix is possibly highrank, but it becomes lowrank after mapping each column to a higher dimensional space of monomial features. Many wellstudied extensions of linear models, including affine subspaces and their union, can be described by a variety model. In addition, varieties can be used to model a richer class of nonlinear quadratic and higher degree curves and surfaces. We study the sampling requirements for matrix completion under a variety model with a focus on a union of affine subspaces. We also propose an efficient matrix completion algorithm that minimizes a convex or nonconvex surrogate of the rank of the matrix of monomial features. Our algorithm uses the wellknown "kernel trick" to avoid working directly with the highdimensional monomial matrix. We show the proposed algorithm is able to recover synthetically generated data up to the predicted sampling complexity bounds. The proposed algorithm also outperforms standard low rank matrix completion and subspace clustering techniques in experiments with real data.
03/28/2017 ∙ by Greg Ongie, et al. ∙ 0 ∙ shareread it

RealTime Energy Disaggregation of a Distribution Feeder's Demand Using Online Learning
Though distribution system operators have been adding more sensors to their networks, they still often lack an accurate realtime picture of the behavior of distributed energy resources such as demand responsive electric loads and residential solar generation. Such information could improve system reliability, economic efficiency, and environmental impact. Rather than installing additional, costly sensing and communication infrastructure to obtain additional realtime information, it may be possible to use existing sensing capabilities and leverage knowledge about the system to reduce the need for new infrastructure. In this paper, we disaggregate a distribution feeder's demand measurements into two components: 1) the demand of a population of air conditioners, and 2) the demand of the remaining loads connected to the feeder. We use an online learning algorithm, Dynamic Fixed Share (DFS), that uses the realtime distribution feeder measurements as well as models generated from historical building and devicelevel data. We develop two implementations of the algorithm and conduct simulations using real demand data from households and commercial buildings to investigate the effectiveness of the algorithm. Case studies demonstrate that DFS can effectively perform online disaggregation and the choice and construction of models included in the algorithm affects its accuracy, which is comparable to that of a set of Kalman filters.
01/16/2017 ∙ by Gregory S. Ledva, et al. ∙ 0 ∙ shareread it

Towards a Theoretical Analysis of PCA for Heteroscedastic Data
Principal Component Analysis (PCA) is a method for estimating a subspace given noisy samples. It is useful in a variety of problems ranging from dimensionality reduction to anomaly detection and the visualization of high dimensional data. PCA performs well in the presence of moderate noise and even with missing data, but is also sensitive to outliers. PCA is also known to have a phase transition when noise is independent and identically distributed; recovery of the subspace sharply declines at a threshold noise variance. Effective use of PCA requires a rigorous understanding of these behaviors. This paper provides a step towards an analysis of PCA for samples with heteroscedastic noise, that is, samples that have nonuniform noise variances and so are no longer identically distributed. In particular, we provide a simple asymptotic prediction of the recovery of a onedimensional subspace from noisy heteroscedastic samples. The prediction enables: a) easy and efficient calculation of the asymptotic performance, and b) qualitative reasoning to understand how PCA is impacted by heteroscedasticity (such as outliers).
10/12/2016 ∙ by David Hong, et al. ∙ 0 ∙ shareread it

Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation From Undersampled Data
Subspace learning and matrix factorization problems have a great many applications in science and engineering, and efficient algorithms are critical as dataset sizes continue to grow. Many relevant problem formulations are nonconvex, and in a variety of contexts it has been observed that solving the nonconvex problem directly is not only efficient but reliably accurate. We discuss convergence theory for a particular method: first order incremental gradient descent constrained to the Grassmannian. The output of the algorithm is an orthonormal basis for a ddimensional subspace spanned by an input streaming data matrix. We study two sampling cases: where each data vector of the streaming matrix is fully sampled, or where it is undersampled by a sampling matrix A_t∈^m× n with m≪ n. We propose an adaptive stepsize scheme that depends only on the sampled data and algorithm outputs. We prove that with fully sampled data, the stepsize scheme maximizes the improvement of our convergence metric at each iteration, and this method converges from any random initialization to the true subspace, despite the nonconvex formulation and orthogonality constraints. For the case of undersampled data, we establish monotonic improvement on the defined convergence metric for each iteration with high probability.
10/01/2016 ∙ by Dejiao Zhang, et al. ∙ 0 ∙ shareread it

Deep Unsupervised Clustering Using Mixture of Autoencoders
Unsupervised clustering is one of the most fundamental challenges in machine learning. A popular hypothesis is that data are generated from a union of lowdimensional nonlinear manifolds; thus an approach to clustering is identifying and separating these manifolds. In this paper, we present a novel approach to solve this problem by using a mixture of autoencoders. Our model consists of two parts: 1) a collection of autoencoders where each autoencoder learns the underlying manifold of a group of similar objects, and 2) a mixture assignment neural network, which takes the concatenated latent vectors from the autoencoders as input and infers the distribution over clusters. By jointly optimizing the two parts, we simultaneously assign data to clusters and learn the underlying manifolds of each cluster.
12/21/2017 ∙ by Dejiao Zhang, et al. ∙ 0 ∙ shareread it

On Learning High Dimensional Structured Single Index Models
Single Index Models (SIMs) are simple yet flexible semiparametric models for machine learning, where the response variable is modeled as a monotonic function of a linear combination of features. Estimation in this context requires learning both the feature weights and the nonlinear function that relates features to observations. While methods have been described to learn SIMs in the low dimensional regime, a method that can efficiently learn SIMs in high dimensions, and under general structural assumptions, has not been forthcoming. In this paper, we propose computationally efficient algorithms for SIM inference in high dimensions with structural constraints. Our general approach specializes to sparsity, group sparsity, and lowrank assumptions among others. Experiments show that the proposed method enjoys superior predictive performance when compared to generalized linear models, and achieves results comparable to or better than single layer feedforward neural networks with significantly less computational cost.
03/13/2016 ∙ by Nikhil Rao, et al. ∙ 0 ∙ shareread it

Matrix Completion Under Monotonic Single Index Models
Most recent results in matrix completion assume that the matrix under consideration is lowrank or that the columns are in a union of lowrank subspaces. In realworld settings, however, the linear structure underlying these models is distorted by a (typically unknown) nonlinear transformation. This paper addresses the challenge of matrix completion in the face of such nonlinearities. Given a few observations of a matrix that are obtained by applying a Lipschitz, monotonic function to a low rank matrix, our task is to estimate the remaining unobserved entries. We propose a novel matrix completion method that alternates between lowrank matrix estimation and monotonic function estimation to estimate the missing matrix elements. Mean squared error bounds provide insight into how well the matrix can be estimated based on the size, rank of the matrix and properties of the nonlinear transformation. Empirical results on synthetic and realworld datasets demonstrate the competitiveness of the proposed approach.
12/29/2015 ∙ by Ravi Ganti, et al. ∙ 0 ∙ shareread it

DistancePenalized Active Learning Using Quantile Search
Adaptive sampling theory has shown that, with proper assumptions on the signal class, algorithms exist to reconstruct a signal in R^d with an optimal number of samples. We generalize this problem to the case of spatial signals, where the sampling cost is a function of both the number of samples taken and the distance traveled during estimation. This is motivated by our work studying regions of low oxygen concentration in the Great Lakes. We show that for onedimensional threshold classifiers, a tradeoff between the number of samples taken and distance traveled can be achieved using a generalization of binary search, which we refer to as quantile search. We characterize both the estimation error after a fixed number of samples and the distance traveled in the noiseless case, as well as the estimation error in the case of noisy measurements. We illustrate our results in both simulations and experiments and show that our method outperforms existing algorithms in the majority of practical scenarios.
09/28/2015 ∙ by John Lipor, et al. ∙ 0 ∙ shareread it

Leveraging Union of Subspace Structure to Improve Constrained Clustering
Many clustering problems in computer vision and other contexts are also classification problems, where each cluster shares a meaningful label. Subspace clustering algorithms in particular are often applied to problems that fit this description, for example with face images or handwritten digits. While it is straightforward to request human input on these datasets, our goal is to reduce this input as much as possible. We present a pairwiseconstrained clustering algorithm that actively selects queries based on the unionofsubspaces model. The central step of the algorithm is in querying points of minimum margin between estimated subspaces; analogous to classifier margin, these lie near the decision boundary. We prove that points lying near the intersection of subspaces are points with low margin. Our procedure can be used after any subspace clustering algorithm that outputs an affinity matrix. We demonstrate on several datasets that our algorithm drives the clustering error down considerably faster than the stateoftheart active query algorithms on datasets with subspace structure and is competitive on other datasets.
08/06/2016 ∙ by John Lipor, et al. ∙ 0 ∙ shareread it

Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation
It has been observed in a variety of contexts that gradient descent methods have great success in solving lowrank matrix factorization problems, despite the relevant problem formulation being nonconvex. We tackle a particular instance of this scenario, where we seek the ddimensional subspace spanned by a streaming data matrix. We apply the natural first order incremental gradient descent method, constraining the gradient method to the Grassmannian. In this paper, we propose an adaptive step size scheme that is greedy for the noiseless case, that maximizes the improvement of our metric of convergence at each data index t, and yields an expected improvement for the noisy case. We show that, with noisefree data, this method converges from any random initialization to the global minimum of the problem. For noisy data, we provide the expected convergence rate of the proposed algorithm per iteration.
06/24/2015 ∙ by Dejiao Zhang, et al. ∙ 0 ∙ shareread it
Laura Balzano
is this you? claim profile
Assistant Professor in Electrical Engineering and Computer Science at the University of Michigan