Jarvis Haupt

is this you? claim profile


Associate Professor at University of Minnesota

  • On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond

    Our paper proposes a generalization error bound for a general family of deep neural networks based on the spectral norm of weight matrices. Through introducing a novel characterization of the Lipschitz properties of neural network family, we achieve a tighter generalization error bound for ultra-deep neural networks, whose depth is much larger than the square root of its width. Besides the general deep neural networks, our results can be applied to derive new bounds for several popular architectures, including convolutional neural networks (CNNs), residual networks (ResNets), and hyperspherical networks (SphereNets). In the regime that the depth of these architectures is dominating, our bounds allow for the choice of much larger parameter spaces of weight matrices, inducing potentially stronger expressive ability.

    06/13/2018 ∙ by Xingguo Li, et al. ∙ 2 share

    read it

  • Near Optimal Sketching of Low-Rank Tensor Regression

    We study the least squares regression problem _Θ∈S_ D,RAΘ-b_2, where S_ D,R is the set of Θ for which Θ = ∑_r=1^Rθ_1^(r)∘...∘θ_D^(r) for vectors θ_d^(r)∈R^p_d for all r ∈ [R] and d ∈ [D], and ∘ denotes the outer product of vectors. That is, Θ is a low-dimensional, low-rank tensor. This is motivated by the fact that the number of parameters in Θ is only R ·∑_d=1^D p_d, which is significantly smaller than the ∏_d=1^D p_d number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors Θ, as well as the Tucker decomposition. For both models we show how to apply data dimensionality reduction techniques based on sparse random projections Φ∈R^m × n, with m ≪ n, to reduce the problem to a much smaller problem _ΘΦ A Θ - Φ b_2, for which if Θ' is a near-optimum to the smaller problem, then it is also a near optimum to the original problem. We obtain significantly smaller dimension and sparsity in Φ than is possible for ordinary least squares regression, and we also provide a number of numerical simulations supporting our theory.

    09/20/2017 ∙ by Jarvis Haupt, et al. ∙ 0 share

    read it

  • Communication-efficient Algorithm for Distributed Sparse Learning via Two-way Truncation

    We propose a communicationally and computationally efficient algorithm for high-dimensional distributed sparse learning. At each iteration, local machines compute the gradient on local data and the master machine solves one shifted l_1 regularized minimization problem. The communication cost is reduced from constant times of the dimension number for the state-of-the-art algorithm to constant times of the sparsity number via Two-way Truncation procedure. Theoretically, we prove that the estimation error of the proposed algorithm decreases exponentially and matches that of the centralized method under mild assumptions. Extensive experiments on both simulated data and real data verify that the proposed algorithm is efficient and has performance comparable with the centralized method on solving high-dimensional sparse learning problems.

    09/02/2017 ∙ by Jineng Ren, et al. ∙ 0 share

    read it

  • Improved Support Recovery Guarantees for the Group Lasso With Applications to Structural Health Monitoring

    This paper considers the problem of estimating an unknown high dimensional signal from noisy linear measurements, when the signal is assumed to possess a group-sparse structure in a known, fixed dictionary. We consider signals generated according to a natural probabilistic model, and establish new conditions under which the set of indices of the non-zero groups of the signal (called the group-level support) may be accurately estimated via the group Lasso. Our results strengthen existing coherence-based analyses that exhibit the well-known "square root" bottleneck, allowing for the number of recoverable nonzero groups to be nearly as large as the total number of groups. We also establish a sufficient recovery condition relating the number of nonzero groups and the signal to noise ratio (quantified in terms of the ratio of the squared Euclidean norms of nonzero groups and the variance of the random additive measurement noise), and validate this trend empirically. Finally, we examine the implications of our results in the context of a structural health monitoring application, where the group Lasso approach facilitates demixing of a propagating acoustic wavefield, acquired on the material surface by a scanning laser Doppler vibrometer, into antithetical components, one of which indicates the locations of internal material defects.

    08/29/2017 ∙ by Mojtaba Kadkhodaie Elyaderani, et al. ∙ 0 share

    read it

  • On Quadratic Convergence of DC Proximal Newton Algorithm for Nonconvex Sparse Learning in High Dimensions

    We propose a DC proximal Newton algorithm for solving nonconvex regularized sparse learning problems in high dimensions. Our proposed algorithm integrates the proximal Newton algorithm with multi-stage convex relaxation based on the difference of convex (DC) programming, and enjoys both strong computational and statistical guarantees. Specifically, by leveraging a sophisticated characterization of sparse modeling structures/assumptions (i.e., local restricted strong convexity and Hessian smoothness), we prove that within each stage of convex relaxation, our proposed algorithm achieves (local) quadratic convergence, and eventually obtains a sparse approximate local optimum with optimal statistical properties after only a few convex relaxations. Numerical experiments are provided to support our theory.

    06/19/2017 ∙ by Xingguo Li, et al. ∙ 0 share

    read it

  • Noisy Tensor Completion for Tensors with a Sparse Canonical Polyadic Factor

    In this paper we study the problem of noisy tensor completion for tensors that admit a canonical polyadic or CANDECOMP/PARAFAC (CP) decomposition with one of the factors being sparse. We present general theoretical error bounds for an estimate obtained by using a complexity-regularized maximum likelihood principle and then instantiate these bounds for the case of additive white Gaussian noise. We also provide an ADMM-type algorithm for solving the complexity-regularized maximum likelihood problem and validate the theoretical finding via experiments on synthetic data set.

    04/08/2017 ∙ by Swayambhoo Jain, et al. ∙ 0 share

    read it

  • Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization

    We propose a general theory for studying the geometry of nonconvex objective functions with underlying symmetric structures. In specific, we characterize the locations of stationary points and the null space of the associated Hessian matrices via the lens of invariant groups. As a major motivating example, we apply the proposed general theory to characterize the global geometry of the low-rank matrix factorization problem. In particular, we illustrate how the rotational symmetry group gives rise to infinitely many non-isolated strict saddle points and equivalent global minima of the objective function. By explicitly identifying all stationary points, we divide the entire parameter space into three regions: (_1) the region containing the neighborhoods of all strict saddle points, where the objective has negative curvatures; (_2) the region containing neighborhoods of all global minima, where the objective enjoys strong convexity along certain directions; and (_3) the complement of the above regions, where the gradient has sufficiently large magnitudes. We further extend our result to the matrix sensing problem. This allows us to establish strong global convergence guarantees for popular iterative algorithms with arbitrary initial solutions.

    12/29/2016 ∙ by Xingguo Li, et al. ∙ 0 share

    read it

  • A First Order Free Lunch for SQRT-Lasso

    Many statistical machine learning techniques sacrifice convenient computational structures to gain estimation robustness and modeling flexibility. In this paper, we study this fundamental tradeoff through a SQRT-Lasso problem for sparse linear regression and sparse precision matrix estimation in high dimensions. We explain how novel optimization techniques help address these computational challenges. Namely, we propose a pathwise iterative smoothing shrinkage thresholding algorithm for solving the SQRT-Lasso optimization problem, and provide a novel model-based perspective for analyzing the smoothing optimization framework, which allows us to establish a near linear convergence (R-linear convergence) guarantee for our proposed algorithm, without sacrificing statistical accuracy. This implies that solving the SQRT-Lasso optimization problem is almost as easy as solving the Lasso optimization problem, while the former requires much less parameter tuning effort. Moreover, we show that our proposed algorithm can also be applied to sparse precision matrix estimation, and enjoys desirable computational as well as statistical properties. Numerical experiments are provided to support our theory.

    05/25/2016 ∙ by Xingguo Li, et al. ∙ 0 share

    read it

  • Nonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction

    We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints. Sufficient conditions are provided, under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. We further extend the proposed algorithm to an asynchronous parallel variant with a near linear speedup. Numerical experiments demonstrate the efficiency of our algorithm in terms of both parameter estimation and computational performance.

    05/09/2016 ∙ by Xingguo Li, et al. ∙ 0 share

    read it

  • A Compressed Sensing Based Decomposition of Electrodermal Activity Signals

    The measurement and analysis of Electrodermal Activity (EDA) offers applications in diverse areas ranging from market research, to seizure detection, to human stress analysis. Unfortunately, the analysis of EDA signals is made difficult by the superposition of numerous components which can obscure the signal information related to a user's response to a stimulus. We show how simple pre-processing followed by a novel compressed sensing based decomposition can mitigate the effects of the undesired noise components and help reveal the underlying physiological signal. The proposed framework allows for decomposition of EDA signals with provable bounds on the recovery of user responses. We test our procedure on both synthetic and real-world EDA signals from wearable sensors and demonstrate that our approach allows for more accurate recovery of user responses as compared to the existing techniques.

    02/24/2016 ∙ by Swayambhoo Jain, et al. ∙ 0 share

    read it

  • On Convolutional Approximations to Linear Dimensionality Reduction Operators for Large Scale Data Processing

    In this paper, we examine the problem of approximating a general linear dimensionality reduction (LDR) operator, represented as a matrix A ∈R^m × n with m < n, by a partial circulant matrix with rows related by circular shifts. Partial circulant matrices admit fast implementations via Fourier transform methods and subsampling operations; our investigation here is motivated by a desire to leverage these potential computational improvements in large-scale data processing tasks. We establish a fundamental result, that most large LDR matrices (whose row spaces are uniformly distributed) in fact cannot be well approximated by partial circulant matrices. Then, we propose a natural generalization of the partial circulant approximation framework that entails approximating the range space of a given LDR operator A over a restricted domain of inputs, using a matrix formed as a product of a partial circulant matrix having m '> m rows and a m × k 'post processing' matrix. We introduce a novel algorithmic technique, based on sparse matrix factorization, for identifying the factors comprising such approximations, and provide preliminary evidence to demonstrate the potential of this approach.

    02/25/2015 ∙ by Swayambhoo Jain, et al. ∙ 0 share

    read it