
Probabilistic Permutation Synchronization using the Riemannian Structure of the Birkhoff Polytope
We present an entirely new geometric and probabilistic approach to synchronization of correspondences across multiple sets of objects or images. In particular, we present two algorithms: (1) BirkhoffRiemannian LBFGS for optimizing the relaxed version of the combinatorially intractable cycle consistency loss in a principled manner, (2) BirkhoffRiemannian Langevin Monte Carlo for generating samples on the Birkhoff Polytope and estimating the confidence of the found solutions. To this end, we first introduce the very recently developed Riemannian geometry of the Birkhoff Polytope. Next, we introduce a new probabilistic synchronization model in the form of a Markov Random Field (MRF). Finally, based on the first order retraction operators, we formulate our problem as simulating a stochastic differential equation and devise new integrators. We show on both synthetic and real datasets that we achieve high quality multigraph matching results with faster convergence and reliable confidence/uncertainty estimates.
04/11/2019 ∙ by Tolga Birdal, et al. ∙ 16 ∙ shareread it

First Exit Time Analysis of Stochastic Gradient Descent Under HeavyTailed Gradient Noise
Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a nonGaussian, heavytailed behavior. This suggests that the gradient noise can be modeled by using αstable distributions, a family of heavytailed distributions that appear in the generalized central limit theorem. In this context, SGD can be viewed as a discretization of a stochastic differential equation (SDE) driven by a Lévy motion, and the metastability results for this SDE can then be used for illuminating the behavior of SGD, especially in terms of `preferring wide minima'. While this approach brings a new perspective for analyzing SGD, it is limited in the sense that, due to the time discretization, SGD might admit a significantly different behavior than its continuoustime limit. Intuitively, the behaviors of these two systems are expected to be similar to each other only when the discretization step is sufficiently small; however, to the best of our knowledge, there is no theoretical understanding on how small the stepsize should be chosen in order to guarantee that the discretized system inherits the properties of the continuoustime system. In this study, we provide formal theoretical analysis where we derive explicit conditions for the stepsize such that the metastability behavior of the discretetime system is similar to its continuoustime limit. We show that the behaviors of the two systems are indeed similar for small stepsizes and we identify how the error depends on the algorithm and problem parameters. We illustrate our results with simulations on a synthetic model and neural networks.
06/21/2019 ∙ by Thanh Huy Nguyen, et al. ∙ 12 ∙ shareread it

Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC
We introduce Tempered Geodesic Markov Chain Monte Carlo (TGMCMC) algorithm for initializing pose graph optimization problems, arising in various scenarios such as SFM (structure from motion) or SLAM (simultaneous localization and mapping). TGMCMC is first of its kind as it unites asymptotically global nonconvex optimization on the spherical manifold of quaternions with posterior sampling, in order to provide both reliable initial poses and uncertainty estimates that are informative about the quality of individual solutions. We devise rigorous theoretical convergence guarantees for our method and extensively evaluate it on synthetic and real benchmark datasets. Besides its elegance in formulation and theory, we show that our method is robust to missing data, noise and the estimated uncertainties capture intuitive properties of the data.
05/31/2018 ∙ by Tolga Birdal, et al. ∙ 2 ∙ shareread it

Fractional Langevin Monte Carlo: Exploring Lévy Driven Stochastic Differential Equations for Markov Chain Monte Carlo
Along with the recent advances in scalable Markov Chain Monte Carlo methods, sampling techniques that are based on Langevin diffusions have started receiving increasing attention. These so called Langevin Monte Carlo (LMC) methods are based on diffusions driven by a Brownian motion, which gives rise to Gaussian proposal distributions in the resulting algorithms. Even though these approaches have proven successful in many applications, their performance can be limited by the lighttailed nature of the Gaussian proposals. In this study, we extend classical LMC and develop a novel Fractional LMC (FLMC) framework that is based on a family of heavytailed distributions, called αstable Lévy distributions. As opposed to classical approaches, the proposed approach can possess large jumps while targeting the correct distribution, which would be beneficial for efficient exploration of the state space. We develop novel computational methods that can scale up to largescale problems and we provide formal convergence analysis of the proposed scheme. Our experiments support our theory: FLMC can provide superior performance in multimodal settings, improved convergence rates, and robustness to algorithm parameters.
06/12/2017 ∙ by Umut Şimşekli, et al. ∙ 0 ∙ shareread it

Learning the Morphology of Brain Signals Using AlphaStable Convolutional Sparse Coding
Neural timeseries data contain a wide variety of prototypical signal waveforms (atoms) that are of significant importance in clinical and cognitive research. One of the goals for analyzing such data is hence to extract such 'shiftinvariant' atoms. Even though some success has been reported with existing algorithms, they are limited in applicability due to their heuristic nature. Moreover, they are often vulnerable to artifacts and impulsive noise, which are typically present in raw neural recordings. In this study, we address these issues and propose a novel probabilistic convolutional sparse coding (CSC) model for learning shiftinvariant atoms from raw neural signals containing potentially severe artifacts. In the core of our model, which we call αCSC, lies a family of heavytailed distributions called αstable distributions. We develop a novel, computationally efficient Monte Carlo expectationmaximization algorithm for inference. The maximization step boils down to a weighted CSC problem, for which we develop a computationally efficient optimization algorithm. Our results show that the proposed algorithm achieves stateoftheart convergence speeds. Besides, αCSC is significantly more robust to artifacts when compared to three competing algorithms: it can extract spike bursts, oscillations, and even reveal more subtle phenomena such as crossfrequency coupling when applied to noisy neural time series.
05/22/2017 ∙ by Mainak Jas, et al. ∙ 0 ∙ shareread it

Stochastic QuasiNewton Langevin Monte Carlo
Recently, Stochastic Gradient Markov Chain Monte Carlo (SGMCMC) methods have been proposed for scaling up Monte Carlo computations to large data problems. Whilst these approaches have proven useful in many applications, vanilla SGMCMC might suffer from poor mixing rates when random variables exhibit strong couplings under the target densities or big scale differences. In this study, we propose a novel SGMCMC method that takes the local geometry into account by using ideas from QuasiNewton optimization methods. These second order methods directly approximate the inverse Hessian by using a limited history of samples and their gradients. Our method uses dense approximations of the inverse Hessian while keeping the time and memory complexities linear with the dimension of the problem. We provide a formal theoretical analysis where we show that the proposed method is asymptotically unbiased and consistent with the posterior expectations. We illustrate the effectiveness of the approach on both synthetic and real datasets. Our experiments on two challenging applications show that our method achieves fast convergence rates similar to Riemannian approaches while at the same time having low computational requirements similar to diagonal preconditioning approaches.
02/10/2016 ∙ by Umut Şimşekli, et al. ∙ 0 ∙ shareread it

HAMSI: A Parallel Incremental Optimization Algorithm Using Quadratic Approximations for Solving Partially Separable Problems
We propose HAMSI (Hessian Approximated Multiple Subsets Iteration), which is a provably convergent, second order incremental algorithm for solving largescale partially separable optimization problems. The algorithm is based on a local quadratic approximation, and hence, allows incorporating curvature information to speedup the convergence. HAMSI is inherently parallel and it scales nicely with the number of processors. Combined with techniques for effectively utilizing modern parallel computer architectures, we illustrate that the proposed method converges more rapidly than a parallel stochastic gradient descent when both methods are used to solve largescale matrix factorization problems. This performance gain comes only at the expense of using memory that scales linearly with the total size of the optimization variables. We conclude that HAMSI may be considered as a viable alternative in many large scale problems, where first order methods based on variants of stochastic gradient descent are applicable.
09/05/2015 ∙ by Kamer Kaya, et al. ∙ 0 ∙ shareread it

Parallel Stochastic Gradient Markov Chain Monte Carlo for Matrix Factorisation Models
For large matrix factorisation problems, we develop a distributed Markov Chain Monte Carlo (MCMC) method based on stochastic gradient Langevin dynamics (SGLD) that we call Parallel SGLD (PSGLD). PSGLD has very favourable scaling properties with increasing data size and is comparable in terms of computational requirements to optimisation methods based on stochastic gradient descent. PSGLD achieves high performance by exploiting the conditional independence structure of the MF models to subsample data in a systematic manner as to allow parallelisation and distributed computation. We provide a convergence proof of the algorithm and verify its superior performance on various architectures such as Graphics Processing Units, shared memory multicore systems and multicomputer clusters.
06/03/2015 ∙ by Umut Şimşekli, et al. ∙ 0 ∙ shareread it

Asynchronous Stochastic QuasiNewton MCMC for NonConvex Optimization
Recent studies have illustrated that stochastic gradient Markov Chain Monte Carlo techniques have a strong potential in nonconvex optimization, where local and global convergence guarantees can be shown under certain conditions. By building up on this recent theory, in this study, we develop an asynchronousparallel stochastic LBFGS algorithm for nonconvex optimization. The proposed algorithm is suitable for both distributed and sharedmemory settings. We provide formal theoretical analysis and show that the proposed method achieves an ergodic convergence rate of O(1/√(N)) (N being the total number of iterations) and it can achieve a linear speedup under certain conditions. We perform several experiments on both synthetic and real datasets. The results support our theory and show that the proposed algorithm provides a significant speedup over the recently proposed synchronous distributed LBFGS algorithm.
06/07/2018 ∙ by Umut Şimşekli, et al. ∙ 0 ∙ shareread it

SlicedWasserstein Flows: Nonparametric Generative Modeling via Optimal Transport and Diffusions
By building up on the recent theory that established the connection between implicit generative modeling and optimal transport, in this study, we propose a novel parameterfree algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is close to the data distribution as much as possible and also expressive enough for generative modeling purposes. We formulate the problem as a gradient flow in the space of probability measures. The connections between gradient flows and stochastic differential equations let us develop a computationally efficient algorithm for solving the optimization problem, where the resulting algorithm resembles the recent dynamicsbased Markov Chain Monte Carlo algorithms. We provide formal theoretical analysis where we prove finitetime error guarantees for the proposed algorithm. Our experimental results support our theory and shows that our algorithm is able to capture the structure of challenging distributions.
06/21/2018 ∙ by Umut Şimşekli, et al. ∙ 0 ∙ shareread it

A TailIndex Analysis of Stochastic Gradient Noise in Deep Neural Networks
The gradient noise (GN) in the stochastic gradient descent (SGD) algorithm is often considered to be Gaussian in the large data regime by assuming that the classical central limit theorem (CLT) kicks in. This assumption is often made for mathematical convenience, since it enables SGD to be analyzed as a stochastic differential equation (SDE) driven by a Brownian motion. We argue that the Gaussianity assumption might fail to hold in deep learning settings and hence render the Brownian motionbased analyses inappropriate. Inspired by nonGaussian natural phenomena, we consider the GN in a more general context and invoke the generalized CLT (GCLT), which suggests that the GN converges to a heavytailed αstable random variable. Accordingly, we propose to analyze SGD as an SDE driven by a Lévy motion. Such SDEs can incur `jumps', which force the SDE transition from narrow minima to wider minima, as proven by existing metastability theory. To validate the αstable assumption, we conduct extensive experiments on common deep learning architectures and show that in all settings, the GN is highly nonGaussian and admits heavytails. We further investigate the tail behavior in varying network architectures and sizes, loss functions, and datasets. Our results open up a different perspective and shed more light on the belief that SGD prefers wide minima.
01/18/2019 ∙ by Umut Şimşekli, et al. ∙ 0 ∙ shareread it
Umut Şimşekli
is this you? claim profile