
Structured secondorder methods via natural gradient descent
In this paper, we propose new structured secondorder methods and struct...
read it

SVRG Meets AdaGrad: Painless Variance Reduction
Variance reduction (VR) methods for finitesum minimization typically re...
read it

Tractable structured natural gradient descent using local parameterizations
Naturalgradient descent on structured parameter spaces (e.g., lowrank ...
read it

The pathway elaboration method for mean first passage time estimation in large continuoustime Markov chains with applications to nucleic acid kinetics
Continuoustime Markov chains (CTMCs) are widely used in many applicatio...
read it

Robust Asymmetric Learning in POMDPs
Policies for partially observed Markov decision processes can be efficie...
read it

HomeomorphicInvariance of EM: NonAsymptotic Convergence in KL Divergence for Exponential Families via Mirror Descent
Expectation maximization (EM) is the default algorithm for fitting proba...
read it

Regret Bounds without Lipschitz Continuity: Online Learning with RelativeLipschitz Losses
In online convex optimization (OCO), Lipschitz continuity of the functio...
read it

VarianceReduced Methods for Machine Learning
Stochastic optimization lies at the heart of machine learning, and its c...
read it

Adaptive Gradient Methods Converge Faster with OverParameterization (and you can do a linesearch)
As adaptive gradient methods are typically used for training overparame...
read it

Handling the PositiveDefinite Constraint in the Bayesian Learning Rule
Bayesian learning rule is a recently proposed variational inference meth...
read it

Stein's Lemma for the Reparameterization Trick with Exponential Family Mixtures
Stein's method (Stein, 1973; 1981) is a powerful tool for statistical ap...
read it

Fast and Furious Convergence: Stochastic Second Order Methods under Interpolation
We consider stochastic second order methods for minimizing stronglyconv...
read it

P4IPsec: Implementation of IPsec Gateways in P4 with SDN Control for HosttoSite Scenarios
In this paper we propose P4IPsec which follows the softwaredefined net...
read it

xRAC: Execution and Access Control for Restricted Application Containers on Managed Hosts
We propose xRAC to permit users to run special applications on managed h...
read it

Where are the Masks: Instance Segmentation with Imagelevel Supervision
A major obstacle in instance segmentation is that existing methods often...
read it

Instance Segmentation with Point Supervision
Instance segmentation methods often require costly perpixel labels. We ...
read it

Fast and Simple NaturalGradient Variational Inference with Mixture of Exponentialfamily Approximations
Naturalgradient methods enable fast and simple algorithms for variation...
read it

Painless Stochastic Gradient: Interpolation, LineSearch, and Convergence Rates
Recent works have shown that stochastic gradient descent (SGD) achieves ...
read it

Efficient Deep Gaussian Process Models for VariableSized Input
Deep Gaussian processes (DGP) have appealing Bayesian properties, can ha...
read it

P4MACsec: Dynamic Topology Monitoring and Data Layer Protection with MACsec in P4SDN
We propose P4MACsec to protect network links between P4 switches throug...
read it

Distributed Maximization of "Submodular plus Diversity" Functions for Multilabel Feature Selection on Huge Datasets
There are many problems in machine learning and data mining which are eq...
read it

SLANG: Fast Structured Covariance Approximations for Bayesian Deep Learning with Natural Gradient
Uncertainty estimation in large deeplearning models is a computationall...
read it

Fast and Faster Convergence of SGD for OverParameterized Models and an Accelerated Perceptron
Modern machine learning focuses on highly expressive models that are abl...
read it

Combining Bayesian Optimization and Lipschitz Optimization
Bayesian optimization and Lipschitz optimization have developed alternat...
read it

Does Your Model Know the Digit 6 Is Not a Cat? A Less Biased Evaluation of "Outlier" Detectors
In the real world, a learning system could receive an input that looks n...
read it

Where are the Blobs: Counting by Localization with Point Supervision
Object counting is an important task in computer vision due to its growi...
read it

New Insights into Bootstrapping for Bandits
We investigate the use of bootstrapping in the bandit setting. We first ...
read it

Online Learning Rate Adaptation with Hypergradient Descent
We introduce a general method for improving the convergence rate of grad...
read it

Fast Patchbased Style Transfer of Arbitrary Style
Artistic style transfer is an image synthesis problem where the content ...
read it

Linear Convergence of Gradient and ProximalGradient Methods Under the PolyakŁojasiewicz Condition
In 1963, Polyak proposed a simple condition that is sufficient to show a...
read it

Play and Learn: Using Video Games to Train Computer Vision Models
Video games are a compelling source of annotated data as they can readil...
read it

Stop Wasting My Gradients: Practical SVRG
We present and analyze several strategies for improving the performance ...
read it

Faster Stochastic Variational Inference using ProximalGradient Methods with General Divergence Functions
Several recent works have explored stochastic gradient methods for varia...
read it

Coordinate Descent Converges Faster with the GaussSouthwell Rule Than Random Selection
There has been significant recent work on the theory and application of ...
read it

NonUniform Stochastic Average Gradient Method for Training Conditional Random Fields
We apply stochastic average gradient (SAG) algorithms for training condi...
read it

Influence Maximization with Bandits
We consider the problem of influence maximization, the problem of maximi...
read it

Hierarchical MaximumMargin Clustering
We present a hierarchical maximummargin clustering method for unsupervi...
read it

Convex Optimization for Big Data
This article reviews recent advances in convex optimization algorithms f...
read it

Minimizing Finite Sums with the Stochastic Average Gradient
We propose the stochastic average gradient (SAG) method for optimizing t...
read it

A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
In this note, we present a new averaging technique for the projected sto...
read it

BlockCoordinate FrankWolfe Optimization for Structural SVMs
We propose a randomized blockcoordinate variant of the classic FrankWo...
read it

Modeling Discrete Interventional Data using Directed Cyclic Graphical Models
We outline a representation for discrete multivariate distributions in t...
read it

Group Sparse Priors for Covariance Estimation
Recently it has become popular to learn sparse Gaussian graphical models...
read it

Generalized Fast Approximate Energy Minimization via Graph Cuts: AlphaExpansion BetaShrink Moves
We present alphaexpansion betashrink moves, a simple generalization of...
read it

Hybrid DeterministicStochastic Methods for Data Fitting
Many structured datafitting applications require the solution of an opt...
read it
Mark Schmidt
is this you? claim profile
Assistant Professor of Computer Science at University of British Columbia, Alfred P. Sloan Research Fellow since 2017, Canadian Institute for Advanced Research (CIFAR) Senior Fellow, Learning in Machines and Brains since 2017, Canada Research Chair in LargeScale Machine Learning since 2016, Assistant Professor (University of British Columbia), Laboratory for Computational Intelligence since 2014, Postdoc (Simon Fraser University), Natural Language Lab from 20132014, Postdoc (Ecole Normale Superieure), INRIA SIERRA project from 20112013, Postdoc (University of British Columbia), Scientific Computing Lab 2010, Ph.D. student (University of British Columbia), Laboratory for Computational Intelligence from 20052010