
On the Periodic Behavior of Neural Network Training with Batch Normalization and Weight Decay
Despite the conventional wisdom that using batch normalization with weig...
Mean Embeddings with TestTime Data Augmentation for Ensembling of Representations
Averaging predictions over a set of models – an ensemble – is widely use...
Towards Practical Credit Assignment for Deep Reinforcement Learning
Credit assignment is a fundamental problem in reinforcement learning, th...
On Power Laws in Deep Ensembles
Ensembles of deep neural networks are known to achieve stateoftheart ...
Involutive MCMC: a Unifying Framework
Markov Chain Monte Carlo (MCMC) is a computational approach to fundament...
MARS: Masked Automatic Ranks Selection in Tensor Decompositions
Tensor decomposition methods have recently proven to be efficient for co...
Reintroducing StraightThrough Estimators as Principled Methods for Stochastic Binary Networks
Training neural networks with binary weights and activations is a challe...
Deep Ensembles on a Fixed Memory Budget: One Wide Network or Several Thinner Ones?
One of the generally accepted views of modern deep learning is that incr...
Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
The overestimation bias is one of the major impediments to accurate off...
Deterministic Decoding for Discrete Data in Variational Autoencoders
Variational autoencoders are prominent generative models for modeling di...
Stochasticity in Neural ODEs: An Empirical Study
Stochastic regularization of neural networks (e.g. dropout) is a widesp...
Greedy Policy Search: A Simple Baseline for Learnable TestTime Augmentation
Testtime data augmentation—averaging the predictions of a machine learn...
Pitfalls of InDomain Uncertainty Estimation and Ensembling in Deep Learning
Uncertainty estimation and ensembling methods go handinhand. Uncertain...
MLRG Deep Curvature
We present MLRG Deep Curvature suite, a PyTorchbased, opensource packa...
Lowvariance Blackbox Gradient Estimates for the PlackettLuce Distribution
Learning models with discrete latent variables using stochastic gradient...
Structured Sparsification of Gated Recurrent Neural Networks
Recently, a lot of techniques were developed to sparsify the weights of ...
A Prior of a Googol Gaussians: a Tensor Ring Induced Prior for Generative Models
Generative models produce realistic objects in many domains, including t...
Subspace Inference for Bayesian Deep Learning
Bayesian inference was once a gold standard for learning with neural net...
The Implicit MetropolisHastings Algorithm
Recent works propose using the discriminator of a GAN to filter out unre...
Importance Weighted Hierarchical Variational Inference
Variational Inference is a powerful tool in the Bayesian modeling toolki...
SemiConditional Normalizing Flows for SemiSupervised Learning
This paper proposes a semiconditional normalizing flow model for semis...
UserControllable MultiTexture Synthesis with Generative Adversarial Networks
We propose a novel multitexture synthesis model based on generative adv...
A Simple Baseline for Bayesian Uncertainty in Deep Learning
We propose SWAGaussian (SWAG), a simple, scalable, and general purpose ...
Bayesian Sparsification of Gated Recurrent Neural Networks
Bayesian methods have been successfully applied to sparsify weights of n...
ReSet: Learning Recurrent Dynamic Routing in ResNetlike Neural Networks
Neural Network is a powerful Machine Learning tool that shows outstandin...
Variational Dropout via Empirical Bayes
We study the Automatic Relevance Determination procedure applied to deep...
Bayesian Compression for Natural Language Processing
In natural language processing, a lot of the tasks are successfully solv...
MetropolisHastings view on variational inference and adversarial training
In this paper we propose to view the acceptance rate of the MetropolisH...
The Deep Weight Prior. Modeling a prior distribution for CNNs using generative models
Bayesian inference is known to provide a general framework for incorpora...
Pairwise Augmented GANs with Adversarial Reconstruction Loss
We propose a novel autoencoding model called Pairwise Augmented GANs. We...
Doubly SemiImplicit Variational Inference
We extend the existing framework of semiimplicit variational inference ...
Conditional Generators of Words Definitions
We explore recently introduced definition modeling technique that provid...
Universal Conditional Machine
We propose a single neural probabilistic model based on variational auto...
Averaging Weights Leads to Wider Optima and Better Generalization
Deep neural networks are typically trained by optimizing a loss function...
Bayesian Incremental Learning for Deep Neural Networks
In industrial machine learning pipelines, data often arrive in parts. Pa...
Uncertainty Estimation via Stochastic Batch Normalization
In this work, we investigate Batch Normalization technique and propose i...
Probabilistic Adaptive Computation Time
We present a probabilistic model with discrete latent variables that con...
Bayesian Sparsification of Recurrent Neural Networks
Recurrent neural networks show stateoftheart results in many text ana...
Structured Bayesian Pruning via LogNormal Multiplicative Noise
Dropoutbased regularization methods can be regarded as injecting random...
Variational Dropout Sparsifies Deep Neural Networks
We explore a recently proposed Variational Dropout technique that provid...
Spatially Adaptive Computation Time for Residual Networks
This paper proposes a deep learning architecture based on Residual Netwo...
GTApprox: surrogate modeling for industrial design
We describe GTApprox  a new tool for mediumscale surrogate modeling in...
Tensorizing Neural Networks
Deep neural networks currently demonstrate stateoftheart performance ...
PerforatedCNNs: Acceleration through Elimination of Redundant Convolutions
We propose a novel approach to reduce the computational cost of evaluati...
Breaking Sticks and Ambiguities with Adaptive Skipgram
Recently proposed Skipgram model is a powerful method for learning high...
Submodular relaxation for inference in Markov random fields
In this paper we address the problem of finding the most probable state ...
Multiutility Learning: Structuredoutput Learning with Multiple Annotationspecific Loss Functions
Structuredoutput learning is a challenging problem; particularly so bec...
Submodular Decomposition Framework for Inference in Associative Markov Networks with Global Constraints
In the paper we address the problem of finding the most probable state o...
Dmitry Vetrov
Research Professor, Head of the Centre:Faculty of Computer Science, Laboratory Head:Faculty of Computer Science at Higher School of Economics , Leading researcher at Yandex