
Does Knowledge Distillation Really Work?
Knowledge distillation is a popular technique for training a small stude...
read it

VIB is Half Bayes
In discriminative settings such as regression and classification there a...
read it

PAC^mBayes: Narrowing the Empirical Risk Gap in the Misspecified Bayesian Regime
While the decisiontheoretic optimality of the Bayesian formalism under ...
read it

Density of States Estimation for OutofDistribution Detection
Perhaps surprisingly, recent studies have shown probabilistic model like...
read it

CEB Improves Model Robustness
We demonstrate that the Conditional Entropy Bottleneck (CEB) can improve...
read it

Neural Tangents: Fast and Easy Infinite Neural Networks in Python
Neural Tangents is a library designed to enable research into infinitew...
read it

Information in Infinite Ensembles of InfinitelyWide Neural Networks
In this preliminary work, we study the generalization properties of infi...
read it

Variational Predictive Information Bottleneck
In classic papers, Zellner demonstrated that Bayesian inference could be...
read it

On Predictive Information Suboptimality of RNNs
Certain biological neurons demonstrate a remarkable capability to optima...
read it

On Variational Bounds of Mutual Information
Estimating and optimizing Mutual Information (MI) is core to many proble...
read it

On the Use of ArXiv as a Dataset
The arXiv has collected 1.5 million preprint articles over 28 years, ho...
read it

βVAEs can retain label information even at high compression
In this paper, we investigate the degree to which the encoding of a βVA...
read it

TherML: Thermodynamics of Machine Learning
In this work we offer a framework for reasoning about a wide class of ex...
read it

Uncertainty in the Variational Information Bottleneck
We present a simple case study, demonstrating that Variational Informati...
read it

GILBO: One Metric to Measure Them All
We propose a simple, tractable lower bound on the mutual information con...
read it

An InformationTheoretic Analysis of Deep LatentVariable Models
We present an informationtheoretic framework for understanding tradeof...
read it

Jeffrey's prior sampling of deep sigmoidal networks
Neural networks have been shown to have a remarkable ability to uncover ...
read it

Clustering via ContentAugmented Stochastic Blockmodels
Much of the data being created on the web contains interactions between ...
read it

Text Segmentation based on Semantic Word Embeddings
We explore the use of semantic word embeddings in text segmentation algo...
read it
Alexander A. Alemi
is this you? claim profile