
The Effects of Invertibility on the Representational Complexity of Encoders in Variational Autoencoders
Training and using modern neuralnetwork based latentvariable generativ...
read it

Universal Approximation for Logconcave Distributions using Wellconditioned Normalizing Flows
Normalizing flows are a widely used class of latentvariable generative ...
read it

Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments
Domain generalization aims at performing well on unseen test environment...
read it

The Limitations of Limited Context for Constituency Parsing
Incorporating syntax into neural approaches in NLP has a multitude of pr...
read it

Contrastive learning of strongmixing continuoustime stochastic processes
Contrastive learning is a family of selfsupervised methods where a mode...
read it

Parametric Complexity Bounds for Approximating PDEs with Neural Networks
Recent empirical results show that deep networks can approximate solutio...
read it

An Online Learning Approach to Interpolation and Extrapolation in Domain Generalization
A popular assumption for outofdistribution generalization is that the ...
read it

The Risks of Invariant Risk Minimization
Invariant Causal Prediction (Peters et al., 2016) is a technique for out...
read it

Representational aspects of depth and conditioning in normalizing flows
Normalizing flows are among the most popular paradigms in generative mod...
read it

Efficient sampling from the Bingham distribution
We give a algorithm for exact sampling from the Bingham distribution p(x...
read it

On Learning LanguageInvariant Representations for Universal Machine Translation
The goal of universal machine translation is to learn to translate betwe...
read it

Fast Convergence for Langevin Diffusion with Matrix Manifold Structure
In this paper, we study the problem of sampling from distributions of th...
read it

Benefits of Overparameterization in SingleLayer Latent Variable Generative Models
One of the most surprising and exciting discoveries in supervising learn...
read it

Sumofsquares meets square loss: Fast rates for agnostic tensor completion
We study tensor completion in the agnostic setting. In the classical ten...
read it

Simulated Tempering Langevin Monte Carlo II: An Improved Proof using Soft Markov Chain Decomposition
A key task in Bayesian machine learning is sampling from distributions t...
read it

Meanfield approximation, convex hierarchies, and the optimality of correlation rounding: a unified perspective
The free energy is a key quantity of interest in Ising models, but unfor...
read it

Approximability of Discriminators Implies Diversity in GANs
While Generative Adversarial Networks (GANs) have empirically produced i...
read it

Representational Power of ReLU Networks and Polynomial Kernels: Beyond WorstCase Analysis
There has been a large amount of interest, both in the past and particul...
read it

Theoretical limitations of EncoderDecoder GAN architectures
Encoderdecoder GANs architectures (e.g., BiGAN and ALI) seek to add an ...
read it

Beyond Logconcavity: Provable Guarantees for Sampling Multimodal Distributions using Simulated Tempering Langevin Monte Carlo
A key task in Bayesian statistics is sampling from distributions that ar...
read it

Provable benefits of representation learning
There is general consensus that learning representations is useful for a...
read it

Extending and Improving Wordnet via Unsupervised Word Embeddings
This work presents an unsupervised approach for improving WordNet that b...
read it

Provable learning of Noisyor Networks
Many machine learning applications use latent variable models to explain...
read it

Recovery Guarantee of Nonnegative Matrix Factorization via Alternating Updates
Nonnegative matrix factorization is a popular tool for decomposing data...
read it

Approximate maximum entropy principles via GoemansWilliamson with applications to provable variational methods
The well known maximumentropy principle due to Jaynes, which states tha...
read it

How to calculate partition functions using convex programming hierarchies: provable bounds for variational methods
We consider the problem of approximating partition functions for Ising m...
read it

Recovery guarantee of weighted lowrank approximation via alternating minimization
Many applications require recovering a ground truth lowrank matrix from...
read it

Linear Algebraic Structure of Word Senses, with Applications to Polysemy
Word embeddings are ubiquitous in NLP and information retrieval, but it'...
read it

On some provably correct cases of variational inference for topic models
Variational inference is a very efficient and popular heuristic used in ...
read it

RANDWALK: A Latent Variable Model Approach to Word Embeddings
Semantic word embeddings represent the meaning of a word via a vector, a...
read it
Andrej Risteski
is this you? claim profile
Machine Learning and Theoretical Computer Science Researcher at Massachusetts Institute of Technology