Diffusion models have demonstrated excellent potential for generating di...
Autoregressive models for text sometimes generate repetitive and low-qua...
Diffusion models have recently become the de-facto approach for generati...
Training stability is of great importance to Transformers. In this work,...
Denoising Diffusion models have demonstrated their proficiency for gener...
Diffusion models (DMs) have recently emerged as SoTA tools for generativ...
We introduce GAUDI, a generative model capable of capturing the distribu...
Transformers have gained increasing popularity in a wide range of
applic...
The grokking phenomenon as reported by Power et al. ( arXiv:2201.02177 )...
In this paper, we study the representation of neural networks from the v...
Modeling the world can benefit robot learning by providing a rich traini...
Including memory banks in a natural language processing architecture
inc...
We analyze the learning dynamics of infinitely wide neural networks with...
We introduce Attention Free Transformer (AFT), an efficient variant of
T...
Offline Reinforcement Learning promises to learn effective policies from...
We study the problem of directly optimizing arbitrary non-differentiable...
State-of-the-art learning-based monocular 3D reconstruction methods lear...
Images with shared characteristics naturally form sets. For example, in ...
Modern neural network performance typically improves as model size incre...
We examine Generative Adversarial Networks (GANs) through the lens of de...
Deep neural networks require collecting and annotating large amounts of ...
In most machine learning training paradigms a fixed, often handcrafted, ...
In this paper, we describe an effective convolutional neural network
fra...
The rapid growth of Electronic Health Records (EHRs), as well as the
acc...
Structural correspondence learning (SCL) is an effective method for
cros...
Multi-task learning aims to improve generalization performance of multip...
Feature pooling layers (e.g., max pooling) in convolutional neural netwo...
In this paper, we attack the anomaly detection problem by directly model...