For computer vision tasks, Vision Transformers (ViTs) have become one of...
Mapping high-fidelity 3D geometry to a representation that allows for
in...
A diffusion model learns to predict a vector field of gradients. We prop...
We propose learnable polyphase sampling (LPS), a pair of learnable
down/...
We present TetGAN, a convolutional neural network designed to generate
t...
We propose Fast text2StyleGAN, a natural language interface that adapts
...
Supervised or weakly supervised methods for phrase localization (textual...
Optimization within a layer of a deep-net has emerged as a new direction...
Designing equivariance as an inductive bias into deep-nets has been a
pr...
Solving complex real-world tasks, e.g., autonomous fleet control, often
...
Exploration is critical for good results in deep reinforcement learning ...
Recent research in adversarially robust classifiers suggests their
repre...
Extracting detailed 3D information of objects from video data is an impo...
Deep reinforcement learning (RL) is computationally demanding and requir...
Existing semi-supervised learning (SSL) algorithms use a single weight t...
We propose Chirality Nets, a family of deep nets that is equivariant to ...
Sample efficiency and scalability to a large number of agents are two
im...
Fine-grained action detection is an important task with numerous applica...
Textual grounding is an important but challenging task for human-compute...
Textual grounding, i.e., linking words to objects in images, is a challe...
We address the problem of synthesizing new video frames in an existing v...
Semantic image inpainting is a challenging task where large missing regi...