Trevor Darrell

research

∙ 08/28/2023

VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

Existing approaches to unsupervised video instance segmentation typicall...

0 Xudong Wang, et al. ∙

research

∙ 08/21/2023

Can Language Models Learn to Listen?

We present a framework for generating appropriate facial responses from ...

0 Evonne Ng, et al. ∙

research

∙ 07/31/2023

Predicting masked tokens in stochastic locations improves masked image modeling

Self-supervised learning is a promising paradigm in deep learning that e...

0 Amir Bar, et al. ∙

research

∙ 07/03/2023

Hierarchical Open-vocabulary Universal Image Segmentation

Open-vocabulary image segmentation aims to partition an image into seman...

0 Xudong Wang, et al. ∙

research

∙ 06/16/2023

Robot Learning with Sensorimotor Pre-training

We present a self-supervised sensorimotor pre-training approach for robo...

8 Ilija Radosavovic, et al. ∙

research

∙ 06/15/2023

Neural Relighting with Subsurface Scattering by Learning the Radiance Transfer Gradient

Reconstructing and relighting objects and scenes under varying lighting ...

3 Shizhan Zhu, et al. ∙

research

∙ 06/08/2023

Modular Visual Question Answering via Code Generation

We present a framework that formulates visual question answering as modu...

6 Sanjay Subramanian, et al. ∙

research

∙ 05/25/2023

Diversify Your Vision Datasets with Automatic Diffusion-Based Augmentation

Many fine-grained classification tasks, like rare animal identification,...

0 Lisa Dunlap, et al. ∙

research

∙ 05/24/2023

Refocusing Is Key to Transfer Learning

Transfer learning involves adapting a pre-trained model to novel downstr...

0 Baifeng Shi, et al. ∙

research

∙ 05/24/2023

Flan-MoE: Scaling Instruction-Finetuned Language Models with Sparse Mixture of Experts

The explosive growth of language models and their applications have led ...

0 Sheng Shen, et al. ∙

research

∙ 05/23/2023

Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence

Diffusion models have been shown to be capable of generating high-qualit...

0 Grace Luo, et al. ∙

research

∙ 05/11/2023

Simple Token-Level Confidence Improves Caption Correctness

The ability to judge whether a caption correctly describes an image is a...

0 Suzanne Petryk, et al. ∙

research

∙ 05/10/2023

Incorporating Structured Representations into Pretrained Vision Language Models Using Scene Graphs

Vision and Language (VL) models have demonstrated remarkable zero-shot p...

4 Roei Herzig, et al. ∙

research

∙ 03/30/2023

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models

Image editing using diffusion models has witnessed extremely fast-paced ...

0 Vidit Goel, et al. ∙

research

∙ 03/23/2023

Learning and Verification of Task Structure in Instructional Videos

Given the enormous number of instructional videos available online, lear...

0 Medhini Narasimhan, et al. ∙

research

∙ 03/23/2023

Top-Down Visual Attention from Analysis by Synthesis

Current attention algorithms (e.g., self-attention) are stimulus-driven ...

0 Baifeng Shi, et al. ∙

research

∙ 03/13/2023

Scaling Vision-Language Models with Sparse Mixture of Experts

The field of natural language processing (NLP) has made significant stri...

0 Sheng Shen, et al. ∙

research

∙ 03/06/2023

Learning Humanoid Locomotion with Transformers

We present a sim-to-real learning-based approach for real-world humanoid...

0 Ilija Radosavovic, et al. ∙

research

∙ 03/02/2023

Dropout Reduces Underfitting

Introduced by Hinton et al. in 2012, dropout has stood the test of time ...

0 Zhuang Liu, et al. ∙

research

∙ 02/13/2023

Guiding Pretraining in Reinforcement Learning with Large Language Models

Reinforcement learning algorithms typically struggle in the absence of a...

0 Yuqing Du, et al. ∙

research

∙ 12/30/2022

Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning

Remote sensing imagery provides comprehensive views of the Earth, where ...

0 Colorado J. Reed, et al. ∙

research

∙ 12/08/2022

PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data

Action recognition models have achieved impressive results by incorporat...

7 Roei Herzig, et al. ∙

research

∙ 12/01/2022

Shape-Guided Diffusion with Inside-Outside Attention

Shape can specify key object constraints, yet existing text-to-image dif...

0 Dong Huk Park, et al. ∙

research

∙ 11/28/2022

G^3: Geolocation via Guidebook Grounding

We demonstrate how language can improve geolocation: the task of predict...

10 Grace Luo, et al. ∙

research

∙ 11/21/2022

Multitask Vision-Language Prompt Tuning

Prompt Tuning, conditioning on task-specific learned prompt vectors, has...

0 Sheng Shen, et al. ∙

research

∙ 10/18/2022

Using Language to Extend to Unseen Domains

It is expensive to collect training data for every possible domain that ...

3 Lisa Dunlap, et al. ∙

research

∙ 10/06/2022

Real-World Robot Learning with Masked Visual Pre-training

In this work, we explore self-supervised visual pre-training on images f...

15 Ilija Radosavovic, et al. ∙

research

∙ 09/19/2022

Decentralized Vehicle Coordination: The Berkeley DeepDrive Drone Dataset

Decentralized multiagent planning has been an important field of researc...

0 Fangyu Wu, et al. ∙

research

∙ 09/07/2022

Prior Knowledge-Guided Attention in Self-Supervised Vision Transformers

Recent trends in self-supervised representation learning have focused on...

0 Kevin Miao, et al. ∙

research

∙ 09/06/2022

Studying Bias in GANs through the Lens of Race

In this work, we study how the performance and evaluation of generative ...

24 Vongani H. Maluleke, et al. ∙

research

∙ 09/01/2022

Visual Prompting via Image Inpainting

How does one adapt a pre-trained visual model to novel downstream tasks ...

10 Amir Bar, et al. ∙

research

∙ 08/25/2022

Refine and Represent: Region-to-Object Representation Learning

Recent works in self-supervised learning have demonstrated strong perfor...

0 Akash Gokul, et al. ∙

research

∙ 08/14/2022

TL;DW? Summarizing Instructional Videos with Task Relevance Cross-Modal Saliency

YouTube users looking for instructions for a specific task may spend a l...

2 Medhini Narasimhan, et al. ∙

research

∙ 07/07/2022

Back to the Source: Diffusion-Driven Test-Time Adaptation

Test-time adaptation harnesses test inputs to improve the accuracy of a ...

0 Jin Gao, et al. ∙

research

∙ 07/04/2022

Disentangled Action Recognition with Knowledge Bases

Action in video usually involves the interaction of human with objects. ...

10 Zhekun Luo, et al. ∙

research

∙ 06/15/2022

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

This technical report describes the SViT approach for the Ego4D Point of...

9 Elad Ben-Avraham, et al. ∙

research

∙ 06/13/2022

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

Recent action recognition models have achieved impressive results by int...

15 Elad Ben-Avraham, et al. ∙

research

∙ 05/19/2022

Voxel-informed Language Grounding

Natural language applied to natural 2D images describes a fundamentally ...

1 Rodolfo Corona, et al. ∙

research

∙ 04/28/2022

Reliable Visual Question Answering: Abstain Rather Than Answer Incorrectly

Machine learning has advanced dramatically, narrowing the accuracy gap t...

0 Spencer Whitehead, et al. ∙

research

∙ 04/23/2022

Visual Attention Emerges from Recurrent Sparse Reconstruction

Visual attention helps achieve robust perception under noise, corruption...

6 Baifeng Shi, et al. ∙

research

∙ 04/21/2022

Contrastive Test-Time Adaptation

Test-time adaptation is a special setting of unsupervised domain adaptat...

0 Dian Chen, et al. ∙

research

∙ 04/20/2022

K-LITE: Learning Transferable Visual Models with External Knowledge

Recent state-of-the-art computer vision systems are trained from natural...

3 Sheng Shen, et al. ∙

research

∙ 04/18/2022

Learning to Listen: Modeling Non-Deterministic Dyadic Facial Motion

We present a framework for modeling interactional communication in dyadi...

4 Evonne Ng, et al. ∙

research

∙ 04/12/2022

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

Training a referring expression comprehension (ReC) model for a new visu...

4 Sanjay Subramanian, et al. ∙

research

∙ 03/19/2022

Teachable Reinforcement Learning via Advice Distillation

Training automated agents to complete complex tasks in interactive envir...

0 Olivia Watkins, et al. ∙

research

∙ 03/11/2022

Masked Visual Pre-training for Motor Control

This paper shows that self-supervised visual pre-training from real-worl...

8 Tete Xiao, et al. ∙

research

∙ 02/17/2022

On Guiding Visual Attention with Language Specification

While real world challenges typically define visual categories with lang...

3 Suzanne Petryk, et al. ∙

research

∙ 01/29/2022

Explaining Reinforcement Learning Policies through Counterfactual Trajectories

In order for humans to confidently decide where to employ RL agents for ...

0 Julius Frost, et al. ∙

research

∙ 01/10/2022

A ConvNet for the 2020s

The "Roaring 20s" of visual recognition began with the introduction of V...

12 Zhuang Liu, et al. ∙

research

∙ 12/21/2021

Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion

In today's era of digital misinformation, we are increasingly faced with...

10 Shruti Agarwal, et al. ∙

Trevor Darrell

Featured Co-authors

Sign in with Google

Consider DeepAI Pro