Jitendra Malik

research

∙ 09/18/2023

General In-Hand Object Rotation with Vision and Touch

We introduce RotateIt, a system that enables fingertip-based object rota...

0 Haozhi Qi, et al. ∙

research

∙ 08/17/2023

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

We introduce EgoSchema, a very long-form video question-answering datase...

0 Karttikeya Mangalam, et al. ∙

research

∙ 06/16/2023

Learning Space-Time Semantic Correspondences

We propose a new task of space-time semantic correspondence prediction i...

0 Du Tran, et al. ∙

research

∙ 06/16/2023

Robot Learning with Sensorimotor Pre-training

We present a self-supervised sensorimotor pre-training approach for robo...

8 Ilija Radosavovic, et al. ∙

research

∙ 06/01/2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Modern hierarchical vision transformers have added several vision-specif...

0 Chaitanya Ryali, et al. ∙

research

∙ 05/31/2023

Humans in 4D: Reconstructing and Tracking Humans with Transformers

We present an approach to reconstruct humans and track them over time. A...

0 Shubham Goel, et al. ∙

research

∙ 04/03/2023

On the Benefits of 3D Pose and Tracking for Human Action Recognition

In this work we study the benefits of using tracking and 3D poses for ac...

6 Jathushan Rajasegaran, et al. ∙

research

∙ 04/03/2023

Navigating to Objects Specified by Images

Images are a convenient way to specify which particular object instance ...

0 Jacob Krantz, et al. ∙

research

∙ 03/31/2023

Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?

We present the largest and most comprehensive empirical study of pre-tra...

0 Arjun Majumdar, et al. ∙

research

∙ 03/06/2023

Learning Humanoid Locomotion with Transformers

We present a sim-to-real learning-based approach for real-world humanoid...

0 Ilija Radosavovic, et al. ∙

research

∙ 02/24/2023

Decoupling Human and Camera Motion from Videos in the Wild

We propose a method to reconstruct global human trajectories from videos...

0 Vickie Ye, et al. ∙

research

∙ 02/15/2023

Big Little Transformer Decoder

The recent emergence of Large Language Models based on the Transformer a...

0 Sehoon Kim, et al. ∙

research

∙ 01/19/2023

Multiview Compressive Coding for 3D Reconstruction

A central goal of visual recognition is to understand objects and scenes...

11 Chao-Yuan Wu, et al. ∙

research

∙ 12/15/2022

MAViL: Masked Audio-Video Learners

We present Masked Audio-Video Learners (MAViL) to train audio-visual rep...

0 Po-Yao Huang, et al. ∙

research

∙ 12/02/2022

Navigating to Objects in the Real World

Semantic navigation is necessary to deploy mobile robots in uncontrolled...

0 Théophile Gervet, et al. ∙

research

∙ 11/29/2022

Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

We consider the problem of embodied visual navigation given an image-goa...

0 Jacob Krantz, et al. ∙

research

∙ 11/23/2022

Learning to Imitate Object Interactions from Internet Videos

We study the problem of imitating object interactions from Internet vide...

0 Austin Patel, et al. ∙

research

∙ 10/06/2022

Real-World Robot Learning with Masked Visual Pre-training

In this work, we explore self-supervised visual pre-training on images f...

15 Ilija Radosavovic, et al. ∙

research

∙ 09/26/2022

Learning to Learn with Generative Models of Neural Network Checkpoints

We explore a data-driven approach for learning to optimize neural networ...

44 William Peebles, et al. ∙

research

∙ 09/06/2022

Multi-skill Mobile Manipulation for Object Rearrangement

We study a modular approach to tackle long-horizon mobile manipulation t...

5 Jiayuan Gu, et al. ∙

research

∙ 06/02/2022

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

The recently proposed Conformer model has become the de facto backbone m...

29 Sehoon Kim, et al. ∙

research

∙ 04/12/2022

Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity

Open-world instance segmentation is the task of grouping pixels into obj...

2 Weiyao Wang, et al. ∙

research

∙ 03/11/2022

Masked Visual Pre-training for Motor Control

This paper shows that self-supervised visual pre-training from real-worl...

8 Tete Xiao, et al. ∙

research

∙ 02/10/2022

Image-to-Image Regression with Distribution-Free Uncertainty Quantification and Applications in Imaging

Image-to-image regression is an important learning task, used frequently...

4 Anastasios N. Angelopoulos, et al. ∙

research

∙ 01/25/2022

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

State-of-the-art approaches to ObjectGoal navigation rely on reinforceme...

1 Santhosh Kumar Ramakrishnan, et al. ∙

research

∙ 01/20/2022

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

While today's video recognition systems parse snapshots or short clips a...

10 Chao-Yuan Wu, et al. ∙

research

∙ 12/08/2021

Tracking People by Predicting 3D Appearance, Location Pose

In this paper, we present an approach for tracking people in monocular v...

10 Jathushan Rajasegaran, et al. ∙

research

∙ 12/03/2021

Coupling Vision and Proprioception for Navigation of Legged Robots

We exploit the complementary strengths of vision and proprioception to a...

4 Zipeng Fu, et al. ∙

research

∙ 12/02/2021

Improved Multiscale Vision Transformers for Classification and Detection

In this paper, we study Multiscale Vision Transformers (MViT) as a unifi...

21 Yanghao Li, et al. ∙

research

∙ 12/02/2021

Differentiable Spatial Planning using Transformers

We consider the problem of spatial path planning. In contrast to the cla...

12 Devendra Singh Chaplot, et al. ∙

research

∙ 12/02/2021

SEAL: Self-supervised Embodied Active Learning using Exploration and 3D Consistency

In this paper, we explore how we can build upon the data and models of I...

9 Devendra Singh Chaplot, et al. ∙

research

∙ 11/18/2021

PyTorchVideo: A Deep Learning Library for Video Understanding

We introduce PyTorchVideo, an open-source deep-learning library that pro...

295 Haoqi Fan, et al. ∙

research

∙ 11/15/2021

Tracking People with 3D Representations

We present a novel approach for tracking multiple people in video. Unlik...

5 Jathushan Rajasegaran, et al. ∙

research

∙ 10/25/2021

Minimizing Energy Consumption Leads to the Emergence of Gaits in Legged Robots

Legged locomotion is commonly studied and expressed as a discrete set of...

1 Zipeng Fu, et al. ∙

research

∙ 10/12/2021

ABO: Dataset and Benchmarks for Real-World 3D Object Understanding

We introduce Amazon-Berkeley Objects (ABO), a new large-scale dataset of...

10 Jasmine Collins, et al. ∙

research

∙ 10/11/2021

Differentiable Stereopsis: Meshes from multiple views using differentiable rendering

We propose Differentiable Stereopsis, a multi-view stereo approach that ...

12 Shubham Goel, et al. ∙

research

∙ 10/11/2021

Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans

This paper introduces a pipeline to parametrically sample and render mul...

21 Ainaz Eftekhar, et al. ∙

research

∙ 07/20/2021

Active 3D Shape Reconstruction from Vision and Touch

Humans build 3D understandings of the world through active object explor...

6 Edward J. Smith, et al. ∙

research

∙ 07/08/2021

RMA: Rapid Motor Adaptation for Legged Robots

Successful real-world deployment of legged robots would require them to ...

11 Ashish Kumar, et al. ∙

research

∙ 06/28/2021

Habitat 2.0: Training Home Assistants to Rearrange their Habitat

We introduce Habitat 2.0 (H2.0), a simulation platform for training virt...

25 Andrew Szot, et al. ∙

research

∙ 04/22/2021

Multiscale Vision Transformers

We present Multiscale Vision Transformers (MViT) for video and image rec...

9 Haoqi Fan, et al. ∙

research

∙ 01/07/2021

Distribution-Free, Risk-Controlling Prediction Sets

While improving prediction accuracy has been the focus of machine learni...

11 Stephen Bates, et al. ∙

research

∙ 12/17/2020

Reconstructing Hand-Object Interactions in the Wild

In this work we explore reconstructing hand-object interactions in the w...

2 Zhe Cao, et al. ∙

research

∙ 12/17/2020

Human Mesh Recovery from Multiple Shots

Videos from edited media like movies are a useful, yet under-explored so...

2 Georgios Pavlakos, et al. ∙

research

∙ 12/02/2020

From Goals, Waypoints Paths To Long Term Human Trajectory Forecasting

Human trajectory forecasting is an inherently multi-modal problem. Uncer...

16 Karttikeya Mangalam, et al. ∙

research

∙ 11/26/2020

Better Knowledge Retention through Metric Learning

In continual learning, new categories may be introduced over time, and a...

4 Ke Li, et al. ∙

research

∙ 11/13/2020

Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation

Vision-based robotics often separates the control loop into one module f...

0 Bryan Chen, et al. ∙

research

∙ 11/03/2020

Rearrangement: A Challenge for Embodied AI

We describe a framework for research and evaluation in Embodied AI. Our ...

2 Dhruv Batra, et al. ∙

research

∙ 10/07/2020

Shape, Illumination, and Reflectance from Shading

A fundamental problem in computer vision is that of inferring the intrin...

0 Jonathan T. Barron, et al. ∙

research

∙ 09/29/2020

Uncertainty Sets for Image Classifiers using Conformal Prediction

Convolutional image classifiers can achieve high predictive accuracy, bu...

8 Anastasios Angelopoulos, et al. ∙

Jitendra Malik

Featured Co-authors

Sign in with Google

Consider DeepAI Pro