Christoph Feichtenhofer

research

∙ 06/01/2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Modern hierarchical vision transformers have added several vision-specif...

0 Chaitanya Ryali, et al. ∙

research

∙ 04/06/2023

Diffusion Models as Masked Autoencoders

There has been a longstanding belief that generation can facilitate a tr...

0 Chen Wei, et al. ∙

research

∙ 04/03/2023

On the Benefits of 3D Pose and Tracking for Human Action Recognition

In this work we study the benefits of using tracking and 3D poses for ac...

6 Jathushan Rajasegaran, et al. ∙

research

∙ 03/23/2023

The effectiveness of MAE pre-pretraining for billion-scale pretraining

This paper revisits the standard pretrain-then-finetune paradigm used in...

0 Mannat Singh, et al. ∙

research

∙ 01/19/2023

Multiview Compressive Coding for 3D Reconstruction

A central goal of visual recognition is to understand objects and scenes...

11 Chao-Yuan Wu, et al. ∙

research

∙ 01/05/2023

CiT: Curation in Training for Effective Vision-Language Data

Large vision-language models are generally applicable to many downstream...

0 Hu Xu, et al. ∙

research

∙ 12/15/2022

MAViL: Masked Audio-Video Learners

We present Masked Audio-Video Learners (MAViL) to train audio-visual rep...

0 Po-Yao Huang, et al. ∙

research

∙ 12/01/2022

Scaling Language-Image Pre-training via Masking

We present Fast Language-Image Pre-training (FLIP), a simple and more ef...

0 Yanghao Li, et al. ∙

research

∙ 10/17/2022

Token Merging: Your ViT But Faster

We introduce Token Merging (ToMe), a simple method to increase the throu...

0 Daniel Bolya, et al. ∙

research

∙ 07/13/2022

Masked Autoencoders that Listen

This paper studies a simple extension of image-based Masked Autoencoders...

3 Po-Yao Huang, et al. ∙

research

∙ 05/18/2022

Masked Autoencoders As Spatiotemporal Learners

This paper studies a conceptually simple extension of Masked Autoencoder...

0 Christoph Feichtenhofer, et al. ∙

research

∙ 01/20/2022

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

While today's video recognition systems parse snapshots or short clips a...

10 Chao-Yuan Wu, et al. ∙

research

∙ 01/10/2022

A ConvNet for the 2020s

The "Roaring 20s" of visual recognition began with the introduction of V...

12 Zhuang Liu, et al. ∙

research

∙ 12/16/2021

Masked Feature Prediction for Self-Supervised Visual Pre-Training

We present Masked Feature Prediction (MaskFeat) for self-supervised pre-...

37 Chen Wei, et al. ∙

research

∙ 12/02/2021

Improved Multiscale Vision Transformers for Classification and Detection

In this paper, we study Multiscale Vision Transformers (MViT) as a unifi...

21 Yanghao Li, et al. ∙

research

∙ 11/18/2021

PyTorchVideo: A Deep Learning Library for Video Understanding

We introduce PyTorchVideo, an open-source deep-learning library that pro...

295 Haoqi Fan, et al. ∙

research

∙ 09/28/2021

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding

We present VideoCLIP, a contrastive approach to pre-train a unified mode...

0 Hu Xu, et al. ∙

research

∙ 06/09/2021

Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers

In video transformers, the time dimension is often treated in the same w...

5 Mandela Patrick, et al. ∙

research

∙ 05/20/2021

VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding

We present a simplified, task-agnostic multi-modal pre-training approach...

0 Hu Xu, et al. ∙

research

∙ 04/29/2021

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

We present a large-scale study on unsupervised spatiotemporal representa...

0 Christoph Feichtenhofer, et al. ∙

research

∙ 04/22/2021

Multiscale Vision Transformers

We present Multiscale Vision Transformers (MViT) for video and image rec...

9 Haoqi Fan, et al. ∙

research

∙ 04/01/2021

Multiview Pseudo-Labeling for Semi-supervised Learning from Video

We present a multiview pseudo-labeling approach to video learning, a nov...

0 Bo Xiong, et al. ∙

research

∙ 01/07/2021

TrackFormer: Multi-Object Tracking with Transformers

We present TrackFormer, an end-to-end multi-object tracking and segmenta...

16 Tim Meinhardt, et al. ∙

research

∙ 04/09/2020

X3D: Expanding Architectures for Efficient Video Recognition

This paper presents X3D, a family of efficient video networks that progr...

0 Christoph Feichtenhofer, et al. ∙

research

∙ 04/07/2020

Feature Pyramid Grids

Feature pyramid networks have been widely adopted in the object detectio...

5 Kai Chen, et al. ∙

research

∙ 01/23/2020

Audiovisual SlowFast Networks for Video Recognition

We present Audiovisual SlowFast Networks, an architecture for integrated...

23 Fanyi Xiao, et al. ∙

research

∙ 01/14/2020

EGO-TOPO: Environment Affordances from Egocentric Video

First-person video naturally brings the use of a physical environment to...

13 Tushar Nagarajan, et al. ∙

research

∙ 12/02/2019

A Multigrid Method for Efficiently Training Video Models

Training competitive deep video models is an order of magnitude slower t...

7 Chao-Yuan Wu, et al. ∙

research

∙ 06/06/2019

Learning Temporal Pose Estimation from Sparsely-Labeled Videos

Modern approaches for multi-person pose estimation in video require larg...

0 Gedas Bertasius, et al. ∙

research

∙ 06/03/2019

Grounded Human-Object Interaction Hotspots from Video (Extended Abstract)

Learning how to interact with objects is an important step towards embod...

0 Tushar Nagarajan, et al. ∙

research

∙ 01/21/2019

Modeling Human Motion with Quaternion-based Neural Networks

Previous work on predicting or generating 3D human pose sequences regres...

0 Dario Pavllo, et al. ∙

research

∙ 12/12/2018

Long-Term Feature Banks for Detailed Video Understanding

To understand the world, we humans constantly need to relate the present...

4 Chao-Yuan Wu, et al. ∙

research

∙ 12/11/2018

Grounded Human-Object Interaction Hotspots from Video

Learning how to interact with objects is an important step towards embod...

6 Tushar Nagarajan, et al. ∙

research

∙ 12/11/2018

Learning Discriminative Motion Features Through Detection

Despite huge success in the image domain, modern detection models such a...

0 Gedas Bertasius, et al. ∙

research

∙ 12/10/2018

SlowFast Networks for Video Recognition

We present SlowFast networks for video recognition. Our model involves (...

0 Christoph Feichtenhofer, et al. ∙

research

∙ 11/28/2018

3D human pose estimation in video with temporal convolutions and semi-supervised training

In this work, we demonstrate that 3D poses in video can be effectively e...

0 Dario Pavllo, et al. ∙

research

∙ 02/20/2018

Camera-based vehicle velocity estimation from monocular video

This paper documents the winning entry at the CVPR2017 vehicle velocity ...

0 Moritz Kampelmühler, et al. ∙

research

∙ 01/04/2018

What have we learned from deep representations for action recognition?

As the success of deep models has led to their deployment in all areas o...

0 Christoph Feichtenhofer, et al. ∙

research

∙ 10/11/2017

Detect to Track and Track to Detect

Recent approaches for high accuracy detection and tracking of object cat...

0 Christoph Feichtenhofer, et al. ∙

research

∙ 11/07/2016

Spatiotemporal Residual Networks for Video Action Recognition

Two-stream Convolutional Networks (ConvNets) have shown strong performan...

0 Christoph Feichtenhofer, et al. ∙

research

∙ 04/22/2016

Convolutional Two-Stream Network Fusion for Video Action Recognition

Recent applications of Convolutional Neural Networks (ConvNets) for huma...

0 Christoph Feichtenhofer, et al. ∙

Christoph Feichtenhofer

Featured Co-authors

Sign in with Google

Consider DeepAI Pro