Karttikeya Mangalam

research

∙ 08/17/2023

EgoSchema: A Diagnostic Benchmark for Very Long-form Video Language Understanding

We introduce EgoSchema, a very long-form video question-answering datase...

0 Karttikeya Mangalam, et al. ∙

research

∙ 06/15/2023

PaReprop: Fast Parallelized Reversible Backpropagation

The growing size of datasets and deep learning models has made faster an...

4 Tyler Lixuan Zhu, et al. ∙

research

∙ 04/06/2023

Diffusion Models as Masked Autoencoders

There has been a longstanding belief that generation can facilitate a tr...

0 Chen Wei, et al. ∙

research

∙ 02/15/2023

Big Little Transformer Decoder

The recent emergence of Large Language Models based on the Transformer a...

0 Sehoon Kim, et al. ∙

research

∙ 11/25/2022

Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

Temporal action localization (TAL) requires long-form reasoning to predi...

0 Chen Zhao, et al. ∙

research

∙ 06/15/2022

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

This technical report describes the SViT approach for the Ego4D Point of...

9 Elad Ben-Avraham, et al. ∙

research

∙ 06/13/2022

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

Recent action recognition models have achieved impressive results by int...

15 Elad Ben-Avraham, et al. ∙

research

∙ 06/02/2022

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

The recently proposed Conformer model has become the de facto backbone m...

29 Sehoon Kim, et al. ∙

research

∙ 01/20/2022

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

While today's video recognition systems parse snapshots or short clips a...

10 Chao-Yuan Wu, et al. ∙

research

∙ 12/29/2021

Overcoming Mode Collapse with Adaptive Multi Adversarial Training

Generative Adversarial Networks (GANs) are a class of generative models ...

8 Karttikeya Mangalam, et al. ∙

research

∙ 12/02/2021

Improved Multiscale Vision Transformers for Classification and Detection

In this paper, we study Multiscale Vision Transformers (MViT) as a unifi...

21 Yanghao Li, et al. ∙

research

∙ 10/13/2021

Object-Region Video Transformers

Evidence from cognitive psychology suggests that understanding spatio-te...

4 Roei Herzig, et al. ∙

research

∙ 04/22/2021

Multiscale Vision Transformers

We present Multiscale Vision Transformers (MViT) for video and image rec...

9 Haoqi Fan, et al. ∙

research

∙ 12/02/2020

From Goals, Waypoints Paths To Long Term Human Trajectory Forecasting

Human trajectory forecasting is an inherently multi-modal problem. Uncer...

16 Karttikeya Mangalam, et al. ∙

research

∙ 07/07/2020

Long-term Human Motion Prediction with Scene Context

Human movement is goal-directed and influenced by the spatial layout of ...

6 Zhe Cao, et al. ∙

research

∙ 11/04/2019

Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision

We tackle the problem of Human Locomotion Forecasting, a task for jointl...

13 Karttikeya Mangalam, et al. ∙

research

∙ 12/01/2018

On Compressing U-net Using Knowledge Distillation

We study the use of knowledge distillation to compress the U-net archite...

0 Karttikeya Mangalam, et al. ∙

research

∙ 12/12/2017

Learning Spontaneity to Improve Emotion Recognition In Speech

We investigate the effect and usefulness of spontaneity in speech (i.e. ...

0 Karttikeya Mangalam, et al. ∙

research

∙ 11/30/2017

Future Person Localization in First-Person Videos

We present a new task that predicts future locations of people observed ...

0 Takuma Yagi, et al. ∙

research

∙ 05/19/2017

Bitwise Operations of Cellular Automaton on Gray-scale Images

Cellular Automata (CA) theory is a discrete model that represents the st...

0 Karttikeya Mangalam, et al. ∙

Karttikeya Mangalam

Featured Co-authors

Sign in with Google

Consider DeepAI Pro