Exploring Modulated Detection Transformer as a Tool for Action Recognition in Videos

09/21/2022
by   Tomás Crisol, et al.
0

During recent years transformers architectures have been growing in popularity. Modulated Detection Transformer (MDETR) is an end-to-end multi-modal understanding model that performs tasks such as phase grounding, referring expression comprehension, referring expression segmentation, and visual question answering. One remarkable aspect of the model is the capacity to infer over classes that it was not previously trained for. In this work we explore the use of MDETR in a new task, action detection, without any previous training. We obtain quantitative results using the Atomic Visual Actions dataset. Although the model does not report the best performance in the task, we believe that it is an interesting finding. We show that it is possible to use a multi-modal model to tackle a task that it was not designed for. Finally, we believe that this line of research may lead into the generalization of MDETR in additional downstream tasks.

READ FULL TEXT
research
04/26/2021

MDETR – Modulated Detection for End-to-End Multi-Modal Understanding

Multi-modal reasoning systems rely on a pre-trained object detector to e...
research
10/17/2021

Towards Language-guided Visual Recognition via Dynamic Convolutions

In this paper, we are committed to establishing an unified and end-to-en...
research
06/14/2022

TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer

In this work, we explore neat yet effective Transformer-based frameworks...
research
05/31/2023

A Multi-Modal Transformer Network for Action Detection

This paper proposes a novel multi-modal transformer network for detectin...
research
06/27/2021

Multi-Modal Chorus Recognition for Improving Song Search

We discuss a novel task, Chorus Recognition, which could potentially ben...
research
12/15/2020

Enhance Multimodal Transformer With External Label And In-Domain Pretrain: Hateful Meme Challenge Winning Solution

Hateful meme detection is a new research area recently brought out that ...
research
06/22/2022

Behavior Transformers: Cloning k modes with one stone

While behavior learning has made impressive progress in recent times, it...

Please sign up or login with your details

Forgot password? Click here to reset