Ensemble Modeling for Multimodal Visual Action Recognition

08/10/2023
by   jyoti-kini, et al.
0

In this work, we propose an ensemble modeling approach for multimodal action recognition. We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset. Based on the underlying principle of focal loss, which captures the relationship between tail (scarce) classes and their prediction difficulties, we propose an exponentially decaying variant of focal loss for our current task. It initially emphasizes learning from the hard misclassified examples and gradually adapts to the entire range of examples in the dataset. This annealing process encourages the model to strike a balance between focusing on the sparse set of hard samples, while still leveraging the information provided by the easier ones. Additionally, we opt for the late fusion strategy to combine the resultant probability distributions from RGB and Depth modalities for final action prediction. Experimental evaluations on the MECCANO dataset demonstrate the effectiveness of our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2022

M M Mix: A Multimodal Multiview Transformer Ensemble

This report describes the approach behind our winning solution to the 20...
research
10/15/2019

Seeing and Hearing Egocentric Actions: How Much Can We Learn?

Our interaction with the world is an inherently multimodal experience. H...
research
03/23/2016

Deep Multimodal Feature Analysis for Action Recognition in RGB+D Videos

Single modality action recognition on RGB or depth sequences has been ex...
research
06/19/2018

Modality Distillation with Multiple Stream Networks for Action Recognition

Diverse input data modalities can provide complementary cues for several...
research
03/13/2020

Gimme Signals: Discriminative signal encoding for multimodal activity recognition

We present a simple, yet effective and flexible method for action recogn...
research
10/20/2020

Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition

Humans can easily recognize actions with only a few examples given, whil...
research
07/31/2015

Multimodal Multipart Learning for Action Recognition in Depth Videos

The articulated and complex nature of human actions makes the task of ac...

Please sign up or login with your details

Forgot password? Click here to reset