Egocentric Video Task Translation

12/13/2022
by   Zihui Xue, et al.
0

Different video understanding tasks are typically treated in isolation, and even with distinct types of curated data (e.g., classifying sports in one dataset, tracking animals in another). However, in wearable cameras, the immersive egocentric perspective of a person engaging with the world around them presents an interconnected web of video understanding tasks – hand-object manipulations, navigation in the space, or human-human interactions – that unfold continuously, driven by the person's goals. We argue that this calls for a much more unified approach. We propose EgoTask Translation (EgoT2), which takes a collection of models optimized on separate tasks and learns to translate their outputs for improved performance on any or all of them at once. Unlike traditional transfer or multi-task learning, EgoT2's flipped design entails separate task-specific backbones and a task translator shared across all tasks, which captures synergies between even heterogeneous tasks and mitigates task competition. Demonstrating our model on a wide array of video tasks from Ego4D, we show its advantages over existing transfer paradigms and achieve top-ranked results on four of the Ego4D 2022 benchmark challenges.

READ FULL TEXT

page 5

page 8

research
05/02/2020

Understanding and Improving Information Transfer in Multi-Task Learning

We investigate multi-task learning approaches that use a shared feature ...
research
02/16/2023

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

Video understanding tasks take many forms, from action detection to visu...
research
02/03/2023

Egocentric Video Task Translation @ Ego4D Challenge 2022

This technical report describes the EgoTask Translation approach that ex...
research
01/06/2023

TarViS: A Unified Approach for Target-based Video Segmentation

The general domain of video segmentation is currently fragmented into di...
research
06/20/2021

Heterogeneous Multi-task Learning with Expert Diversity

Predicting multiple heterogeneous biological and medical targets is a ch...
research
06/12/2018

Multi-Task Neural Models for Translating Between Styles Within and Across Languages

Generating natural language requires conveying content in an appropriate...
research
04/15/2022

In-BoXBART: Get Instructions into Biomedical Multi-Task Learning

Single-task models have proven pivotal in solving specific tasks; howeve...

Please sign up or login with your details

Forgot password? Click here to reset