SUSiNet: See, Understand and Summarize it

12/03/2018
by   Petros Koutras, et al.
0

In this work we propose a multi-task spatio-temporal network, called SUSiNet, that can jointly tackle the spatio-temporal problems of saliency estimation, action recognition and video summarization. Our approach employs a single network that is jointly end-to-end trained for all tasks with multiple and diverse datasets related to the exploring tasks. The proposed network uses a unified architecture that includes global and task specific layer and produces multiple output types, i.e., saliency maps or classification labels, by employing the same video input. Moreover, one additional contribution is that the proposed network can be deeply supervised through an attention module that is related to human attention as it is expressed by eye-tracking data. From the extensive evaluation, on seven different datasets, we have observed that the multi-task network performs as well as the state-of-the-art single-task methods (or in some cases better), while it requires less computational budget than having one independent network per each task.

READ FULL TEXT

page 1

page 3

page 8

research
01/09/2020

STAViS: Spatio-Temporal AudioVisual Saliency Network

We introduce STAViS, a spatio-temporal audiovisual saliency network that...
research
04/25/2019

DynamoNet: Dynamic Action and Motion Network

In this paper, we are interested in self-supervised learning the motion ...
research
10/25/2022

Multi-task Video Enhancement for Dental Interventions

A microcamera firmly attached to a dental handpiece allows dentists to c...
research
12/15/2019

Multi-task Deep Learning for Real-Time 3D Human Pose Estimation and Action Recognition

Human pose estimation and action recognition are related tasks since bot...
research
12/10/2021

ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath While Tracking Instruments in Robotic Surgery

Representation learning of the task-oriented attention while tracking in...
research
07/14/2021

FetalNet: Multi-task deep learning framework for fetal ultrasound biometric measurements

In this paper, we propose an end-to-end multi-task neural network called...
research
07/17/2019

OmniNet: A unified architecture for multi-modal multi-task learning

Transformer is a popularly used neural network architecture, especially ...

Please sign up or login with your details

Forgot password? Click here to reset