OS-MSL: One Stage Multimodal Sequential Link Framework for Scene Segmentation and Classification

07/04/2022
by   Ye Liu, et al.
0

Scene segmentation and classification (SSC) serve as a critical step towards the field of video structuring analysis. Intuitively, jointly learning of these two tasks can promote each other by sharing common information. However, scene segmentation concerns more on the local difference between adjacent shots while classification needs the global representation of scene segments, which probably leads to the model dominated by one of the two tasks in the training phase. In this paper, from an alternate perspective to overcome the above challenges, we unite these two tasks into one task by a new form of predicting shots link: a link connects two adjacent shots, indicating that they belong to the same scene or category. To the end, we propose a general One Stage Multimodal Sequential Link Framework (OS-MSL) to both distinguish and leverage the two-fold semantics by reforming the two learning tasks into a unified one. Furthermore, we tailor a specific module called DiffCorrNet to explicitly extract the information of differences and correlations among shots. Extensive experiments on a brand-new large scale dataset collected from real-world applications, and MovieScenes are conducted. Both the results demonstrate the effectiveness of our proposed method against strong baselines.

READ FULL TEXT

page 1

page 3

research
06/29/2021

IMENet: Joint 3D Semantic Scene Completion and 2D Semantic Segmentation through Iterative Mutual Enhancement

3D semantic scene completion and 2D semantic segmentation are two tightl...
research
01/06/2023

Task Aware Feature Extraction Framework for Sequential Dependence Multi-Task Learning

Multi-task learning (MTL) has been successfully implemented in many real...
research
06/14/2023

Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph Propagation

Zero-Shot Learning (ZSL), which aims at automatically recognizing unseen...
research
08/05/2023

Learning Unified Decompositional and Compositional NeRF for Editable Novel View Synthesis

Implicit neural representations have shown powerful capacity in modeling...
research
12/12/2021

Multimodal-based Scene-Aware Framework for Aquatic Animal Segmentation

Recent years have witnessed great advances in object segmentation resear...
research
06/09/2023

Embodied Executable Policy Learning with Language-based Scene Summarization

Large Language models (LLMs) have shown remarkable success in assisting ...

Please sign up or login with your details

Forgot password? Click here to reset