Contrastive Graph Multimodal Model for Text Classification in Videos

06/06/2022
by   Ye Liu, et al.
0

The extraction of text information in videos serves as a critical step towards semantic understanding of videos. It usually involved in two steps: (1) text recognition and (2) text classification. To localize texts in videos, we can resort to large numbers of text recognition methods based on OCR technology. However, to our knowledge, there is no existing work focused on the second step of video text classification, which will limit the guidance to downstream tasks such as video indexing and browsing. In this paper, we are the first to address this new task of video text classification by fusing multimodal information to deal with the challenging scenario where different types of video texts may be confused with various colors, unknown fonts and complex layouts. In addition, we tailor a specific module called CorrelationNet to reinforce feature representation by explicitly extracting layout information. Furthermore, contrastive learning is utilized to explore inherent connections between samples using plentiful unlabeled videos. Finally, we construct a new well-defined industrial dataset from the news domain, called TI-News, which is dedicated to building and evaluating video text recognition and classification applications. Extensive experiments on TI-News demonstrate the effectiveness of our method.

READ FULL TEXT
research
08/30/2021

N15News: A New Dataset for Multimodal News Classification

Current news datasets merely focus on text features on the news and rare...
research
04/11/2021

Constructing Contrastive samples via Summarization for Text Classification with limited annotations

Contrastive Learning has emerged as a powerful representation learning m...
research
05/23/2022

Conditional Supervised Contrastive Learning for Fair Text Classification

Contrastive representation learning has gained much attention due to its...
research
04/26/2022

A Robust Contrastive Alignment Method For Multi-Domain Text Classification

Multi-domain text classification can automatically classify texts in var...
research
04/17/2023

Multimodal Short Video Rumor Detection System Based on Contrastive Learning

With short video platforms becoming one of the important channels for ne...
research
07/06/2023

MultiVENT: Multilingual Videos of Events with Aligned Natural Text

Everyday news coverage has shifted from traditional broadcasts towards a...
research
09/11/2022

Improving Keyphrase Extraction with Data Augmentation and Information Filtering

Keyphrase extraction is one of the essential tasks for document understa...

Please sign up or login with your details

Forgot password? Click here to reset