MM-AU:Towards Multimodal Understanding of Advertisement Videos

08/27/2023
by   Digbalay Bose, et al.
0

Advertisement videos (ads) play an integral part in the domain of Internet e-commerce as they amplify the reach of particular products to a broad audience or can serve as a medium to raise awareness about specific issues through concise narrative structures. The narrative structures of advertisements involve several elements like reasoning about the broad content (topic and the underlying message) and examining fine-grained details involving the transition of perceived tone due to the specific sequence of events and interaction among characters. In this work, to facilitate the understanding of advertisements along the three important dimensions of topic categorization, perceived tone transition, and social message detection, we introduce a multimodal multilingual benchmark called MM-AU composed of over 8.4K videos (147 hours) curated from multiple web sources. We explore multiple zero-shot reasoning baselines through the application of large language models on the ads transcripts. Further, we demonstrate that leveraging signals from multiple modalities, including audio, video, and text, in multimodal transformer-based supervised models leads to improved performance compared to unimodal approaches.

READ FULL TEXT

page 1

page 12

page 13

page 14

page 15

page 18

page 19

page 20

research
03/20/2023

MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action

We propose MM-REACT, a system paradigm that integrates ChatGPT with a po...
research
09/14/2023

Zero-shot Audio Topic Reranking using Large Language Models

The Multimodal Video Search by Examples (MVSE) project investigates usin...
research
05/16/2023

A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot

Multimedia content, such as advertisements and story videos, exhibit a r...
research
08/19/2021

Detection of Illicit Drug Trafficking Events on Instagram: A Deep Multimodal Multilabel Learning Approach

Social media such as Instagram and Twitter have become important platfor...
research
09/11/2023

NExT-GPT: Any-to-Any Multimodal LLM

While recently Multimodal Large Language Models (MM-LLMs) have made exci...
research
05/23/2023

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

We propose a novel multimodal video benchmark - the Perception Test - to...

Please sign up or login with your details

Forgot password? Click here to reset