Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information

05/07/2022
by   Zhipeng Zhang, et al.
0

Recently, online shopping has gradually become a common way of shopping for people all over the world. Wonderful merchandise advertisements often attract more people to buy. These advertisements properly integrate multimodal multi-structured information of commodities, such as visual spatial information and fine-grained structure information. However, traditional multimodal text generation focuses on the conventional description of what existed and happened, which does not match the requirement of advertisement copywriting in the real world. Because advertisement copywriting has a vivid language style and higher requirements of faithfulness. Unfortunately, there is a lack of reusable evaluation frameworks and a scarcity of datasets. Therefore, we present a dataset, E-MMAD (e-commercial multimodal multi-structured advertisement copywriting), which requires, and supports much more detailed information in text generation. Noticeably, it is one of the largest video captioning datasets in this field. Accordingly, we propose a baseline method and faithfulness evaluation metric on the strength of structured information reasoning to solve the demand in reality on this dataset. It surpasses the previous methods by a large margin on all metrics. The dataset and method are coming soon on <https://e-mmad.github.io/e-mmad.net/index.html>.

READ FULL TEXT

page 1

page 5

page 11

page 12

research
09/12/2019

VizSeq: A Visual Analysis Toolkit for Text Generation Tasks

Automatic evaluation of text generation tasks (e.g. machine translation,...
research
09/21/2023

CAMERA: A Multimodal Dataset and Benchmark for Ad Text Generation

In response to the limitations of manual online ad production, significa...
research
03/06/2023

Models See Hallucinations: Evaluating the Factuality in Video Captioning

Video captioning aims to describe events in a video with natural languag...
research
09/20/2023

Kosmos-2.5: A Multimodal Literate Model

We present Kosmos-2.5, a multimodal literate model for machine reading o...
research
06/29/2023

ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles

Automatically generating textual content with desired attributes is an a...
research
09/13/2022

Visual Recipe Flow: A Dataset for Learning Visual State Changes of Objects with Recipe Flows

We present a new multimodal dataset called Visual Recipe Flow, which ena...
research
05/24/2019

Deep Reason: A Strong Baseline for Real-World Visual Reasoning

This paper presents a strong baseline for real-world visual reasoning (G...

Please sign up or login with your details

Forgot password? Click here to reset