ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles

06/29/2023
by   Haoqin Tu, et al.
0

Automatically generating textual content with desired attributes is an ambitious task that people have pursued long. Existing works have made a series of progress in incorporating unimodal controls into language models (LMs), whereas how to generate controllable sentences with multimodal signals and high efficiency remains an open question. To tackle the puzzle, we propose a new paradigm of zero-shot controllable text generation with multimodal signals (ZeroGen). Specifically, ZeroGen leverages controls of text and image successively from token-level to sentence-level and maps them into a unified probability space at decoding, which customizes the LM outputs by weighted addition without extra training. To achieve better inter-modal trade-offs, we further introduce an effective dynamic weighting mechanism to regulate all control weights. Moreover, we conduct substantial experiments to probe the relationship of being in-depth or in-width between signals from distinct modalities. Encouraging empirical results on three downstream tasks show that ZeroGen not only outperforms its counterparts on captioning tasks by a large margin but also shows great potential in multimodal news generation with a higher degree of control. Our code will be released at https://github.com/ImKeTT/ZeroGen.

READ FULL TEXT

page 1

page 16

research
06/01/2023

Focused Prefix Tuning for Controllable Text Generation

In a controllable text generation dataset, there exist unannotated attri...
research
07/31/2023

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

Image-to-text generation aims to describe images using natural language....
research
07/11/2023

Generative Pretraining in Multimodality

We present Emu, a Transformer-based multimodal foundation model, which c...
research
08/11/2023

ZYN: Zero-Shot Reward Models with Yes-No Questions

In this work, we address the problem of directing the text generations o...
research
05/02/2023

Multimodal Procedural Planning via Dual Text-Image Prompting

Embodied agents have achieved prominent performance in following human i...
research
04/16/2021

An Empirical Study of Extrapolation in Text Generation with Scalar Control

We conduct an empirical evaluation of extrapolation performance when con...
research
05/07/2022

Attract me to Buy: Advertisement Copywriting Generation with Multimodal Multi-structured Information

Recently, online shopping has gradually become a common way of shopping ...

Please sign up or login with your details

Forgot password? Click here to reset