Multi-Modal Experience Inspired AI Creation

09/02/2022
by   Qian Cao, et al.
0

AI creation, such as poem or lyrics generation, has attracted increasing attention from both industry and academic communities, with many promising models proposed in the past few years. Existing methods usually estimate the outputs based on single and independent visual or textual information. However, in reality, humans usually make creations according to their experiences, which may involve different modalities and be sequentially correlated. To model such human capabilities, in this paper, we define and solve a novel AI creation problem based on human experiences. More specifically, we study how to generate texts based on sequential multi-modal information. Compared with the previous works, this task is much more difficult because the designed model has to well understand and adapt the semantics among different modalities and effectively convert them into the output in a sequential manner. To alleviate these difficulties, we firstly design a multi-channel sequence-to-sequence architecture equipped with a multi-modal attention network. For more effective optimization, we then propose a curriculum negative sampling strategy tailored for the sequential inputs. To benchmark this problem and demonstrate the effectiveness of our model, we manually labeled a new multi-modal experience dataset. With this dataset, we conduct extensive experiments by comparing our model with a series of representative baselines, where we can demonstrate significant improvements in our model based on both automatic and human-centered metrics. The code and data are available at: <https://github.com/Aman-4-Real/MMTG>.

READ FULL TEXT

page 7

page 8

research
12/08/2022

OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

Generalist models, which are capable of performing diverse multi-modal t...
research
08/28/2021

AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data

The use of multi-modal data such as the combination of whole slide image...
research
09/30/2021

Multi-Modal Sarcasm Detection Based on Contrastive Attention Mechanism

In the past decade, sarcasm detection has been intensively conducted in ...
research
08/04/2019

Improving IT Support by Enhancing Incident Management Process with Multi-modal Analysis

IT support services industry is going through a major transformation wit...
research
10/11/2022

ViLPAct: A Benchmark for Compositional Generalization on Multimodal Human Activities

We introduce ViLPAct, a novel vision-language benchmark for human activi...
research
08/23/2023

NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

Non-photorealistic videos are in demand with the wave of the metaverse, ...
research
06/15/2023

Med-MMHL: A Multi-Modal Dataset for Detecting Human- and LLM-Generated Misinformation in the Medical Domain

The pervasive influence of misinformation has far-reaching and detriment...

Please sign up or login with your details

Forgot password? Click here to reset