MultiQG-TI: Towards Question Generation from Multi-modal Sources

07/07/2023
by   Zichao Wang, et al.
0

We study the new problem of automatic question generation (QG) from multi-modal sources containing images and texts, significantly expanding the scope of most of the existing work that focuses exclusively on QG from only textual sources. We propose a simple solution for our new problem, called MultiQG-TI, which enables a text-only question generator to process visual input in addition to textual input. Specifically, we leverage an image-to-text model and an optical character recognition model to obtain the textual description of the image and extract any texts in the image, respectively, and then feed them together with the input texts to the question generator. We only fine-tune the question generator while keeping the other components fixed. On the challenging ScienceQA dataset, we demonstrate that MultiQG-TI significantly outperforms ChatGPT with few-shot prompting, despite having hundred-times less trainable parameters. Additional analyses empirically confirm the necessity of both visual and textual signals for QG and show the impact of various modeling choices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/05/2023

Text2Face: A Multi-Modal 3D Face Model

We present the first 3D morphable modelling approach, whereby 3D face sh...
research
10/13/2020

A Multi-Modal Method for Satire Detection using Textual and Visual Cues

Satire is a form of humorous critique, but it is sometimes misinterprete...
research
06/15/2023

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

Video Question Answering (VideoQA) has been significantly advanced from ...
research
12/13/2021

ITA: Image-Text Alignments for Multi-Modal Named Entity Recognition

Recently, Multi-modal Named Entity Recognition (MNER) has attracted a lo...
research
08/10/2022

CLEVR-Math: A Dataset for Compositional Language, Visual and Mathematical Reasoning

We introduce CLEVR-Math, a multi-modal math word problems dataset consis...
research
02/17/2023

Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts

Medical vision-and-language pre-training (Med-VLP) has shown promising i...
research
11/30/2021

Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources

Misinformation is now a major problem due to its potential high risks to...

Please sign up or login with your details

Forgot password? Click here to reset