Table Caption Generation in Scholarly Documents Leveraging Pre-trained Language Models

08/18/2021
by   Junjie H. Xu, et al.
17

This paper addresses the problem of generating table captions for scholarly documents, which often require additional information outside the table. To this end, we propose a method of retrieving relevant sentences from the paper body, and feeding the table content as well as the retrieved sentences into pre-trained language models (e.g. T5 and GPT-2) for generating table captions. The contributions of this paper are: (1) discussion on the challenges in table captioning for scholarly documents; (2) development of a dataset DocBank-TB, which is publicly available; and (3) comparison of caption generation methods for scholarly documents with different strategies to retrieve relevant sentences from the paper body. Our experimental results showed that T5 is the better generation model for this task, as it outperformed GPT-2 in BLEU and METEOR implying that the generated text are clearer and more precise. Moreover, inputting relevant sentences matching the row header or whole table is effective.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2023

LoFT: Enhancing Faithfulness and Diversity for Table-to-Text Generation via Logic Form Control

Logical Table-to-Text (LT2T) generation is tasked with generating logica...
research
03/01/2022

Attend, Memorize and Generate: Towards Faithful Table-to-Text Generation in Few Shots

Few-shot table-to-text generation is a task of composing fluent and fait...
research
06/03/2023

Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models

In this paper, we propose a table and image generation task to verify ho...
research
09/06/2021

Text-to-Table: A New Way of Information Extraction

We study a new problem setting of information extraction (IE), referred ...
research
05/31/2023

A Sequence-to-Sequence Set Model for Text-to-Table Generation

Recently, the text-to-table generation task has attracted increasing att...
research
05/29/2018

Table-to-Text: Describing Table Region with Natural Language

In this paper, we present a generative model to generate a natural langu...
research
06/07/2023

Privately generating tabular data using language models

Privately generating synthetic data from a table is an important brick o...

Please sign up or login with your details

Forgot password? Click here to reset