Learning from Multiple Sources for Data-to-Text and Text-to-Data

02/22/2023
by   Song Duong, et al.
0

Data-to-text (D2T) and text-to-data (T2D) are dual tasks that convert structured data, such as graphs or tables into fluent text, and vice versa. These tasks are usually handled separately and use corpora extracted from a single source. Current systems leverage pre-trained language models fine-tuned on D2T or T2D tasks. This approach has two main limitations: first, a separate system has to be tuned for each task and source; second, learning is limited by the scarcity of available corpora. This paper considers a more general scenario where data are available from multiple heterogeneous sources. Each source, with its specific data format and semantic domain, provides a non-parallel corpus of text and structured data. We introduce a variational auto-encoder model with disentangled style and content variables that allows us to represent the diversity that stems from multiple sources of text and data. Our model is designed to handle the tasks of D2T and T2D jointly. We evaluate our model on several datasets, and show that by learning from multiple sources, our model closes the performance gap with its supervised single-source counterpart and outperforms it in some cases.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/06/2022

Language Model for Text Analytic in Cybersecurity

NLP is a form of artificial intelligence and machine learning concerned ...
research
08/24/2023

Financial News Analytics Using Fine-Tuned Llama 2 GPT Model

The paper considers the possibility to fine-tune Llama 2 GPT large langu...
research
05/22/2023

Language Models for German Text Simplification: Overcoming Parallel Data Scarcity through Style-specific Pre-training

Automatic text simplification systems help to reduce textual information...
research
03/12/2023

DTT: An Example-Driven Tabular Transformer by Leveraging Large Language Models

Many organizations rely on data from government and third-party sources,...
research
12/16/2021

Unsupervised Matching of Data and Text

Entity resolution is a widely studied problem with several proposals to ...
research
12/10/2022

Structured information extraction from complex scientific text with fine-tuned large language models

Intelligently extracting and linking complex scientific information from...
research
06/05/2023

Text-To-KG Alignment: Comparing Current Methods on Classification Tasks

In contrast to large text corpora, knowledge graphs (KG) provide dense a...

Please sign up or login with your details

Forgot password? Click here to reset