Enhancing Biomedical Text Summarization and Question-Answering: On the Utility of Domain-Specific Pre-Training

07/10/2023
by   Dima Galat, et al.
0

Biomedical summarization requires large datasets to train for text generation. We show that while transfer learning offers a viable option for addressing this challenge, an in-domain pre-training does not always offer advantages in a BioASQ summarization task. We identify a suitable model architecture and use it to show a benefit of a general-domain pre-training followed by a task-specific fine-tuning in the context of a BioASQ summarization task, leading to a novel three-step fine-tuning approach that works with only a thousand in-domain examples. Our results indicate that a Large Language Model without domain-specific pre-training can have a significant edge in some domain-specific biomedical text generation tasks.

READ FULL TEXT
research
12/15/2021

DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization

We propose a novel domain-specific generative pre-training (DS-GPT) meth...
research
08/22/2019

Denoising based Sequence-to-Sequence Pre-training for Text Generation

This paper presents a new sequence-to-sequence (seq2seq) pre-training me...
research
09/27/2020

Unsupervised Pre-training for Biomedical Question Answering

We explore the suitability of unsupervised representation learning metho...
research
06/16/2018

Large Scale Fine-Grained Categorization and Domain-Specific Transfer Learning

Transferring the knowledge learned from large scale datasets (e.g., Imag...
research
04/11/2022

Towards Generalizable Semantic Product Search by Text Similarity Pre-training on Search Click Logs

Recently, semantic search has been successfully applied to e-commerce pr...
research
09/02/2023

LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs

Question-answering (QA) is a significant application of Large Language M...
research
04/14/2023

Learn What Is Possible, Then Choose What Is Best: Disentangling One-To-Many Relations in Language Through Text-based Games

Language models pre-trained on large self-supervised corpora, followed b...

Please sign up or login with your details

Forgot password? Click here to reset