A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension Datasets

04/02/2023
by   Iva Bojic, et al.
5

Low-quality data can cause downstream problems in high-stakes applications. Data-centric approach emphasizes on improving dataset quality to enhance model performance. High-quality datasets are needed for general-purpose Large Language Models (LLMs) training, as well as for domain-specific models, which are usually small in size as it is costly to engage a large number of domain experts for their creation. Thus, it is vital to ensure high-quality domain-specific training data. In this paper, we propose a framework for enhancing the data quality of original datasets. We applied the proposed framework to four biomedical datasets and showed relative improvement of up to 33 using back translation to enhance the original dataset quality.

READ FULL TEXT

page 4

page 5

page 14

page 15

research
07/20/2023

MediaGPT : A Large Language Model For Chinese Media

Large language models (LLMs) have shown remarkable capabilities in gener...
research
02/26/2022

BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves Biomedical Machine Reading Comprehension Task

Motivation: Biomedical machine reading comprehension (biomedical-MRC) ai...
research
09/18/2023

Adapting Large Language Models via Reading Comprehension

We explore how continued pre-training on domain-specific corpora influen...
research
08/03/2023

Domain specificity and data efficiency in typo tolerant spell checkers: the case of search in online marketplaces

Typographical errors are a major source of frustration for visitors of o...
research
03/17/2022

A domain specific language for data-centric infographics: technical report

The production process of data-centric infographics entails problems rel...
research
09/21/2023

A Vision-Centric Approach for Static Map Element Annotation

The recent development of online static map element (a.k.a. HD Map) cons...
research
04/28/2023

Made of Steel? Learning Plausible Materials for Components in the Vehicle Repair Domain

We propose a novel approach to learn domain-specific plausible materials...

Please sign up or login with your details

Forgot password? Click here to reset