UVA Resources for the Biomedical Vocabulary Alignment at Scale in the UMLS Metathesaurus

05/21/2022
by   Vinh Nguyen, et al.
0

The construction and maintenance process of the UMLS (Unified Medical Language System) Metathesaurus is time-consuming, costly, and error-prone as it relies on (1) the lexical and semantic processing for suggesting synonymous terms, and (2) the expertise of UMLS editors for curating the suggestions. For improving the UMLS Metathesaurus construction process, our research group has defined a new task called UVA (UMLS Vocabulary Alignment) and generated a dataset for evaluating the task. Our group has also developed different baselines for this task using logical rules (RBA), and neural networks (LexLM and ConLM). In this paper, we present a set of reusable and reproducible resources including (1) a dataset generator, (2) three datasets generated by using the generator, and (3) three baseline approaches. We describe the UVA dataset generator and its implementation generalized for any given UMLS release. We demonstrate the use of the dataset generator by generating datasets corresponding to three UMLS releases, 2020AA, 2021AA, and 2021AB. We provide three UVA baselines using the three existing approaches (LexLM, ConLM, and RBA). The code, the datasets, and the experiments are publicly available, reusable, and reproducible with any UMLS release (a no-cost license agreement is required for downloading the UMLS).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2022

UBERT: A Novel Language Model for Synonymy Prediction at Scale in the UMLS Metathesaurus

The UMLS Metathesaurus integrates more than 200 biomedical source vocabu...
research
10/26/2022

BioNLI: Generating a Biomedical NLI Dataset Using Lexico-semantic Constraints for Adversarial Examples

Natural language inference (NLI) is critical for complex decision-making...
research
11/22/2022

A Large-Scale Dataset for Biomedical Keyphrase Generation

Keyphrase generation is the task consisting in generating a set of words...
research
11/19/2021

Pointer over Attention: An Improved Bangla Text Summarization Approach Using Hybrid Pointer Generator Network

Despite the success of the neural sequence-to-sequence model for abstrac...
research
04/21/2022

Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing

Multi-modal data abounds in biomedicine, such as radiology images and re...
research
04/27/2022

Modern Baselines for SPARQL Semantic Parsing

In this work, we focus on the task of generating SPARQL queries from nat...
research
09/12/2018

Learning to Summarize Radiology Findings

The Impression section of a radiology report summarizes crucial radiolog...

Please sign up or login with your details

Forgot password? Click here to reset