Detecting Idiomatic Multiword Expressions in Clinical Terminology using Definition-Based Representation Learning

05/11/2023
by   François Remy, et al.
0

This paper shines a light on the potential of definition-based semantic models for detecting idiomatic and semi-idiomatic multiword expressions (MWEs) in clinical terminology. Our study focuses on biomedical entities defined in the UMLS ontology and aims to help prioritize the translation efforts of these entities. In particular, we develop an effective tool for scoring the idiomaticity of biomedical MWEs based on the degree of similarity between the semantic representations of those MWEs and a weighted average of the representation of their constituents. We achieve this using a biomedical language model trained to produce similar representations for entity names and their definitions, called BioLORD. The importance of this definition-based approach is highlighted by comparing the BioLORD model to two other state-of-the-art biomedical language models based on Transformer: SapBERT and CODER. Our results show that the BioLORD model has a strong ability to identify idiomatic MWEs, not replicated in other models. Our corpus-free idiomaticity estimation helps ontology translators to focus on more challenging MWEs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

BioLORD: Learning Ontological Representations from Definitions (for Biomedical Concepts and their Textual Descriptions)

This work introduces BioLORD, a new pre-training strategy for producing ...
research
08/23/2021

Expressing and Executing Informed Consent Permissions Using SWRL: The All of Us Use Case

The informed consent process is a complicated procedure involving permis...
research
06/01/2023

Automatic Glossary of Clinical Terminology: a Large-Scale Dictionary of Biomedical Definitions Generated from Ontological Knowledge

Background: More than 400,000 biomedical concepts and some of their rela...
research
06/17/2021

Biomedical Interpretable Entity Representations

Pre-trained language models induce dense entity representations that off...
research
04/14/2020

Multi-Ontology Refined Embeddings (MORE): A Hybrid Multi-Ontology and Corpus-based Semantic Representation for Biomedical Concepts

Objective: Currently, a major limitation for natural language processing...
research
05/28/2023

Large Language Models, scientific knowledge and factuality: A systematic analysis in antibiotic discovery

Inferring over and extracting information from Large Language Models (LL...
research
01/31/2018

Onto2Vec: joint vector-based representation of biological entities and their ontology-based annotations

We propose the Onto2Vec method, an approach to learn feature vectors for...

Please sign up or login with your details

Forgot password? Click here to reset