GLADIS: A General and Large Acronym Disambiguation Benchmark

02/03/2023
by   Lihu Chen, et al.
0

Acronym Disambiguation (AD) is crucial for natural language understanding on various sources, including biomedical reports, scientific papers, and search engine queries. However, existing acronym disambiguation benchmarks and tools are limited to specific domains, and the size of prior benchmarks is rather small. To accelerate the research on acronym disambiguation, we construct a new benchmark named GLADIS with three components: (1) a much larger acronym dictionary with 1.5M acronyms and 6.4M long forms; (2) a pre-training corpus with 160 million sentences; (3) three datasets that cover the general, scientific, and biomedical domains. We then pre-train a language model, AcroBERT, on our constructed corpus for general acronym disambiguation, and show the challenges and values of our new benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/08/2022

RuBioRoBERTa: a pre-trained biomedical language model for Russian language biomedical text mining

This paper presents several BERT-based models for Russian language biome...
research
02/18/2023

BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark

To advance Chinese financial natural language processing (NLP), we intro...
research
11/01/2022

VarMAE: Pre-training of Variational Masked Autoencoder for Domain-adaptive Language Understanding

Pre-trained language models have achieved promising performance on gener...
research
03/18/2021

Rethinking Relational Encoding in Language Model: Pre-Training for General Sequences

Language model pre-training (LMPT) has achieved remarkable results in na...
research
12/20/2022

PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English

Privacy policies provide individuals with information about their rights...
research
02/13/2020

CBAG: Conditional Biomedical Abstract Generation

Biomedical research papers use significantly different language and jarg...
research
08/25/2023

SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Recently, there has been growing interest in using Large Language Models...

Please sign up or login with your details

Forgot password? Click here to reset