Ensembling Transformers for Cross-domain Automatic Term Extraction

12/12/2022
by   Hanh Thi Hong Tran, et al.
0

Automatic term extraction plays an essential role in domain language understanding and several natural language processing downstream tasks. In this paper, we propose a comparative study on the predictive power of Transformers-based pretrained language models toward term extraction in a multi-language cross-domain setting. Besides evaluating the ability of monolingual models to extract single- and multi-word terms, we also experiment with ensembles of mono- and multilingual models by conducting the intersection or union on the term output sets of different language models. Our experiments have been conducted on the ACTER corpus covering four specialized domains (Corruption, Wind energy, Equitation, and Heart failure) and three languages (English, French, and Dutch), and on the RSDO5 Slovenian corpus covering four additional domains (Biomechanics, Chemistry, Veterinary, and Linguistics). The results show that the strategy of employing monolingual models outperforms the state-of-the-art approaches from the related work leveraging multilingual models, regarding all the languages except Dutch and French if the term extraction task excludes the extraction of named entity terms. Furthermore, by combining the outputs of the two best performing models, we achieve significant improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2020

How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models

In this work we provide a systematic empirical comparison of pretrained ...
research
03/26/2019

A New Approach for Semi-automatic Building and Extending a Multilingual Terminology Thesaurus

This paper describes a new system for semi-automatically building, exten...
research
07/27/2021

gaBERT – an Irish Language Model

The BERT family of neural language models have become highly popular due...
research
07/16/2021

Are Multilingual Models the Best Choice for Moderately Under-resourced Languages? A Comprehensive Assessment for Catalan

Multilingual language models have been a crucial breakthrough as they co...
research
10/22/2020

Towards Fully Bilingual Deep Language Modeling

Language models based on deep neural networks have facilitated great adv...
research
08/29/2023

Extracting Mathematical Concepts with Large Language Models

We extract mathematical concepts from mathematical text using generative...
research
05/31/2023

FEED PETs: Further Experimentation and Expansion on the Disambiguation of Potentially Euphemistic Terms

Transformers have been shown to work well for the task of English euphem...

Please sign up or login with your details

Forgot password? Click here to reset