ATR4S: Toolkit with State-of-the-art Automatic Terms Recognition Methods in Scala

11/23/2016
by   N. Astrakhantsev, et al.
0

Automatically recognized terminology is widely used for various domain-specific texts processing tasks, such as machine translation, information retrieval or sentiment analysis. However, there is still no agreement on which methods are best suited for particular settings and, moreover, there is no reliable comparison of already developed methods. We believe that one of the main reasons is the lack of state-of-the-art methods implementations, which are usually non-trivial to recreate. In order to address these issues, we present ATR4S, an open-source software written in Scala that comprises more than 15 methods for automatic terminology recognition (ATR) and implements the whole pipeline from text document preprocessing, to term candidates collection, term candidates scoring, and finally, term candidates ranking. It is highly scalable, modular and configurable tool with support of automatic caching. We also compare 10 state-of-the-art methods on 7 open datasets by average precision and processing time. Experimental comparison reveals that no single method demonstrates best average precision for all datasets and that other available tools for ATR do not contain the best methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2014

Tools for Terminology Processing

Automatic terminology processing appeared 10 years ago when electronic c...
research
01/17/2023

The Recent Advances in Automatic Term Extraction: A survey

Automatic term extraction (ATE) is a Natural Language Processing (NLP) t...
research
04/10/2021

Sentiment-based Candidate Selection for NMT

The explosion of user-generated content (UGC)–e.g. social media posts, c...
research
11/09/2017

SemRe-Rank: Incorporating Semantic Relatedness to Improve Automatic Term Extraction Using Personalized PageRank

Automatic Term Extraction deals with the extraction of terminology from ...
research
05/24/2023

A Distributed Automatic Domain-Specific Multi-Word Term Recognition Architecture using Spark Ecosystem

Automatic Term Recognition is used to extract domain-specific terms that...
research
06/17/2021

pysentimiento: A Python Toolkit for Sentiment Analysis and SocialNLP tasks

Extracting opinions from texts has gathered a lot of interest in the las...

Please sign up or login with your details

Forgot password? Click here to reset