DeepAI AI Chat
Log In Sign Up

FlexiTerm: A more efficient implementation of flexible multi-word term recognition

by   Irena Spasić, et al.

Terms are linguistic signifiers of domain-specific concepts. Automated recognition of multi-word terms (MWT) in free text is a sequence labelling problem, which is commonly addressed using supervised machine learning methods. Their need for manual annotation of training data makes it difficult to port such methods across domains. FlexiTerm, on the other hand, is a fully unsupervised method for MWT recognition from domain-specific corpora. Originally implemented in Java as a proof of concept, it did not scale well, thus offering little practical value in the context of big data. In this paper, we describe its re-implementation in Python and compare the performance of these two implementations. The results demonstrated major improvements in terms of efficiency, which allow FlexiTerm to transition from the proof of concept to the production-grade application.


Differentiable Disentanglement Filter: an Application Agnostic Core Concept Discovery Probe

It has long been speculated that deep neural networks function by discov...

A Survey on Domain-Specific Languages for Machine Learning in Big Data

The amount of data generated in the modern society is increasing rapidly...

Functorial Language Models

We introduce functorial language models: a principled way to compute pro...

Essentia: Mining Domain-specific Paraphrases with Word-Alignment Graphs

Paraphrases are important linguistic resources for a wide variety of NLP...

Balancing Multi-Domain Corpora Learning for Open-Domain Response Generation

Open-domain conversational systems are assumed to generate equally good ...

FastContext: an efficient and scalable implementation of the ConText algorithm

Objective: To develop and evaluate FastContext, an efficient, scalable i...

Proof-of-Concept Examples of Performance-Transparent Programming Models

Machine-specific optimizations command the machine to behave in a specif...

Code Repositories


Repository for FlexiTerm: a software tool to automatically recognise multi-word terms in text documents.

view repo