Highly Fast Text Segmentation With Pairwise Markov Chains

02/17/2021
by   Elie Azeraf, et al.
0

Natural Language Processing (NLP) models' current trend consists of using increasingly more extra-data to build the best models as possible. It implies more expensive computational costs and training time, difficulties for deployment, and worries about these models' carbon footprint reveal a critical problem in the future. Against this trend, our goal is to develop NLP models requiring no extra-data and minimizing training time. To do so, in this paper, we explore Markov chain models, Hidden Markov Chain (HMC) and Pairwise Markov Chain (PMC), for NLP segmentation tasks. We apply these models for three classic applications: POS Tagging, Named-Entity-Recognition, and Chunking. We develop an original method to adapt these models for text segmentation's specific challenges to obtain relevant performances with very short training and execution times. PMC achieves equivalent results to those obtained by Conditional Random Fields (CRF), one of the most applied models for these tasks when no extra-data are used. Moreover, PMC has training times 30 times shorter than the CRF ones, which validates this model given our objectives.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/03/2023

Linear chain conditional random fields, hidden Markov models, and related classifiers

Practitioners use Hidden Markov Models (HMMs) in different problems for ...
research
11/14/2021

On equivalence between linear-chain conditional random fields and hidden Markov chains

Practitioners successfully use hidden Markov chains (HMCs) in different ...
research
02/17/2021

Introducing the Hidden Neural Markov Chain framework

Nowadays, neural network models achieve state-of-the-art results in many...
research
05/26/2018

Connecting Distant Entities with Induction through Conditional Random Fields for Named Entity Recognition: Precursor-Induced CRF

This paper presents a method of designing specific high-order dependency...
research
01/11/2017

Decoding with Finite-State Transducers on GPUs

Weighted finite automata and transducers (including hidden Markov models...
research
05/21/2020

Hidden Markov Chains, Entropic Forward-Backward, and Part-Of-Speech Tagging

The ability to take into account the characteristics - also called featu...
research
10/06/2016

Sequence-based Sleep Stage Classification using Conditional Neural Fields

Sleep signals from a polysomnographic database are sequences in nature. ...

Please sign up or login with your details

Forgot password? Click here to reset