Morfessor EM+Prune: Improved Subword Segmentation with Expectation Maximization and Pruning

03/06/2020
by   Stig-Arne Grönroos, et al.
0

Data-driven segmentation of words into subword units has been used in various natural language processing applications such as automatic speech recognition and statistical machine translation for almost 20 years. Recently it has became more widely adopted, as models based on deep neural networks often benefit from subword units even for morphologically simpler languages. In this paper, we discuss and compare training algorithms for a unigram subword model, based on the Expectation Maximization algorithm and lexicon pruning. Using English, Finnish, North Sami, and Turkish data sets, we show that this approach is able to find better solutions to the optimization problem defined by the Morfessor Baseline model than its original recursive training algorithm. The improved optimization also leads to higher morphological segmentation accuracy when compared to a linguistic gold standard. We publish implementations of the new algorithms in the widely-used Morfessor software package.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/09/2020

Improving Gradient Flow with Unrolled Highway Expectation Maximization

Integrating model-based machine learning methods into deep neural archit...
research
01/12/2020

Urdu-English Machine Transliteration using Neural Networks

Machine translation has gained much attention in recent years. It is a s...
research
08/10/2020

Subword Regularization: An Analysis of Scalability and Generalization for End-to-End Automatic Speech Recognition

Subwords are the most widely used output units in end-to-end speech reco...
research
10/26/2022

Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead

Segmentation for continuous Automatic Speech Recognition (ASR) has tradi...
research
08/08/2022

Bayesian Pseudo Labels: Expectation Maximization for Robust and Efficient Semi-Supervised Segmentation

This paper concerns pseudo labelling in segmentation. Our contribution i...
research
11/24/2014

Noise Benefits in Expectation-Maximization Algorithms

This dissertation shows that careful injection of noise into sample data...
research
06/20/2020

Demand Estimation from Sales Transaction Data – Practical Extensions

In this paper we discuss some of the practical limitations of the standa...

Please sign up or login with your details

Forgot password? Click here to reset