Language-Agnostic Syllabification with Neural Sequence Labeling

09/29/2019
by   Jacob Krantz, et al.
0

The identification of syllables within phonetic sequences is known as syllabification. This task is thought to play an important role in natural language understanding, speech production, and the development of speech recognition systems. The concept of the syllable is cross-linguistic, though formal definitions are rarely agreed upon, even within a language. In response, data-driven syllabification methods have been developed to learn from syllabified examples. These methods often employ classical machine learning sequence labeling models. In recent years, recurrence-based neural networks have been shown to perform increasingly well for sequence labeling tasks such as named entity recognition (NER), part of speech (POS) tagging, and chunking. We present a novel approach to the syllabification problem which leverages modern neural network techniques. Our network is constructed with long short-term memory (LSTM) cells, a convolutional component, and a conditional random field (CRF) output layer. Existing syllabification approaches are rarely evaluated across multiple language families. To demonstrate cross-linguistic generalizability, we show that the network is competitive with state of the art systems in syllabifying English, Dutch, Italian, French, Manipuri, and Basque datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/24/2017

NNVLP: A Neural Network-Based Vietnamese Language Processing Toolkit

This paper demonstrates neural network-based toolkit namely NNVLP for es...
research
05/14/2020

An Evaluation of Recent Neural Sequence Tagging Models in Turkish Named Entity Recognition

Named entity recognition (NER) is an extensively studied task that extra...
research
03/12/2019

Syllable-based Neural Named Entity Recognition for Myanmar Language

Named Entity Recognition (NER) for Myanmar Language is essential to Myan...
research
09/28/2017

Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks

Sentence-level classification and sequential labeling are two fundamenta...
research
07/07/2022

Part-of-Speech Tagging of Odia Language Using statistical and Deep Learning-Based Approaches

Automatic Part-of-speech (POS) tagging is a preprocessing step of many n...
research
11/19/2020

Persuasive Dialogue Understanding: the Baselines and Negative Results

Persuasion aims at forming one's opinion and action via a series of pers...
research
09/11/2020

Investigating Bi-LSTM and CRF with POS Tag Embedding for Indonesian Named Entity Tagger

Researches on Indonesian named entity (NE) tagger have been conducted si...

Please sign up or login with your details

Forgot password? Click here to reset