A Subword Guided Neural Word Segmentation Model for Sindhi

by   Wazir Ali, et al.

Deep neural networks employ multiple processing layers for learning text representations to alleviate the burden of manual feature engineering in Natural Language Processing (NLP). Such text representations are widely used to extract features from unlabeled data. The word segmentation is a fundamental and inevitable prerequisite for many languages. Sindhi is an under-resourced language, whose segmentation is challenging as it exhibits space omission, space insertion issues, and lacks the labeled corpus for segmentation. In this paper, we investigate supervised Sindhi Word Segmentation (SWS) using unlabeled data with a Subword Guided Neural Word Segmenter (SGNWS) for Sindhi. In order to learn text representations, we incorporate subword representations to recurrent neural architecture to capture word information at morphemic-level, which takes advantage of Bidirectional Long-Short Term Memory (BiLSTM), self-attention mechanism, and Conditional Random Field (CRF). Our proposed SGNWS model achieves an F1 value of 98.51 engineering. The empirical results demonstrate the benefits of the proposed model over the existing Sindhi word segmenters.



There are no comments yet.


page 1

page 2

page 3

page 4


Urdu Word Segmentation using Conditional Random Fields (CRFs)

State-of-the-art Natural Language Processing algorithms rely heavily on ...

Bidirectional LSTM-CRF Attention-based Model for Chinese Word Segmentation

Chinese word segmentation (CWS) is the basic of Chinese natural language...

DAG-based Long Short-Term Memory for Neural Word Segmentation

Neural word segmentation has attracted more and more research interests ...

Case Studies on using Natural Language Processing Techniques in Customer Relationship Management Software

How can a text corpus stored in a customer relationship management (CRM)...

Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks

We propose a deep learning model for identifying structure within experi...

How to Evaluate Word Representations of Informal Domain?

Diverse word representations have surged in most state-of-the-art natura...

Is It Worth the Attention? A Comparative Evaluation of Attention Layers for Argument Unit Segmentation

Attention mechanisms have seen some success for natural language process...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.