A Subword Guided Neural Word Segmentation Model for Sindhi

12/30/2020
by   Wazir Ali, et al.
1

Deep neural networks employ multiple processing layers for learning text representations to alleviate the burden of manual feature engineering in Natural Language Processing (NLP). Such text representations are widely used to extract features from unlabeled data. The word segmentation is a fundamental and inevitable prerequisite for many languages. Sindhi is an under-resourced language, whose segmentation is challenging as it exhibits space omission, space insertion issues, and lacks the labeled corpus for segmentation. In this paper, we investigate supervised Sindhi Word Segmentation (SWS) using unlabeled data with a Subword Guided Neural Word Segmenter (SGNWS) for Sindhi. In order to learn text representations, we incorporate subword representations to recurrent neural architecture to capture word information at morphemic-level, which takes advantage of Bidirectional Long-Short Term Memory (BiLSTM), self-attention mechanism, and Conditional Random Field (CRF). Our proposed SGNWS model achieves an F1 value of 98.51 engineering. The empirical results demonstrate the benefits of the proposed model over the existing Sindhi word segmenters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2018

Urdu Word Segmentation using Conditional Random Fields (CRFs)

State-of-the-art Natural Language Processing algorithms rely heavily on ...
research
05/20/2021

Bidirectional LSTM-CRF Attention-based Model for Chinese Word Segmentation

Chinese word segmentation (CWS) is the basic of Chinese natural language...
research
07/02/2017

DAG-based Long Short-Term Memory for Neural Word Segmentation

Neural word segmentation has attracted more and more research interests ...
research
06/09/2021

Case Studies on using Natural Language Processing Techniques in Customer Relationship Management Software

How can a text corpus stored in a customer relationship management (CRM)...
research
02/17/2017

Experiment Segmentation in Scientific Discourse as Clause-level Structured Prediction using Recurrent Neural Networks

We propose a deep learning model for identifying structure within experi...
research
11/12/2019

How to Evaluate Word Representations of Informal Domain?

Diverse word representations have surged in most state-of-the-art natura...
research
11/16/2016

A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

Recently, neural network models for natural language processing tasks ha...

Please sign up or login with your details

Forgot password? Click here to reset