Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations

08/04/2019
by   Chanatip Saetia, et al.
0

A sentence is typically treated as the minimal syntactic unit used for extracting valuable information from a longer piece of text. However, in written Thai, there are no explicit sentence markers. We proposed a deep learning model for the task of sentence segmentation that includes three main contributions. First, we integrate n-gram embedding as a local representation to capture word groups near sentence boundaries. Second, to focus on the keywords of dependent clauses, we combine the model with a distant representation obtained from self-attention modules. Finally, due to the scarcity of labeled data, for which annotation is difficult and time-consuming, we also investigate and adapt Cross-View Training (CVT) as a semi-supervised learning technique, allowing us to utilize unlabeled data to improve the model representations. In the Thai sentence segmentation experiments, our model reduced the relative error by 7.4 on the Orchid and UGWC datasets, respectively. We also applied our model to the task of pronunciation recovery on the IWSLT English dataset. Our model outperformed the prior sequence tagging models, achieving a relative error reduction of 2.5 presentations was the main contributing factor for Thai, while the semi-supervised training helped the most for English.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/22/2018

Semi-Supervised Sequence Modeling with Cross-View Training

Unsupervised representation learning algorithms such as word2vec and ELM...
research
10/25/2021

Generalized Multi-Task Learning from Substantially Unlabeled Multi-Source Medical Image Data

Deep learning-based models, when trained in a fully-supervised manner, c...
research
06/17/2020

Deep Categorization with Semi-Supervised Self-Organizing Maps

Nowadays, with the advance of technology, there is an increasing amount ...
research
04/08/2021

Uncertainty-Aware Temporal Self-Learning (UATS): Semi-Supervised Learning for Segmentation of Prostate Zones and Beyond

Various convolutional neural network (CNN) based concepts have been intr...
research
10/28/2020

MultiMix: Sparingly Supervised, Extreme Multitask Learning From Medical Images

Semi-supervised learning via learning from limited quantities of labeled...
research
11/26/2021

Semi-Supervised Music Tagging Transformer

We present Music Tagging Transformer that is trained with a semi-supervi...
research
06/17/2019

Particle Swarm Optimization for Great Enhancement in Semi-Supervised Retinal Vessel Segmentation with Generative Adversarial Networks

Retinal vessel segmentation based on deep learning requires a lot of man...

Please sign up or login with your details

Forgot password? Click here to reset