Real-time Automatic Word Segmentation for User-generated Text

10/31/2018
by   Won Ik Cho, et al.
0

For readability and possibly for disambiguation, appropriate word segmentation is recommended for written text. In this paper, we propose a real-time assistive technology that utilizes an automatic segmentation. The language primarily investigated is Korean, a head-final language with the various morpho-syllabic blocks as a character set. The training scheme is fully neural network-based and extensible to other languages, as is implemented in this study for English. Besides, we show how the proposed system can be utilized in a web-based fine-tuning for a user-generated text. With a qualitative and quantitative comparison with widely used text processing toolkits, we show the reliability of the proposed system and how it fits with conversation-style and non-canonical texts. Demonstration for both languages is freely available online.

READ FULL TEXT
research
08/11/2016

Automatic text extraction and character segmentation using maximally stable extremal regions

Text detection and segmentation is an important prerequisite for many co...
research
08/05/2016

Boundary-based MWE segmentation with text partitioning

This work presents a fine-grained, text-chunking algorithm designed for ...
research
09/27/2021

PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities

Augmentative and alternative communication (AAC) is a practical means of...
research
05/13/2022

An empirical study of CTC based models for OCR of Indian languages

Recognition of text on word or line images, without the need for sub-wor...
research
10/23/2019

A context sensitive real-time Spell Checker with language adaptability

We present a novel language adaptable spell checking system which detect...
research
08/19/2022

Text to Image Generation: Leaving no Language Behind

One of the latest applications of Artificial Intelligence (AI) is to gen...
research
06/27/2023

Automatic Annotation of Direct Speech in Written French Narratives

The automatic annotation of direct speech (AADS) in written text has bee...

Please sign up or login with your details

Forgot password? Click here to reset