DeepAI AI Chat
Log In Sign Up

Real-time Automatic Word Segmentation for User-generated Text

by   Won Ik Cho, et al.
Seoul National University

For readability and possibly for disambiguation, appropriate word segmentation is recommended for written text. In this paper, we propose a real-time assistive technology that utilizes an automatic segmentation. The language primarily investigated is Korean, a head-final language with the various morpho-syllabic blocks as a character set. The training scheme is fully neural network-based and extensible to other languages, as is implemented in this study for English. Besides, we show how the proposed system can be utilized in a web-based fine-tuning for a user-generated text. With a qualitative and quantitative comparison with widely used text processing toolkits, we show the reliability of the proposed system and how it fits with conversation-style and non-canonical texts. Demonstration for both languages is freely available online.


Automatic text extraction and character segmentation using maximally stable extremal regions

Text detection and segmentation is an important prerequisite for many co...

Boundary-based MWE segmentation with text partitioning

This work presents a fine-grained, text-chunking algorithm designed for ...

PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities

Augmentative and alternative communication (AAC) is a practical means of...

An empirical study of CTC based models for OCR of Indian languages

Recognition of text on word or line images, without the need for sub-wor...

A context sensitive real-time Spell Checker with language adaptability

We present a novel language adaptable spell checking system which detect...

Text to Image Generation: Leaving no Language Behind

One of the latest applications of Artificial Intelligence (AI) is to gen...

Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji

Next word prediction is an input technology that simplifies the process ...