DeepAI AI Chat
Log In Sign Up

Real-time Automatic Word Segmentation for User-generated Text

10/31/2018
by   Won Ik Cho, et al.
Seoul National University
0

For readability and possibly for disambiguation, appropriate word segmentation is recommended for written text. In this paper, we propose a real-time assistive technology that utilizes an automatic segmentation. The language primarily investigated is Korean, a head-final language with the various morpho-syllabic blocks as a character set. The training scheme is fully neural network-based and extensible to other languages, as is implemented in this study for English. Besides, we show how the proposed system can be utilized in a web-based fine-tuning for a user-generated text. With a qualitative and quantitative comparison with widely used text processing toolkits, we show the reliability of the proposed system and how it fits with conversation-style and non-canonical texts. Demonstration for both languages is freely available online.

READ FULL TEXT
08/11/2016

Automatic text extraction and character segmentation using maximally stable extremal regions

Text detection and segmentation is an important prerequisite for many co...
08/05/2016

Boundary-based MWE segmentation with text partitioning

This work presents a fine-grained, text-chunking algorithm designed for ...
09/27/2021

PicTalky: Augmentative and Alternative Communication Software for Language Developmental Disabilities

Augmentative and alternative communication (AAC) is a practical means of...
05/13/2022

An empirical study of CTC based models for OCR of Indian languages

Recognition of text on word or line images, without the need for sub-wor...
10/23/2019

A context sensitive real-time Spell Checker with language adaptability

We present a novel language adaptable spell checking system which detect...
08/19/2022

Text to Image Generation: Leaving no Language Behind

One of the latest applications of Artificial Intelligence (AI) is to gen...
07/27/2020

Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji

Next word prediction is an input technology that simplifies the process ...