VSEC: Transformer-based Model for Vietnamese Spelling Correction

11/01/2021
by   Dinh-Truong Do, et al.
0

Spelling error correction is one of topics which have a long history in natural language processing. Although previous studies have achieved remarkable results, challenges still exist. In the Vietnamese language, a state-of-the-art method for the task infers a syllable's context from its adjacent syllables. The method's accuracy can be unsatisfactory, however, because the model may lose the context if two (or more) spelling mistakes stand near each other. In this paper, we propose a novel method to correct Vietnamese spelling errors. We tackle the problems of mistyped errors and misspelled errors by using a deep learning model. The embedding layer, in particular, is powered by the byte pair encoding technique. The sequence to sequence model based on the Transformer architecture makes our approach different from the previous works on the same problem. In the experiment, we train the model with a large synthetic dataset, which is randomly introduced spelling errors. We test the performance of the proposed method using a realistic dataset. This dataset contains 11,202 human-made misspellings in 9,341 different Vietnamese sentences. The experimental results show that our method achieves encouraging performance with 86.8 state-of-the-art approach 5.6

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2023

Is ChatGPT a Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation

ChatGPT, a large-scale language model based on the advanced GPT-3.5 arch...
research
07/04/2023

A Language Model for Grammatical Error Correction in L2 Russian

Grammatical error correction is one of the fundamental tasks in Natural ...
research
06/28/2021

Complexity-based partitioning of CSFI problem instances with Transformers

In this paper, we propose a two-steps approach to partition instances of...
research
06/16/2023

Improving Audio Caption Fluency with Automatic Error Correction

Automated audio captioning (AAC) is an important cross-modality translat...
research
03/17/2022

Type-Driven Multi-Turn Corrections for Grammatical Error Correction

Grammatical Error Correction (GEC) aims to automatically detect and corr...
research
06/09/2023

Reconstructing Human Expressiveness in Piano Performances with a Transformer Network

Capturing intricate and subtle variations in human expressiveness in mus...
research
02/09/2023

Correcting Real-Word Spelling Errors: A New Hybrid Approach

Spelling correction is one of the main tasks in the field of Natural Lan...

Please sign up or login with your details

Forgot password? Click here to reset