Neural text normalization leveraging similarities of strings and sounds

11/04/2020
by   Riku Kawamura, et al.
0

We propose neural models that can normalize text by considering the similarities of word strings and sounds. We experimentally compared a model that considers the similarities of both word strings and sounds, a model that considers only the similarity of word strings or of sounds, and a model without the similarities as a baseline. Results showed that leveraging the word string similarity succeeded in dealing with misspellings and abbreviations, and taking into account the sound similarity succeeded in dealing with phonetic substitutions and emphasized characters. So that the proposed models achieved higher F_1 scores than the baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/16/2018

Combining a Context Aware Neural Network with a Denoising Autoencoder for Measuring String Similarities

Measuring similarities between strings is central for many established a...
research
03/25/2019

Algorithms to compute the Burrows-Wheeler Similarity Distribution

The Burrows-Wheeler transform (BWT) is a well studied text transformatio...
research
04/07/2021

Accurate and Efficient Suffix Tree Based Privacy-Preserving String Matching

The task of calculating similarities between strings held by different o...
research
03/23/2023

Equational Theorem Proving for Clauses over Strings

Although reasoning about equations over strings has been extensively stu...
research
05/28/2015

Query by String word spotting based on character bi-gram indexing

In this paper we propose a segmentation-free query by string word spotti...
research
07/23/2019

Optimal Transport-based Alignment of Learned Character Representations for String Similarity

String similarity models are vital for record linkage, entity resolution...

Please sign up or login with your details

Forgot password? Click here to reset