Good Neighbors Are All You Need for Chinese Grapheme-to-Phoneme Conversion

03/14/2023
by   Jungjun Kim, et al.
0

Most Chinese Grapheme-to-Phoneme (G2P) systems employ a three-stage framework that first transforms input sequences into character embeddings, obtains linguistic information using language models, and then predicts the phonemes based on global context about the entire input sequence. However, linguistic knowledge alone is often inadequate. Language models frequently encode overly general structures of a sentence and fail to cover specific cases needed to use phonetic knowledge. Also, a handcrafted post-processing system is needed to address the problems relevant to the tone of the characters. However, the system exhibits inconsistency in the segmentation of word boundaries which consequently degrades the performance of the G2P system. To address these issues, we propose the Reinforcer that provides strong inductive bias for language models by emphasizing the phonological information between neighboring characters to help disambiguate pronunciations. Experimental results show that the Reinforcer boosts the cutting-edge architectures by a large margin. We also combine the Reinforcer with a large-scale pre-trained model and demonstrate the validity of using neighboring context in knowledge transfer scenarios.

READ FULL TEXT
research
03/20/2023

Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models

Pretrained language models (PLMs) have shown marvelous improvements acro...
research
08/23/2022

CLOWER: A Pre-trained Language Model with Contrastive Learning over Word and Character Representations

Pre-trained Language Models (PLMs) have achieved remarkable performance ...
research
02/25/2021

LET: Linguistic Knowledge Enhanced Graph Transformer for Chinese Short Text Matching

Chinese short text matching is a fundamental task in natural language pr...
research
04/26/2020

SpellGCN: Incorporating Phonological and Visual Similarities into Language Models for Chinese Spelling Check

Chinese Spelling Check (CSC) is a task to detect and correct spelling er...
research
08/18/2023

ChatHaruhi: Reviving Anime Character in Reality via Large Language Model

Role-playing chatbots built on large language models have drawn interest...
research
05/07/2020

2kenize: Tying Subword Sequences for Chinese Script Conversion

Simplified Chinese to Traditional Chinese character conversion is a comm...
research
11/05/2022

Small Language Models for Tabular Data

Supervised deep learning is most commonly applied to difficult problems ...

Please sign up or login with your details

Forgot password? Click here to reset