Extracting linguistic speech patterns of Japanese fictional characters using subword units

03/05/2022
by   Mika Kishino, et al.
0

This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.

READ FULL TEXT

page 1

page 2

page 3

page 4

07/13/2018

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units

In this paper, we present an end-to-end automatic speech recognition sys...
06/28/2018

Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging

Due to the fact that Korean is a highly agglutinative, character-rich la...
02/01/2021

Inducing Meaningful Units from Character Sequences with Slot Attention

Characters do not convey meaning, but sequences of characters do. We pro...
12/20/2021

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

What are the units of text that we want to model? From bytes to multi-wo...
08/23/2019

A BLSTM Network for Printed Bengali OCR System with High Accuracy

This paper presents a printed Bengali and English text OCR system develo...
05/24/2022

Multi-Level Modeling Units for End-to-End Mandarin Speech Recognition

The choice of modeling units affects the performance of the acoustic mod...
11/15/2014

Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading

In this paper, we defined the viseme (visual speech element) and describ...