Extracting linguistic speech patterns of Japanese fictional characters using subword units

03/05/2022
by   Mika Kishino, et al.
0

This study extracted and analyzed the linguistic speech patterns that characterize Japanese anime or game characters. Conventional morphological analyzers, such as MeCab, segment words with high performance, but they are unable to segment broken expressions or utterance endings that are not listed in the dictionary, which often appears in lines of anime or game characters. To overcome this challenge, we propose segmenting lines of Japanese anime or game characters using subword units that were proposed mainly for deep learning, and extracting frequently occurring strings to obtain expressions that characterize their utterances. We analyzed the subword units weighted by TF/IDF according to gender, age, and each anime character and show that they are linguistic speech patterns that are specific for each feature. Additionally, a classification experiment shows that the model with subword units outperformed that with the conventional method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/13/2018

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units

In this paper, we present an end-to-end automatic speech recognition sys...
research
06/28/2018

Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging

Due to the fact that Korean is a highly agglutinative, character-rich la...
research
02/01/2021

Inducing Meaningful Units from Character Sequences with Slot Attention

Characters do not convey meaning, but sequences of characters do. We pro...
research
12/20/2021

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

What are the units of text that we want to model? From bytes to multi-wo...
research
08/23/2019

A BLSTM Network for Printed Bengali OCR System with High Accuracy

This paper presents a printed Bengali and English text OCR system develo...
research
02/18/2019

Modeling fonts in context of counteraction of electromagnetic eavesdropping process

Computer fonts can be one of solutions supporting a protection of inform...
research
11/15/2014

Definition of Visual Speech Element and Research on a Method of Extracting Feature Vector for Korean Lip-Reading

In this paper, we defined the viseme (visual speech element) and describ...

Please sign up or login with your details

Forgot password? Click here to reset