End-to-End Mandarin Tone Classification with Short Term Context Information

04/12/2021
by   Jiyang Tang, et al.
0

In this paper, we propose an end-to-end Mandarin tone classification method from continuous speech utterances utilizing both the spectrogram and the short term context information as the inputs. Both Mel-spectrograms and context segment features are used to train the tone classifier. We first divide the spectrogram frames into syllable segments using force alignment results produced by an ASR model. Then we extract the short term segment features to capture the context information across multiple syllables. Feeding both the Mel-spectrogram and the short term context segment features into an end-to-end model could significantly improve the performance. Experiments are performed on a large scale open source Mandarin speech dataset to evaluate the proposed method. Results show that the this method improves the classification accuracy from 79.5% to 88.7% on the AISHELL3 database.

READ FULL TEXT
research
09/18/2018

Language Identification with Deep Bottleneck Features

In this paper we proposed an end-to-end short utterances speech language...
research
04/08/2023

StillFast: An End-to-End Approach for Short-Term Object Interaction Anticipation

Anticipation problem has been studied considering different aspects such...
research
04/16/2020

Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network

We propose a non-invasive and cost-effective method to automatically det...
research
01/21/2016

On Structured Sparsity of Phonological Posteriors for Linguistic Parsing

The speech signal conveys information on different time scales from shor...
research
11/20/2016

Fast Video Classification via Adaptive Cascading of Deep Models

Recent advances have enabled "oracle" classifiers that can classify acro...
research
08/15/2023

Preliminary investigation of the short-term in situ performance of an automatic masker selection system

Soundscape augmentation or "masking" introduces wanted sounds into the a...
research
10/05/2022

Toward Knowledge-Driven Speech-Based Models of Depression: Leveraging Spectrotemporal Variations in Speech Vowels

Psychomotor retardation associated with depression has been linked with ...

Please sign up or login with your details

Forgot password? Click here to reset