End-to-End Mandarin Tone Classification with Short Term Context Information

by   Jiyang Tang, et al.

In this paper, we propose an end-to-end Mandarin tone classification method from continuous speech utterances utilizing both the spectrogram and the short term context information as the inputs. Both Mel-spectrograms and context segment features are used to train the tone classifier. We first divide the spectrogram frames into syllable segments using force alignment results produced by an ASR model. Then we extract the short term segment features to capture the context information across multiple syllables. Feeding both the Mel-spectrogram and the short term context segment features into an end-to-end model could significantly improve the performance. Experiments are performed on a large scale open source Mandarin speech dataset to evaluate the proposed method. Results show that the this method improves the classification accuracy from 79.5% to 88.7% on the AISHELL3 database.


Language Identification with Deep Bottleneck Features

In this paper we proposed an end-to-end short utterances speech language...

Speech Sentiment Analysis via Pre-trained Features from End-to-end ASR Models

In this paper, we propose to use pre-trained features from end-to-end AS...

Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network

We propose a non-invasive and cost-effective method to automatically det...

On Structured Sparsity of Phonological Posteriors for Linguistic Parsing

The speech signal conveys information on different time scales from shor...

Fast Video Classification via Adaptive Cascading of Deep Models

Recent advances have enabled "oracle" classifiers that can classify acro...

Short Video-based Advertisements Evaluation System: Self-Organizing Learning Approach

With the rising of short video apps, such as TikTok, Snapchat and Kwai, ...

ReMOTS: Refining Multi-Object Tracking and Segmentation

We aim to improve the performance of Multiple Object Tracking and Segmen...