A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect

05/07/2021
by   Binbin Xu, et al.
0

This study presents a large scale benchmarking on cloud based Speech-To-Text systems: Google Cloud Speech-To-Text, Microsoft Azure Cognitive Services, Amazon Transcribe, IBM Watson Speech to Text. For each systems, 40158 clean and noisy speech files about 101 hours are tested. Effect of background noise on STT quality is also evaluated with 5 different Signal-to-noise ratios from 40dB to 0dB. Results showed that Microsoft Azure provided lowest transcription error rate 9.09% on clean speech, with high robustness to noisy environment. Google Cloud and Amazon Transcribe gave similar performance, but the latter is very limited for time-constraint usage. Though IBM Watson could work correctly in quiet conditions, it is highly sensible to noisy speech which could strongly limit its application in real life situations.

READ FULL TEXT
research
10/23/2020

Training Noisy Single-Channel Speech Separation With Noisy Oracle Sources: A Large Gap and A Small Step

As the performance of single-channel speech separation systems has impro...
research
02/17/2021

Variational Autoencoder for Speech Enhancement with a Noise-Aware Encoder

Recently, a generative variational autoencoder (VAE) has been proposed f...
research
02/06/2019

End-to-end Anchored Speech Recognition

Voice-controlled house-hold devices, like Amazon Echo or Google Home, fa...
research
11/05/2021

Quadrupedal Robotic Guide Dog with Vocal Human-Robot Interaction

Guide dogs play a critical role in the lives of many, however training t...
research
02/23/2021

Handling Background Noise in Neural Speech Generation

Recent advances in neural-network based generative modeling of speech ha...
research
09/14/2023

Mandarin Lombard Flavor Classification

The Lombard effect refers to individuals' unconscious modulation of voca...
research
03/04/2021

A Neural Text-to-Speech Model Utilizing Broadcast Data Mixed with Background Music

Recently, it has become easier to obtain speech data from various media ...

Please sign up or login with your details

Forgot password? Click here to reset