DeepAI AI Chat
Log In Sign Up

Using Deep Learning Techniques and Inferential Speech Statistics for AI Synthesised Speech Recognition

by   Arun Kumar Singh, et al.

The recent developments in technology have re-warded us with amazing audio synthesis models like TACOTRON and WAVENETS. On the other side, it poses greater threats such as speech clones and deep fakes, that may go undetected. To tackle these alarming situations, there is an urgent need to propose models that can help discriminate a synthesized speech from an actual human speech and also identify the source of such a synthesis. Here, we propose a model based on Convolutional Neural Network (CNN) and Bidirectional Recurrent Neural Network (BiRNN) that helps to achieve both the aforementioned objectives. The temporal dependencies present in AI synthesized speech are exploited using Bidirectional RNN and CNN. The model outperforms the state-of-the-art approaches by classifying the AI synthesized audio from real human speech with an error rate of 1.9


page 1

page 6

page 8

page 10

page 12


Detection of AI Synthesized Hindi Speech

The recent advancements in generative artificial speech models have made...

Deep Speech Synthesis from Articulatory Representations

In the articulatory synthesis task, speech is synthesized from input fea...

Detection of AI-Synthesized Speech Using Cepstral Bispectral Statistics

Digital technology has made possible unimaginable applications come true...

Vid2speech: Speech Reconstruction from Silent Video

Speechreading is a notoriously difficult task for humans to perform. In ...

AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning

In movie productions, the Foley Artist is responsible for creating an ov...

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is amon...

High-dimensional sequence transduction

We investigate the problem of transforming an input sequence into a high...