Speaker Clustering With Neural Networks And Audio Processing

03/22/2018
by   Maxime Jumelle, et al.
0

Speaker clustering is the task of differentiating speakers in a recording. In a way, the aim is to answer "who spoke when" in audio recordings. A common method used in industry is feature extraction directly from the recording thanks to MFCC features, and by using well-known techniques such as Gaussian Mixture Models (GMM) and Hidden Markov Models (HMM). In this paper, we studied neural networks (especially CNN) followed by clustering and audio processing in the quest to reach similar accuracy to state-of-the-art methods.

READ FULL TEXT
research
09/10/2020

Speaker Diarization Using Stereo Audio Channels: Preliminary Study on Utterance Clustering

Speaker diarization is one of the actively researched topics in audio si...
research
03/17/2020

High-Resolution Speaker Counting In Reverberant Rooms Using CRNN With Ambisonics Features

Speaker counting is the task of estimating the number of people that are...
research
12/30/2021

Feature extraction with mel scale separation method on noise audio recordings

This paper focuses on improving the accuracy of noise audio recordings. ...
research
11/15/2021

Machine Learning for Genomic Data

This report explores the application of machine learning techniques on s...
research
08/30/2019

Enhancements for Audio-only Diarization Systems

In this paper two different approaches to enhance the performance of the...
research
02/23/2022

Speech watermarking: an approach for the forensic analysis of digital telephonic recordings

In this article, the authors discuss the problem of forensic authenticat...
research
07/01/2022

Speaker Diarization and Identification from Single-Channel Classroom Audio Recording Using Virtual Microphones

Speaker identification in noisy audio recordings, specifically those fro...

Please sign up or login with your details

Forgot password? Click here to reset