pyannote.audio: neural building blocks for speaker diarization

11/04/2019
by   Hervé Bredin, et al.
0

We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding – reaching state-of-the-art performance for most of them.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2017

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

We introduce a technique for augmenting neural text-to-speech (TTS) with...
research
10/28/2021

TorchAudio: Building Blocks for Audio and Speech Processing

This document describes version 0.10 of torchaudio: building blocks for ...
research
03/27/2022

End-to-End Active Speaker Detection

Recent advances in the Active Speaker Detection (ASD) problem build upon...
research
06/28/2023

cuSLINK: Single-linkage Agglomerative Clustering on the GPU

In this paper, we propose cuSLINK, a novel and state-of-the-art reformul...
research
12/10/2021

Shennong: a Python toolbox for audio speech features extraction

We introduce Shennong, a Python toolbox and command-line utility for spe...
research
09/14/2023

VoicePAT: An Efficient Open-source Evaluation Toolkit for Voice Privacy Research

Speaker anonymization is the task of modifying a speech recording such t...
research
11/25/2022

The TSN Building Blocks in Linux

Various application areas e.g. industrial automation, professional audio...

Please sign up or login with your details

Forgot password? Click here to reset