TRILLsson: Distilled Universal Paralinguistic Speech Representations

03/01/2022
by   Joel Shor, et al.
0

Recent advances in self-supervision have dramatically improved the quality of speech representations. However, deployment of state-of-the-art embedding models on devices has been restricted due to their limited public availability and large resource footprint. Our work addresses these issues by publicly releasing a collection of paralinguistic speech models that are small and near state-of-the-art performance. Our approach is based on knowledge distillation, and our models are distilled on public data only. We explore different architectures and thoroughly evaluate our models on the Non-Semantic Speech (NOSS) benchmark. Our largest distilled model is less than 15 original model (314MB vs 2.2GB), achieves over 96 tasks, and is trained on 6.5 and achieves over 90 open source Wav2Vec 2.0 model on 6 of 7 tasks, and our smallest model outperforms the open source Wav2Vec 2.0 on both emotion recognition tasks despite being 7

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2020

FUN! Fast, Universal, Non-Semantic Speech Embeddings

Learned speech representations can drastically improve performance on ta...
research
07/12/2022

Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource Devices

This work introduces BRILLsson, a novel binary neural network-based repr...
research
09/09/2023

Speech Emotion Recognition with Distilled Prosodic and Linguistic Affect Representations

We propose EmoDistill, a novel speech emotion recognition (SER) framewor...
research
03/01/2022

Towards a Common Speech Analysis Engine

Recent innovations in self-supervised representation learning have led t...
research
10/26/2022

Fast Yet Effective Speech Emotion Recognition with Self-distillation

Speech emotion recognition (SER) is the task of recognising human's emot...
research
03/22/2023

Open-source Frame Semantic Parsing

While the state-of-the-art for frame semantic parsing has progressed dra...
research
03/29/2021

Shrinking Bigfoot: Reducing wav2vec 2.0 footprint

Wav2vec 2.0 is a state-of-the-art speech recognition model which maps sp...

Please sign up or login with your details

Forgot password? Click here to reset