What all do audio transformer models hear? Probing Acoustic Representations for Language Delivery and its Structure

01/02/2021
by   Jui Shah, et al.
10

In recent times, BERT based transformer models have become an inseparable part of the 'tech stack' of text processing models. Similar progress is being observed in the speech domain with a multitude of models observing state-of-the-art results by using audio transformer models to encode speech. This begs the question of what are these audio transformer models learning. Moreover, although the standard methodology is to choose the last layer embedding for any downstream task, but is it the optimal choice? We try to answer these questions for the two recent audio transformer models, Mockingjay and wave2vec2.0. We compare them on a comprehensive set of language delivery and structure features including audio, fluency and pronunciation features. Additionally, we probe the audio models' understanding of textual surface, syntax, and semantic features and compare them to BERT. We do this over exhaustive settings for native, non-native, synthetic, read and spontaneous speech datasets

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2018

The Relevance of Text and Speech Features in Automatic Non-native English Accent Identification

This paper describes our experiments with automatically identifying nati...
research
05/25/2020

An Audio-enriched BERT-based Framework for Spoken Multiple-choice Question Answering

In a spoken multiple-choice question answering (SMCQA) task, given a pas...
research
11/25/2020

Neural Representations for Modeling Variation in English Speech

Variation in speech is often represented and investigated using phonetic...
research
11/18/2022

Scaling Native Language Identification with Transformer Adapters

Native language identification (NLI) is the task of automatically identi...
research
09/26/2022

The Ability of Self-Supervised Speech Models for Audio Representations

Self-supervised learning (SSL) speech models have achieved unprecedented...
research
11/02/2022

Audio Language Modeling using Perceptually-Guided Discrete Representations

In this work, we study the task of Audio Language Modeling, in which we ...
research
05/24/2020

Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge

In this paper, we report our submitted system for the ZeroSpeech 2020 ch...

Please sign up or login with your details

Forgot password? Click here to reset