Towards Robust Family-Infant Audio Analysis Based on Unsupervised Pretraining of Wav2vec 2.0 on Large-Scale Unlabeled Family Audio

05/21/2023
by   Jialu Li, et al.
0

To perform automatic family audio analysis, past studies have collected recordings using phone, video, or audio-only recording devices like LENA, investigated supervised learning methods, and used or fine-tuned general-purpose embeddings learned from large pretrained models. In this study, we advance the audio component of a new infant wearable multi-modal device called LittleBeats (LB) by learning family audio representation via wav2vec 2.0 (W2V2) pertaining. We show given a limited number of labeled LB home recordings, W2V2 pretrained using 1k-hour of unlabeled home recordings outperforms oracle W2V2 pretrained on 52k-hour unlabeled audio in terms of parent/infant speaker diarization (SD) and vocalization classifications (VC) at home. Extra relevant external unlabeled and labeled data further benefit W2V2 pretraining and fine-tuning. With SpecAug and environmental speech corruptions, we obtain 12 weights are available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2022

Visualizations of Complex Sequences of Family-Infant Vocalizations Using Bag-of-Audio-Words Approach Based on Wav2vec 2.0 Features

In the U.S., approximately 15-17 to have at least one diagnosed mental, ...
research
02/14/2023

Detecting human and non-human vocal productions in large scale audio recordings

We propose an automatic data processing pipeline to extract vocal produc...
research
10/06/2022

Matching Text and Audio Embeddings: Exploring Transfer-learning Strategies for Language-based Audio Retrieval

We present an analysis of large-scale pretrained deep learning models us...
research
02/11/2023

Cross-Modal Fine-Tuning: Align then Refine

Fine-tuning large-scale pretrained models has led to tremendous progress...
research
07/31/2023

Pretrained deep models outperform GBDTs in Learning-To-Rank under label scarcity

While deep learning (DL) models are state-of-the-art in text and image d...
research
03/01/2021

Comparing acoustic analyses of speech data collected remotely

Face-to-face speech data collection has been next to impossible globally...
research
02/23/2022

Speech watermarking: an approach for the forensic analysis of digital telephonic recordings

In this article, the authors discuss the problem of forensic authenticat...

Please sign up or login with your details

Forgot password? Click here to reset