Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content

06/13/2023
by   Tiantian Feng, et al.
0

Automatic Speech Understanding (ASU) leverages the power of deep learning models for accurate interpretation of human speech, leading to a wide range of speech applications that enrich the human experience. However, training a robust ASU model requires the curation of a large number of speech samples, creating risks for privacy breaches. In this work, we investigate using foundation models to assist privacy-enhancing speech computing. Unlike conventional works focusing primarily on data perturbation or distributed algorithms, our work studies the possibilities of using pre-trained generative models to synthesize speech content as training data with just label guidance. We show that zero-shot learning with training label-guided synthetic speech content remains a challenging task. On the other hand, our results demonstrate that the model trained with synthetic speech samples provides an effective initialization point for low-resource ASU training. This result reveals the potential to enhance privacy by reducing user data collection but using label-guided synthetic speech content.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2023

Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting

Significant advances are being made in speech emotion recognition (SER) ...
research
05/25/2022

FLEURS: Few-shot Learning Evaluation of Universal Representations of Speech

We introduce FLEURS, the Few-shot Learning Evaluation of Universal Repre...
research
05/31/2023

Strategies for improving low resource speech to text translation relying on pre-trained ASR models

This paper presents techniques and findings for improving the performanc...
research
08/31/2021

LightNER: A Lightweight Generative Framework with Prompt-guided Attention for Low-resource NER

Most existing NER methods rely on extensive labeled data for model train...
research
05/28/2023

Investigating Pre-trained Audio Encoders in the Low-Resource Condition

Pre-trained speech encoders have been central to pushing state-of-the-ar...
research
12/01/2022

Leveraging Large-scale Multimedia Datasets to Refine Content Moderation Models

The sheer volume of online user-generated content has rendered content m...
research
07/20/2023

Ethosight: A Reasoning-Guided Iterative Learning System for Nuanced Perception based on Joint-Embedding Contextual Label Affinity

Traditional computer vision models often require extensive manual effort...

Please sign up or login with your details

Forgot password? Click here to reset