Efficient Adapters for Giant Speech Models

06/13/2023
by   Nanxin Chen, et al.
0

Large pre-trained speech models are widely used as the de-facto paradigm, especially in scenarios when there is a limited amount of labeled data available. However, finetuning all parameters from the self-supervised learned model can be computationally expensive, and becomes infeasiable as the size of the model and the number of downstream tasks scales. In this paper, we propose a novel approach called Two Parallel Adapter (TPA) that is inserted into the conformer-based model pre-trained model instead. TPA is based on systematic studies of the residual adapter, a popular approach for finetuning a subset of parameters. We evaluate TPA on various public benchmarks and experiment results demonstrates its superior performance, which is close to the full finetuning on different datasets and speech tasks. These results show that TPA is an effective and efficient approach for serving large pre-trained speech models. Ablation studies show that TPA can also be pruned, especially for lower blocks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/08/2021

Improved Language Identification Through Cross-Lingual Self-Supervised Learning

Language identification greatly impacts the success of downstream tasks ...
research
10/12/2021

Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

The speech representations learned from large-scale unlabeled data have ...
research
03/31/2022

An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks

Speech representations learned from Self-supervised learning (SSL) model...
research
05/17/2022

Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model

Many application studies rely on audio DNN models pre-trained on a large...
research
04/07/2021

Utilizing Self-supervised Representations for MOS Prediction

Speech quality assessment has been a critical issue in speech processing...
research
04/08/2023

FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement

With the increasing data volume, there is a trend of using large-scale p...
research
03/01/2023

SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks

Prompt tuning is a technology that tunes a small set of parameters to st...

Please sign up or login with your details

Forgot password? Click here to reset