Improved Conformer-based End-to-End Speech Recognition Using Neural Architecture Search

by   Yukun Liu, et al.

Recently neural architecture search(NAS) has been successfully used in image classification, natural language processing, and automatic speech recognition(ASR) tasks for finding the state-of-the-art(SOTA) architectures than those human-designed architectures. NAS can derive a SOTA and data-specific architecture over validation data from a pre-defined search space with a search algorithm. Inspired by the success of NAS in ASR tasks, we propose a NAS-based ASR framework containing one search space and one differentiable search algorithm called Differentiable Architecture Search(DARTS). Our search space follows the convolution-augmented transformer(Conformer) backbone, which is a more expressive ASR architecture than those used in existing NAS-based ASR frameworks. To improve the performance of our method, a regulation method called Dynamic Search Schedule(DSS) is employed. On a widely used Mandarin benchmark AISHELL-1, our best-searched architecture outperforms the baseline Conform model significantly with about 11 efficient by the search cost comparisons.



There are no comments yet.


page 1

page 2

page 3

page 4


Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients

Neural Architecture Search (NAS), the process of automating architecture...

Darts-Conformer: Towards Efficient Gradient-Based Neural Architecture Search For End-to-End ASR

Neural architecture search (NAS) has been successfully applied to tasks ...

Learned Transferable Architectures Can Surpass Hand-Designed Architectures for Large Scale Speech Recognition

In this paper, we explore the neural architecture search (NAS) for autom...

Transfer NAS: Knowledge Transfer between Search Spaces with Transformer Agents

Recent advances in Neural Architecture Search (NAS) have produced state-...

GLiT: Neural Architecture Search for Global and Local Image Transformer

We introduce the first Neural Architecture Search (NAS) method to find a...

Efficient Backbone Search for Scene Text Recognition

Scene text recognition (STR) is very challenging due to the diversity of...

Neural Recurrent Structure Search for Knowledge Graph Embedding

Knowledge graph (KG) embedding is a fundamental problem in mining relati...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.