Conformer-based End-to-end Speech Recognition With Rotary Position Embedding

07/13/2021
by   Shengqiang Li, et al.
0

Transformer-based end-to-end speech recognition models have received considerable attention in recent years due to their high training speed and ability to model a long-range global context. Position embedding in the transformer architecture is indispensable because it provides supervision for dependency modeling between elements at different positions in the input sequence. To make use of the time order of the input sequence, many works inject some information about the relative or absolute position of the element into the input sequence. In this work, we investigate various position embedding methods in the convolution-augmented transformer (conformer) and adopt a novel implementation named rotary position embedding (RoPE). RoPE encodes absolute positional information into the input sequence by a rotation matrix, and then naturally incorporates explicit relative position information into a self-attention module. To evaluate the effectiveness of the RoPE method, we conducted experiments on AISHELL-1 and LibriSpeech corpora. Results show that the conformer enhanced with RoPE achieves superior performance in the speech recognition task. Specifically, our model achieves a relative word error rate reduction of 8.70 test-other sets of the LibriSpeech corpus respectively.

READ FULL TEXT
research
04/20/2021

RoFormer: Enhanced Transformer with Rotary Position Embedding

Position encoding in transformer architecture provides supervision for d...
research
05/20/2020

Relative Positional Encoding for Speech Recognition and Direct Translation

Transformer models are powerful sequence-to-sequence architectures that ...
research
03/29/2021

Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention

Self-attention (SA), which encodes vector sequences according to their p...
research
06/06/2021

CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings

Without positional information, attention-based transformer neural netwo...
research
04/26/2019

Transformers with convolutional context for ASR

The recent success of transformer networks for neural machine translatio...
research
08/16/2022

Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition

Optimization of modern ASR architectures is among the highest priority t...
research
10/29/2022

ImplantFormer: Vision Transformer based Implant Position Regression Using Dental CBCT Data

Implant prosthesis is the most optimum treatment of dentition defect or ...

Please sign up or login with your details

Forgot password? Click here to reset