Transformer-based encoder-encoder architecture for Spoken Term Detection

11/02/2022
by   Jan Švec, et al.
0

The paper presents a method for spoken term detection based on the Transformer architecture. We propose the encoder-encoder architecture employing two BERT-like encoders with additional modifications, including convolutional and upsampling layers, attention masking, and shared parameters. The encoders project a recognized hypothesis and a searched term into a shared embedding space, where the score of the putative hit is computed using the calibrated dot product. In the experiments, we used the Wav2Vec 2.0 speech recognizer, and the proposed system outperformed a baseline method based on deep LSTMs on the English and Czech STD datasets based on USC Shoah Foundation Visual History Archive (MALACH).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2022

Spoken Term Detection and Relevance Score Estimation using Dot-Product of Pronunciation Embeddings

The paper describes a novel approach to Spoken Term Detection (STD) in l...
research
10/21/2022

Deep LSTM Spoken Term Detection using Wav2Vec 2.0 Recognizer

In recent years, the standard hybrid DNN-HMM speech recognizers are outp...
research
03/07/2021

CNN-based Spoken Term Detection and Localization without Dynamic Programming

In this paper, we propose a spoken term detection algorithm for simultan...
research
08/18/2023

Is context all you need? Scaling Neural Sign Language Translation to Large Domains of Discourse

Sign Language Translation (SLT) is a challenging task that aims to gener...
research
10/16/2022

RedApt: An Adaptor for wav2vec 2 Encoding Faster and Smaller Speech Translation without Quality Compromise

Pre-trained speech Transformers in speech translation (ST) have facilita...
research
02/07/2022

HeadPosr: End-to-end Trainable Head Pose Estimation using Transformer Encoders

In this paper, HeadPosr is proposed to predict the head poses using a si...
research
09/01/2020

Object Detection-Based Variable Quantization Processing

In this paper, we propose a preprocessing method for conventional image ...

Please sign up or login with your details

Forgot password? Click here to reset