End-to-end speech translation (ST) for conversation recordings involves
...
In this paper, we explore the zero-shot capability of the Segment Anythi...
In real-world applications, users often require both translations and
tr...
We propose gated language experts to improve multilingual transformer
tr...
This paper proposes a novel application system for the generation of
thr...
Neural transducer is now the most popular end-to-end model for speech
re...
Automatic Speech Recognition (ASR) systems typically yield output in lex...
End-to-end formulation of automatic speech recognition (ASR) and speech
...
In this paper, we introduce our work of building a Streaming Multilingua...
Arbitrary-oriented object representations contain the oriented bounding ...
Neural transducers have been widely used in automatic speech recognition...
Nearly all existing Facial Action Coding System-based datasets that incl...
In this paper, several works are proposed to address practical challenge...
Facial expression analysis based on machine learning requires large numb...