Jian Xue

research

∙ 09/14/2023

DiariST: Streaming Speech Translation with Speaker Diarization

End-to-end speech translation (ST) for conversation recordings involves ...

0 Mu Yang, et al. ∙

research

∙ 08/11/2023

FoodSAM: Any Food Segmentation

In this paper, we explore the zero-shot capability of the Segment Anythi...

0 Xing Lan, et al. ∙

research

∙ 07/07/2023

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

In real-world applications, users often require both translations and tr...

0 Sara Papi, et al. ∙

research

∙ 03/01/2023

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

We propose gated language experts to improve multilingual transformer tr...

0 Eric Sun, et al. ∙

research

∙ 12/12/2022

Markerless Body Motion Capturing for 3D Character Animation based on Multi-view Cameras

This paper proposes a novel application system for the generation of thr...

0 Jinbao Wang, et al. ∙

research

∙ 12/05/2022

Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

Neural transducer is now the most popular end-to-end model for speech re...

0 Rui Zhao, et al. ∙

research

∙ 11/07/2022

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

Automatic Speech Recognition (ASR) systems typically yield output in lex...

0 Yashesh Gaur, et al. ∙

research

∙ 11/05/2022

LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers

End-to-end formulation of automatic speech recognition (ASR) and speech ...

0 Peidong Wang, et al. ∙

research

∙ 11/04/2022

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

In this paper, we introduce our work of building a Streaming Multilingua...

0 Jian Xue, et al. ∙

research

∙ 05/24/2022

G-Rep: Gaussian Representation for Arbitrary-Oriented Object Detection

Arbitrary-oriented object representations contain the oriented bounding ...

0 Liping Hou, et al. ∙

research

∙ 04/11/2022

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

Neural transducers have been widely used in automatic speech recognition...

0 Jian Xue, et al. ∙

research

∙ 11/04/2021

FEAFA+: An Extended Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation

Nearly all existing Facial Action Coding System-based datasets that incl...

0 Wei Gan, et al. ∙

research

∙ 04/27/2021

On Addressing Practical Challenges for RNN-Transducer

In this paper, several works are proposed to address practical challenge...

0 Rui Zhao, et al. ∙

research

∙ 04/02/2019

FEAFA: A Well-Annotated Dataset for Facial Expression Analysis and 3D Facial Animation

Facial expression analysis based on machine learning requires large numb...

0 Yanfu Yan, et al. ∙

Jian Xue

Featured Co-authors

Sign in with Google

Consider DeepAI Pro