b'Zhuo Chen'

research

∙ 01/05/2023

Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

We introduce a language modeling approach for text to speech synthesis (...

4 Chengyi Wang, et al. ∙

research

∙ 12/29/2022

MEAformer: Multi-modal Entity Alignment Transformer for Meta Modality Hybrid

As an important variant of entity alignment (EA), multi-modal entity ali...

10 Zhuo Chen, et al. ∙

research

∙ 12/18/2022

BEATs: Audio Pre-Training with Acoustic Tokenizers

The massive growth of self-supervised learning (SSL) has been witnessed ...

0 Sanyuan Chen, et al. ∙

research

∙ 12/14/2022

Simulating 2+1D Lattice Quantum Electrodynamics at Finite Density with Neural Flow Wavefunctions

We present a neural flow wavefunction, Gauge-Fermion FlowNet, and use it...

0 Zhuo Chen, et al. ∙

research

∙ 11/18/2022

Exploring WavLM on Speech Enhancement

There is a surge in interest in self-supervised learning approaches for ...

0 Hyungchan Song, et al. ∙

research

∙ 11/11/2022

Breaking trade-offs in speech separation with sparsely-gated mixture of experts

Several trade-offs need to be balanced when employing monaural speech se...

0 Xiaofei Wang, et al. ∙

research

∙ 11/09/2022

Speech separation with large-scale self-supervised learning

Self-supervised learning (SSL) methods such as WavLM have shown promisin...

0 Zhuo Chen, et al. ∙

research

∙ 10/27/2022

Simulating realistic speech overlaps improves multi-talker ASR

Multi-talker automatic speech recognition (ASR) has been studied to gene...

0 Muqiao Yang, et al. ∙

research

∙ 10/24/2022

Real-time Speech Interruption Analysis: From Cloud to Client Deployment

Meetings are an essential form of communication for all types of organiz...

0 Quchen Fu, et al. ∙

research

∙ 10/20/2022

Tele-Knowledge Pre-training for Fault Analysis

In this work, we share our experience on tele-knowledge pre-training for...

3 Zhuo Chen, et al. ∙

research

∙ 09/22/2022

The Microsoft System for VoxCeleb Speaker Recognition Challenge 2022

In this report, we describe our submitted system for track 2 of the VoxC...

0 Gang Liu, et al. ∙

research

∙ 09/12/2022

VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition

This paper presents a novel streaming automatic speech recognition (ASR)...

6 Naoyuki Kanda, et al. ∙

research

∙ 09/07/2022

Cooperative trajectory planning algorithm of USV-UAV with hull dynamic constraints

Efficient trajectory generation in complex dynamic environment stills re...

0 Tao Huang, et al. ∙

research

∙ 09/03/2022

Illegal But Not Malware: An Underground Economy App Detection System Based on Usage Scenario

This paper focuses on mobile apps serving the underground economy by pro...

0 Zhuo Chen, et al. ∙

research

∙ 08/19/2022

Aspect-based Sentiment Classification with Sequential Cross-modal Semantic Graph

Multi-modal aspect-based sentiment classification (MABSC) is an emerging...

14 Yufeng Huang, et al. ∙

research

∙ 07/26/2022

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

Visual question answering (VQA) often requires an understanding of visua...

1 Zhuo Chen, et al. ∙

research

∙ 07/04/2022

DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Zero-shot learning (ZSL) aims to predict unseen classes whose samples ha...

0 Zhuo Chen, et al. ∙

research

∙ 06/08/2022

Disentangled Ontology Embedding for Zero-shot Learning

Knowledge Graph (KG) and its variant of ontology have been widely used f...

12 Yuxia Geng, et al. ∙

research

∙ 04/27/2022

Ultra Fast Speech Separation Model with Teacher Student Learning

Transformer has been successfully applied to speech separation recently ...

0 Sanyuan Chen, et al. ∙

research

∙ 04/27/2022

Why does Self-Supervised Learning for Speech Recognition Benefit Speaker Recognition?

Recently, self-supervised learning (SSL) has demonstrated strong perform...

0 Sanyuan Chen, et al. ∙

research

∙ 03/30/2022

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

This paper presents a streaming speaker-attributed automatic speech reco...

0 Naoyuki Kanda, et al. ∙

research

∙ 02/17/2022

Knowledge-informed Molecular Learning: A Survey on Paradigm Transfer

Machine learning, especially deep learning, has greatly advanced molecul...

0 Yin Fang, et al. ∙

research

∙ 02/02/2022

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

This paper proposes a token-level serialized output training (t-SOT), a ...

0 Naoyuki Kanda, et al. ∙

research

∙ 12/19/2021

A New Image Codec Paradigm for Human and Machine Uses

With the AI of Things (AIoT) development, a huge amount of visual data, ...

5 Sien Chen, et al. ∙

research

∙ 12/01/2021

Molecular Contrastive Learning with Chemical Element Knowledge Graph

Molecular representation learning contributes to multiple downstream tas...

0 Yin Fang, et al. ∙

research

∙ 10/28/2021

Continuous Speech Separation with Recurrent Selective Attention Network

While permutation invariant training (PIT) based continuous speech separ...

0 Yixuan Zhang, et al. ∙

research

∙ 10/27/2021

Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Multi-talker conversational speech processing has drawn many interests f...

0 Wangyou Zhang, et al. ∙

research

∙ 10/26/2021

WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Self-supervised learning (SSL) achieves great success in speech recognit...

0 Sanyuan Chen, et al. ∙

research

∙ 10/20/2021

One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement

With the recent surge of video conferencing tools usage, providing high-...

0 Hassan Taherian, et al. ∙

research

∙ 10/18/2021

Personalized Speech Enhancement: New Models and Comprehensive Evaluation

Personalized speech enhancement (PSE) models utilize additional cues, su...

0 Sefik Emre Eskimez, et al. ∙

research

∙ 10/13/2021

All-neural beamformer for continuous speech separation

Continuous speech separation (CSS) aims to separate overlapping voices f...

0 Zhuohuang Zhang, et al. ∙

research

∙ 10/12/2021

UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Self-supervised learning (SSL) is a long-standing goal for speech proces...

0 Sanyuan Chen, et al. ∙

research

∙ 10/12/2021

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Continuous speech separation using a microphone array was shown to be pr...

0 Takuya Yoshioka, et al. ∙

research

∙ 10/07/2021

Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR

This paper presents Transcribe-to-Diarize, a new approach for neural spe...

0 Naoyuki Kanda, et al. ∙

research

∙ 09/17/2021

Continuous Streaming Multi-Talker ASR with Dual-path Transducers

Streaming recognition of multi-talker conversations has so far been eval...

0 Desh Raj, et al. ∙

research

∙ 08/04/2021

Spacetime Neural Network for High Dimensional Quantum Dynamics

We develop a spacetime neural network method with second order optimizat...

0 Jiangran Wang, et al. ∙

research

∙ 07/12/2021

Zero-shot Visual Question Answering using Knowledge Graph

Incorporating external knowledge to Visual Question Answering (VQA) has ...

13 Zhuo Chen, et al. ∙

research

∙ 07/08/2021

Collaboration of Experts: Achieving 80 100M FLOPs

In this paper, we propose a Collaboration of Experts (CoE) framework to ...

0 Yikang Zhang, et al. ∙

research

∙ 07/06/2021

A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio

Speaker-attributed automatic speech recognition (SA-ASR) is a task to re...

0 Naoyuki Kanda, et al. ∙

research

∙ 07/05/2021

Investigation of Practical Aspects of Single Channel Speech Separation for ASR

Speech separation has been successfully applied as a frontend processing...

0 Jian Wu, et al. ∙

research

∙ 06/29/2021

K-ZSL: Resources for Knowledge-driven Zero-shot Learning

External knowledge (a.k.a side information) plays a critical role in zer...

27 Yuxia Geng, et al. ∙

research

∙ 06/28/2021

Modeling and Reasoning in Event Calculus using Goal-Directed Constraint Answer Set Programming

Automated commonsense reasoning is essential for building human-like AI ...

0 Joaquín Arias, et al. ∙

research

∙ 06/10/2021

Lifting The Grey Curtain: A First Look at the Ecosystem of CULPRITWARE

Mobile apps are extensively involved in cyber-crimes. Some apps are malw...

0 Zhuo Chen, et al. ∙

research

∙ 04/08/2021

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario

In this paper, we present AISHELL-4, a sizable real-recorded Mandarin sp...

0 Yihui Fu, et al. ∙

research

∙ 04/05/2021

End-to-End Speaker-Attributed ASR with Transformer

This paper presents our recent effort on end-to-end speaker-attributed a...

0 Naoyuki Kanda, et al. ∙

research

∙ 03/31/2021

Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone

Transcribing meetings containing overlapped speech with only a single di...

0 Naoyuki Kanda, et al. ∙

research

∙ 03/03/2021

Continuous Speech Separation with Ad Hoc Microphone Arrays

Speech separation has been shown effective for multi-talker speech recog...

0 Dongmei Wang, et al. ∙

research

∙ 02/26/2021

Knowledge-aware Zero-Shot Learning: Survey and Perspective

Zero-shot learning (ZSL) which aims at predicting classes that have neve...

21 Jiaoyan Chen, et al. ∙

research

∙ 02/23/2021

Dual-Path Modeling for Long Recording Speech Separation in Meetings

The continuous speech separation (CSS) is a task to separate the speech ...

0 Chenda Li, et al. ∙

research

∙ 02/15/2021

OntoZSL: Ontology-enhanced Zero-shot Learning

Zero-shot Learning (ZSL), which aims to predict for those classes that h...

11 Yuxia Geng, et al. ∙

Zhuo Chen

Featured Co-authors

Sign in with Google

Consider DeepAI Pro