Tom Ko

research

∙ 08/31/2023

RepCodec: A Speech Representation Codec for Speech Tokenization

With recent rapid growth of large language models (LLMs), discrete speec...

0 Zhichao Huang, et al. ∙

research

∙ 06/18/2023

MOSPC: MOS Prediction Based on Pairwise Comparison

As a subjective metric to evaluate the quality of synthesized speech, Me...

0 Kexin Wang, et al. ∙

research

∙ 05/19/2023

DUB: Discrete Unit Back-translation for Speech Translation

How can speech-to-text translation (ST) perform as well as machine trans...

0 Dong Zhang, et al. ∙

research

∙ 03/30/2023

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

The advancement of audio-language (AL) multimodal learning tasks has bee...

0 Xinhao Mei, et al. ∙

research

∙ 12/07/2022

M3ST: Mix at Three Levels for Speech Translation

How to solve the data scarcity problem for end-to-end speech-to-text tra...

0 Xuxin Cheng, et al. ∙

research

∙ 10/28/2022

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

Audio captioning is the task of generating captions that describe the co...

0 Xubo Liu, et al. ∙

research

∙ 10/08/2022

CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Speech is the surface form of a finite set of phonetic units, which can ...

0 Chutong Meng, et al. ∙

research

∙ 08/03/2022

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

In human speech, the attitude of a speaker cannot be fully expressed onl...

0 Qibing Bai, et al. ∙

research

∙ 05/18/2022

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

Direct Speech-to-speech translation (S2ST) has drawn more and more atten...

0 Qianqian Dong, et al. ∙

research

∙ 04/08/2022

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

This paper introduces GigaST, a large-scale pseudo speech translation (S...

0 Rong Ye, et al. ∙

research

∙ 03/31/2022

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

This paper studies a novel pre-training technique with unpaired speech d...

0 Junyi Ao, et al. ∙

research

∙ 08/05/2021

An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning

Automated audio captioning aims to use natural language to describe the ...

0 Xinhao Mei, et al. ∙

research

∙ 07/21/2021

CL4AC: A Contrastive Loss for Audio Captioning

Automated Audio captioning (AAC) is a cross-modal translation task that ...

0 Xubo Liu, et al. ∙

research

∙ 04/08/2021

Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation

Machine Speech Chain, which integrates both end-to-end (E2E) automatic s...

0 Fengpeng Yue, et al. ∙

research

∙ 03/31/2021

Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

Auto-KWS 2021 challenge calls for automated machine learning (AutoML) so...

11 Jingsong Wang, et al. ∙

research

∙ 10/25/2020

AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification

The AutoSpeech challenge calls for automated machine learning (AutoML) s...

12 Jingsong Wang, et al. ∙

research

∙ 09/29/2020

MetaMix: Improved Meta-Learning with Interpolation-based Consistency Regularization

Model-Agnostic Meta-Learning (MAML) and its variants are popular few-sho...

46 Yangbin Chen, et al. ∙

research

∙ 12/26/2018

Meta Learning for Few-shot Keyword Spotting

Keyword spotting with limited training data is a challenging task which ...

0 Yangbin Chen, et al. ∙

Tom Ko

Featured Co-authors

Sign in with Google

Consider DeepAI Pro