Filler words like “um" or “uh" are common in spontaneous speech. It is
d...
Video understanding tasks have traditionally been modeled by two separat...
Exploring dense matching between the current frame and past frames for
l...
In this paper, we present TridentSE, a novel architecture for speech
enh...
Continuous Speech Keyword Spotting (CSKWS) is a task to detect predefine...
This paper proposes a new "decompose-and-edit" paradigm for the text-bas...
Attention mechanism has been widely believed as the key to success of vi...
Given a piece of speech and its transcript text, text-based speech editi...
Transformers have sprung up in the field of computer vision. In this wor...
Convolutional neural networks (CNN) are the dominant deep neural network...
This paper presents a self-supervised learning framework, named MGF, for...