Sign language, which conveys meaning through gestures, is the chief mean...
Recent work has shown that it is possible to resynthesize high-quality s...
Recent 2D-to-3D human pose estimation (HPE) utilizes temporal consistenc...
Representation learning has been evolving from traditional supervised
tr...
Large-scale generative models such as GPT and DALL-E have revolutionized...
Artificial intelligence has made significant progress in natural languag...
Expanding the language coverage of speech technology has the potential t...
In contrastive learning, the choice of “view” controls the information t...
Prompt learning has achieved great success in efficiently exploiting
lar...
We introduce MuAViC, a multilingual audio-visual corpus for robust speec...
There has been a recent surge of interest in introducing transformers to...
Automated visual story generation aims to produce stories with correspon...
Prior works on improving speech quality with visual input typically stud...
Many self-supervised speech models, varying in their pre-training object...
While audio-visual speech models can yield superior performance and
robu...
Existing work on sign language translation–that is, translation from sig...
This paper investigates self-supervised pre-training for audio-visual sp...
Natural language processing for sign language video - including tasks li...
Audio-based automatic speech recognition (ASR) degrades significantly in...
Video recordings of speech contain correlated audio and visual informati...
Recent 2D-to-3D human pose estimation works tend to utilize the graph
st...
Collecting annotated data for semantic segmentation is time-consuming an...
Fingerspelling, in which words are signed letter by letter, is an import...
This paper proposes a network architecture mainly designed for audio tag...
Segmental models are sequence prediction models in which scores of hypot...
Many natural language processing (NLP) tasks involve reasoning with text...
The boom in social media with regard to producing and consuming informat...
As a mixed result of intensive dependency on third-party libraries, flex...
Third-party libraries are a central building block to develop software
s...
We study few-shot acoustic event detection (AED) in this paper. Few-shot...
Differentiable neural architecture search methods became popular in auto...
Sign language recognition is a challenging gesture sequence recognition
...
Acoustic Event Detection (AED), aiming at detecting categories of events...
In this paper, we present a compression approach based on the combinatio...
This paper presents our work of training acoustic event detection (AED)
...
Recent work has shown that speech paired with images can be used to lear...
We address the problem of American Sign Language fingerspelling recognit...
We address the problem of automatic American Sign Language fingerspellin...