Bowen Shi

research

∙ 08/23/2023

Toward American Sign Language Processing in the Real World: Data, Tasks, and Methods

Sign language, which conveys meaning through gestures, is the chief mean...

0 Bowen Shi, et al. ∙

research

∙ 08/10/2023

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

Recent work has shown that it is possible to resynthesize high-quality s...

0 Tu Anh Nguyen, et al. ∙

research

∙ 07/18/2023

ActionPrompt: Action-Guided 3D Human Pose Estimation With Text and Pose Prompting

Recent 2D-to-3D human pose estimation (HPE) utilizes temporal consistenc...

0 Hongwei Zheng, et al. ∙

research

∙ 06/28/2023

Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners

Representation learning has been evolving from traditional supervised tr...

0 Bowen Shi, et al. ∙

research

∙ 06/23/2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Large-scale generative models such as GPT and DALL-E have revolutionized...

0 Matthew Le, et al. ∙

research

∙ 06/22/2023

Prompt to GPT-3: Step-by-Step Thinking Instructions for Humor Generation

Artificial intelligence has made significant progress in natural languag...

0 Yuetian Chen, et al. ∙

research

∙ 05/22/2023

Scaling Speech Technology to 1,000+ Languages

Expanding the language coverage of speech technology has the potential t...

0 Vineel Pratap, et al. ∙

research

∙ 05/08/2023

SEGA: Structural Entropy Guided Anchor View for Graph Contrastive Learning

In contrastive learning, the choice of “view” controls the information t...

0 Junran Wu, et al. ∙

research

∙ 03/09/2023

Rethinking Visual Prompt Learning as Masked Visual Token Modeling

Prompt learning has achieved great success in efficiently exploiting lar...

0 Ning Liao, et al. ∙

research

∙ 03/01/2023

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

We introduce MuAViC, a multilingual audio-visual corpus for robust speec...

0 Mohamed Anwar, et al. ∙

research

∙ 02/15/2023

Pose-Oriented Transformer with Uncertainty-Guided Refinement for 2D-to-3D Human Pose Estimation

There has been a recent surge of interest in introducing transformers to...

0 Han Li, et al. ∙

research

∙ 01/07/2023

Visual Story Generation Based on Emotion and Keywords

Automated visual story generation aims to produce stories with correspon...

0 Yuetian Chen, et al. ∙

research

∙ 12/21/2022

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement

Prior works on improving speech quality with visual input typically stud...

0 Wei-Ning Hsu, et al. ∙

research

∙ 11/08/2022

Comparative layer-wise analysis of self-supervised speech models

Many self-supervised speech models, varying in their pre-training object...

0 Ankita Pasad, et al. ∙

research

∙ 07/14/2022

A Single Self-Supervised Model for Many Speech Modalities Enables Zero-Shot Modality Transfer

While audio-visual speech models can yield superior performance and robu...

0 Wei-Ning Hsu, et al. ∙

research

∙ 05/25/2022

Open-Domain Sign Language Translation Learned from Online Video

Existing work on sign language translation–that is, translation from sig...

0 Bowen Shi, et al. ∙

research

∙ 05/15/2022

Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT

This paper investigates self-supervised pre-training for audio-visual sp...

0 Bowen Shi, et al. ∙

research

∙ 03/24/2022

Searching for fingerspelled content in American Sign Language

Natural language processing for sign language video - including tasks li...

0 Bowen Shi, et al. ∙

research

∙ 01/05/2022

Robust Self-Supervised Audio-Visual Speech Recognition

Audio-based automatic speech recognition (ASR) degrades significantly in...

0 Bowen Shi, et al. ∙

research

∙ 01/05/2022

Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction

Video recordings of speech contain correlated audio and visual informati...

0 Bowen Shi, et al. ∙

research

∙ 11/23/2021

Hierarchical Graph Networks for 3D Human Pose Estimation

Recent 2D-to-3D human pose estimation works tend to utilize the graph st...

4 Han Li, et al. ∙

research

∙ 06/08/2021

Multi-dataset Pretraining: A Unified Model for Semantic Segmentation

Collecting annotated data for semantic segmentation is time-consuming an...

0 Bowen Shi, et al. ∙

research

∙ 04/03/2021

Fingerspelling Detection in American Sign Language

Fingerspelling, in which words are signed letter by letter, is an import...

1 Bowen Shi, et al. ∙

research

∙ 08/07/2020

A Joint Framework for Audio Tagging and Weakly Supervised Acoustic Event Detection Using DenseNet with Global Average Pooling

This paper proposes a network architecture mainly designed for audio tag...

0 Chieh-Chi Kao, et al. ∙

research

∙ 07/01/2020

Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

Segmental models are sequence prediction models in which scores of hypot...

0 Bowen Shi, et al. ∙

research

∙ 06/06/2020

A Cross-Task Analysis of Text Span Representations

Many natural language processing (NLP) tasks involve reasoning with text...

0 Shubham Toshniwal, et al. ∙

research

∙ 04/12/2020

Behavior variations and their implications for popularity promotions: From elites to mass in Weibo

The boom in social media with regard to producing and consuming informat...

0 Bowen Shi, et al. ∙

research

∙ 02/25/2020

Interactive, Effort-Aware Library Version Harmonization

As a mixed result of intensive dependency on third-party libraries, flex...

0 Kaifeng Huang, et al. ∙

research

∙ 02/25/2020

An Empirical Study of Usages, Updates and Risks of Third-Party Libraries in Java Projects

Third-party libraries are a central building block to develop software s...

0 Ying Wang, et al. ∙

research

∙ 02/21/2020

Few-shot acoustic event detection via meta-learning

We study few-shot acoustic event detection (AED) in this paper. Few-shot...

0 Bowen Shi, et al. ∙

research

∙ 01/17/2020

Latency-Aware Differentiable Neural Architecture Search

Differentiable neural architecture search methods became popular in auto...

3 Yuhui Xu, et al. ∙

research

∙ 08/28/2019

Fingerspelling recognition in the wild with iterative visual attention

Sign language recognition is a challenging gesture sequence recognition ...

10 Bowen Shi, et al. ∙

research

∙ 07/01/2019

Compression of Acoustic Event Detection Models With Quantized Distillation

Acoustic Event Detection (AED), aiming at detecting categories of events...

0 Bowen Shi, et al. ∙

research

∙ 05/02/2019

Compression of Acoustic Event Detection Models with Low-rank Matrix Factorization and Quantization Training

In this paper, we present a compression approach based on the combinatio...

0 Bowen Shi, et al. ∙

research

∙ 04/29/2019

Semi-supervised Acoustic Event Detection based on tri-training

This paper presents our work of training acoustic event detection (AED) ...

0 Bowen Shi, et al. ∙

research

∙ 04/24/2019

On the Contributions of Visual and Textual Supervision in Low-resource Semantic Speech Retrieval

Recent work has shown that speech paired with images can be used to lear...

0 Ankita Pasad, et al. ∙

research

∙ 10/26/2018

American Sign Language fingerspelling recognition in the wild

We address the problem of American Sign Language fingerspelling recognit...

0 Bowen Shi, et al. ∙

research

∙ 10/09/2017

Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition

We address the problem of automatic American Sign Language fingerspellin...

0 Bowen Shi, et al. ∙

Bowen Shi

Featured Co-authors

Sign in with Google

Consider DeepAI Pro