Phone-to-audio alignment without text: A Semi-supervised Approach

10/08/2021
by   Jian Zhu, et al.
0

The task of phone-to-audio alignment has many applications in speech research. Here we introduce two Wav2Vec2-based models for both text-dependent and text-independent phone-to-audio alignment. The proposed Wav2Vec2-FS, a semi-supervised model, directly learns phone-to-audio alignment through contrastive learning and a forward sum loss, and can be coupled with a pretrained phone recognizer to achieve text-independent alignment. The other model, Wav2Vec2-FC, is a frame classification model trained on forced aligned labels that can both perform forced alignment and text-independent segmentation. Evaluation results suggest that both proposed methods, even when transcriptions are not available, generate highly close results to existing forced alignment tools. Our work presents a neural pipeline of fully automated phone-to-audio alignment. Code and pretrained models are available at https://github.com/lingjzhu/charsiu.

READ FULL TEXT
research
07/18/2023

OxfordVGG Submission to the EGO4D AV Transcription Challenge

This report presents the technical details of our submission on the EGO4...
research
08/08/2023

EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation

Audio-guided Video Object Segmentation (A-VOS) and Referring Video Objec...
research
04/17/2023

Prak: An automatic phonetic alignment tool for Czech

Labeling speech down to the identity and time boundaries of phones is a ...
research
11/21/2019

ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring

We improve the recently-proposed "MixMatch" semi-supervised learning alg...
research
03/11/2019

Deep Text-to-Speech System with Seq2Seq Model

Recent trends in neural network based text-to-speech/speech synthesis pi...
research
03/30/2022

End to End Lip Synchronization with a Temporal AutoEncoder

We study the problem of syncing the lip movement in a video with the aud...
research
02/12/2020

AlignNet: A Unifying Approach to Audio-Visual Alignment

We present AlignNet, a model that synchronizes videos with reference aud...

Please sign up or login with your details

Forgot password? Click here to reset