Estimating speech from lip dynamics

08/03/2017
by   Jithin Donny George, et al.
0

The goal of this project is to develop a limited lip reading algorithm for a subset of the English language. We consider a scenario in which no audio information is available. The raw video is processed and the position of the lips in each frame is extracted. We then prepare the lip data for processing and classify the lips into visemes and phonemes. Hidden Markov Models are used to predict the words the speaker is saying based on the sequences of classified phonemes and visemes. The GRID audiovisual sentence corpus [10][11] database is used for our study.

READ FULL TEXT

page 4

page 5

page 6

page 7

research
07/02/2017

Emirati Speaker Verification Based on HMM1s, HMM2s, and HMM3s

This work focuses on Emirati speaker verification systems in neutral tal...
research
06/29/2017

Speaker Identification in the Shouted Environment Using Suprasegmental Hidden Markov Models

In this paper, Suprasegmental Hidden Markov Models (SPHMMs) have been us...
research
09/28/2019

Emirati-Accented Speaker Identification in Stressful Talking Conditions

This research is dedicated to improving text-independent Emirati-accente...
research
03/31/2018

Emirati-Accented Speaker Identification in each of Neutral and Shouted Talking Environments

This work is devoted to capturing Emirati-accented speech database (Arab...
research
01/26/2022

J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis

In this paper, we construct a Japanese audiobook speech corpus called "J...
research
04/03/2022

Automatic Dialect Density Estimation for African American English

In this paper, we explore automatic prediction of dialect density of the...

Please sign up or login with your details

Forgot password? Click here to reset