Auditory Representation Effective for Estimating Vocal Tract Information

06/02/2023
by   Toshio Irino, et al.
0

We can estimate the size of the speaker solely based on their speech sounds. We had proposed an auditory computational theory of the stabilised wavelet-Mellin transform (SWMT), which segregates information about the size and shape of vocal tract and glottal vibration, to explain this observation. It was demonstrated that the auditory representation or excitation pattern (EP) associated with a weighting function based on SWMT, referred to as "SSI weigh", made it possible to explain the psychometric functions of size perception. In this study, we investigated whether EP with SSI weight can precisely estimate vocal tract lengths (VTLs) which were measured using male and female MRI data. It was found that the use of SSI weight significantly improved the VTL estimation. Moreover, the estimation errors were significantly smaller in the EP with the SSI weight than those in the commonly used spectra derived from the Fourier transform, Mel filterbank, and WORLD vocoder. It was also shown that the SSI weight can be easily introduced into these spectra to improve the performance.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset