Affective social anthropomorphic intelligent system

04/19/2023
by   Md. Adyelullahil Mamun, et al.
0

Human conversational styles are measured by the sense of humor, personality, and tone of voice. These characteristics have become essential for conversational intelligent virtual assistants. However, most of the state-of-the-art intelligent virtual assistants (IVAs) are failed to interpret the affective semantics of human voices. This research proposes an anthropomorphic intelligent system that can hold a proper human-like conversation with emotion and personality. A voice style transfer method is also proposed to map the attributes of a specific emotion. Initially, the frequency domain data (Mel-Spectrogram) is created by converting the temporal audio wave data, which comprises discrete patterns for audio features such as notes, pitch, rhythm, and melody. A collateral CNN-Transformer-Encoder is used to predict seven different affective states from voice. The voice is also fed parallelly to the deep-speech, an RNN model that generates the text transcription from the spectrogram. Then the transcripted text is transferred to the multi-domain conversation agent using blended skill talk, transformer-based retrieve-and-generate generation strategy, and beam-search decoding, and an appropriate textual response is generated. The system learns an invertible mapping of data to a latent space that can be manipulated and generates a Mel-spectrogram frame based on previous Mel-spectrogram frames to voice synthesize and style transfer. Finally, the waveform is generated using WaveGlow from the spectrogram. The outcomes of the studies we conducted on individual models were auspicious. Furthermore, users who interacted with the system provided positive feedback, demonstrating the system's effectiveness.

READ FULL TEXT

page 10

page 11

page 12

page 13

page 18

research
10/26/2019

Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens

Mellotron is a multispeaker voice synthesis model based on Tacotron 2 GS...
research
07/25/2022

Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis

Sequence-to-Sequence Text-to-Speech architectures that directly generate...
research
02/17/2021

End-to-end lyrics Recognition with Voice to Singing Style Transfer

Automatic transcription of monophonic/polyphonic music is a challenging ...
research
10/08/2019

MelGAN-VC: Voice Conversion and Audio Style Transfer on arbitrarily long samples using Spectrograms

Traditional voice conversion methods rely on parallel recordings of mult...
research
09/06/2022

Read it to me: An emotionally aware Speech Narration Application

In this work we try to perform emotional style transfer on audios. In pa...
research
07/21/2021

Digital Einstein Experience: Fast Text-to-Speech for Conversational AI

We describe our approach to create and deliver a custom voice for a conv...
research
04/08/2023

VOICE: Visual Oracle for Interaction, Conversation, and Explanation

We present VOICE, a novel approach for connecting large language models'...

Please sign up or login with your details

Forgot password? Click here to reset