ViDA-MAN: Visual Dialog with Digital Humans

10/26/2021
by   Tong Shen, et al.
0

We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries. Compared to traditional text or voice-based system, ViDA-MAN offers human-like interactions (e.g, vivid voice, natural facial expression and body gestures). Given a speech request, the demonstration is able to response with high quality videos in sub-second latency. To deliver immersive user experience, ViDA-MAN seamlessly integrates multi-modal techniques including Acoustic Speech Recognition (ASR), multi-turn dialog, Text To Speech (TTS), talking heads video generation. Backed with large knowledge base, ViDA-MAN is able to chat with users on a number of topics including chit-chat, weather, device control, News recommendations, booking hotels, as well as answering questions via structured knowledge.

READ FULL TEXT

page 1

page 2

research
03/25/2022

Facial Expression Recognition with Swin Transformer

The task of recognizing human facial expressions plays a vital role in v...
research
03/18/2022

Improve few-shot voice cloning using multi-modal learning

Recently, few-shot voice cloning has achieved a significant improvement....
research
01/28/2019

Multi-modal dialog for browsing large visual catalogs using exploration-exploitation paradigm in a joint embedding space

We present a multi-modal dialog system to assist online shoppers in visu...
research
02/27/2023

Improving Medical Speech-to-Text Accuracy with Vision-Language Pre-training Model

Automatic Speech Recognition (ASR) is a technology that converts spoken ...
research
03/16/2020

Multi-modal Multi-channel Target Speech Separation

Target speech separation refers to extracting a target speaker's voice f...
research
10/20/2020

Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

We present a novel multi-modal unspoken punctuation prediction system fo...
research
01/06/2022

Multi-modal data fusion of Voice and EMG data for Robotic Control

Wearable electronic equipment is constantly evolving and is increasing t...

Please sign up or login with your details

Forgot password? Click here to reset