SGGNet^2: Speech-Scene Graph Grounding Network for Speech-guided Navigation

07/14/2023
by   Dohyun Kim, et al.
0

The spoken language serves as an accessible and efficient interface, enabling non-experts and disabled users to interact with complex assistant robots. However, accurately grounding language utterances gives a significant challenge due to the acoustic variability in speakers' voices and environmental noise. In this work, we propose a novel speech-scene graph grounding network (SGGNet^2) that robustly grounds spoken utterances by leveraging the acoustic similarity between correctly recognized and misrecognized words obtained from automatic speech recognition (ASR) systems. To incorporate the acoustic similarity, we extend our previous grounding model, the scene-graph-based grounding network (SGGNet), with the ASR model from NVIDIA NeMo. We accomplish this by feeding the latent vector of speech pronunciations into the BERT-based grounding network within SGGNet. We evaluate the effectiveness of using latent vectors of speech commands in grounding through qualitative and quantitative studies. We also demonstrate the capability of SGGNet^2 in a speech-based navigation task using a real quadruped robot, RBQ-3, from Rainbow Robotics.

READ FULL TEXT

page 1

page 3

page 5

page 6

research
02/27/2023

Multimodal Speech Recognition for Language-Guided Embodied Agents

Benchmarks for language-guided embodied agents typically assume text-bas...
research
07/11/2023

Speech Diarization and ASR with GMM

In this research paper, we delve into the topics of Speech Diarization a...
research
02/13/2020

Looking Enhances Listening: Recovering Missing Speech Using Images

Speech is understood better by using visual context; for this reason, th...
research
01/25/2017

Learning Word-Like Units from Joint Audio-Visual Analysis

Given a collection of images and spoken audio captions, we present a met...
research
07/31/2018

Extensible Grounding of Speech for Robot Instruction

Spoken language is a convenient interface for commanding a mobile robot....
research
09/01/2022

Video-Guided Curriculum Learning for Spoken Video Grounding

In this paper, we introduce a new task, spoken video grounding (SVG), wh...
research
07/14/2023

Towards spoken dialect identification of Irish

The Irish language is rich in its diversity of dialects and accents. Thi...

Please sign up or login with your details

Forgot password? Click here to reset