3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple Objects from Video

10/05/2021
by   Justin Wilson, et al.
6

3D object reconstructions of transparent and concave structured objects, with inferred material properties, remains an open research problem for robot navigation in unstructured environments. In this paper, we propose a multimodal single- and multi-frame neural network for 3D reconstructions using audio-visual inputs. Our trained reconstruction LSTM autoencoder 3D-MOV accepts multiple inputs to account for a variety of surface types and views. Our neural network produces high-quality 3D reconstructions using voxel representation. Based on Intersection-over-Union (IoU), we evaluate against other baseline methods using synthetic audio-visual datasets ShapeNet and Sound20K with impact sounds and bounding box annotations. To the best of our knowledge, our single- and multi-frame model is the first audio-visual reconstruction neural network for 3D geometry and material representation.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

research
10/05/2021

Echo-Reconstruction: Audio-Augmented 3D Scene Reconstruction

Reflective and textureless surfaces such as windows, mirrors, and walls ...
research
12/06/2020

Shape From Tracing: Towards Reconstructing 3D Object Geometry and SVBRDF Material from Images via Differentiable Path Tracing

Reconstructing object geometry and material from multiple views typicall...
research
01/30/2020

NAViDAd: A No-Reference Audio-Visual Quality Metric Based on a Deep Autoencoder

The development of models for quality prediction of both audio and video...
research
12/21/2020

From Points to Multi-Object 3D Reconstruction

We propose a method to detect and reconstruct multiple 3D objects from a...
research
09/28/2020

Amodal 3D Reconstruction for Robotic Manipulation via Stability and Connectivity

Learning-based 3D object reconstruction enables single- or few-shot esti...
research
10/20/2020

Object Permanence Through Audio-Visual Representations

As robots perform manipulation tasks and interact with objects, it is pr...
research
01/23/2021

Next-best-view Regression using a 3D Convolutional Neural Network

Automated three-dimensional (3D) object reconstruction is the task of bu...

Please sign up or login with your details

Forgot password? Click here to reset