Y^2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences

11/07/2018
by   Zhizhong Han, et al.
8

A recent method employs 3D voxels to represent 3D shapes, but this limits the approach to low resolutions due to the computational cost caused by the cubic complexity of 3D voxels. Hence the method suffers from a lack of detailed geometry. To resolve this issue, we propose Y^2Seq2Seq, a view-based model, to learn cross-modal representations by joint reconstruction and prediction of view and word sequences. Specifically, the network architecture of Y^2Seq2Seq bridges the semantic meaning embedded in the two modalities by two coupled `Y' like sequence-to-sequence (Seq2Seq) structures. In addition, our novel hierarchical constraints further increase the discriminability of the cross-modal representations by employing more detailed discriminative information. Experimental results on cross-modal retrieval and 3D shape captioning show that Y^2Seq2Seq outperforms the state-of-the-art methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 8

research
10/27/2022

3D Shape Knowledge Graph for Cross-domain and Cross-modal 3D Shape Retrieval

With the development of 3D modeling and fabrication, 3D shape retrieval ...
research
03/06/2019

Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval

In recent years, hashing has attracted more and more attention owing to ...
research
07/05/2021

Part2Word: Learning Joint Embedding of Point Clouds and Text by Matching Parts to Words

It is important to learn joint embedding for 3D shapes and text in diffe...
research
08/12/2023

BEV-DG: Cross-Modal Learning under Bird's-Eye View for Domain Generalization of 3D Semantic Segmentation

Cross-modal Unsupervised Domain Adaptation (UDA) aims to exploit the com...
research
09/07/2018

A Deeper Look at 3D Shape Classifiers

We investigate the role of representations and architectures for classif...
research
07/31/2019

ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences

3D shape captioning is a challenging application in 3D shape understandi...
research
08/26/2021

Drop-DTW: Aligning Common Signal Between Sequences While Dropping Outliers

In this work, we consider the problem of sequence-to-sequence alignment ...

Please sign up or login with your details

Forgot password? Click here to reset