ShapeCaptioner: Generative Caption Network for 3D Shapes by Learning a Mapping from Parts Detected in Multiple Views to Sentences

07/31/2019
by   Zhizhong Han, et al.
1

3D shape captioning is a challenging application in 3D shape understanding. Captions from recent multi-view based methods reveal that they cannot capture part-level characteristics of 3D shapes. This leads to a lack of detailed part-level description in captions, which human tend to focus on. To resolve this issue, we propose ShapeCaptioner, a generative caption network, to perform 3D shape captioning from semantic parts detected in multiple views. Our novelty lies in learning the knowledge of part detection in multiple views from 3D shape segmentations and transferring this knowledge to facilitate learning the mapping from 3D shapes to sentences. Specifically, ShapeCaptioner aggregates the parts detected in multiple colored views using our novel part class specific aggregation to represent a 3D shape, and then, employs a sequence to sequence model to generate the caption. Our outperforming results show that ShapeCaptioner can learn 3D shape features with more detailed part characteristics to facilitate better 3D shape captioning than previous work.

READ FULL TEXT

page 1

page 2

page 3

page 4

page 5

page 6

page 7

page 9

research
07/05/2021

Part2Word: Learning Joint Embedding of Point Clouds and Text by Matching Parts to Words

It is important to learn joint embedding for 3D shapes and text in diffe...
research
05/18/2019

Parts4Feature: Learning 3D Global Features from Generally Semantic Parts in Multiple Views

Deep learning has achieved remarkable results in 3D shape analysis by le...
research
11/28/2019

3D Shape Completion with Multi-view Consistent Inference

3D shape completion is important to enable machines to perceive the comp...
research
05/17/2019

3DViewGraph: Learning Global Features for 3D Shapes from A Graph of Unordered Views with Attention

Learning global features by aggregating information over multiple views ...
research
11/07/2018

Y^2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences

A recent method employs 3D voxels to represent 3D shapes, but this limit...
research
03/08/2020

Better Captioning with Sequence-Level Exploration

Sequence-level learning objective has been widely used in captioning tas...
research
04/26/2017

Epsilon-shapes: characterizing, detecting and thickening thin features in geometric models

We focus on the analysis of planar shapes and solid objects having thin ...

Please sign up or login with your details

Forgot password? Click here to reset