Understanding Pixel-level 2D Image Semantics with 3D Keypoint Knowledge Engine

by   Yang You, et al.

Pixel-level 2D object semantic understanding is an important topic in computer vision and could help machine deeply understand objects (e.g. functionality and affordance) in our daily life. However, most previous methods directly train on correspondences in 2D images, which is end-to-end but loses plenty of information in 3D spaces. In this paper, we propose a new method on predicting image corresponding semantics in 3D domain and then projecting them back onto 2D images to achieve pixel-level understanding. In order to obtain reliable 3D semantic labels that are absent in current image datasets, we build a large scale keypoint knowledge engine called KeypointNet, which contains 103,450 keypoints and 8,234 3D models from 16 object categories. Our method leverages the advantages in 3D vision and can explicitly reason about objects self-occlusion and visibility. We show that our method gives comparative and even superior results on standard semantic benchmarks.


page 1

page 3

page 6

page 9

page 11

page 12

page 13

page 16


Semantic Correspondence via 2D-3D-2D Cycle

Visual semantic correspondence is an important topic in computer vision ...

Superpixelizing Binary MRF for Image Labeling Problems

Superpixels have become prevalent in computer vision. They have been use...

A Split Semantic Detection Algorithm for Psychological Sandplay Image

Psychological sandplay, as an important psychological analysis tool, is ...

Pixel Invisibility: Detecting Objects Invisible in Color Images

Despite recent success of object detectors using deep neural networks, t...

Toward Parts-Based Scene Understanding with Pixel-Support Parts-Sparse Pictorial Structures

Scene understanding remains a significant challenge in the computer visi...

Classifying Suspicious Content in Tor Darknet

One of the tasks of law enforcement agencies is to find evidence of crim...

Benchmarking recognition results on word image datasets

We have benchmarked the maximum obtainable recognition accuracy on vario...