'Part'ly first among equals: Semantic part-based benchmarking for state-of-the-art object recognition systems

An examination of object recognition challenge leaderboards (ILSVRC, PASCAL-VOC) reveals that the top-performing classifiers typically exhibit small differences amongst themselves in terms of error rate/mAP. To better differentiate the top performers, additional criteria are required. Moreover, the (test) images, on which the performance scores are based, predominantly contain fully visible objects. Therefore, `harder' test images, mimicking the challenging conditions (e.g. occlusion) in which humans routinely recognize objects, need to be utilized for benchmarking. To address the concerns mentioned above, we make two contributions. First, we systematically vary the level of local object-part content, global detail and spatial context in images from PASCAL VOC 2010 to create a new benchmarking dataset dubbed PPSS-12. Second, we propose an object-part based benchmarking procedure which quantifies classifiers' robustness to a range of visibility and contextual settings. The benchmarking procedure relies on a semantic similarity measure that naturally addresses potential semantic granularity differences between the category labels in training and test datasets, thus eliminating manual mapping. We use our procedure on the PPSS-12 dataset to benchmark top-performing classifiers trained on the ILSVRC-2012 dataset. Our results show that the proposed benchmarking procedure enables additional differentiation among state-of-the-art object classifiers in terms of their ability to handle missing content and insufficient object detail. Given this capability for additional differentiation, our approach can potentially supplement existing benchmarking procedures used in object recognition challenge leaderboards.


page 7

page 19

page 20

page 21

page 22

page 23

page 36

page 37


Robustness of Object Recognition under Extreme Occlusion in Humans and Computational Models

Most objects in the visual world are partially occluded, but humans can ...

Sparse distributed localized gradient fused features of objects

The sparse, hierarchical, and modular processing of natural signals is r...

Improving Performance of Object Detection using the Mechanisms of Visual Recognition in Humans

Object recognition systems are usually trained and evaluated on high res...

DeepScores -- A Dataset for Segmentation, Detection and Classification of Tiny Objects

We present the DeepScores dataset with the goal of advancing the state-o...

Pinpointing Why Object Recognition Performance Degrades Across Income Levels and Geographies

Despite impressive advances in object-recognition, deep learning systems...

ScrewNet: Category-Independent Articulation Model Estimation From Depth Images Using Screw Theory

Robots in human environments will need to interact with a wide variety o...

ScanNeRF: a Scalable Benchmark for Neural Radiance Fields

In this paper, we propose the first-ever real benchmark thought for eval...

Please sign up or login with your details

Forgot password? Click here to reset