Rehabilitating the Color Checker Dataset for Illuminant Estimation

by   Ghalia Hemrit, et al.

In a previous work, it was shown that there is a curious problem with the benchmark Color Checker dataset for illuminant estimation. To wit, this dataset has at least 3 different sets of ground-truths. Typically, for a single algorithm a single ground-truth is used. But then different algorithms, whose performance is measured with respect to different ground-truths, are compared against each other and then ranked. This makes no sense. In fact it is nonsense. We show in this paper that there are also errors in how each ground-truth set was calculated. As a result, all performance rankings based on the Color Checker dataset - and there are scores of these - are ill-founded. In this paper, we re-generate a new 'recommended' set of ground-truth based on the calculation methodology described by Shi and Funt. We then review the performance evaluation of a range of illuminant estimation algorithms. Compared with the legacy ground-truths, we find that the difference in how algorithms perform can be large with many local rankings of algorithms being reversed. Finally, we draw the readers attention to our new 'open' data repository which, we hope, will allow the Color Checker set to be rehabilitated and, once again, to become a useful benchmark for illuminant estimation algorithms.


page 1

page 3


The Inconvenient Truths of Ground Truth for Binary Analysis

The effectiveness of binary analysis tools and techniques is often measu...

From Appearance to Essence: Comparing Truth Discovery Methods without Using Ground Truth

Truth discovery has been widely studied in recent years as a fundamental...

Estimation of Muscle Fascicle Orientation in Ultrasonic Images

We compare four different algorithms for automatically estimating the mu...

Shift If You Can: Counting and Visualising Correction Operations for Beat Tracking Evaluation

In this late-breaking abstract we propose a modified approach for beat t...

ECG Feature Importance Rankings: Cardiologists vs. Algorithms

Feature importance methods promise to provide a ranking of features acco...

On The Usage Of Average Hausdorff Distance For Segmentation Performance Assessment: Hidden Bias When Used For Ranking

Average Hausdorff Distance (AVD) is a widely used performance measure to...

A Framework for Evaluating Motion Segmentation Algorithms

There have been many proposals for algorithms segmenting human whole-bod...

Please sign up or login with your details

Forgot password? Click here to reset