Learning a Saliency Evaluation Metric Using Crowdsourced Perceptual Judgments
In the area of human fixation prediction, dozens of computational saliency models are proposed to reveal certain saliency characteristics under different assumptions and definitions. As a result, saliency model benchmarking often requires several evaluation metrics to simultaneously assess saliency models from multiple perspectives. However, most computational metrics are not designed to directly measure the perceptual similarity of saliency maps so that the evaluation results may be sometimes inconsistent with the subjective impression. To address this problem, this paper first conducts extensive subjective tests to find out how the visual similarities between saliency maps are perceived by humans. Based on the crowdsourced data collected in these tests, we conclude several key factors in assessing saliency maps and quantize the performance of existing metrics. Inspired by these factors, we propose to learn a saliency evaluation metric based on a two-stream convolutional neural network using crowdsourced perceptual judgements. Specifically, the relative saliency score of each pair from the crowdsourced data is utilized to regularize the network during the training process. By capturing the key factors shared by various subjects in comparing saliency maps, the learned metric better aligns with human perception of saliency maps, making it a good complement to the existing metrics. Experimental results validate that the learned metric can be generalized to the comparisons of saliency maps from new images, new datasets, new models and synthetic data. Due to the effectiveness of the learned metric, it also can be used to facilitate the development of new models for fixation prediction.
READ FULL TEXT