Feature representations useful for predicting image memorability
Predicting image memorability has attracted interest in various fields. Consequently, prediction accuracy with convolutional neural network (CNN) models has been approaching the empirical upper bound estimated based on human consistency. However, identifying which feature representations embedded in CNN models are responsible for such high prediction accuracy of memorability remains an open question. To tackle this problem, this study sought to identify memorability-related feature representations in CNN models using brain similarity. Specifically, memorability prediction accuracy and brain similarity were examined and assessed by Brain-Score across 16,860 layers in 64 CNN models pretrained for object recognition. A clear tendency was shown in this comprehensive analysis that layers with high memorability prediction accuracy had higher brain similarity with the inferior temporal (IT) cortex, which is the highest stage in the ventral visual pathway. Furthermore, fine-tuning the 64 CNN models revealed that brain similarity with the IT cortex at the penultimate layer was positively correlated with memorability prediction accuracy. This analysis also showed that the best fine-tuned model provided accuracy comparable to the state-of-the-art CNN models developed specifically for memorability prediction. Overall, this study's results indicated that the CNN models' great success in predicting memorability relies on feature representation acquisition similar to the IT cortex. This study advanced our understanding of feature representations and its use for predicting image memorability.
READ FULL TEXT