Object detection can be improved using human-derived contextual expectations
Each object in the world occurs in a specific context: cars are seen on highways but not in forests. Contextual information is generally thought to facilitate computation by constraining locations to search. But can knowing context yield tangible benefits in object detection? For it to do so, scene context needs to be learned independently from target features. However this is impossible in traditional object detection where classifiers are trained on images containing both target features and surrounding coarse scene features. In contrast, we humans have the opportunity to learn context and target features separately, such as when we see highways without cars. Here we show for the first time that human-derived scene expectations can be used to improve object detection performance in machines. To measure these expectations, we asked human subjects to indicate the scale, location and likelihood at which targets may occur on scenes containing no targets. Humans showed highly systematic expectations that we could accurately predict using scene features. We then augmented state-of-the-art object detectors (based on deep neural networks) with these human-derived expectations on novel scenes to produce a significant (1-3 improvement was due to low-confidence detector matches being correctly relabeled as targets when they occurred in likely scenes.
READ FULL TEXT