CAR-Net: Clairvoyant Attentive Recurrent Network
We present an interpretable framework for path prediction that learns scene-specific causations behind agents' behaviors. We exploit two sources of information: the past motion trajectory of the agent of interest and a wide top-down view of the scene. We propose a Clairvoyant Attentive Recurrent Network (CAR-Net) that learns "where to look" in the large image when solving the path prediction task. While previous works on trajectory prediction are constrained to either use semantic information or hand-crafted regions centered around the agent, our method has the capacity to select any region within the image, e.g., a far-away curve when predicting the change of speed of vehicles. To study our goal towards learning observable causality behind agents' behaviors, we have built a new dataset made of top view images of hundreds of scenes (e.g., F1 racing circuits) where the vehicles are governed by known specific regions within the images (e.g., upcoming curves). Our algorithm successfully selects these regions, learns navigation patterns that generalize to unseen maps, outperforms previous works in terms of prediction accuracy on publicly available datasets, and provides human-interpretable static scene-specific dependencies.
READ FULL TEXT