Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing
Neuromorphic vision sensing (NVS) allows for significantly higher event sampling rates at substantially increased energy efficiency and robustness to illumination changes. However, feature representation for NVS is far behind the APS-based counterparts, resulting in lower performance in high-level computer vision tasks. To fully utilize the sparse and asynchronous nature, we propose a compact graph representation for NVS, which allows for end-to-end learning with graph convolution neural networks. We couple this with a novel end-to-end feature learning framework that accommodates both appearance-based and motion-based tasks. The core of framework comprises a spatial feature learning module, which utilizes our proposed residual-graph CNN (RG-CNN), for end-to-end learning of appearance-based features directly from graphs. We extend this with our proposed Graph2Grid block and temporal feature learning module for efficiently modelling temporal dependencies over multiple graphs and a long temporal extent. We show that performance of this framework generalizes to both object classification and action recognition, which, importantly, preserves the spatial and temporal coherence of spike events, while requiring less computation and memory. The experimental validation shows that our framework show that our proposed framework outperforms all recent methods on standard datasets. Finally, to address the absence of large real-world NVS datasets for complex recognition tasks, we introduce, evaluate and make available a 100k dataset of NVS recordings of the American Sign Language letters (ASL_DVS) acquired with an iniLabs DAVIS240c device under real-world conditions, as well as a neuromorphic action recognition dataset (UCF101_DVS and HMDB51_DVS) recorded from monitor.
READ FULL TEXT