Lintao Zheng

is this you? claim profile


  • Active Scene Understanding via Online Semantic Reconstruction

    We propose a novel approach to robot-operated active understanding of unknown indoor scenes, based on online RGBD reconstruction with semantic segmentation. In our method, the exploratory robot scanning is both driven by and targeting at the recognition and segmentation of semantic objects from the scene. Our algorithm is built on top of the volumetric depth fusion framework (e.g., KinectFusion) and performs real-time voxel-based semantic labeling over the online reconstructed volume. The robot is guided by an online estimated discrete viewing score field (VSF) parameterized over the 3D space of 2D location and azimuth rotation. VSF stores for each grid the score of the corresponding view, which measures how much it reduces the uncertainty (entropy) of both geometric reconstruction and semantic labeling. Based on VSF, we select the next best views (NBV) as the target for each time step. We then jointly optimize the traverse path and camera trajectory between two adjacent NBVs, through maximizing the integral viewing score (information gain) along path and trajectory. Through extensive evaluation, we show that our method achieves efficient and accurate online scene parsing during exploratory scanning.

    06/18/2019 ∙ by Lintao Zheng, et al. ∙ 6 share

    read it

  • Recurrent 3D Attentional Networks for End-to-End Active Object Recognition in Cluttered Scenes

    Active vision is inherently attention-driven: The agent selects views of observation to best approach the vision task while improving its internal representation of the scene being observed. Inspired by the recent success of attention-based models in 2D vision tasks based on single RGB images, we propose to address the multi-view depth-based active object recognition using attention mechanism, through developing an end-to-end recurrent 3D attentional network. The architecture comprises of a recurrent neural network (RNN), storing and updating an internal representation, and two levels of spatial transformer units, guiding two-level attentions. Our model, trained with a 3D shape database, is able to iteratively attend to the best views targeting an object of interest for recognizing it, and focus on the object in each view for removing the background clutter. To realize 3D view selection, we derive a 3D spatial transformer network which is differentiable for training with back-propagation, achieving must faster convergence than the reinforcement learning employed by most existing attention-based models. Experiments show that our method outperforms state-of-the-art methods in cluttered scenes.

    10/14/2016 ∙ by Min Liu, et al. ∙ 0 share

    read it

  • Shortest Paths in HSI Space for Color Texture Classification

    Color texture representation is an important step in the task of texture classification. Shortest paths was used to extract color texture features from RGB and HSV color spaces. In this paper, we propose to use shortest paths in the HSI space to build a texture representation for classification. In particular, two undirected graphs are used to model the H channel and the S and I channels respectively in order to represent a color texture image. Moreover, the shortest paths is constructed by using four pairs of pixels according to different scales and directions of the texture image. Experimental results on colored Brodatz and USPTex databases reveal that our proposed method is effective, and the highest classification accuracy rate is 96.93 Brodatz database.

    04/16/2019 ∙ by Mingxin Jin, et al. ∙ 0 share

    read it

  • VERAM: View-Enhanced Recurrent Attention Model for 3D Shape Classification

    Multi-view deep neural network is perhaps the most successful approach in 3D shape classification. However, the fusion of multi-view features based on max or average pooling lacks a view selection mechanism, limiting its application in, e.g., multi-view active object recognition by a robot. This paper presents VERAM, a recurrent attention model capable of actively selecting a sequence of views for highly accurate 3D shape classification. VERAM addresses an important issue commonly found in existing attention-based models, i.e., the unbalanced training of the subnetworks corresponding to next view estimation and shape classification. The classification subnetwork is easily overfitted while the view estimation one is usually poorly trained, leading to a suboptimal classification performance. This is surmounted by three essential view-enhancement strategies: 1) enhancing the information flow of gradient backpropagation for the view estimation subnetwork, 2) devising a highly informative reward function for the reinforcement training of view estimation and 3) formulating a novel loss function that explicitly circumvents view duplication. Taking grayscale image as input and AlexNet as CNN architecture, VERAM with 9 views achieves instance-level and class-level accuracy of 95:5 and 95:3 state-of-the-art performance under the same number of views.

    08/20/2018 ∙ by Songle Chen, et al. ∙ 0 share

    read it