SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling

by   Vijay Badrinarayanan, et al.

We propose a novel deep architecture, SegNet, for semantic pixel wise image labelling. SegNet has several attractive properties; (i) it only requires forward evaluation of a fully learnt function to obtain smooth label predictions, (ii) with increasing depth, a larger context is considered for pixel labelling which improves accuracy, and (iii) it is easy to visualise the effect of feature activation(s) in the pixel label space at any depth. SegNet is composed of a stack of encoders followed by a corresponding decoder stack which feeds into a soft-max classification layer. The decoders help map low resolution feature maps at the output of the encoder stack to full input image size feature maps. This addresses an important drawback of recent deep learning approaches which have adopted networks designed for object categorization for pixel wise labelling. These methods lack a mechanism to map deep layer feature maps to input dimensions. They resort to ad hoc methods to upsample features, e.g. by replication. This results in noisy predictions and also restricts the number of pooling layers in order to avoid too much upsampling and thus reduces spatial context. SegNet overcomes these problems by learning to map encoder outputs to image pixel labels. We test the performance of SegNet on outdoor RGB scenes from CamVid, KITTI and indoor scenes from the NYU dataset. Our results show that SegNet achieves state-of-the-art performance even without use of additional cues such as depth, video frames or post-processing with CRF models.


page 5

page 7

page 8


SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

We present a novel and practical deep fully convolutional neural network...

A Novel Upsampling and Context Convolution for Image Semantic Segmentation

Semantic segmentation, which refers to pixel-wise classification of an i...

Pixel-wise Attentional Gating for Parsimonious Pixel Labeling

To achieve parsimonious inference in per-pixel labeling tasks with a lim...

The Devil is in the Decoder

Many machine vision applications require predictions for every pixel of ...

Unsupervised Superpixel Generation using Edge-Sparse Embedding

Partitioning an image into superpixels based on the similarity of pixels...

Dynamic Approach for Lane Detection using Google Street View and CNN

Lane detection algorithms have been the key enablers for a fully-assisti...

TemplateNet for Depth-Based Object Instance Recognition

We present a novel deep architecture termed templateNet for depth based ...

Please sign up or login with your details

Forgot password? Click here to reset