An Accurate and Real-time Self-blast Glass Insulator Location Method Based On Faster R-CNN and U-net with Aerial Images

01/16/2018 ∙ by Zenan Ling, et al. ∙ Shanghai Jiao Tong University NetEase, Inc 0

The location of broken insulators in aerial images is a challenging task. This paper, focusing on the self-blast glass insulator, proposes a deep learning solution. We address the broken insulators location problem as a low signal-noise-ratio image location framework with two modules: 1) object detection based on Fast R-CNN, and 2) classification of pixels based on U-net. A diverse aerial image set of some grid in China is tested to validated the proposed approach. Furthermore, a comparison is made among different methods and the result shows that our approach is accurate and real-time.



There are no comments yet.


page 1

page 3

page 4

page 5

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a Background

Insulators are widely used in high voltage transmission lines and play a significant role in the electrical insulation and conductor conjunction. The insulators fault, e.g., glass insulators self-blast, poses a grave threat to power systems as it could cause a cascade failure. It is of great safety risk to conduct periodical manual inception of insulators under extremely high voltage conditions, as a result that thousands of insulators are deployed on the transmission lines, which are usually long in distance and high in altitude. The common approach is to capture aerial images (see Fig. 5) of insulators by manned helicopters or Unmanned Aerial Vehicles (UAV). As the humongous aerial images are more and more easier-accessible, an accurate and real-time broken insulators location method is in urgent need.

Fig. 5: The aerial images captured by helicopters or UAV.

General speaking, following factors, as shown in Fig. 8, present a challenge to the broken insulator location problem:

  1. Complicated background: the backgrounds of aerial images often include various scenes such as forests, rivers, farmlands and so on.

  2. Dynamic view changing: sometimes the camera may be not brought into focus, causing non-distinctive feature even to human beings.

  3. Low signal-noise-ratio (SNR): compared to the whole image, quite a few pixels contain the broken insulator information. In some extreme cases, the broken position can hardly be located via human eye identification.

Fig. 8: The self-blast glass insulators: the broken parts are labelled by red circle.

In the last decade, deep learning has achieved remarkable development especially in computer vision. AlexNet 

[1], proposed by Hinton et al., leads to a new wave of deep learning. Along with the invention of the new network architectures such as VGG [2], InceptionNet [3], ResNet [4], deep learning beats the classical methods in various image classification competitions. Furthermore, Faster R-CNN [5], proposed by Ren et al., realizes trained end-to-end and real-time object detection with high accuracy. Besides, fully convolutional network (FCN), proposed by [6], produces accurate and detailed pixel-to-pixel segmentations and improves the previous best results in semantic segmentations.

I-B Related Work

Recently, much work has been done on the insulator detection and status classification. Zhao et al. [7]

proposes a deep CNN model with multi-patch feature extraction and a Support Vector Machine (SVM) for insulator status classification. Liao et al. 

[8] proposes a robust insulator detection algorithm based on local features and spatial orders for aerial images. This feature extraction method is model-based and able to locate the insulator string. Wu et al. [9] uses the global minimization active contour model (GMAC) for insulator segmentation. Reddy et al.[10] uses SVM and Discrete Orthogonal S Transform (DOST) to carry out condition monitoring of insulators in a complex background. This method needs to transform the raw image to complex frequency-domain which may cause extra computation complexity.

Little work, however, has been done about the broken insulator location based on deep learning, which is the state of art architectures for computer vision. Broken insulator location for aerial images is a typical computer vision problem.

I-C Contributions of Our Paper

This paper focuses on the self-blast glass insulator location problem. The contributions are concluded as follows:

  1. The self-blast glass insulator location problem is formulated into two computer vision problems: object detection and semantic segmentation.

  2. Two state-of-art deep learning architectures are introduced: Faster R-CNN [5] and U-net [11]. To our best knowledge, there is no similar work reported in this domain.

  3. The object detection module realizes the insulator string location as a by-product of our method.

  4. The proposed method is accurate and real-time.

The rest of this paper is organized as follows. Section II proposes the self-blast glass insulator location method. Section III and IV briefly introduce the architectures of Faster R-CNN and U-net. Section V gives the location results in our experiment, and makes a comparison with different methods. Section VI concludes the paper.

Ii Self-Blast Glass Insulator Location Method

Ii-a Self-blast Glass Insulator Location Using Aerial Images

The challenges in broken insulator location, mentioned in I-A, drive us to focus on following three aspects:

Ii-A1 The efficient feature extraction of the aerial images

We choose CNN based methods because they are proved to preform much better than other methods in multiple image tasks. If the CNN model is trained directly, however, it is hard to achieve desired results as the limitation of the sample numbers. Borrowing the idea of transfer learning, we adopt the convolutional layers pre-trained on large-scaled image libraries.

Ii-A2 The enhancement of SNR

It is natural to crop the parts containing the insulator strings to improve SNR. Thus, the problem is transformed into the location of insulator strings. Faster R-CNN is suitable to this task. It has the state-of-art performance in object detection or location tasks; moreover, the pre-trained convolutional layers are adopted in its architecture (see Section III-A1).

Ii-A3 The judgement of insulators’ states

Based on the above analysis, the cropped images may have different sizes because of the sizes of the insulator strings, as well as their shooting angles, are different. Standard CNN architectures are not able to processing images of arbitrary sizes. As a result, U-net, a specific FCN, is introduced to handle this problem. Besides, U-net is widely used in semantic segmentation tasks especially in biomedical image segmentation [11], which is often a low SNR task. Taking account of above mentioned reasons, U-net is adopted as a part of our method to conduct the binary classification of the pixels in the aerial images.

Based on the above analysis, the self-blast glass insulator location method is designed. The proposed method contains two modules: 1) the object detection module based on Faster R-CNN, and 2) the segmentation module based on U-net. The procedure is shown as Fig. 9.

[scale=0.5 ]procedure

Fig. 9: Overall flow chart of the proposed method

Firstly, Faster R-CNN is adopted to locate the insulator stings. The insulator strings, in the form of rectangular boxes, are cropped from the raw image, and the other parts are dropped. In this way, the redundant information in the aerial images are thrown away to enhance SNR.

Secondly, take the cropped images as the input of U-net, and the pixel binary classification results as the output—for broken parts, the pixels are labelled as false and for normal parts as true.

The final output of the proposed method contains two parts: the locations of the insulator strings, and the coordinates of the broken insulators.

Ii-B Real-time Performance of Proposed Method

On the one hand, Faster R-CNN enables a unified, deep-learning-based object detection system to run at near real-time frame rates through making region proposal network (RPN) sharing convolutional features with the down-stream detection network.

On the other hand, there is no fully connected layer contained in the architecture of U-net so that the number of parameters are reduced dramatically. In practice, the classification of a single image costs extremely little time based on U-net.

Besides, the proposed method can be easily expanded into a distributed system by using Graphics Processing Units (GPUs). Multiple images can be processed simultaneously provided that sufficient GPUs are available.

For the integrity of this paper, a brief introduction of Faster R-CNN and U-net is given in the next two sections.

Iii Insulator String Location Method: Faster R-Cnn

Iii-a Structure of Faster R-CNN

Faster R-CNN is composed of two networks: RPN and Fast R-CNN. RPN is a fully convolution network for generating the proposal regions, and Fast R-CNN conducts detection and classification based on the proposal regions. The architecture of Faster R-CNN is illustrated in Fig. 10.

[scale=0.28 ]faster_rcnn

Fig. 10: The network structure of Faster R-CNN

Iii-A1 Feature Extraction

As shown in Fig.10

, efficient and rich features of the aerial images are extracted by a commonly used CNN model pre-trained on ImageNet such as Resnet, VGG. The pre-trained convolutional layers are shared by RPN and Fast R-CNN.

Iii-A2 Rpn

RPN is the core part of Faster R-CNN. As shown in Fig. 11. RPN aims to identify all possible candidate boxes. 3*3 convolution kernels connected to the last shared convolutional layer are used to generate a feature map of 256 channels. Sliding window method is conducted over the feature map, and a 256 dimension feature vector is obtained at each location of the feature map. Within each sliding-window, the region proposals are predicted simultaneously. The proposals are parameterized relative to reference boxes which are called ”anchors”. Nine anchors, composed of 3 different sizes and 3 different scales, are adopted to improve the accuracy of proposal regions. The candidate boxes are relatively sparse due to the subsequent location refinement procedure.

[scale=0.5 ]RPN

Fig. 11: The network structure of RPN

Iii-A3 Fast R-CNN

The proposal region generated from RPN is detected by Fast-RCNN network. Reshape the feature map as a high dimensional feature vector. This feature layer is fully connected with another feature layer of the same length. The following parameters are predicted: the probability value of that the candidate box belongs to each class; the parameters for the bounding box’s translation and scaling.

Iv Self-Blast Insulator Location Method: U-Net

U-net is a kind of FCN architecture. There are two core operations contained in FCN: 1) up sampling and 2) skip layers. FCN uses up sampling to ensure that the output has the same size as the input image. However, the feature map obtained by direct up sampling often leads to very rough segmentation results. Thus, another structure skip layers is adopted to overcome this problem. Skip layers method conducts up sampling for the outputs of different pooling layers, and refines the results by feature fusion such as copying and cropping.

Iv-a The Structure of U-net

[scale=0.29 ]unet

Fig. 12: The network structure of U-net

The structure shown in Fig.12 and its shape is like U. Note that the input of U-net can be an image of arbitrary size, e.g. 572*572 in Fig. 12.

U-net is composed of the compression and recovery layers. The compression layers is a classical VGGNet-16 [2]

. The structure of two 3*3 convolution kernels and a 2*2 max-pooling layer are adopted repeatedly and more high-level features are extracted layer-by-layer using the down sampling effect of the pooling layer. The structure of the recovery layers is totally different from that of VGGNet-16. The structure of a 2*2 deconvolution kernel and two 3*3 convolution kernels are adopted repeatedly.

Finally, by a 1*1 convolutional kernel, the dimension of the feature map is mapped to 2. And softmax is used for classification in the output layer.

A remarkable detail is that the input of the pooling layer must be a feature map with even height and width. Thus the hyper-parameter needs to be set carefully.

V Experiment

In this section, we present our results on the aerial images provided by China South Grid. Firstly, the data set, computation source, the preprocessing method and the construction of training and test sets are introduced. Secondly, details of the training and testing of two network architecture are reported. The accuracy and speed of the proposed method is evaluated and compared with other methods. Parts of the results of the proposed method are presented by images. Besides, the advantage of the combination of the two models and the influence of the number of samples are also discussed.

V-a Experiment Description

V-A1 Data and Computation Source

The experiment data set is 620 high resolution images collected by helicopters. The data set consists of 400 positive samples and 220 negative samples (see Fig. 15

). The size of raw images are 7360*4912 or 4512*3008. All the experiments are conducted on a computer with Intel Core i7-7700 (3.60GHz), 32GB RAM and two NVIDIA GTX 1080 with GPU memory of 8 GB. Our implementation uses TensorFlow 1.2.0 


(a) a positive sample
(b) a negative sample
Fig. 15: The examples of training samples

V-A2 Data Preprocessing

Due to the limitation of computation resource, the raw images are resized to 1024*1024. Note that no other preprocessing operation like intercepting target subjects which is often adopted by other methods is conducted in our experiment.

V-A3 Construction of Training Sets and Test Sets

Due to the number of sample is not large enough, 3-fold cross-validation i.e. 412 images for training and 208 images for testing is carried out in the experiment for insulator strings location. In the experiment for classification, the 4-fold cross-validation is conducted on the cropped images. The insulator string is labelled with ”insulator” by a box which is determined by four coordinates. Note that only glass insulators in the images are labelled. The pixels that belong to broken insulators are labelled with ”break” and the others are labelled with ”normal”.

V-A4 Evaluation Method

The performance of the proposed method is evaluated by the values of recall and precision defined as follows:


where is the number of successfully located targets. is the total number of located objects, and is the total number of actual targets.

V-B The Insulator String Location

The location of glass insulator strings is conducted through Faster R-CNN. The best performance of Faster R-CNN is obtained with the pre-trained model inception-resnet-v2 [13]. The training of Faster R-CNN is carried out using Momentum method [14]. The parameter settings are shown in Tab. I and the training loss is plotted in Fig. 16.

Parameter name Value
batch size 2
 max step 23000
dropout keep probability 1.0
the score threshold for NMS 0.0
the IoU threshold for NMS 0.7
the initial learning rate 0.0003
the momentum value 0.9
TABLE I: The parameter settings of Faster R-CNN

Fig. 21 shows two location results. The detected insulator strings are bounded by red boxes and the scores and class name are shown above these boxes. As shown in Fig. 21, the location method is fairly effective in detecting glass insulator strings under a complex scene. If the overlapping portion of the result and the ground truth is greater than 0.95, we regard that the detection is successful. The value of precision is 0.966 and the value of recall is 0.971. Besides, the average time cost on a single image is 793 ms. These results suggest that the method enables the real-time location of insulator strings with high accuracy.


Fig. 16: The training loss of Faster R-CNN
Fig. 21: Four Insulator String Location Results. Each detected insulator string is bounded by a red box. Each output box is associated with a category label and a softmax score in .

We also test other three widely used object detection architectures on our data set: single shot multibox detector (SSD) [15], region-based fully convolutional networks (RFCN) [16] and discriminatively trained part-based models (DPM) [17]. The comparison of different detection architectures with different pre-trained CNN models is presented in Tab. II.

Model name Pre-trained model Precision Recall Speed (ms)
inception-resnet-v2 0.966 0.971 793
Faster resnet-101 0.96 0.96 236
R-CNN resnet-50 0.948 0.942 120
inception-v2 0.947 0.931 69
vgg-16 0.916 0.922 25
inception-resnet-v2 0.953 0.942 403
resnet-101 0.947 0.925 111
SSD resnet-50 0.941 0.914 75
inception-v2 0.941 0.908 42
vgg-16 0.882 0.857 15
inception-resnet-v2 0.953 0.942 467
resnet-101 0.947 0.941 178
RFCN resnet-50 0.931 0.936 96
inception-v2 0.924 0.910 60
vgg-16 0.909 0.857 19
DPM 0.771 0.731 900
TABLE II: Performance of different object detection models

General speaking, DPM without the pre-trained model has a worst performance. The trade-off between speed and accuracy is obvious. Comparing to Faster R-CNN, SSD and RFCN are faster in calculation and lower in accuracy. For the reason that the time cost are more tolerable (793 ms), we choose Faster R-CNN with inception-resnet-v2 to purchase a better accuracy.

V-C Self-Blast Insulator Location Results

Firstly, the images are cropped according to the bounding box as the input of U-net. The architecture used in our experiment is shown as Fig. 12. Note that batch normalization

layers are added after the convolutional layers to avoid vanishing gradient problem and to accelerate deep network training 

[18]. Meanwhile, neither the drop out nor weight regularization is adopted. The optimization algorithm adopted in this experiment is Adam [19]. In Adam, the parameters , , are set to 1e-8, 0.9, 0.999 respectively (as recommended in [19]). The batch size is set to 1 and the initial learning rate is set to 1e-3. The data augmentation methods such as rotation and flip are carried out. As shown in Fig. 22, the training loss converges after about 12000 steps.


Fig. 22: The training loss of U-net

Four classification results are shown in Fig. 31. The pixels of failure position is covered by semitransparent purple masks. The value of precision is 0.951 and the value of recall is 0.955. The time cost of U-net on a single image is almost negligible. This is a fairly ideal result for the broken glass insulator location.

Fig. 31: The self-blast insulator location results. The magnified images of corresponding self-blast glass insulators are shown in the second line for readers’ convenience.

V-D Advantage of the Combination of Faster R-CNN and U-net

To validate the advantage of the combination of these two architectures, the self-blast insulator location is carried out with Faster-RCNN or U-net alone as follows.

The values of the criteria are shown in Tab. III. U-net alone performs bad, and Faster R-CNN alone performs better but still far away from the proposed method.

Method Precision Recall
U-net 0.567 0.627
Faster R-CNN 0.882 0.896
Faster R-CNN plus U-net 0.951 0.955

The precision and recall of different methods

As shown in Fig.35, an intuitive comparison is presented among the three methods. Fig.(a)a shows that U-net fails to locate the broken insulators. Fig.(b)b shows that Faster R-CNN locates the self-blast insulator successfully, however, it seems like that the redundant information interfere with Faster R-CNN and thus an extra false result is obtained. This indicates the importance of the enhancement of SNR. Fig.(c)c shows that the proposed method, which combines Faster R-CNN and U-net, has a good performance.

(a) The result based on U-net alone
(b) The result based on Faster-RCNN alone
(c) The result based on the proposed method
Fig. 35: The comparison of three methods

V-E Influence of the Number of Training Samples

In the end of this section, the proposed method is evaluated on the aspect of training sample numbers. We set the samples numbers from 100 to 620, and keep the proportion of positive and negative samples about 2:1. The result is shown in Fig. 36.

Fig. 36 shows that the method performs better as the training sample number increase. It means that the method will achieve more accurate results if more training sample are available.


Fig. 36: The criteria with different numbers of training samples

Vi Conclusion

The methods for aerial images is a bottleneck for broken insulator fault location. This paper proposed a novel self-blast glass insulator location methods based on the deep learning architectures for aerial images. The self-blast glass insulator location problem is formulated into the location and segmentation problem in computer vision. Two state-of-art CNN based models, Faster R-CNN and U-net, are introduced. The former is responsible for the location of glass insulator strings so that SNR is enhanced greatly; and the latter enables the precise classification of the pixels in the cropped images of different sizes. The advantages of these two architectures are combined naturally. According to a large number of experimental comparisons, Faster R-CNN with inception-resnet-v2 pre-trained model is chosen as the optimal object detection architecture so that the best performance of accuracy is obtained while the real-time detection is realized: the value of precision is 0.951, the value of recall is 0.955, and the time cost of a single image is about 800 ms. The experiment results also indicate that the proposed method performs better as the training sample number increases. To reproduce our method, the details of our experiments such as the preprocessing operation, the method of image labelling and the parameter settings is clearly elaborated. Besides, the proposed method can be viewed as a general framework of broken insulator location using aerial images. It is easy to transfer our method to other similar detection problem in this domain.

It is necessary to point out that there is still some room left for our proposed method. In the method, the training and reference of the two networks are dependent—it is not essentially an end-to-end learning method. Currently, Mask R-CNN [20] is proposed and it combines the detection and segmentation in its architecture by adding FCN with Faster R-CNN. Besides, the results of  [21] indicates that the pixel label may not be necessary and the coordinates of bounding boxes is enough for Mask R-CNN. In practice, this method will dramatically reduce the cost of time and labor on labelling for pixels. Thus, we will improve our method based on Mask R-CNN in the future work.


  • [1]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,”

    Communications of the Acm, vol. 60, no. 2, p. 2012, 2012.
  • [2] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Computer Science, 2014.
  • [3] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” pp. 1–9, 2014.
  • [4] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” pp. 770–778, 2015.
  • [5] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards real-time object detection with region proposal networks,” in International Conference on Neural Information Processing Systems, 2015, pp. 91–99.
  • [6] E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 4, pp. 640–651, 2017.
  • [7]

    Z. Zhao, G. Xu, Y. Qi, N. Liu, and T. Zhang, “Multi-patch deep features for power line insulator status classification from aerial images,” in

    International Joint Conference on Neural Networks, 2016, pp. 3187–3194.
  • [8] S. Liao and J. An, “A robust insulator detection algorithm based on local features and spatial orders for aerial images,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 5, pp. 963–967, 2017.
  • [9] Q. Wu, J. An, and B. Lin, “A texture segmentation algorithm based on pca and global minimization active contour model for aerial insulator images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 5, pp. 1509–1518, 2012.
  • [10] M. J. B. Reddy, C. B. Karthik, and D. K. Mohanta, “Condition monitoring of 11 kv distribution system insulators incorporating complex imagery using combined dost-svm approach,” IEEE Transactions on Dielectrics and Electrical Insulation, vol. 20, no. 2, pp. 664–674, 2013.
  • [11] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2015, pp. 234–241.
  • [12]

    M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng, “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015, software available from [Online]. Available:
  • [13]

    C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, inception-resnet and the impact of residual connections on learning,” 2016.

  • [14] I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in International Conference on International Conference on Machine Learning, 2013, pp. III–1139.
  • [15] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” pp. 21–37, 2015.
  • [16] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object detection via region-based fully convolutional networks,” 2016.
  • [17] C. Szegedy, A. Toshev, and D. Erhan, “Deep neural networks for object detection,” Advances in Neural Information Processing Systems, vol. 26, pp. 2553–2561, 2013.
  • [18] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” pp. 448–456, 2015.
  • [19] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” Computer Science, 2014.
  • [20] K. He, G. Gkioxari, P. Doll r, and R. Girshick, “Mask r-cnn,” 2017.
  • [21] R. Hu, P. Dollár, K. He, T. Darrell, and R. Girshick, “Learning to segment every thing,” arXiv preprint arXiv:1711.10370, 2017.