Complete object contours extracted from an image contain relevant information about the shape of the photographed objects. They are used in many areas of computer vision, like object detection[1, 2], object recognition [3, 4] and object tracking [5, 6]. That’s why there has always been a great demand for methods that can extract object contours [7, 8, 9]
. Recently driven by the success of deep learning methods, the object contour detection has made great progress[10, 11, 12, 13]. This progress can also be found in the closely related task of semantic edge detection [14, 15, 16]
, in which the contours are also assigned to object classes such as person, car or dog. The outputs of all these detectors are so-called soft contour maps, and they provide a probability for each pixel to be an object contour. The mainly used post-processing method, the Non-Maximum Suppression (NMS), breaks the soft contour when thinning out, especially if the contour changes direction. Recently, the NMS has gotten integrated into the end-to-end training of deep networks for improvement , but we still see the properties of the NMS resulting in an unconnected object contour. A quick way to synthesise and analyse them with methods like Fourier descriptors (FDs) is missing, because these normally require complete contours [19, 20], with few exceptions and loss of performance . An easy utilization would be very useful because FDs or similar methods  are extensively researched and used for active contours , shape description , shape matching  or identification . There is work on contour completion [27, 28], but it should be even more effective not to let the problem arise and to use another post-processing method instead of the NMS. For this, we need to improve the soft contour map in such a way that a perfectly closed, 1 pixel wide and detailed binary object contour can be obtained by standard image processing tools in a final step. For the improvement, we develop a new contour tracing algorithm based on Convolutional Neural Network (CNN), because we assume that CNNs are tailor-made for this task. There are some line following approaches based on CNN or Artificial Neural Network (ANN) to support specific robotic tasks [29, 30], but overall, CNNs are used rarely for contour tracing. Most methods mainly do not make use of CNNs  and are often intended for binary images [32, 33, 34]. That’s why we present the Walk the Lines (WtL) algorithm, a standard regression CNN trained to follow object contours in RGB images. To take the first step, we train the WtL only on ship contours, but the principle is applicable to other objects.
The main advantage of our approach against the NMS is that our WtL contour is closed, connected and a bit more detailed, so that further processing can lead to a complete and detailed binary object contour.
Section 2 describes the new contour tracing algorithm and its training, Section 3 contains its application to generate a complete object contour, Section 4 will briefly summarize the image processing steps for the object contour binarization, Section 5 evaluates our results and Section 6 gives a conclusion.
Ii Walk the Lines algorithm
We propose our method to bridge the gap between promising object contour detector results and a final, highly detailed and closed binary contour. To elaborate details and simultaneously create a connected contour, it is designed as a contour tracing method.
Algorithm 1 describes our approach via pseudocode. The original image contains useful information, such as edges and orientations and is one part of the input. To follow the object contour, the predictions of an object contour detector are very helpful and therefore the other part of the input. Here, we use the recently published RefineContourNet (RCN)
. The image and the soft contour map are concatenated to an input tensor of size, where is the height, is the width of the image and channels are obtained through the concatination. Our contour tracer needs only small patches from this tensor to operate. To find and prepare these patches, a center pixel and a direction angle are defined. For initialization, the tracer is either placed on or directly next to the object contour. This starting pixel is selected by choosing a high probability value from the soft contour map and assigning the coordinates to the variable . The direction angle , describes the direction of the contour at this point. The is used to crop the image and the soft contour map at the correct location in which the CNN operates. The first cropping is done quite generously, so that enough outer lying pixels can be considered for the rotation calculation. Then, is used to rotate this tensor in such a way that the CNN input is always orientated more or less to the direction of the contour itself. This should be benefitting valid predictions, because it provides a similar view for the CNN on upcoming contour lines. A second cropping around the brings the tensor to the specified input size of . This size was chosen because it is approximately the area that allows to make meaningful interpretations without processing too much information. We assume that CNNs are very appropriate for contour tracing, because their convolutional filters are able to learn orientations. Their adaptability and generalization are also well suited for this task. Hence, we have chosen a standard regression CNN as centrepiece. Its architecture is visualized in Table I.
|Id.||Layers||Kernel Size||Output Size|
Conv1, BN, ReLU
|3||Conv2, BN, ReLU, Pool||3x3x64x128||6x6x128|
|4||Conv3, BN, ReLU, Pool||3x3x128x256||3x3x256|
|5||Conv4, BN, ReLU||3x3x256x512||3x3x512|
|6||Conv5, BN, ReLU||3x3x512x1024||1x1x1024|
An input layer feeds the CNN with the cropped and rotated input. Following layers, such as Convolution (Conv), Batch Norm (BN), Rectified Linear Unit (ReLU), Pooling (Pool) and Fully Connected (FC) estimate a final prediction of one single value. All convolutional layers have a stride of 1 and use padding to preserve the input resolution, except for the last convolution layer, which does not use padding. For pooling we used maximum pooling. The Mean Squared Error (MSE) loss is optimized by the standard Stochastic Gradient Descent (SGD) with momentum. The CNN output predicts the following contour course from thein clockwise direction and has values between to . With this result a new center pixel and a new input can be found and prepared. This input leads to another new direction estimation and this procedure can be repeated as required for steps. A possible course of the WtL algorithm is shown in Fig. 1 for a ship image.
In order to train a regression CNN to follow contours, training data with very detailed contours is required. Since we were not able to find anything in this level of detail, we have created our own dataset. To keep the required number of ground truths manageable and to test the principle first, we focus on one object category, here ship. We create 100 ground truths (gt) with help of the Pixel Annotation Tool  and call this internal collection Detailed Ship Contour (DSC) dataset. An example is given in Fig. 2. It shows that great importance is attached to antennas and ship superstructures. Its format is exactly the same as the image segmentation dataset from PASCAL VOC  and contains an unlabelled region around the ship segmentation mask. Here, however, this region is a line, which has a pixel width of 1 and is identical to the object contour of the ship. In the following, we describe in detail how we generate training labels from the DSC for the CNN. To have an orientation on the contour gt, we define that the tracer will move around the object in a clockwise direction. A random point on or directly besides the contour gt is chosen. Furthermore, the contour gt is followed by three pixels. This corresponds to the direction of the contour gt at this position, which is noted as the angle for the rotation. The reached pixel gets defined as the center pixel . This in turn leads to a generous crop of the stacked input, which consists of the image and its matching soft contour maps of the RCN. This is rotated according to the angle and cropped again for the fixed CNN input size of . Afterwards, the contour gt is followed by three pixels again, which gives us the coming change in direction on the contour itself and this angle is saved as the label value for the and the already saved corresponding crop. In total we create labels per image and use for the training and for validation.
Iii Object Contour Completion with WtL
The example in Fig. 1 shows that the tracer can already follow the object contour for a certain distance. But a single tracer has many possibilities to deviate from the correct object contour and so it is quite rare for a tracer to circle the entire object on its own. Therefore, we want to use many tracers at different image locations, which together should be able to draw a complete object contour.
The implementation of the contour completion is described via pseudocode in Algorithm 2. The input data is the same as for Algorithm 1. For initialization, the soft contour map is binarized. A relatively high threshold ensures that only regions that contain the actual object contour are taken into account. The resulting coarse object contour is thinned out and then fragmented by multiplying with a checkerboard pattern, which creates many small 1 pixel wide lines. The end points of these lines serve as starting center pixels for the WtL tracers. The starting direction angles can also be determined for every starting point. A list of
tracers is created and a loop runs as long as it contains at least one tracer. For every listed tracer the WtL is performed. To save computational time, the CNN is fed batchwise and returns a vectorwith all new direction angles. A pixelstep is chosen randomly to move the tracers 1, 2 or 3 pixels at once. This offers two advantages:
A higher pixelstep can represent more directions, as illustrated in Fig. 3. With a stepsize of 1, only 8 other pixels can be reached from the and therefore only eight directions are available. A step of 2 pixels can display 16 different directions. A pixelstep of 3 has 24 directions. Combined there are 32 different directions. This normally allows each tracer to follow the object contour in a smoother way.
Now, the tracers randomly take different paths, which leads to an increased robustness in the overall result, because previously, many tracers have taken the same wrong turn.
Further robustness is achieved when the pixelstep of 1 is used most often, the one with 2 pixels less often and 3 pixels quite rarely. The percentage of use for these three cases are , and . The reason for this is that a larger pixel step is less forgiving, because a wrong prediction moves the tracer further away from the contour. The chosen stepsize and the move the tracer to the new center pixel . If a tracer reaches a bad location it gets deleted from the list. A bad location is when a tracer leaves the image or if predictions of the soft contour map are too unlikely to be an object contour. We also delete it when the tracer is crossing its own old path, then we assume that it is looping or has walked successfully around the object. When no tracer is left, we repeat the whole procedure with the flipped image in order to trace in the anti-clockwise direction. The complete run takes several minutes per image, which results from the pixelwise tracing of the contour. Finally, we sum up all walked lines and return the WtL contour as shown in Fig. 4.
Iv Object Contour Binarization
The previously proposed WtL contour is a grayscale image and must now be converted into a binary, 1 pixel wide and detailed object contour. For this, a third procedure is implemented and described in Algorithm 3. To obtain as much detail as possible, the procedure searches for the highest threshold that closes the object contour. For this purpose, the WtL contour has to be opened at a location which is easy to interrupt. This means when it is certain that no other parts of the object contour will be modified. This is not the case, for instance, in image areas where the contour is not particularly straight. Since only ship images are used, we can use the waterline of the ship. A low threshold of the WtL contour is used to create a binary image, which contains broad contours. There, the longest line is detected, which is very often the waterline. This is done with the Hough transform . Now a vertical cut is done in the middle, by setting pixels on the cut to zero and copying the former values into a small column called . This is performed on the WtL contour and two pixels adjacent to this cut are chosen automatically. The maximum available value in this open WtL contour is defined as the start threshold . With this , a binary image of the open WtL contour is formed. This is checked in a while loop until the decreasing by allows a closed object contour. We assume that the object contour has been closed when the two separated pixels belong to one common segment again. Then the is added back to the WtL contour and a binary image gets created with the determined threshold. The resulting contour is too wide and has branches. A thinning and a cleaning lead to the targeted 1 pixel wide binary object contour, visible in Fig. 5. All algorithms are implemented with MATLAB  or MatConvNet .
V-a WtL contours
Figure 6 shows the soft object contour map of the RCN and its further processing with the NMS compared to our WtL. Some details are suppressed by the WtL, for instance the second flag at the lifeboat at row 3. Overall, however, more details are worked out. Well visible in the 4th row, the chimney of the passenger ship is closer to the shape of its original. For each individual image, it is visible that our algorithm, in contrast to the NMS, does not break the object contour, while, creates a connected one, instead. This is also true for the yacht in the top image, where the NMS separates the antenna from the ship, while it remains connected when using WtL. The NMS thins the contours to the desired width of 1 pixel, while the WtL sometimes creates wider or parallel contours, clearly visible at the waterlines, which require further processing.
V-B Binary WtL contours
For a comparison, we trained RefineNet (RN)  with parameters comparable to those used in RCN. This RefineNet is specialized on the Ship Scene Segmentation and has strong results on PASCAL val2012 with an Intersection over Union (IoU) of for ships, which is comparable with state-of-the-art deep learning segmentation methods. For evaluation, we run the algorithms on the remaining validation images from the DSC. With the closed contour, we can cut out a segmentation mask and compare it with the ship segmentation mask from the ground truth and calculate the Precision (P), Recall (R) and IoU for each image in Table II
. The advantages and disadvantages are visible here and we have sorted the images by the IoU. On the right are the images where the WtL did not work so well, i.e. where, for example, only a relatively low threshold produced a closed contour resulting in a bad IoU. Another difference between the approaches lies in the different distribution in precision and recall. While the RN has a higher precision and therefore has fewer false positives, the WtL achieves very high recall values, which shows that the WtL has fewer false negatives for some images. This is because the WtL tends to form an outer object contour as it circles the object, which encloses many fine details of the ship, but also false positives. The advantage of our proposed approach is visible on the left side. Here, the contour closing happens quickly by finding a relatively high threshold for that and these columns are marked grey-shaded in TableII, resulting in an outstanding IoU and a very detailed binary object contour. Because our DSC dataset remains internal, visual results are shown on publicly available ship images in Fig. 7. The images in the last three rows show three typical examples where the binary contour does not give satisfactory results:
"Wrong Lines" include unwanted pixels to the segmentation, visible at the stranded ship, where the stone in the foreground is wrongly segmented.
"Doubled Lines" appear, if the contour closing does not pass over the desired object contour. This results in a poor segmentation, visible at the warship.
"Open Lines" leads to a very low threshold, because a closed contour is found very late. The result is a high loss of detail, as can be seen on the passenger ship, the image at the bottom.
Extracting the binary object contour is not guaranteed, because it relies on a chain of previous procedures, such as an accurate RCN prediction or a successful object contour completion. In addition, assumptions are made for the binarization step which do not necessarily apply. This includes the assumption that the longest line is really the waterline of the ship. Therefore, we can only give an approximate range of - for the creation of an acceptable binary WtL contour, based on our experience and always depending on the individual image. The advantage of our method is only visible for those images on which the whole procedure works as desired. Then, excellent segmentations with a very high level of detail become possible, as visible in the upper four images in Fig. (d)d. Compared to the RN in Fig. (c)c, more details of the ship superstructures and thin antennas are visible in the ship segmentation.
The WtL shows that contour tracing is a new and unexplored application for CNNs. The object contour completion by WtL draws excellent object contour maps on which many specific details of the original object can be seen. Compared to the results from the NMS, the contour is a bit more detailed and wider. The main advantage, however, is that WtL contours are connected and can be converted into a closed binary contour without losing many details. The binarization is completely automatic, but so far it only works for a limited number of images. For these we produce excellent segmentations with very high IoUs and reveal details that are easily omitted, such as antennas and ship superstructures.
-  W. Bi, Y. Zhang, W. Huang, and G. Gao, “Salient contour matching for object detection,” in 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), vol. 01, Aug 2016, pp. 525–529.
J.-H. Lee, F. Hua, and J. W. Jang, “An improved object detection and contour
tracking algorithm based on local curvature,” in
Signal Processing, Image Processing and Pattern Recognition, D. Ślęzak, S. K. Pal, B.-H. Kang, J. Gu, H. Kuroda, and T.-h. Kim, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 25–32.
-  C. Phanikrishna and A. V. N. Reddy, “Contour tracking based knowledge extraction and object recognition using deep learning neural networks,” in 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), Oct 2016, pp. 352–354.
-  S. G. Salve and K. C. Jondhale, “Shape matching and object recognition using shape contexts,” in 2010 3rd International Conference on Computer Science and Information Technology, vol. 9, July 2010, pp. 471–474.
-  A. Yilmaz, Xin Li, and M. Shah, “Contour-based object tracking with occlusion handling in video acquired using mobile cameras,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 11, pp. 1531–1536, Nov 2004.
-  H. Rajabi and M. Nahvi, “Modified contour-based algorithm for multiple objects tracking and detection,” in ICCKE 2013, Oct 2013, pp. 235–239.
-  J. Zhan and B. Hu, “Salient object contour detection based on boundary similar region,” in 2012 Fourth International Conference on Digital Home, Nov 2012, pp. 335–339.
-  S. Kim and J. W. Jang, “An improved snake-based method for object contour detection,” in 2007 IEEE International Conference on Image Processing, vol. 1, Sep. 2007, pp. I – 249–I – 252.
-  H. Jiang, Y. Wu, and Z. Yuan, “Probabilistic salient object contour detection based on superpixels,” in 2013 IEEE International Conference on Image Processing, Sep. 2013, pp. 3069–3072.
-  J. Yang, B. Price, S. Cohen, H. Lee, and M. Yang, “Object contour detection with a fully convolutional encoder-decoder network,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 193–202.
-  A. Khoreva, R. Benenson, M. Omran, M. Hein, and B. Schiele, “Weakly supervised object boundaries,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016, pp. 183–192.
-  K. Maninis, J. Pont-Tuset, P. Arbeláez, and L. V. Gool, “Convolutional oriented boundaries: From image segmentation to high-level tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 40, no. 4, pp. 819 – 833, 2017.
-  A. P. Kelm, V. S. Rao, and U. Zölzer, “Object contour and edge detection with refinecontournet,” in Computer Analysis of Images and Patterns, M. Vento and G. Percannella, Eds. Cham: Springer International Publishing, 2019, pp. 246–258.
-  Z. Yu, C. Feng, M. Liu, and S. Ramalingam, “Casenet: Deep category-aware semantic edge detection,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, pp. 1761–1770.
-  Z. Yu, W. Liu, Y. Zou, C. Feng, S. Ramalingam, B. V. K. V. Kumar, and J. Kautz, “Simultaneous edge alignment and learning,” in European Conference on Computer Vision (ECCV), 2018.
-  D. Acuna, A. Kar, and S. Fidler, “Devil is in the edges: Learning semantic boundaries from noisy annotations,” 2019.
-  J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 6, pp. 679–698, Nov 1986.
Y. Hu, Y. Chen, X. Li, and J. Feng, “Dynamic feature fusion for semantic edge
detection,” in Proceedings of the Twenty-Eighth International Joint
Conference on Artificial Intelligence, IJCAI-19
. International Joint Conferences on Artificial Intelligence Organization, 7 2019, pp. 782–788. [Online]. Available:https://doi.org/10.24963/ijcai.2019/110
-  C. T. Zahn and R. Z. Roskies, “Fourier descriptors for plane closed curves,” IEEE Transactions on Computers, vol. C-21, no. 3, pp. 269–281, March 1972.
-  L. J. Latecki, R. Lakämper, and U. Eckhardt, “Shape descriptors for non-rigid shapes with a single closed contour,” Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662), vol. 1, pp. 424–429 vol.1, 2000.
-  J. Ding, W. Chao, J. Huang, and C. Kuo, “Asymmetric fourier descriptor of non-closed segments,” in 2010 IEEE International Conference on Image Processing, Sep. 2010, pp. 1613–1616.
-  L. Legrand, K. Khalil, and A. Dipanda, “Representing plane closed curves with hartley descriptors,” in Proceedings., International Conference on Image Processing, vol. 3, Oct 1995, pp. 344–347 vol.3.
-  T. Li, A. Krupa, and C. Collewet, “A robust parametric active contour based on fourier descriptors,” in 2011 18th IEEE International Conference on Image Processing, Sep. 2011, pp. 1037–1040.
-  E. Sokic and S. Konjicija, “Shape description using phase-preserving fourier descriptor,” in 2015 IEEE International Conference on Multimedia and Expo (ICME), June 2015, pp. 1–6.
-  I. Bartolini, P. Ciaccia, and M. Patella, “Warp: accurate retrieval of shapes using phase of fourier descriptors and time warping distance,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 1, pp. 142–147, Jan 2005.
-  Z. Liu, J. Watson, and A. Allen, “Efficient affine-invariant fourier descriptors for identification of marine plankton,” in OCEANS 2017 - Aberdeen, June 2017, pp. 1–9.
-  T. Guan, D. Zhou, K. Peng, and Y. Liu, “A novel contour closure method using ending point restrained gradient vector flow field,” J. Inf. Sci. Eng., vol. 31, pp. 43–58, 2015.
-  Y. Ming, H. Li, and X. He, “Connected contours: A new contour completion model that respects the closure effect,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp. 829–836.
R. B. Hellman, C. Tekin, M. van der Schaar, and V. J. Santos, “Functional contour-following via haptic perception and reinforcement learning,”IEEE Transactions on Haptics, vol. 11, no. 1, pp. 61–72, Jan 2018.
-  S. H. Rao, V. Kalaichelvi, and R. Karthikeyan, “Real-time tracing of a weld line using artificial neural networks,” in 2018 IEEE/ACIS 17th International Conference on Computer and Information Science (ICIS), June 2018, pp. 275–280.
-  P. Zamperoni, “Contour tracing of grey-scale images based on 2-d histograms,” Pattern Recognition, vol. 15, no. 3, pp. 161 – 165, 1982. [Online]. Available: http://www.sciencedirect.com/science/article/pii/003132038290067X
-  D. Sun and Y. Liu, “A new contour tracing algorithm in eight-connected binary images,” in 2010 Third International Joint Conference on Computational Science and Optimization, vol. 1, May 2010, pp. 249–253.
-  J. Seo, S. Chae, J. Shim, D. Kim, C. Cheong, and T.-D. Han, “Fast contour-tracing algorithm based on a pixel-following method for image sensors,” Sensors, vol. 16, no. 3, 2016. [Online]. Available: https://www.mdpi.com/1424-8220/16/3/353
-  S. Suzuki and K. be, “Topological structural analysis of digitized binary images by border following,” Computer Vision, Graphics, and Image Processing, vol. 30, no. 1, pp. 32 – 46, 1985. [Online]. Available: http://www.sciencedirect.com/science/article/pii/0734189X85900167
-  G. Durr, A. Buchanan, E. McLean, and T. Park, “Boat images.” [Online]. Available: https://unsplash.com
-  A. Bréhéret, “Pixel annotation tool.” [Online]. Available: https://github.com/abreheret/PixelAnnotationTool
-  M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results.” [Online]. Available: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
-  R. O. Duda and P. E. Hart, “Use of the hough transformation to detect lines and curves in pictures.” Commun. ACM, vol. 15, no. 1, pp. 11–15, 1972.
-  MATLAB, version 22.214.171.1241655 (R2016b). Natick, Massachusetts: The MathWorks Inc., 2016.
-  A. Vedaldi and K. Lenc, “Matconvnet – convolutional neural networks for matlab,” in Proceeding of the ACM Int. Conf. on Multimedia, 2015.
-  G. Lin, A. Milan, C. Shen, and I. Reid, “Refinenet: Multi-path refinement networks for high-resolution semantic segmentation,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017, pp. 5168–5177.
-  FreeImages, “Boat images.” [Online]. Available: https://de.freeimages.com