Great changes have taken place in intelligent technology such as object detection in road networks . Despite the technological progress, the demands for driving safety, efficiency, and automated maintenance systems have also increased significantly . There are crucial challenges such as the capability to cope with temporary and sudden circumstances such as accidents and road construction . Among the numerous objects, traffic cones are needed to be recognized since they present spatio-temporal visual appearance periodicity and are constantly replaced and moved in the road network. Traffic cone is an impermanent sign of traffic redirection, cordon-off an area, road accidents and lane shift .
The present paper outlines a deep learning approach to effectively recognizing traffic cones in roadwork images collected from multiple sources . This application was implemented with YOLOv5 algorithm which is widely used for object detection problems , . We created a dataset of RGB roadwork images that were annotated by engineer experts within the framework of the HERON project . Traffic cone identification task can be addressed as an on-road object detection problem. The aim of our research is to broaden current studies of object detection issues and adapt them to the requirements of contemporary road network issues.
The aforementioned implementation can contribute to various studies that are related to traffic road efficiency and safety development. Firstly, it can support the observation of the pre/post-intervention phase including visual inspections in a roadwork project. Additionally, it can be beneficial in the intelligent transportation such as autonomous driving. In an automotive scenario, it can be used for avoiding obstacles while driving, predicting decisions and limitation of traffic accidents.
To this end, in this study we present a novel cone detection framework that includes: (i) formulating the identification as an object detection task, (ii) creating a roadwork image dataset, (iii) utilizing a YOLOv5 algorithm for object recognition. The remainder of this paper is organized as follows. Section 2 briefly presents object detection frameworks that are related into object detection for road networks. Section 3 describes the proposed architecture. Section 4 discusses the dataset description and the experimental results.
2 Related work
The literature presents various noteworthy attempts at studies that use road images for object detection methods with deep learning techniques, in order to confront road issues . Object detection methods can apply to different aspects of the above-mentioned issues. The work of  presents a single shot detection and classification of road users based on the real-time object detection system YOLO. This method is applied to the pre-processed radar range-Doppler-angle power spectrum. The study of 
suggests an on-road object detection using SSD which is a detection mechanism based on a deep neural network. In is proposed a novel deep learning anchor-free approach based on CenterNet for road object detection. The paper of  focuses on an object detection system called YOLO in order to enhance autonomous driving and other types of automation in transportation systems. Object detection is essential for automated driving and vehicle safety systems. For this purpose, the article 
compares five algorithms to inspect the contents of images, Region-based Fully Convolutional Network (R-FCN), Mask Region-based Convolutional Neural Networks (Mask R-CNN, Single Shot Multi-Box Detector (SSD), RetinaNet and YOLOv4.
Obstacle recognition on road images is another aspect of object detection. The work of  implemented an obstacle detection and avoidance driverless car using Convolutional Neural Networks. In the paper of  a deep learning system, using Faster Region-based convolutional neural network was employed for the detection and classification of on-road obstacles such as vehicles, pedestrians, and animals. Tsung-Ming Hsu et al. presented a deep learning model to mimic driving behaviors by learning the dynamic information of the vehicle along with image information in order to improve the performance of a self-driving vehicle. For the implementation of the model, they placed traffic cones on the road to collect the scene of avoiding obstacles .
Little work has been presented in the literature on cone detection with deep learning techniques. The work of  utilized a machine vision system with two monochrome cameras and two color cameras in order to recognize the color and position of traffic cones. Another approach is the study of , which presents an overview of object detection methods and used sensors and datasets in an autonomous driving application. 
focuses on the detection of a construction barrel, which includes a construction cone, a looper cone, a barricade, and four types of signs, via a collection of road images. Ankit Dhall et al. presented an accurate traffic cone detection and estimation of their position in the 3D world in real-time presents an implementation of a robust autonomous driving algorithm using the Viola-Jones object detection method for traffic cones recognition. The study of  proposes a lightweight neural network to perform cone detection from a racing car in order to research autonomous driving. Finally, the work of  presents a deep architecture called ChangeNet for detecting changes between pairs of images and expressing the same semantically. The dataset has 11 different classes of structural changes including traffic cones on road.
3 Proposed System Architecture
The presented system utilizes the roadwork image dataset to identify traffic cones. Each image was properly fed into the YOLOv5 algorithm. YOLO is an acronym for ’You only look once’ and is a target detection algorithm based on a regression algorithm that uses Neural Networks to provide real-time object detection. Its usefulness comes due to the fact that it completes the prediction of the classification and location information of the objects according to the calculation of the loss function, so it makes the target detection problem transform into a regression problem solution. This algorithm extracts the most advanced detection technologies available at the time and optimizes the implementation for best practice 
. In this implementation, we utilize YOLOv5, which holds the best performance among YOLO algorithms. It is based on the PyTorch framework and its functionality comes from the fact that it is a suitable lightweight detector that can balance detection accuracy and model complexity under the constraints of processing platforms with limited memory and computation resources.
The architecture of the model YOLOv5 consists of three parts: (i) Backbone: CSPDarknet, (ii) Neck: PANet, and (iii) Head: YOLO Layer. The data are initially input to CSPDarknet for feature extraction and subsequently fed to PANet for feature fusion. Lastly, the YOLO Layer outputs the object detection results (i.e., class, score, location, size). The architecture of the model can be seen in Fig.1.
4 Experimental Evaluation
4.1 Dataset description
The data used in this paper was collected and manually annotated under the framework of the H2020 HERON project . HERON aims to develop an integrated automated system to perform maintenance and upgrading roadworks tasks, such as sealing cracks, patching potholes, asphalt rejuvenation, autonomous replacement of CUD (removable urban pavement) elements and painting road markings, but also supporting the pre- and post-intervention phase including visual inspections and dispensing and removing traffic cones in an automated and controlled manner.
More specifically, to train and evaluate the deep learning object detector, a dataset that contains RGB images was collected and manually annotated using labelImg , which is a graphical image annotation tool. labelImg is written in Python and uses Qt for its graphical interface. The produced annotations are saved as .txt files that store the information of the annotated bounding boxes. In particular, for each RGB image (see Fig. 2a) a corresponding text file was generated (see Fig. 2b) that contains a number of rows equal to the number of the bounding boxes (i.e., traffic cones) in the specific image. As one can observe in Fig. 2b, each row consists of five numbers: (i) An integer number, starting at 0, that represents the class ID, which therefore in our case always equals 0, since the cone detection task is a single class problem; (ii) the horizontal coordinate x of the central pixel of the bounding box; (iii) the vertical coordinate y of the central pixel of the bounding box; (iv) the width w of the bounding box and (v) the height h of the bounding box. It is noted that the central position of the bounding box (ii-iii), as well its dimensions (iv-v) are real numbers on a scale of 0 to 1, and, therefore, represent the relative location and size of the bounding box with respect to the whole image.
The dataset contains RGB data from heterogeneous sources and sensors (e.g., DSLR cameras, smartphones, UAVs). Furthermore, the images vary in terms of illumination conditions (e.g., overexposure, underexposure), environmental landscapes (e.g., highways, bridges, cities, countrysides), and weather conditions (e.g., cold, hot, sunny, windy, cloudy, rainy, and snowy). In parallel, several images include various types of occlusions, thus making the traffic cone detection task more challenging.
The total number of RGB images in the dataset is 540 with various resolutions ranging from 114×170 to 2,100×1,400. It is underlined that the total number of traffic cones in the entire dataset is 947. Representative samples of the dataset are demonstrated in Fig. 3. From the images of the whole dataset, 92.5% were used for training the deep model, and 7.5% for testing its effectiveness. Among the training data, 80% of them were used for training and the remaining 20% for validation. The traffic cone detection dataset is made available online at: https://github.com/ikatsamenis/Cone-Detection/ (accessed date 8 May 2022).
4.2 Experimental setup - Model training
Hence, for the training process, we utilized 500 images, 400 of which were included in the train set and 100 in the validation set. It is noted that the training data should include images with non-labeled objects (i.e., empty .txt files) and in particular, the negative samples without bounded boxes should be equal to the positive images with objects. To this end, 50% of the data of both train and validation sets (i.e., 200 and 50 images respectively) are negative samples, while the rest contain at least one traffic cone. Lastly, it is underlined that to further generalize the learning process, we augmented the training data by horizontally flipping the corresponding images, thus increasing the train set size from 400 to 800.
The YOLO object detector was trained and evaluated using an NVIDIA Tesla K80 GPU with 12 GB of memory, provided by Google Colab. We trained the network, using batches of size 32, for 200 epochs, and set the input image resolution to 448×448 pixels. This work is based on the YOLOv5 small model in order to reduce the computational cost of the detection task. Towards this direction, the network takes up less than 15 MB of storage and thus can be easily embedded in smartphone applications and various low-memory digital devices or systems, including drones and microcontrollers.
4.3 Evaluation metrics
The Intersection over Union (IoU) metric was employed in evaluating the performance of the proposed method. IoU is the most popular evaluation metric used in the object detection benchmarks. In order to apply IoU, ground-truth bounding boxes and predicted bounding boxes from our model are needed. This metric is used to evaluate how close the predicted bounding boxes are to the ground-truth bounding boxes. The greater the region of overlap, the greater the IoU, and therefore the detection accuracy as shown in Fig. 4. Consequently, IoU is a number from 0 to 1 that specifies the size of the overlapping area between prediction and ground truth.
4.4 Experimental Validation
The proposed algorithm reached an excellent average IoU score of 91.31%5.42% with a confidence level of 95% over the data of the test set. Moreover, the network demonstrated an average prediction time of 0.065±0.029 seconds per image.
The experimental results using the YOLOv5 architecture are shown in Fig. 5. The first column corresponds to the original RGB images followed by their ground truth bounding boxes in the second column. Finally, the last column illustrates the predicted bounding boxes with their corresponding confidence scores.
In this paper, we presented and evaluated a YOLOv5 algorithm for traffic cone recognition over a multisource roadwork image dataset. The utilized technique uses a deep learning framework, identifying traffic cones as an object detection scenario. The model was able to achieve high scores and successfully managed the identification task. Future work should include the fusion of additional ephemeral objects that are correlated with road network maintenance and development.
This work has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under grant agreement No 955356 (Improved Robotic Platform to perform Maintenance and Upgrading Roadworks: The HERON Approach).
A case of study on traffic cone detection for autonomous racing on a jetson platform.
Iberian Conference on Pattern Recognition and Image Analysis, pp. 629–641. Cited by: §2.
-  (2019) A survey on 3d object detection methods for autonomous driving applications. IEEE Transactions on Intelligent Transportation Systems 20 (10), pp. 3782–3795. Cited by: §2.
-  (2019) Real-time 3d traffic cone detection for autonomous driving. In 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 494–501. Cited by: §1, §2.
-  (2021) Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430. Cited by: §3.
-  (2021) Road object detection: a comparative study of deep learning-based algorithms. Electronics 10 (16), pp. 1932. Cited by: §2.
End-to-end deep learning for autonomous longitudinal and lateral control based on vehicle dynamics.
Proceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality, pp. 111–114. Cited by: §2.
-  Ultralytics/yolov5: initial release External Links: Cited by: §1.
-  (2022) Robotic maintenance of road infrastructures: the heron project. arXiv preprint arXiv:2205.04164. Cited by: §1, §4.1.
-  (2022) Simultaneous precise localization and classification of metal rust defects for robotic-driven maintenance and prefabrication using residual attention u-net. Automation in Construction 137, pp. 104182. Cited by: §1.
-  (2020) Pixel-level corrosion detection on metal constructions by fusion of deep learning semantic and contour segmentation. In International Symposium on Visual Computing, pp. 160–169. Cited by: §2.
-  (2020) Man overboard event detection from rgb and thermal imagery: possibilities and limitations. In Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive Environments, pp. 1–6. Cited by: §1.
-  (2016) On-road object detection using deep neural network. In 2016 IEEE International Conference on Consumer Electronics-Asia (ICCE-Asia), pp. 1–4. Cited by: §2.
-  (2020) Detection of road objects with small appearance in images for autonomous driving in various traffic situations using a deep learning based approach. IEEE Access 8, pp. 211164–211172. Cited by: §2.
-  (2022) A two-stage industrial defect detection framework based on improved-yolov5 and optimized-inception-resnetv2 models. Applied Sciences 12 (2), pp. 834. Cited by: §3.
-  (2018) Object detection with neural models, deep learning and common sense to aid smart mobility. In 2018 IEEE 30th international conference on tools with artificial intelligence (ICTAI), pp. 859–863. Cited by: §2.
-  (2019) Deep learning radar object detection and classification for urban automotive scenarios. In 2019 Kleinheubach Conference, pp. 1–4. Cited by: §2.
-  (2017) Obstacle detection and classification using deep learning for tracking in high-speed autonomous driving. In 2017 IEEE region 10 symposium (TENSYMP), pp. 1–6. Cited by: §2.
-  (2020) Multi-label deep learning models for continuous monitoring of road infrastructures. In Proceedings of the 13th ACM International Conference on PErvasive Technologies Related to Assistive Environments, pp. 1–7. Cited by: §1.
-  (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 658–666. Cited by: §4.3.
-  (2020) Deep learning techniques for obstacle detection and avoidance in driverless cars. In 2020 International Conference on Artificial Intelligence and Signal Processing (AISP), pp. 1–4. Cited by: §2.
-  (2022) Temporary traffic control device detection for road construction projects using deep learning application. In Construction Research Congress (CRC), Arlington VA, Cited by: §2.
-  Tzutalin/labelimg. free software: mit license. available online:. Note: https://github.com/tzutalin/labelImgAccessed: 2022-05-05 Cited by: §4.1.
-  (2018) ChangeNet: a deep learning architecture for visual change detection. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 0–0. Cited by: §2.
-  (2018) Deep learning for computer vision: a brief review. Computational intelligence and neuroscience 2018. Cited by: §1.
-  (2020) Advanced driver-assistance system (adas) for intelligent transportation based on the recognition of traffic cones. Advances in Civil Engineering 2020. Cited by: §2.
-  (2022) Lite-yolov5: a lightweight deep learning detector for on-board ship detection in large-scene sentinel-1 sar images. Remote Sensing 14 (4), pp. 1018. Cited by: §3.
-  (2015) Real-time traffic cone detection for autonomous vehicle. In 2015 34th Chinese Control Conference (CCC), pp. 3718–3722. Cited by: §1.