Line following robots have a variety of use cases in education, entertainment, health care, factory/warehouse settings, and more [19, 4, 1, 8, 17]. However, the effectiveness of these types of mobile robots in realistic deployments is dependent upon their ability to negotiate environments in which obstacles are dynamic, trajectories are occluded, and lighting conditions vary. Existing control methods for robot line following utilize knowledge-based approaches that rely upon structured environments. While effective in simplistic scenarios, these methods are severely limited by their surroundings. Moreover, they do not generalize for real-world use due to the inherent uncertainty in states encountered during operation.
Analog sensors have historically been the most common approach to provide control for a line following robot. These sensors can provide inference in high-contrast binary environments, but often require dedicated circuits and fail when encountering discontinuous trajectories. Vision-based approaches such as sensor array matrices use the location of image pixel values to generate pulse-width modulation outputs for motor control. However, the assumption that information will be found in a certain region of interest is not always true. In comparison, a neural network can be trained to produce control outputs based on noisy images where information is contained in different areas and trajectories are discontinuous.
Neural network-based methods for line following often discretize the output space into steering commands by using classification models that only allow for lateral control. Although this enables the robot to steer correctly, it does not make use of the trajectory information to provide control longitudinally. In addition, discretizing the output action space results in a loss of precision. A continuous output space over both lateral and longitudinal velocities allows for reactive control based on the steepness of the trajectory, hence mimicking human behavior.
To address these issues, we first propose a decision tree thresholding algorithm that can adapt to active and cluttered environments using a minimal amount of surveyed data. Next, we design and compare the performance of two neural network architectures: a multilayer perceptron (MLP) and a 1D convolutional neural network (CNN). Our networks produce continuous output over both linear and angular velocities using a regression-based model. Finally, we demonstrate the usability of the system through the comparison of decisions made by the neural networks against those of a human operator with a line following robot, Figure1.
2 Related Work
Line following robots have traditionally used comparator circuits with analog sensors to detect the presence of a trajectory. For example, Punetha et al. implement an array of light dependent resistors and IR proximity sensors to follow a line in a binary environment with sharp contrast differences . Their approach works in controlled environments, yet real-world conditions usually do not offer such simple scenarios and trajectories may not always be continuous.
Ismail et al. take a vision-based approach to line following . The authors generate proportional pulse-width modulated outputs directly from images using a score calculated from a sensor array matrix under the assumption that pixels lie in the center of the image. However, if there are pixels denoting a straight line towards the periphery of the image the system will fail to predict accurate control outputs. Similarly, Rahman et al. propose a line following vision system assuming that objects in the environment are static . Although this approach may work in constrained environments, the expectation of a static environment contradicts most real world scenarios.
Pomerleau uses a camera feed and laser rangefinder measurements with a single-layer neural network to a steer vehicle . The activations of the network’s input layer are proportional to the blue channel of the image. Nevertheless, the activations of the network are not dependent on color since the implementation discards this information before passing the input to the network. In another implementation using a CNN for line following, Borne and Lowrance discretize the output space to produce steering commands 
. Likewise, Tai et al. implement a CNN with fully connected end layers activated using a softmax function to predict probabilities of members in a discrete steering action space multiplied with preset velocities.
In this section, we describe our approach to obtain visual trajectories in dynamic environments from a streaming camera feed. We use these trajectories with two different neural networks architectures to provide acceleration and steering control for a mobile robot over a continuous space in real time. The performance of the networks is compared to human teleoperation of a line following robot.
3.1 Image Preprocessing
To accurately segment the trajectories from noisy backgrounds, obstacles that move even after the field of view is restricted, and be tolerant to fluctuating lighting conditions, a HSV (hue, saturation, and value) threshold was trained. Since the robot may follow different colored lines, individual HSV thresholds must be learned. First, the robot’s operational environment was surveyed and approximately 20 images were collected for a single color of track in diverse conditions (e.g. under various lighting conditions, backgrounds, objects in the frame, etc.).
After collecting the data, each pixel was labeled as either line or non-line. The labeled data was then passed to decision tree (DT) classifier using the CART algorithm with a Gini impurity measure for making the splits. The maximum depth level of the DT was 2. This resulted in a 98% mean accuracy and the ability to threshold the image with conditionals on the hue and value parameters of the HSV color space, Figure 2. Separate DTs were trained for different colors and their values are chosen according to the user input.
All the images are initially filtered through an image preprocessing pipeline before being given to the neural networks, Figure 3. First, a Gaussian blur is applied to smooth the image. Then, the RGB image is converted to the HSV color space. After the conversion, a threshold specific to the line color that was learned by the DT is applied and a segmented image of the trajectory is obtained. The segmented trajectory is downsampled to pixels to reduce computational complexity. Afterwards, the images are binarized and subsequently flattened for input to the neural networks.
3.2 Trajectory Data Collection
We take a learning by demonstration approach to train the neural networks 
. A track was laid out in a lighting controlled environment and the dataset was manually collected. Multiple rounds of data collection were performed, each with a varied camera orientation, frame rate, and track color. The robot was teleoperated from a base station with the operator judging movement purely on the incoming images. The processed images and velocity outputs were recorded to a CSV file. To avoid unintentional learning of user speeds, the velocities were normalized to a unit vector.
The dataset was augmented by mirroring the images and negating the corresponding angular velocity. The final dataset consisted of 122,576 labeled images with a 72/20/8 training/test/validation split. Note that the distribution of velocities in the dataset is not uniformly distributed as shown by the heatmaps in Figure4.
3.3 Neural Networks
In this subsection, we propose two neural network architectures for predicting the linear and angular steering velocities of a line following robot. A detailed justification is provided for the choice of the networks along with the network hyperparameters. In addition, we compare the validation versus training loss for each network.
|Layer Type||Hyperparameters||Output Shape|
|Convolution1D (Filter Size - 3)||307||(307,1022)|
|Max Pooling||Pool Size - 3||(102,1022)|
|Convolution1D (Filter Size -1)||100||(102,100)|
3.3.1 1d Cnn
2D CNNs are commonly used to detect local features in images. By using an overcomplete set of filters, variations of patterns can be learned for specific local features and can therefore produce accurate local feature maps. Higher level feature maps in 2D CNNs correspond to larger input region areas. Thus, generating abstractions by combining the lower level features can lead to good performance in tasks where spatial relationships are important . In this work, most spatial relationships are not significant since the actions depend more on global features rather than local features. Despite many architectural models tested, a 2D CNN failed to converge for our line following application. 1D CNNs have performed well in signal and image processing applications . Due to the sparsity of our data, a 1D CNN architecture was implemented with several fully connected layers for regression over the output velocities, Table I.
The preprocessed input image was flattened and fed into a 1D convolution layer which generated 307 feature maps. The activation used after each convolution layer is the softsign function,
In the final layer the activation is linear. The Adam 
optimizer is used with a learning rate of 0.0001, an exponential decay rate of 0.9, and a mean squared error loss function. Max pooling, dropouts and batch normalization  are regularly used throughout the network to generalize and prevent overfitting. The validation and training loss graphs for the network trained on the datasets are presented in Figure 5.
3.3.2 Multilayer Perceptron
|Layer Type||Hyperparameters||Input Shape|
The MLP architecture consists of three fully connected hidden layers and a varying number of nodes, Table II. The input layer consists of 1,024 nodes. Each node corresponds to a pixel of the preprocessed image and is forward propagated to produce two outputs at the last layer. These outputs correspond to the angular and linear velocities. Due to additional parameters and the suitability of the MLP for regression-based problems , the accuracy metrics and performance achieved was slightly better than the 1D CNN model.
For the activation functions, the ReLU function was used for all the layers except for the output layer. The optimizer and the loss function are identical to the ones used in the 1D CNN architecture. Due to the larger dimensionality of the MLP, the network is prone to overfitting. Dropout and batch normalization were used in the network to help alleviate these issues. To check for overfitting, the training and validation loss were calculated and observed throughout training. Due to the absence of an inflection point and the presence of a steady decrease in both validation and training losses, overfitting was minimized as shown in Figure6.
4 Experimental Results
In this section, we present the experimental results of our methods. The experiments were conducted at the University of Texas at Arlington Robotic Vision Laboratory. All experiments were conducted on a line following robot under realistic and challenging environmental conditions.
4.1 Robot Description
A robotic research platform was used for data collection, training, and experimental testing of the camera-based adaptive trajectory guidance system (Figure 1). The robot consists of a custom built 1.5 ft 1.5 ft chassis and is powered by two wheelchair motors using a differential drive configuration through a PID enabled Roboteq motor controller. The on-board computer is a Nivdia Jetson AGX Xavier running the Robot Operating System (ROS) . ROS is used to communicate between the various systems of the robot, including the Intel RealSense D415 stereo camera. The camera has a resolution of and a frame rate set to 6 fps. In addition, the camera was angled downward to provide a better view of the ground.
4.2 Model Metrics
The model metrics obtained while training are displayed in Tables III and IV. The MLP has a lower validation loss and higher validation accuracy than the 1D CNN. To test the performance of the model, the root mean square error and mean absolute error were calculated on 24,516 test samples. The 1D CNN performed as good as the MLP using 35% less parameters (265,519 vs. 409,102), while the performance of both models was similar.
4.3 Human Operator Versus Neural Network Decision Comparison
To test the viability of our system in changing conditions, a comparison between human teleoperation velocities and neural network predicted velocities was performed (Figure 7). Both the 1D CNN and the MLP architectures were tested against a new test track with a different color. To enable the neural network predictions, a threshold was learned for the new color and the image processing pipeline was updated accordingly. It is worth noting that the mean absolute error differs with each teleoperation as no two human operated runs will be exactly the same. Although the mean absolute error is a bit higher than the test values, even under low-light conditions the performance does not degrade much and it follows the human decision with a moderate amount of error (Table V). We also observe from the 1D CNN and MLP linear performance graphs that the robot is able to adjust its linear velocity based on the trajectory ahead similar to a human operator. To portray the information clearly, the linear velocity is scaled down and has not been normalized.
|Model||Angular MAE||Linear MAE|
|MLP (Low Light)||0.244||0.177|
|CNN (Low Light)||0.283||0.259|
|MLP (Full Light)||0.266||0.163|
|CNN (Full Light)||0.333||0.196|
4.4 Occlusion Scenarios
In this experiment, we test how well the robot performs when the track is occluded. To do this, multiple sections of the track were blocked to show only small portions of the curved trajectory and the predicted velocities were recorded (Figure 8). In these tests, the robot correctly predicts the movement under the condition that at least one section of trajectory is consistently present in each camera frame.
In summary, we have presented a system for adapting a line following robot to a noisy, dynamic, and non-binary environment using two occlusion and lighting tolerant neural network architectures. The 1D CNN model achieved a performance close to the MLP model in all metrics, and used 35% less parameters. In addition, the performance of both architectures in low-light conditions as well as the adjustment of linear velocities was demonstrated through the comparison of human actions with network predictions. The experimental results showed that the networks perform similar to a remote human operator.
Our system can be expanded to work with more track colors through the training of multiple threshold DTs. For example, multiple colors of track could be used simultaneously to allow for live switching of thresholds based on the most prevalent color. Additionally, individual neural network models could be trained for each color possibly negating the use of a separately trained threshold. The performance of the multi-model system may then be compared to that of a single model trained on the whole multi-color dataset to determine if dynamic switching truly provides a benefit.
-  (2013) Automated waste sorter with mobile robot delivery waste system. In De La Salle University Research Congress, pp. 7–9. Cited by: §1.
-  (2009) A survey of robot learning from demonstration. Robotics and autonomous systems 57 (5), pp. 469–483. Cited by: §3.2.
-  (2018) Application of convolutional neural network image classification for a path-following robot. Cited by: §2.
-  (2009) Evolving a line following robot to use in shopping centers for entertainment. In 2009 35th Annual Conference of IEEE Industrial Electronics, pp. 3803–3807. Cited by: §1.
-  (1998) A bayesian cart algorithm. Biometrika 85 (2), pp. 363–377. Cited by: §3.1.
-  (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. External Links: Cited by: §3.3.1.
-  (2009) Vision-based system for line following mobile robot. In 2009 IEEE Symposium on Industrial Electronics & Applications, Vol. 2, pp. 642–645. Cited by: §2.
-  (2014) Applications of line follower robot in medical field. International Journal of Research 1 (11), pp. 409–412. Cited by: §1.
-  (2018) Comparing multilayer perceptron and multiple regression models for predicting energy use in the balkans. CoRR abs/1810.11333. External Links: Cited by: §3.3.2.
-  (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.3.1.
-  (2019-05) 1-d convolutional neural networks for signal processing applications. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 8360–8364. External Links: Cited by: §3.3.1.
-  (2013) Network in network. arXiv preprint arXiv:1312.4400. Cited by: §3.3.1.
-  (1989) Alvinn: an autonomous land vehicle in a neural network. In Advances in neural information processing systems, pp. 305–313. Cited by: §2.
-  (2013) Development and applications of line following robot based health care management system. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) 2 (8), pp. 2446–2450. Cited by: §2.
ROS: an open-source robot operating system. In ICRA workshop on open source software, Vol. 3, pp. 5. Cited by: §4.1.
-  (2005) Architecture of the vision system of a line following mobile robot operating in static environment. In 2005 Pakistan Section Multitopic Conference, pp. 1–8. Cited by: §2.
-  (2016) Serving robot: new generation electronic waiter. International Journal of Engineering Science 3763. Cited by: §1.
Dropout: a simple way to prevent neural networks from overfitting.
The journal of machine learning research15 (1), pp. 1929–1958. Cited by: §3.3.1.
-  (2010) An intelligent line-following robot project for introductory robot courses. Science World Transactions on Engineering and Technology Education, Lunghwa University of Science and Technology, Taoyuan County, Taiwan 8 (4), pp. 1–7. Cited by: §1.
-  (2016) A deep-network solution towards model-less obstacle avoidance. In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 2759–2764. Cited by: §2.