Camera-Based Adaptive Trajectory Guidance via Neural Networks

In this paper, we introduce a novel method to capture visual trajectories for navigating an indoor robot in dynamic settings using streaming image data. First, an image processing pipeline is proposed to accurately segment trajectories from noisy backgrounds. Next, the captured trajectories are used to design, train, and compare two neural network architectures for predicting acceleration and steering commands for a line following robot over a continuous space in real time. Lastly, experimental results demonstrate the performance of the neural networks versus human teleoperation of the robot and the viability of the system in environments with occlusions and/or low-light conditions.


page 1

page 3

page 5


Deep Trajectory for Recognition of Human Behaviours

Identifying human actions in complex scenes is widely considered as a ch...

dPMP-Deep Probabilistic Motion Planning: A use case in Strawberry Picking Robot

This paper presents a novel probabilistic approach to deep robot learnin...

Trajectory Servoing: Image-Based Trajectory Tracking Using SLAM

This paper describes an image based visual servoing (IBVS) system for a ...

TrueRMA: Learning Fast and Smooth Robot Trajectories with Recursive Midpoint Adaptations in Cartesian Space

We present TrueRMA, a data-efficient, model-free method to learn cost-op...

Visual Servoing from Deep Neural Networks

We present a deep neural network-based method to perform high-precision,...

CineFilter: Unsupervised Filtering for Real Time Autonomous Camera Systems

Learning to mimic the smooth and deliberate camera movement of a human c...

Walk2Map: Extracting Floor Plans from Indoor Walk Trajectories

Recent years have seen a proliferation of new digital products for the e...

1 Introduction

Line following robots have a variety of use cases in education, entertainment, health care, factory/warehouse settings, and more [19, 4, 1, 8, 17]. However, the effectiveness of these types of mobile robots in realistic deployments is dependent upon their ability to negotiate environments in which obstacles are dynamic, trajectories are occluded, and lighting conditions vary. Existing control methods for robot line following utilize knowledge-based approaches that rely upon structured environments. While effective in simplistic scenarios, these methods are severely limited by their surroundings. Moreover, they do not generalize for real-world use due to the inherent uncertainty in states encountered during operation.

Fig. 1: A line following robot equipped with a stereo camera.

Analog sensors have historically been the most common approach to provide control for a line following robot. These sensors can provide inference in high-contrast binary environments, but often require dedicated circuits and fail when encountering discontinuous trajectories. Vision-based approaches such as sensor array matrices use the location of image pixel values to generate pulse-width modulation outputs for motor control. However, the assumption that information will be found in a certain region of interest is not always true. In comparison, a neural network can be trained to produce control outputs based on noisy images where information is contained in different areas and trajectories are discontinuous.

Neural network-based methods for line following often discretize the output space into steering commands by using classification models that only allow for lateral control. Although this enables the robot to steer correctly, it does not make use of the trajectory information to provide control longitudinally. In addition, discretizing the output action space results in a loss of precision. A continuous output space over both lateral and longitudinal velocities allows for reactive control based on the steepness of the trajectory, hence mimicking human behavior.

To address these issues, we first propose a decision tree thresholding algorithm that can adapt to active and cluttered environments using a minimal amount of surveyed data. Next, we design and compare the performance of two neural network architectures: a multilayer perceptron (MLP) and a 1D convolutional neural network (CNN). Our networks produce continuous output over both linear and angular velocities using a regression-based model. Finally, we demonstrate the usability of the system through the comparison of decisions made by the neural networks against those of a human operator with a line following robot, Figure


The remainder of this paper is organized as follows. Related research work is reviewed in Section 2. Section 3 provides the details of our proposed approach. Experimental results are presented and discussed in Section 4. We conclude in Section 5.

2 Related Work

Line following robots have traditionally used comparator circuits with analog sensors to detect the presence of a trajectory. For example, Punetha et al. implement an array of light dependent resistors and IR proximity sensors to follow a line in a binary environment with sharp contrast differences [14]. Their approach works in controlled environments, yet real-world conditions usually do not offer such simple scenarios and trajectories may not always be continuous.

Ismail et al. take a vision-based approach to line following [7]. The authors generate proportional pulse-width modulated outputs directly from images using a score calculated from a sensor array matrix under the assumption that pixels lie in the center of the image. However, if there are pixels denoting a straight line towards the periphery of the image the system will fail to predict accurate control outputs. Similarly, Rahman et al. propose a line following vision system assuming that objects in the environment are static [16]. Although this approach may work in constrained environments, the expectation of a static environment contradicts most real world scenarios.

Pomerleau uses a camera feed and laser rangefinder measurements with a single-layer neural network to a steer vehicle [13]. The activations of the network’s input layer are proportional to the blue channel of the image. Nevertheless, the activations of the network are not dependent on color since the implementation discards this information before passing the input to the network. In another implementation using a CNN for line following, Borne and Lowrance discretize the output space to produce steering commands [3]

. Likewise, Tai et al. implement a CNN with fully connected end layers activated using a softmax function to predict probabilities of members in a discrete steering action space multiplied with preset velocities


3 Approach

In this section, we describe our approach to obtain visual trajectories in dynamic environments from a streaming camera feed. We use these trajectories with two different neural networks architectures to provide acceleration and steering control for a mobile robot over a continuous space in real time. The performance of the networks is compared to human teleoperation of a line following robot.

3.1 Image Preprocessing

Fig. 2: The decision tree used for colored line segmentation.

To accurately segment the trajectories from noisy backgrounds, obstacles that move even after the field of view is restricted, and be tolerant to fluctuating lighting conditions, a HSV (hue, saturation, and value) threshold was trained. Since the robot may follow different colored lines, individual HSV thresholds must be learned. First, the robot’s operational environment was surveyed and approximately 20 images were collected for a single color of track in diverse conditions (e.g. under various lighting conditions, backgrounds, objects in the frame, etc.).

After collecting the data, each pixel was labeled as either line or non-line. The labeled data was then passed to decision tree (DT) classifier using the CART algorithm

[5] with a Gini impurity measure for making the splits. The maximum depth level of the DT was 2. This resulted in a 98% mean accuracy and the ability to threshold the image with conditionals on the hue and value parameters of the HSV color space, Figure 2. Separate DTs were trained for different colors and their values are chosen according to the user input.

Raw RGB Image

Gaussian Blur

HSV Conversion

Thresholding with DT

Resize ()

Image Binarization

Fig. 3: The image preprocessing pipeline for robot line following.

All the images are initially filtered through an image preprocessing pipeline before being given to the neural networks, Figure 3. First, a Gaussian blur is applied to smooth the image. Then, the RGB image is converted to the HSV color space. After the conversion, a threshold specific to the line color that was learned by the DT is applied and a segmented image of the trajectory is obtained. The segmented trajectory is downsampled to pixels to reduce computational complexity. Afterwards, the images are binarized and subsequently flattened for input to the neural networks.

3.2 Trajectory Data Collection

We take a learning by demonstration approach to train the neural networks [2]

. A track was laid out in a lighting controlled environment and the dataset was manually collected. Multiple rounds of data collection were performed, each with a varied camera orientation, frame rate, and track color. The robot was teleoperated from a base station with the operator judging movement purely on the incoming images. The processed images and velocity outputs were recorded to a CSV file. To avoid unintentional learning of user speeds, the velocities were normalized to a unit vector.

The dataset was augmented by mirroring the images and negating the corresponding angular velocity. The final dataset consisted of 122,576 labeled images with a 72/20/8 training/test/validation split. Note that the distribution of velocities in the dataset is not uniformly distributed as shown by the heatmaps in Figure


(a) Angular velocities.
(b) Linear velocities.
Fig. 4: The trajectory dataset distribution heatmaps.

3.3 Neural Networks

In this subsection, we propose two neural network architectures for predicting the linear and angular steering velocities of a line following robot. A detailed justification is provided for the choice of the networks along with the network hyperparameters. In addition, we compare the validation versus training loss for each network.

Layer Type Hyperparameters Output Shape
Input 1024 (1,1024)
Convolution1D (Filter Size - 3) 307 (307,1022)
Dropout 20% Dropout (307,1022)
Max Pooling Pool Size - 3 (102,1022)
Dense 207 (102,207)
Batch Normalization None (102,207)
Dropout 10% Dropout (102,207)
Convolution1D (Filter Size -1) 100 (102,100)
Dense 100 (102,100)
Batch Normalization None (102,100)
Dropout 20% Dropout (102,100)
Dropout 20% Dropout (102,100)
Flatten None (1,10200)
Dense 2 (1,2)
TABLE I: 1D CNN architecture.

3.3.1 1d Cnn

2D CNNs are commonly used to detect local features in images. By using an overcomplete set of filters, variations of patterns can be learned for specific local features and can therefore produce accurate local feature maps. Higher level feature maps in 2D CNNs correspond to larger input region areas. Thus, generating abstractions by combining the lower level features can lead to good performance in tasks where spatial relationships are important [12]. In this work, most spatial relationships are not significant since the actions depend more on global features rather than local features. Despite many architectural models tested, a 2D CNN failed to converge for our line following application. 1D CNNs have performed well in signal and image processing applications [11]. Due to the sparsity of our data, a 1D CNN architecture was implemented with several fully connected layers for regression over the output velocities, Table I.

The preprocessed input image was flattened and fed into a 1D convolution layer which generated 307 feature maps. The activation used after each convolution layer is the softsign function,

In the final layer the activation is linear. The Adam [10]

optimizer is used with a learning rate of 0.0001, an exponential decay rate of 0.9, and a mean squared error loss function. Max pooling, dropouts

[18] and batch normalization [6] are regularly used throughout the network to generalize and prevent overfitting. The validation and training loss graphs for the network trained on the datasets are presented in Figure 5.

Fig. 5: Validation versus training loss using a 1D CNN.

3.3.2 Multilayer Perceptron

Layer Type Hyperparameters Input Shape
Input 1024 (1,1024)
Dense-1 300 (1024,300)
Dropout 20% Dropout (1024,300)
Dense-2 200 (300,200)
BatchNorm None (300,200)
Dropout 10% Dropout (300,200)
Dense-3 200 (300,200)
Output 2 (200,2)
TABLE II: The MLP architecture.

The MLP architecture consists of three fully connected hidden layers and a varying number of nodes, Table II. The input layer consists of 1,024 nodes. Each node corresponds to a pixel of the preprocessed image and is forward propagated to produce two outputs at the last layer. These outputs correspond to the angular and linear velocities. Due to additional parameters and the suitability of the MLP for regression-based problems [9], the accuracy metrics and performance achieved was slightly better than the 1D CNN model.

For the activation functions, the ReLU function was used for all the layers except for the output layer. The optimizer and the loss function are identical to the ones used in the 1D CNN architecture. Due to the larger dimensionality of the MLP, the network is prone to overfitting. Dropout and batch normalization were used in the network to help alleviate these issues. To check for overfitting, the training and validation loss were calculated and observed throughout training. Due to the absence of an inflection point and the presence of a steady decrease in both validation and training losses, overfitting was minimized as shown in Figure


Fig. 6: Validation versus training loss using a MLP.

4 Experimental Results

In this section, we present the experimental results of our methods. The experiments were conducted at the University of Texas at Arlington Robotic Vision Laboratory. All experiments were conducted on a line following robot under realistic and challenging environmental conditions.

4.1 Robot Description

A robotic research platform was used for data collection, training, and experimental testing of the camera-based adaptive trajectory guidance system (Figure 1). The robot consists of a custom built 1.5 ft 1.5 ft chassis and is powered by two wheelchair motors using a differential drive configuration through a PID enabled Roboteq motor controller. The on-board computer is a Nivdia Jetson AGX Xavier running the Robot Operating System (ROS) [15]. ROS is used to communicate between the various systems of the robot, including the Intel RealSense D415 stereo camera. The camera has a resolution of and a frame rate set to 6 fps. In addition, the camera was angled downward to provide a better view of the ground.

4.2 Model Metrics

Train Validation Train Validation
Accuracy 0.924 0.884 0.925 0.907
Loss 0.025 0.038 0.020 0.025
TABLE III: The accuracy and loss for the neural networks on the training and validation datasets.
Linear Angular Linear Angular
MAE 0.010 0.183 0.090 0.162
RMSE 0.148 0.258 0.142 0.247
TABLE IV: The mean absolute error (MAE) and root mean square error (RMSE) for the neural networks.

The model metrics obtained while training are displayed in Tables III and IV. The MLP has a lower validation loss and higher validation accuracy than the 1D CNN. To test the performance of the model, the root mean square error and mean absolute error were calculated on 24,516 test samples. The 1D CNN performed as good as the MLP using 35% less parameters (265,519 vs. 409,102), while the performance of both models was similar.

4.3 Human Operator Versus Neural Network Decision Comparison

(a) MLP linear performance.
(b) MLP angular performance.
(c) CNN linear performance.
(d) CNN angular performance.
Fig. 7: The velocity decisions of the human operator versus the neural network.

To test the viability of our system in changing conditions, a comparison between human teleoperation velocities and neural network predicted velocities was performed (Figure 7). Both the 1D CNN and the MLP architectures were tested against a new test track with a different color. To enable the neural network predictions, a threshold was learned for the new color and the image processing pipeline was updated accordingly. It is worth noting that the mean absolute error differs with each teleoperation as no two human operated runs will be exactly the same. Although the mean absolute error is a bit higher than the test values, even under low-light conditions the performance does not degrade much and it follows the human decision with a moderate amount of error (Table V). We also observe from the 1D CNN and MLP linear performance graphs that the robot is able to adjust its linear velocity based on the trajectory ahead similar to a human operator. To portray the information clearly, the linear velocity is scaled down and has not been normalized.

Model Angular MAE Linear MAE
MLP (Low Light) 0.244 0.177
CNN (Low Light) 0.283 0.259
MLP (Full Light) 0.266 0.163
CNN (Full Light) 0.333 0.196
TABLE V: The mean absolute error in human versus neural network decision.

4.4 Occlusion Scenarios

Fig. 8: An example a trajectory produced (right) by an occluded track (left).

In this experiment, we test how well the robot performs when the track is occluded. To do this, multiple sections of the track were blocked to show only small portions of the curved trajectory and the predicted velocities were recorded (Figure 8). In these tests, the robot correctly predicts the movement under the condition that at least one section of trajectory is consistently present in each camera frame.

5 Conclusion

In summary, we have presented a system for adapting a line following robot to a noisy, dynamic, and non-binary environment using two occlusion and lighting tolerant neural network architectures. The 1D CNN model achieved a performance close to the MLP model in all metrics, and used 35% less parameters. In addition, the performance of both architectures in low-light conditions as well as the adjustment of linear velocities was demonstrated through the comparison of human actions with network predictions. The experimental results showed that the networks perform similar to a remote human operator.

Our system can be expanded to work with more track colors through the training of multiple threshold DTs. For example, multiple colors of track could be used simultaneously to allow for live switching of thresholds based on the most prevalent color. Additionally, individual neural network models could be trained for each color possibly negating the use of a separately trained threshold. The performance of the multi-model system may then be compared to that of a single model trained on the whole multi-color dataset to determine if dynamic switching truly provides a benefit.


  • [1] F. Ang, M. K. A. R. Gabriel, J. Sy, J. J. O. Tan, and A. C. Abad (2013) Automated waste sorter with mobile robot delivery waste system. In De La Salle University Research Congress, pp. 7–9. Cited by: §1.
  • [2] B. D. Argall, S. Chernova, M. Veloso, and B. Browning (2009) A survey of robot learning from demonstration. Robotics and autonomous systems 57 (5), pp. 469–483. Cited by: §3.2.
  • [3] W. Born and C. J. Lowrance (2018) Application of convolutional neural network image classification for a path-following robot. Cited by: §2.
  • [4] I. Colak and D. Yildirim (2009) Evolving a line following robot to use in shopping centers for entertainment. In 2009 35th Annual Conference of IEEE Industrial Electronics, pp. 3803–3807. Cited by: §1.
  • [5] D. G. Denison, B. K. Mallick, and A. F. Smith (1998) A bayesian cart algorithm. Biometrika 85 (2), pp. 363–377. Cited by: §3.1.
  • [6] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. External Links: 1502.03167 Cited by: §3.3.1.
  • [7] A. H. Ismail, H. R. Ramli, M. Ahmad, and M. H. Marhaban (2009) Vision-based system for line following mobile robot. In 2009 IEEE Symposium on Industrial Electronics & Applications, Vol. 2, pp. 642–645. Cited by: §2.
  • [8] T. Jain, R. Sharma, and S. Chauhan (2014) Applications of line follower robot in medical field. International Journal of Research 1 (11), pp. 409–412. Cited by: §1.
  • [9] R. Jankovic and A. Amelio (2018) Comparing multilayer perceptron and multiple regression models for predicting energy use in the balkans. CoRR abs/1810.11333. External Links: Link, 1810.11333 Cited by: §3.3.2.
  • [10] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §3.3.1.
  • [11] S. Kiranyaz, T. Ince, O. Abdeljaber, O. Avci, and M. Gabbouj (2019-05) 1-d convolutional neural networks for signal processing applications. In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vol. , pp. 8360–8364. External Links: Document, ISSN Cited by: §3.3.1.
  • [12] M. Lin, Q. Chen, and S. Yan (2013) Network in network. arXiv preprint arXiv:1312.4400. Cited by: §3.3.1.
  • [13] D. A. Pomerleau (1989) Alvinn: an autonomous land vehicle in a neural network. In Advances in neural information processing systems, pp. 305–313. Cited by: §2.
  • [14] D. Punetha, N. Kumar, and V. Mehta (2013) Development and applications of line following robot based health care management system. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) 2 (8), pp. 2446–2450. Cited by: §2.
  • [15] M. Quigley, K. Conley, B. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y. Ng (2009)

    ROS: an open-source robot operating system

    In ICRA workshop on open source software, Vol. 3, pp. 5. Cited by: §4.1.
  • [16] M. Rahman, M. H. R. Rahman, A. L. Haque, and M. T. Islam (2005) Architecture of the vision system of a line following mobile robot operating in static environment. In 2005 Pakistan Section Multitopic Conference, pp. 1–8. Cited by: §2.
  • [17] U. Scholar (2016) Serving robot: new generation electronic waiter. International Journal of Engineering Science 3763. Cited by: §1.
  • [18] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov (2014) Dropout: a simple way to prevent neural networks from overfitting.

    The journal of machine learning research

    15 (1), pp. 1929–1958.
    Cited by: §3.3.1.
  • [19] J. Su, C. Lee, H. Huang, S. Chuang, and C. Lin (2010) An intelligent line-following robot project for introductory robot courses. Science World Transactions on Engineering and Technology Education, Lunghwa University of Science and Technology, Taoyuan County, Taiwan 8 (4), pp. 1–7. Cited by: §1.
  • [20] L. Tai, S. Li, and M. Liu (2016) A deep-network solution towards model-less obstacle avoidance. In 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 2759–2764. Cited by: §2.