Machine learning is defined as a set of methods that can
automatically detect patterns in data, and then use the uncovered patterns to predict future
data and perform decision making under uncertainty [Murphy2012].
In the field of robotics, machine learning has been widely used to accurately approximate models of robots, without the need of analytical models, which may be hard to obtain due to the complexity of the system [Nguyen-Tuong2011].
Many algorithms have been developed for solving the regression problem in robot modelling [Sigaud2011], however, in order to build good models, data must be carefully analysed, since outliers and noise may affect the results. Most of the proposed methods do not address the problem of outlier rejection and usually assume that data points follow a known distribution (typically Gaussian).
An outlier is defined as a data point significantly different from the others [Hawkins1980] and the presence of unwanted data points may lead to a wrong model describing the relationship between the input and the output values [Rousseeuw2003].
Different approaches exist to detect outliers in datasets [Aggarwal2013], yet, all these outlier detection method either rely on only one particular method, or on the knowledge of the data statistical distribution. Moreover, they can be regarded to as a sort of preprocessing approaches. As a matter of fact, first outliers have to be detected with one of these methods, and, only afterwards, regression methods can be applied to the good data points to find the appropriate model. Robust regression approaches such as iteratively reweighed least squares [Street1988] or random sample consensus (RANSAC) [Fischler1981] for linear regression, instead, allow to neglect outliers while the regression process takes place.
for linear regression, instead, allow to neglect outliers while the regression process takes place.
In this work a novel approach for robust data modelling and input/output mapping is presented. The method doesn’t require any preprocessing to identify outliers, since they are automatically found while learning the model. Moreover, no assumption on the data distribution is required. The proposed method has been validated by using neural networks for regression, yet it can be generalized to any other regression method such as linear regression, Gaussian process regression, etc.
The paper is thus structured as follows.
Section II presents the proposed method. Section III shows the results on simulated and real data. For the simulated data, two cases are analyzed: linear and nonlinear regression. For the real application, the method is applied to model the dynamics of a tendon-driven surgical robot. Conclusions are then drawn in Section IV.
Given a dataset of input points and output points , the goal of regression is to find best the relationship between the two, meaning
where can be any linear or nonlinear function. Artificial neural network (ANN) can model any suitably smooth function, given enough hidden units, to any desired level of accuracy [Hornik1991]. They are thus capable of representing complicated behaviours, without the need of knowing any mathematical or physical model. Nevertheless, it has been shown that NN behaviour is influenced by outliers [Khamis2005, Liano1996].
In order to build a model which is not affected by bad data, outliers must be detected and somehow neglected. Given data points their estimated output value for each output component is .
For each output dimension the vector of residuals is computed
data points their estimated output value for each output component isfor ,
. For each output dimension the vector of residuals is computed. The median and the median absolute deviation of each residual vector are calculated, and the threshold is then set to , with being a positive constant.
Once the medians and the thresholds have been retrieved, each data sample is assigned an output weight as follows:
A data point for a certain output component is thus an outlier if its residual is too far from the median of the residuals.
In order to build a robust model from a given dataset, an iterative re-weighing process is performed. At first, each sample for each output component is assigned a unitary weight and a first model is built. Then the inliers and their weights are computed, and the model is refined with the new weights. The process continues until a desired number of refinements is reached.
In this Section the proposed method is tested on simulated data and real data. For the simulation, it is applied to linear and non linear regression. The method is compared to traditional ANN, RANSAC for the linear regression, and Gaussian Process Regression for the nonlinear regression. For the real experiment, the robust method is applied to model the dynamics of a tendon-driven robot.
Iii-a Simulation Data
|Linear Dataset 1|
|Robust NN||10 nodes||0.9927||0|
|Traditional NN||10 nodes||0.9957||0.2337|
|Linear Dataset 2|
|Robust NN||10 nodes||0.9706||0.0105|
|Traditional NN||10 nodes||0.9706||0.0199|
For the linear regression example, two different datasets are used. In both cases the desired function is described by .
In the first dataset, some noisy data is added only in a certain input region. The noisy data is generated from a random Gaussian distribution with mean ( ) equal to 1 and standard deviation (
. In the first dataset, some noisy data is added only in a certain input region. The noisy data is generated from a random Gaussian distribution with mean (
) equal to 1 and standard deviation () of 0.5. In total 1100 data points are used, with 100 points being noisy. In the second dataset, instead, all the 1100 points are corrupted by noise (with ). For building the robust model, the threshold has been set to and 5 refinements are executed. Figure 1 shows the results comparing the robust proposed method, the traditional ANN, and RANSAC method [Fischler1981].
For both the robust NN and traditional NN, the datset is divided randomly into train set ( ). Different network architectures are used: on single hidden layer with 10 nodes and with 20 nodes. In both cases, linear activation functions for each node are used. For RANSAC, 2 samples are used, the maximum distance is set equal to
). Different network architectures are used: on single hidden layer with 10 nodes and with 20 nodes. In both cases, linear activation functions for each node are used. For RANSAC, 2 samples are used, the maximum distance is set equal to, and the distance function being the euclidean distance between the output and the expected output value. Table I shows the goodness of fit expressed in terms of the [Allen1997] and the RMSE between the computed models and the desired one.
In the first datsets, RANSAC and the robust method perform very close to each other finding the correct mapping. Traditional NN, instead, doesn’t mange to obtain good results, with the model being biased by the noisy data. In the second dataset, instead, RANSAC is the one the performs worst. The proposed robust method, instead, performs the best.
For the nonlinear case two different mappings are sought. In the first mapping the desired function is described by and the Gaussian noise by , whereas in the second the function is and the noise by . In both cases 2000 data points are used. Two different network architectures are used: one single hidden layer with 20 nodes and a two-hidden-layer structure with 20 and 10 nodes each. The robust and traditional ANN methods are here compared to Gaussian Process Regression (GPR). Figure 2 shows the results. The same threshold and refinements of the linear case are used. In order to see the behaviour when points outside a Guassian distribution are present, 150 outliers are also added (Figure 3). Table II reports the error metrics for the different methods on the different datasets.
When no outliers are present, all methods perform pretty well and the models are pretty close one to each other. On the second datasets, however, traditional NN is the one that performs the worst. When outliers are included, instead, the models from GPR and traditional NN are compromised. The two methods perform very similarly, with large errors. The robust approach, conversely, manages to keep errors small, even if a bit higher than in the case with only Gaussian noise.
|Nonlinear Dataset 1|
|No Outliers||With Outliers|
|Robust NN||10 nodes||0.9361||0.0276||0.1722||0.0521|
|Traditional NN||10 nodes||0.9365||0.0215||0.2302||0.3825|
|Nonlinear Dataset 2|
|No Outliers||With Outliers|
|Robust NN||10 nodes||0.9549||0.0123||0.2165||0.0632|
|Traditional NN||10 nodes||0.9510||0.0309||0.2720||0.2147|
Iii-B Robot Dynamic Modelling
As an example of real life application, we were interested in collecting data for learning the inverse dynamics and nolinearities of the Micro-IGES robotic surgical tool [Shang2017]. This robot is tendon-driven and the major causes of nonlinearities are due to the routing and elasticity of the tendons, and friction in the joints and along the tendons. Because of the motor to joint mapping,the inverse dynamics of the system can be expressed in therms of the motor values , , , and torques as:
Only 4-dof of the articulated part of the robot are considered and, given the motor values () as inputs, the corresponding is sought.
For the learning the model a single neural network with two hidden layers of 20 and 10 neurons was used. Each motor was excited with a sinusoidal wave of linearly increasing frequency within the range
For the learning the model a single neural network with two hidden layers of 20 and 10 neurons was used. Each motor was excited with a sinusoidal wave of linearly increasing frequency within the rangeHz. Figure 4 shows the results by using the traditional and robust approaches on a subset of the data and Table III reports the and the RMSE between the two models and the measured values in the whole dataset.
Both methods perform well, with good and small errors. However, it can be noted that the robust method allows to have smoother mapping, being less influenced by unwanted data.
|Robust NN Method|
|Traditional NN Method|
In conclusion, a novel learning algorithm has been presented in this work. The algorithm doesn’t make any assumption on the data distribution and allows to have outliers identification while performing the regression, without the need of any preprocessing. Results show that the method is able to neglect undesired data points and, in turn, to produce robust models, minimally influenced by outliers. However, higher robustness comes at a price of higher computational efforts. In all tests the robust method took more time than traditional NN to build the model. In the nonlinear case, the computational time was comparable to that of GPR.
In the next future, the algorithm will be improved in order to make it robuster and reduce the computational time, so that to apply it to online learning.
This is a preprint of a work which will be published in IROS 2019, Nov 4-8, Macau by the same authors.