Towards Autonomous Driving of Personal Mobility with Small and Noisy Dataset using Tsallis-statistics-based Behavioral Cloning

11/29/2021
by   Taisuke Kobayashi, et al.
0

Autonomous driving has made great progress and been introduced in practical use step by step. On the other hand, the concept of personal mobility is also getting popular, and its autonomous driving specialized for individual drivers is expected for a new step. However, it is difficult to collect a large driving dataset, which is basically required for the learning of autonomous driving, from the individual driver of the personal mobility. In addition, when the driver is not familiar with the operation of the personal mobility, the dataset will contain non-optimal data. This study therefore focuses on an autonomous driving method for the personal mobility with such a small and noisy, so-called personal, dataset. Specifically, we introduce a new loss function based on Tsallis statistics that weights gradients depending on the original loss function and allows us to exclude noisy data in the optimization phase. In addition, we improve the visualization technique to verify whether the driver and the controller have the same region of interest. From the experimental results, we found that the conventional autonomous driving failed to drive properly due to the wrong operations in the personal dataset, and the region of interest was different from that of the driver. In contrast, the proposed method learned robustly against the errors and successfully drove automatically while paying attention to the similar region to the driver. Attached video is also uploaded on youtube: https://youtu.be/KEq8-bOxYQA

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 5

page 7

page 8

page 9

04/23/2018

Adaptive Performance Assessment For Drivers Through Behavioral Advantage

The potential positive impact of autonomous driving and driver assistanc...
02/13/2022

Motion Sickness Modeling with Visual Vertical Estimation and Its Application to Autonomous Personal Mobility Vehicles

Passengers (drivers) of level 3-5 autonomous personal mobility vehicles ...
04/23/2018

Adapted Performance Assessment For Drivers Through Behavioral Advantage

The potential positive impact of autonomous driving and driver assistanc...
04/03/2018

Looking at Hands in Autonomous Vehicles: A ConvNet Approach using Part Affinity Fields

In the context of autonomous driving, where humans may need to take over...
04/07/2021

Human-Vehicle Cooperation on Prediction-Level: Enhancing Automated Driving with Human Foresight

To maximize safety and driving comfort, autonomous driving systems can b...
05/04/2018

Failure Prediction for Autonomous Driving

The primary focus of autonomous driving research is to improve driving a...
02/03/2022

AI-as-a-Service Toolkit for Human-Centered Intelligence in Autonomous Driving

This paper presents a proof-of-concept implementation of the AI-as-a-Ser...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Recently, autonomous driving of vehicles has reached the stage of practical application. Autonomous driving is a control problem: e.g. classical control techniques are often employed for keeping lane and distance between vehicles Suryanarayanan and Tomizuka (2007); Klančar et al. (2009); planning by model predictive control with vehicle models is also employed for more extensive autonomous driving Levinson et al. (2011); Williams et al. (2018)

. On the other hand, machine learning is a methodology that can handle the cases where an accurate vehicle model is not available and/or where there is uncertainty in the surrounding environment. As one of the machine learning technologies, imitation learning, which learns end-to-end mapping from observations to actions (i.e. steering, accelerating, and braking), is mainly utilized with a huge driving dataset 

Codevilla et al. (2018); Onishi et al. (2019); Hawke et al. (2020). In this study, we focus on such an imitation learning technology, which is simpler and more versatile, although its limitations about scalability were reported Codevilla et al. (2019).

While most of the above-mentioned autonomous driving technologies target general vehicles, the development and widespread use of personal mobility, such as electric wheelchairs Nakajima (2017) and Segways Nguyen et al. (2004), will be accelerated as a next-generation mobility. The personal mobility is basically intended for short-distance travel and requires the ability to travel in a wide range of situations, not limited to well-developed roads. In addition, since the personal mobility is developed for personal use, the situations encountered by each driver differ greatly. That is, it is desirable to tune a controller specialized for each driver rather than acquiring generalized performance by learning from a huge dataset in advance.

The problem considered from this problem setting is the quality of the dataset. Naturally, the total size of dataset will be small because it is constructed for each driver. If the driver is not familiar with the operation of the personal mobility, wrong operations will inevitably be included as noises. Imitation learning on such a small and noisy dataset, called a personal dataset in this paper, is known to have significant performance degradation Argall et al. (2009); Hussein et al. (2017). For this reason, we have to make imitation learning robust to noise.

Here, we briefly introduce the related work for the noise-robust imitation learning. A research group of Sugiyama has developed quality-aware imitation learning methods Wu et al. (2019); Tangkaratt et al. (2020)

, which estimate the quality of each data to select ones to be optimized. However, unlike behavioral cloning 

Bain and Sammut (1995), which is often used in autonomous driving to learn the direct mapping from observations to actions Codevilla et al. (2018); Onishi et al. (2019); Hawke et al. (2020)

, these methods are classified as inverse reinforcement learning 

Ng and Russell (2000), which uses reinforcement learning Sutton and Barto (2018) in combination and requires some trial and error by a non-optimal controller. R-MaxEnt also estimates the quality of each data through maximum entropy principle Hussein et al. (2021). Although this method is capable of learning the optimal policy from only a given dataset, the controller is assumed to be for a discrete system, hence, it is not suitable for autonomous driving where the continuous control command is required. Sasaki and Yamashita have modified the standard behavioral cloning to seek one of the modes of the expert behaviors Sasaki and Yamashina (2021). Although there is no restriction on the controller like above, the controller is desired to be ensemble trained to improve the performance, which increases the computational cost. Ilboudo et al. have proposed a noise-robust optimizer for the standard behavioral cloning problem Ilboudo et al. (2020, 2021)

. It checks the gradients used to update the neural networks that approximates the controller, and empirically filters out anomalies, but it is only a safety net and is less effective if there is a lot of noise in the dataset.

Therefore, this paper proposes a simple but yet noise-robust behavioral cloning for the personal mobility. Specifically, we focus on the fact that the standard behavioral cloning is the minimization problem of the negative log likelihood of the stochastic controller. By replacing the log likelihood to -log likelihood introduced in Tsallis statistics Tsallis (1988); Suyari and Tsukada (2005); Kobayashi (2020), the behavioral cloning can easily adjust its noise robustness in accordance with a real parameter, . This replacement can be interpreted as a nonlinear transformation of the log likelihood, and the gradient naturally disappears in noisy data where the log likelihood becomes small. As a result, each data is implicitly weighted to imitate only high quality data, resulting in obtaining the noise robustness.

In order to validate the proposed method, we employ a visualization technique for the inputs (more specifically, the region of interest in the inputted image) that is strongly involved in the controller, so-called VisualBackProp Bojarski et al. (2018)

. This allows us to qualitatively assess whether the driver and the learned controller have a common region of interest. However, we empirically found that the original VisualBackProp sometimes fails to extract the region of interest appropriately due to noise, which causes extreme features values. In addition, although conventional techniques are for convolutional neural networks (CNNs) 

Krizhevsky et al. (2012); LeCun et al. (2015), in many cases, the features are further shaped by multiple fully connected networks (FCNs) after the CNNs. These FCNs are ignored in the original VisualBackProp, thus ignoring the features that contribute more directly to the controller. Therefore, as an additional minor contribution, we modify the implementation of VisualBackProp to improve these shortcomings.

Experiments using an electric wheelchair as one of the personal mobilities are conducted for the verification of the proposed method. The personal dataset contains driving corners, stopping in front of the stop sign, and zigzagging and/or non-stopping as noise. Although the standard behavioral cloning fails to imitate the stopping operation due to the adverse effects by the noisy data, the proposed method successfully imitates all the operations by excluding the noisy data. In addition, the modified VisualBackProp is able to properly extract the stop sign (and objects in a shelf to guide driving a corner) as the region of interest, which is naturally similar to that of the driver. As a consequence, the proposed method achieves the autonomous driving of the personal mobility even with the small and noisy personal dataset, while extracting the driver-like region of interest.

2 Conventional methods and their problems

2.1 Behavioral cloning

Figure 1: Overview of behavioral cloning: at first, the dataset with the pairs of is collected by an expert (often a human); the parameters set are optimized using the dataset in an offline manner; after optimization, is deployed to imitate the expert operations in the real environment.

Behavioral cloning is one of the most popular imitation learning methods Bain and Sammut (1995). Under Markov process, a dataset including pairs of expert operations over observed states , , is built. We consider learning a stochastic controller with the parameters set, assuming that includes stochastic operations, especially when the expert is human(s). Since is a distribution model parameterized by

(e.g. normal distribution), the following minimization of the negative log likelihood with

is employed for optimization of .

(1)

This optimization problem is basically solved by stochastic gradient descent (e.g. Adam 

Kingma and Ba (2014)) when is approximated by neural networks, i.e. contains network weights and biases. The above process is illustrated in Fig. 1.

obtained through this problem is optimized to equally represent all the data in as well as possible. If is ideal and huge, after deployed should achieve the expert imitation properly. However, if contains incorrect operations, as this paper deals with, a risk of interference, such as requiring different for the same , will be increased, leading to failure of the proper expert imitation. In addition, the smaller is, the more the effect of such noise becomes apparent.

2.2 VisualBackProp

Figure 2:

Flow of VisualBackProp: by backpropagating the features at the respective CNN layers, the attention map (or mask image) is extracted to reveal the region of interest for the output.

VisualBackProp is one of the methods to enhance the interpretability of the outputs obtained through CNNs Bojarski et al. (2018). Following the process below (illustrated in Fig. 2), an attention map (or a mask image) is generated from features obtained in the respective CNNs corresponding to the input image. This attention map identifies the region of interest that contributes significantly to the output.

  1. Get the feature averaged in the channel direction, , from the -th CNN layer closest to the output layer, and set it as .

  2. Pass through the deconvolution layer Zeiler et al. (2011) with weights of one and a bias of zero, as , to match the feature size of the -th layer.

  3. Compute the element product of and as .

  4. Decrement and repeat the steps 2 and 3 until reaching the first CNN layer.

  5. Normalize to make all the components within .

When ReLU functions are employed as the activation functions for the respective CNNs, their features become non-negative, and it can be expected that the mask image, in which only pixels with high contribution have non-zero values only by the element product, is extracted. However, if some of the features are with excessive values, their effects will remain unless the counterpart of the element product is perfectly zero, and they may overwrite other features. Excessive values of some features are prone to occur when noise is mixed in with the input, and therefore, we have to consider this problem in this paper. In addition, the feature obtained by passing through CNNs is not directly converted to the output, but may be further formatted by FCNs. Since VisualBackProp ignores the effects of FCNs, it is difficult to say that it truly generates the region of interest that contributes to the output.

3 Noise-robust behavioral cloning

vs.
vs.
vs.
Figure 3: Analyses of noise robustness in the proposed method: (a) since the monotonic increasing function with is larger than or equal to (see eq. (4)), the gradient of at is smaller than that of (see eq. (5)); (b) can be defined as the function of (see eq. (6)), and its gradient with converges to zero at ; (c) since the ratio of the gradients of and , , is expressed as an exponential function (see eq. (7)), data can be exponentially weighted according to their losses and the noisy data with larger losses would be ignored.

3.1 Tsallis statistics

Tsallis statistics refers to the organization of mathematical functions and associated probability distributions proposed by Tsallis 

Tsallis (1988); Suyari and Tsukada (2005). This concept is organized based on -deformed exponential and logarithmic functions, which are extensions of general exponential and logarithmic functions by real number . Tsallis statistics has various properties, and machine learning methods that take advantage of these properties have been proposed, such as Kobayashi (2020). We introduce the following -logarithm for our method.

The -logarithm, with , is given as follows:

(2)

where gives its shape. Regardless of , belongs to monotonic increasing function.

3.2 Formulation with -log likelihood

The proposed noise-robust behavioral cloning can easily be derived with eqs. (1) and (2). Specifically, given , the log likelihood in eq. (1) is replaced by the -log likelihood as follows:

(3)

When , this can be reverted to the standard behavioral cloning. Note that since including is the monotonic increasing function as mentioned before, the direction of learning itself is invariant with this replacement.

3.3 Analysis of noise robustness

We show from two analyses why this formulation is robust to noise. We notice again that when is represented by neural networks, the behavioral cloning problem is solved by stochastic gradient descent (e.g. Adam Kingma and Ba (2014)), hence, the gradient property is important for the analyses. The following analyses can be illustrated in Fig. 3.

Before the analyses, we assume that the number of noisy data is few compared to the number of normal data. In addition, the loss (i.e. the negative (-)log likelihood) for the noisy data is larger than the others. This is a natural assumption since the limited resources () are allocated to represent the normal and majority data and the remaining has no enough capability to do the noisy data, although it inhibits learning the normal data.

First, for , the following inequality is satisfied.

(4)

The equality is valid only when . The special case of this inequality is with . In order to satisfy this inequality while matching on , the following two inequalities must be satisfied since is monotonic.

(5)

Although is often the case for the noisy data with a large loss, the gradient of the proposed method becomes small. That is, it does not try to reduce its loss relative to the other normal data with . The proposed method therefore achieves learning with priority on the normal data.

For a more precise analysis, we derive the ratio of the gradients for and as . This can be easily gained by representing as a function of .

(6)

where is utilized. Its gradient for (i.e. the gradient ratio or weight ) can be analytically given as follows:

(7)

Note that this equation can cover the case with . This means that each data is exponentially weighted according to its own loss. That is, with for the noisy data, converges to zero, hence the noisy data would be ignored. In addition, the smaller yields the faster the convergence of . However, please note that if is too small, even the normal data will be ignored, and therefore, we have to tune appropriately by checking test data.

4 Modified VisualBackProp

Figure 4: Modified version of VisualBackProp: to alleviate the effects of large features, all the features are normalized before the element product; to visualize the effects of FCNs, the features in FCNs are also backpropagated through the sparse connection matrices.

4.1 Normalization of intermediate features

This section implements a minor fix to VisualBackProp Bojarski et al. (2018). The modified VisualBackProp is shown in Fig. 4.

In the original VisualBackProp, the features with large values are back-propagated to the input mask, and the resulting region of interest may look like a blurred image of the entire input image and cannot be limited to a specific region. Although it is a naive approach, we can alleviate this problem by normalizing all components of each feature to be . This process is expected to make all components of the mask in each layer also , resulting in that unnecessary information will be removed as zero and important information will remain as one.

4.2 Backpropagation from fully connected networks

As another issue in the original VisualBackProp, we consider the effects of FCNs after CNNs. The main difference between FCNs and CNNs is the deconvolution process, except that the features of FCNs can be backpropagated by the same procedure as for CNNs.

The deconvolution process for FCNs can be represented by transposing the weight matrix. However, if all the weights are set to 1 as in the case of CNNs, all the components of the expanded feature will be the sum of the feature components before the expansion because they are all combined, unlike CNNs. To avoid this problem, we introduce a sparse connection matrix where only the top 10% of the forward weight matrix is 1 and the rest is 0. This allows us to backpropagate the FCN features with high importance (i.e. with large weights).

5 Experiment

Figure 5: Electric wheelchair as one of the personal mobility: the joystick commands can be sent from ROS2, so that the same control system can be shared between the driver and the system; the front camera is positioned and angled so that it overlaps with the driver field of view as much as possible.
Figure 6: Traveled course: the driver goes around a rectangular course with everyday objects placed in a messy manner around to make it easy to extract the visual features; a stop sign is placed diagonally opposite the starting point, and the wheelchair needs to stop before it.
Figure 7:

Network architecture to output the controller (policy): a compressed image is fed into this and passes through five CNNs until the height and width are one; the 128 channel features are considered as firings of the neurons and fed into three FCNs; finally the parameters (mean, variance) of a two-dimensional diagonal normal distribution are outputted.

Figure 8: Evaluation of the effect of : when is too small, the proposed method collapses the mapping from state to action; with moderate ( in this case), the proposed method outperforms the standard behavioral cloning.
Figure 9: Learning curves for as the proposed method and

as the conventional method: the update speed of the proposed method is rather slow in the early stage, probably because the overall loss is large and many data are with small weights (i.e. gradient ratios

); thanks to the noise robustness, the proposed method outperforms the conventional method from the middle stage and succeeds in reducing the loss steadily thereafter.
At corner
In front of stop sign
When stopped
Figure 10: Snapshots and the corresponding regions of interest when demonstrated by the conventional method: the wheelchair succeeded in turning the left at the corner noisily, but it completely missed the stop sign and eventually crossed the stop line.
At corner
In front of stop sign
When stopped
Figure 11: Snapshots and the corresponding regions of interest when demonstrated by the proposed method: the wheelchair succeeded in turning the left at the corner smoothly, and it found the stop sign from a distance and successfully stopped just on the stop line.
Original
With normalization
With consideration of FCNs
With both
Figure 12: Regions of interest extracted by the original and modified VisualBackProps: (a) the original is blurred overall; (b) by adding the normalization, the region of interest is clearer and emphasizes the stop sign more; (c) by adding the consideration of FCNs, the attention to the right shelf concentrated on the upper right; (d) by adding both, the region of interest is focused on the stop sign, which brings about stopping behavior, and on the upper right, which brings about normal operation in this corner, and we can interpret that the stronger stop sign yields stopping behavior.

5.1 Experimental setup

5.1.1 Dataset

The proposed method is validated through an autonomous driving task of an electric wheelchair. Our wheelchair is based on Whill Model CR with two cameras (Intel Realsense D435i) mounted overhead, as shown in Fig. 5. This wheelchair can control its translational and turning speeds by tilting a joystick in the hand back and forth, left and right. The observation state is the RGB image acquired by the front camera and compressed to 9696 pixels, and the action is the two-dimensional operated values of the joystick, which can be given from ROS2 Maruyama et al. (2016) without operating the joystick. Note that although two cameras were mounted on both the front and back sides, only the front camera was used in the experiment for simplicity.

When collecting the dataset, one driver attempted to drive the wheelchair clockwise and counterclockwise around a rectangular course. A stop sign was placed at the second corner from the starting point, and one trajectory was defined until stopping in front of it for three seconds (see Fig. 6). State and action were stored at 50 fps, and in total, we collected 90 trajectories with eight patterns. They were divided 82 trajectories with 21137 state-action pairs into the training dataset and the remaining eight trajectories with 1149 state-action pairs into the test dataset. Here, to show robustness to noisy data, we intentionally mixed in two trajectories of the training data that did zigzag and/or not stop before the stop sign.

5.1.2 Architecture

For approximating the stochastic controller , we combine CNNs and FCNs as described in Fig. 7

. This architecture is implemented by PyTorch 

Paszke et al. (2017). The activation function for each layer is the ReLU function, and we introduce InstanceNorm Ulyanov et al. (2016) for CNNs and LayerNorm Ba et al. (2016) for FCNs to stabilize learning. Since the controller is modeled as a multivariate diagonal normal distribution, the more specific outputs from the architecture are the mean and variance parameters.

To train this architecture, we use Adam Kingma and Ba (2014), which is the most popular stochastic gradient descent optimizer, with a batch size of 512 and a learning rate of

. One epoch of training is to use all the training data once randomly, and the training is forcibly terminated after 100 epochs. In order to take into account the randomness of the initialization, the training is performed three times for each condition and the mean of the results is used for comparison.

5.2 Scores for test dataset

First, we investigate the effect of

, a hyperparameter added in the proposed method. The scores of training with

with 0.1 decrements from are shown in Fig. 8. Note that since the loss function is modified in the proposed method, the original negative log likelihood for the test dataset was employed as score. As expected, we found that too small resulted in extremely poor score, since it excludes most of the data as noise and fits the remaining few. On the other hand, for , their scores were roughly the same as that of the conventional method (), with a minimum at . In fact, comparing the learning curves with and , Fig. 9 can see that the learning curve of the proposed method was noticeably lower than that of the conventional method.

From these results, we conclude that by specifying the appropriate , the proposed method can increase the likelihood of the controller for the test dataset than the conventional method. This fact indicates that while the conventional method updates the controller in the wrong direction to represent even the noisy data contained in the training dataset, the proposed method can properly exclude them and preferentially fit the data similar to the test dataset. As a remark needs to be adjusted according to the problem, but the best result can be obtained without a large burden by various efficient meta-optimization methods Srinivas et al. (2010); Salinas et al. (2020); Aotani et al. (2021) or even by the grid search as in this paper.

5.3 Demonstrations

We show examples of autonomous driving in which the controller learned by the conventional method (i.e. with ) or the proposed method at the best (i.e. with ) is deployed. Details of the demonstrations can be found in the attached video. Note that the region of interest here was visualized using the conventional VisualBackProp, where blue/yellow regions have the low/high attentions.

First, the demonstration using the conventional method is shown in Fig. 10. It is easy to see in the video, but the joystick command moved noisily to the left and right even when going straight due to the effects of zigzagging. In addition, the wheelchair did not pay attention to the stop sign when finding it, and failed to stop at an appropriate distance, which can be confirmed by the white line on the floor. When the wheelchair stopped, it still paid attention to the shelf on the right (probably as a guide for going straight and/or turning left) instead of paying attention to the stop sign. Therefore we can say that the imitation failed and the region of interest was clearly wrong, indicating that the influence of noisy data was strong.

In contrast, the demonstration using the proposed method was successfully completed, as shown in Fig. 11. The joystick command was hardly noisy even when going straight ahead. In addition, it can be seen that the wheelchair started to pay attention to the stop sign when finding it. When the wheelchair finally stopped at the appropriate distance, it paid the most attention to the stop sign, indicating that it used this as a landmark for its stopping motion. Thus, we can conclude that the proposed method did not get confused by noisy data, but relied on the optimal other data for successful imitation.

5.4 Analysis by modified VisualBackProp

As can be seen in Fig. 11, although VisualBackProp extracted the region of interest that seems to be natural, the whole image was blurred. Therefore, we judged that its visualization would have room for improvements. We examine the effects of the two proposed modifications, i.e. normalization and consideration of FCNs, on Fig. 11.

Fig. 12 shows the regions of interest by the respective modifications. First, it is noticeable that the region of interest was clearer with the normalization. This allows us to judge with more confidence that the wheelchair was stopped by checking the stop sign. Although the consideration of FCNs made the region of interest slightly more blurred (probably due to the insufficient sparseness), it can be seen that the emphasis on the entire right shelf was reduced to focus only on the upper right. In fact, objects with unique colors in the course are placed in the upper right, suggesting that they can be easily used as landmarks for going straight and/or turning left. By integrating these modifications, the region of interest could be limited to the stop sign and the objects in the upper right, while paying stronger attention to the stop sign. This implies that the driver and the learned controller make a decision between stopping and going straight and/or turning left (in this corner) based on these two characteristic regions of interest, and that they shift to the stopping behavior when the stop sign is close enough.

As a consequence, the region of interest was clearer than that of the conventional VisualBackProp, and its contents were natural enough to be interpreted. Therefore, it is suggested that the proposed modifications can surely improve the visualization performance.

6 Discussion

6.1 Limitations of the proposed method

We experimentally confirmed that the proposed method can indeed achieve robust imitation against noise. However, unless is appropriately tuned, the proposed method may collapse the controller, although meta-optimization is possible at relatively low cost, as mentioned before Srinivas et al. (2010); Salinas et al. (2020); Aotani et al. (2021). In addition, it is not obvious whether there exists that necessarily outperforms the conventional method. Especially when the variance of the expert controller is large, or when the action space is discrete and there are many choices, is basically satisfied, and almost all data can be with weights of less than 1 (ultimately 0). In that cases, should be better to have the non-zero weights, but that would revert noise sensitivity again. Therefore, although the proposed method is effective for autonomous driving tasks in which the control command is continuous and relatively deterministic, we have to carefully use the proposed method for imitating more general tasks.

Since the proposed method is formulated based on the standard behavioral cloning, it inherits the problems of behavioral cloning (except for the noise sensitivity). For example, the open issues about covariate shift and compounding error are often discussed in the literature of imitation learning Laskey et al. (2017); Brantley et al. (2020); Ho and Ermon (2016). In the near future, the proposed method should be properly integrated with methods that mitigate these problems.

6.2 Alternative interpretations of behavioral cloning

Behavioral cloning is formulated by eq. (1), but new optimization problems have been reported by reinterpreting it as another optimization problem Sasaki and Yamashina (2021); Ghasemipour et al. (2020). Specifically, eq. (1

) is equivalent to minimizing the following Kullback-Leibler divergence.

(8)

where denotes the expert controller and is the stochastic dynamics of the environment. The expectation operation for these two distributions can be replaced by that for the dataset by Monte Carlo approximation, and the entropy can be excluded due to irrelevance to the optimization problem. As a result, the optimization problem for the standard behavioral cloning is obtained.

The proposed method with eq. (3) can be reinterpreted in the same way. In Tsallis statistics, the -deformed Kullback-Leibler divergence (or Tsallis divergence) is also defined with a similar but different form Nielsen and Nock (2011); Gil et al. (2013).

(9)

In addition, the decomposition of -logarithm is specially given as follows:

(10)

With these two definitions, we derive the following minimization problem.

(11)

where denotes Tsallis entropy, which can be excluded. This is consistent with eq. (3), except that it is multiplied by . However, since is unknown with some exceptions (see later), it must be removed somehow.

As the first removal method, we assume that . With notice that we do not calculate the gradient of for this substitution, we have , which cancels out the gradient ratio that arises when considering the gradient of , as defined in eq. (7). Hence, under this assumption, the above optimization problem is perfectly consistent with the standard behavioral cloning.

As the second way, we assume that , where the expert took all the actions with a constant likelihood to collect the dataset. In this case, the above optimization problem is consistent with eq. (3). Hence, we can conclude that the proposed method is equivalent to the minimization problem of Tsallis divergence under the assumption of .

This interpretation can be exploited, for example, to utilize Renyi divergence Nielsen and Nock (2011); Gil et al. (2013) as a new minimization problem. It can be transformed invertibly to Tsallis divergence, and it is possible that the gradient generated by the invertible transformation may provide different learning properties from the proposed method.

6.3 Other applications of the proposed method

While this paper utilized the property for for the noise robustness, other applications can be discussed. For example, if the task to be imitated has multiple correct solutions, the dataset will contain a wide variety of trajectories, and imitating all of them will require a very high level of approximation capability to CNNs and FCNs (and a model for the stochastic controller). In such a case, the proposed method limits the number of trajectories to be imitated by excluding some of the various trajectories as noise, and thus it can be trained by a standard implementation. This can be interpreted as the dataset distillation Wang et al. (2018) in the loss function stage implicitly.

According to this interpretation, the proposed method should be effective in distilling the model Rusu et al. (2015); Gou et al. (2021). Ideally, the distilled model should have the same level of performance as the original, but depending on its size, some performance degradation is inevitable. In such a case, the proposed method would be able to achieve a distillation that excludes selectively some of the features but retains the rest, rather than degrading the overall performance.

As a remark, in the above minimization problem of Tsallis divergence, was assumed to be unknown in general, but can be revealed in the model distillation. In this case, for the standard behavioral cloning, the gradient ratio is given by . That is, the weighting is relative in this form, whereas it was absolute in the proposed method. Although the expected behavior is similar, it will be possible to prioritize relatively important data by appropriately weighting cases where the variance of is large and the entire data tends to be ignored.

7 Conclusion

In this paper, we proposed a novel behavioral cloning method based on Tsallis statistics that is robust to the small and noisy personal dataset especially in the automated personal mobility task. Specifically, we focused on that the standard behavioral cloning utilizes the log likelihood of the stochastic controller, and replaced it with the -log likelihood. We showed analytically that this replacement provides the noise robustness. We also identified minor issues with VisualBackProp, which is useful for visually verifying task performance, and implemented the ad-hoc solutions, i.e. the normalization of all the features and the consideration of FCNs. With the experimental results, it can be concluded that the proposed method can learn correctly even by the dataset that conventionally fail to be imitated, and has the similar region of interest to the driver.

In the future, we aim to conduct larger-scale experiments and further improve imitation learning based on Tsallis statistics. In particular, we would like to investigate and analyze whether this concept can be successfully used to solve covariate shift and compounding error, which are open issues in behavioral cloning.

Acknowledgements

This work was supported by The Support Center for Advanced Telecommunications Technology Research Foundation (SCAT) Research Grant.

References

  • T. Aotani, T. Kobayashi, and K. Sugimoto (2021) Meta-optimization of bias-variance trade-off in stochastic model learning. IEEE Access 9, pp. 148783–148799. Cited by: §5.2, §6.1.
  • B. D. Argall, S. Chernova, M. Veloso, and B. Browning (2009) A survey of robot learning from demonstration. Robotics and autonomous systems 57 (5), pp. 469–483. Cited by: §1.
  • J. L. Ba, J. R. Kiros, and G. E. Hinton (2016) Layer normalization. arXiv preprint arXiv:1607.06450. Cited by: §5.1.2.
  • M. Bain and C. Sammut (1995) A framework for behavioural cloning.. In Machine Intelligence 15, pp. 103–129. Cited by: §1, §2.1.
  • M. Bojarski, A. Choromanska, K. Choromanski, B. Firner, L. J. Ackel, U. Muller, P. Yeres, and K. Zieba (2018) Visualbackprop: efficient visualization of cnns for autonomous driving. In IEEE International Conference on Robotics and Automation, pp. 4701–4708. Cited by: §1, §2.2, §4.1.
  • K. Brantley, W. Sun, and M. Henaff (2020) Disagreement-regularized imitation learning. In International Conference on Learning Representations, Cited by: §6.1.
  • F. Codevilla, M. Müller, A. López, V. Koltun, and A. Dosovitskiy (2018) End-to-end driving via conditional imitation learning. In IEEE International Conference on Robotics and Automation, pp. 4693–4700. Cited by: §1, §1.
  • F. Codevilla, E. Santana, A. M. López, and A. Gaidon (2019) Exploring the limitations of behavior cloning for autonomous driving. In

    IEEE/CVF International Conference on Computer Vision

    ,
    pp. 9329–9338. Cited by: §1.
  • S. K. S. Ghasemipour, R. Zemel, and S. Gu (2020) A divergence minimization perspective on imitation learning methods. In Conference on Robot Learning, pp. 1259–1277. Cited by: §6.2.
  • M. Gil, F. Alajaji, and T. Linder (2013) Rényi divergence measures for commonly used univariate continuous distributions. Information Sciences 249, pp. 124–131. Cited by: §6.2, §6.2.
  • J. Gou, B. Yu, S. J. Maybank, and D. Tao (2021) Knowledge distillation: a survey. International Journal of Computer Vision 129 (6), pp. 1789–1819. Cited by: §6.3.
  • J. Hawke, R. Shen, C. Gurau, S. Sharma, D. Reda, N. Nikolov, P. Mazur, S. Micklethwaite, N. Griffiths, A. Shah, et al. (2020) Urban driving with conditional imitation learning. In IEEE International Conference on Robotics and Automation, pp. 251–257. Cited by: §1, §1.
  • J. Ho and S. Ermon (2016) Generative adversarial imitation learning. Advances in neural information processing systems 29, pp. 4565–4573. Cited by: §6.1.
  • A. Hussein, M. M. Gaber, E. Elyan, and C. Jayne (2017) Imitation learning: a survey of learning methods. ACM Computing Surveys 50 (2), pp. 1–35. Cited by: §1.
  • M. Hussein, B. Crowe, M. Clark-Turner, P. Gesel, M. Petrik, and M. Begum (2021) Robust behavior cloning with adversarial demonstration detection. In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 7835–7841. Cited by: §1.
  • W. E. L. Ilboudo, T. Kobayashi, and K. Sugimoto (2020) Robust stochastic gradient descent with student-t distribution based first-order momentum. IEEE Transactions on Neural Networks and Learning Systems. Cited by: §1.
  • W. E. L. Ilboudo, T. Kobayashi, and K. Sugimoto (2021)

    Adaptive t-momentum-based optimization for unknown ratio of outliers in amateur data in imitation learning

    .
    In IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 7828–7834. Cited by: §1.
  • D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §2.1, §3.3, §5.1.2.
  • G. Klančar, D. Matko, and S. Blažič (2009) Wheeled mobile robots control in a linear platoon. Journal of Intelligent and Robotic Systems 54 (5), pp. 709–731. Cited by: §1.
  • T. Kobayashi (2020)

    Q-vae for disentangled representation learning and latent dynamical systems

    .
    IEEE Robotics and Automation Letters 5 (4), pp. 5669–5676. Cited by: §1, §3.1.
  • A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §1.
  • M. Laskey, J. Lee, R. Fox, A. Dragan, and K. Goldberg (2017) Dart: noise injection for robust imitation learning. In Conference on robot learning, pp. 143–156. Cited by: §6.1.
  • Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), pp. 436. Cited by: §1.
  • J. Levinson, J. Askeland, J. Becker, J. Dolson, D. Held, S. Kammel, J. Z. Kolter, D. Langer, O. Pink, V. Pratt, et al. (2011) Towards fully autonomous driving: systems and algorithms. In 2011 IEEE intelligent vehicles symposium (IV), pp. 163–168. Cited by: §1.
  • Y. Maruyama, S. Kato, and T. Azumi (2016) Exploring the performance of ros2. In International Conference on Embedded Software, pp. 1–10. Cited by: §5.1.1.
  • S. Nakajima (2017) A new personal mobility vehicle for daily life: improvements on a new rt-mover that enable greater mobility are showcased at the cybathlon. IEEE Robotics & Automation Magazine 24 (4), pp. 37–48. Cited by: §1.
  • A. Y. Ng and S. J. Russell (2000) Algorithms for inverse reinforcement learning. In International Conference on Machine Learning, pp. 663–670. Cited by: §1.
  • H. G. Nguyen, J. Morrell, K. D. Mullens, A. B. Burmeister, S. Miles, N. Farrington, K. M. Thomas, and D. W. Gage (2004) Segway robotic mobility platform. In Mobile Robots XVII, Vol. 5609, pp. 207–220. Cited by: §1.
  • F. Nielsen and R. Nock (2011) A closed-form expression for the sharma–mittal entropy of exponential families. Journal of Physics A: Mathematical and Theoretical 45 (3), pp. 032003. Cited by: §6.2, §6.2.
  • T. Onishi, T. Motoyoshi, Y. Suga, H. Mori, and T. Ogata (2019) End-to-end learning method for self-driving cars with trajectory recovery using a path-following function. In International Joint Conference on Neural Networks, pp. 1–8. Cited by: §1, §1.
  • A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in pytorch. In Advances in Neural Information Processing Systems Workshop, Cited by: §5.1.2.
  • A. A. Rusu, S. G. Colmenarejo, C. Gulcehre, G. Desjardins, J. Kirkpatrick, R. Pascanu, V. Mnih, K. Kavukcuoglu, and R. Hadsell (2015) Policy distillation. arXiv preprint arXiv:1511.06295. Cited by: §6.3.
  • D. Salinas, H. Shen, and V. Perrone (2020)

    A quantile-based approach for hyperparameter transfer learning

    .
    In International Conference on Machine Learning, pp. 8438–8448. Cited by: §5.2, §6.1.
  • F. Sasaki and R. Yamashina (2021) Behavioral cloning from noisy demonstrations. In International Conference on Learning Representations, Cited by: §1, §6.2.
  • N. Srinivas, A. Krause, S. Kakade, and M. Seeger (2010) Gaussian process optimization in the bandit setting: no regret and experimental design. In International Conference on International Conference on Machine Learning, pp. 1015–1022. Cited by: §5.2, §6.1.
  • S. Suryanarayanan and M. Tomizuka (2007) Appropriate sensor placement for fault-tolerant lane-keeping control of automated vehicles. IEEE/ASME Transactions on mechatronics 12 (4), pp. 465–471. Cited by: §1.
  • R. S. Sutton and A. G. Barto (2018) Reinforcement learning: an introduction. MIT press. Cited by: §1.
  • H. Suyari and M. Tsukada (2005) Law of error in tsallis statistics. IEEE Transactions on Information Theory 51 (2), pp. 753–757. Cited by: §1, §3.1.
  • V. Tangkaratt, B. Han, M. E. Khan, and M. Sugiyama (2020) Variational imitation learning with diverse-quality demonstrations. In International Conference on Machine Learning, pp. 9407–9417. Cited by: §1.
  • C. Tsallis (1988) Possible generalization of boltzmann-gibbs statistics. Journal of statistical physics 52 (1-2), pp. 479–487. Cited by: §1, §3.1.
  • D. Ulyanov, A. Vedaldi, and V. Lempitsky (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §5.1.2.
  • T. Wang, J. Zhu, A. Torralba, and A. A. Efros (2018) Dataset distillation. arXiv preprint arXiv:1811.10959. Cited by: §6.3.
  • G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou (2018) Information-theoretic model predictive control: theory and applications to autonomous driving. IEEE Transactions on Robotics 34 (6), pp. 1603–1622. Cited by: §1.
  • Y. Wu, N. Charoenphakdee, H. Bao, V. Tangkaratt, and M. Sugiyama (2019) Imitation learning from imperfect demonstration. In International Conference on Machine Learning, pp. 6818–6827. Cited by: §1.
  • M. D. Zeiler, G. W. Taylor, and R. Fergus (2011) Adaptive deconvolutional networks for mid and high level feature learning. In 2011 International Conference on Computer Vision, pp. 2018–2025. Cited by: item 2.