Federated Imitation Learning: A Privacy Considered Imitation Learning Framework for Cloud Robotic Systems with Heterogeneous Sensor Data

Humans are capable of learning a new behavior by observing others perform the skill. Robots can also implement this by imitation learning. Furthermore, if with external guidance, humans will master the new behavior more efficiently. So how can robots implement this? To address the issue, we present Federated Imitation Learning (FIL) in the paper. Firstly, a knowledge fusion algorithm deployed on the cloud for fusing knowledge from local robots is presented. Then, effective transfer learning methods in FIL are introduced. With FIL, a robot is capable of utilizing knowledge from other robots to increase its imitation learning. FIL considers information privacy and data heterogeneity when robots share knowledge. It is suitable to be deployed in cloud robotic systems. Finally, we conduct experiments of a simplified self-driving task for robots (cars). The experimental results demonstrate that FIL is capable of increasing imitation learning of local robots in cloud robotic systems.


page 1

page 4

page 5

page 6

page 7

page 8


Federated Imitation Learning: A Novel Framework for Cloud Robotic Systems with Heterogeneous Sensor Data

Humans are capable of learning a new behavior by observing others to per...

Lifelong Federated Reinforcement Learning: A Learning Architecture for Navigation in Cloud Robotic Systems

This paper was motivated by the problem of how to make robots fuse and t...

Peer-Assisted Robotic Learning: A Data-Driven Collaborative Learning Approach for Cloud Robotic Systems

A technological revolution is occurring in the field of robotics with th...

Learning to Gather Information via Imitation

The budgeted information gathering problem - where a robot with a fixed ...

Cross Domain Robot Imitation with Invariant Representation

Animals are able to imitate each others' behavior, despite their differe...

The Virtuous Machine - Old Ethics for New Technology?

Modern AI and robotic systems are characterized by a high and ever-incre...

SA-Net: Deep Neural Network for Robot Trajectory Recognition from RGB-D Streams

Learning from demonstration (LfD) and imitation learning offer new parad...

I Introduction

Demonstrations provide a descriptive medium for specifying robotic tasks. Prior work has shown that robots can acquire a range of complex skills through demonstration, such as table tennis [1], drawer opening [2], and multi-stage manipulation tasks [3]. Nevertheless, there exists a number of problems in the application of imitation learning. For example, large amounts of data are required and sometimes isomorphic. Particularly, data of one robot cannot be shared with other robots. These drawbacks result in long training time for the robot and limited generalization performance.

In the paper, we address the problem of how to improve imitation learning of robots by taking advantage of knowledge from other robots. Moreover, we consider the premise of privacy protection and heterogeneous data.

Fig. 1: The child on the upper left acquires the ability to ride a bicycle by observing an adult. This is the process of imitation learning in humans. Correspondingly, the upper right robot acquires skills by training data. This is the process of imitation learning in robots. The bottom left child not only acquires bicycling skills by observing an adult, but also gets helps from an adult. This makes his learning more efficient. Inspired by this, in this work, FIL enable the bottom right robot not only acquires skills by training data, but also gets knowledge from other robots through the cloud robot system. So FIL increases imitation learning of local robots.

In order to realize the goal, we focus on the framework of cloud robotic system [4], which can enhance robotic systems by facilitating the process of sharing trajectories, control policies and outcomes of collective robot learning. Federated Imitation Learning (FIL) is presented in the work. As shown in Fig.1, It is inspired by the case that humans will learn more effectively if they have external guidance in imitation learning. With FIL, a robot is capable of taking advantage of knowledge from other robots to increase its learning. Even more, FIL considers information privacy and data heterogeneity when robots share knowlege. So it is suitable to be deployed in cloud robot systems with heterogeneous sensor data. To demonstrate it, we test FIL in a autonomous driving task. Experimental result indicates that FIL enables robots absorb knowledge from other robots and apply the knowledge to increase imitation learning. Overall, this paper makes the following contributions:

  • We present a FIL framework. It increases imitation learning of local robots in cloud robotic systems which may have heterogeneous sensor data and consider privacy protection.

  • We propose a knowledge fusion algorithm in FIL. It enables the cloud to fuse knowledge from other robots and generate guide models for robots with service requests.

  • An effective transfer learning method in FIL is introduced to facilitate local robots to acquire prior knowledge from the cloud.

Ii Related Theory

Ii-a Imitation learning

Imitation learning is a problem in machine learning that autonomous agents face when attempting to learn tasks from another, more-expert agent. The expert provides demonstrations of task execution, from which the imitator attempts to mimic the expert’s behavior [5]. At present, imitation learning has been applied to a variety of tasks, including autonomous flight [6], articulated motion [7], and end-to-end driving [8, 9]. Therefore, FIL is an important branch of artificial intelligence. Imitation learning is mainly divided into two methods: behavioral cloning and inverse reinforcement learning. This work is based on behavioral cloning. In order to learn a skill, it is necessary to determine the most effective strategy representation. The most effective representation of a strategy is often the direct mapping of state features to trajectory actions, which is called behavioral cloning (BC). BC is a method of using supervised learning from state to action with given state-action teaching data set. For robotic systems, BC mainly focuses on behavior trajectory planning. System dynamics is not a key factor in it, so BC is a good choice.

Early imitation learning studies interpreted the model-free BC approach as supervised learning. BC has been used in various applications. For instance, it has recently been used in the context of autonomous driving [10] and autonomous control of grasping [11]. BC is powerful when the agent requires only demonstration data to directly learn an imitation policy and does not require any further interaction between the agent and the environment. In [12], a neural network was trained in 1989 for an automated driving system. The work built a model from camera image to steering angle mapping. But it is not successful in practice. Reasons for failure include the constraints of the computing power and the size of the training set. However, [9,13] accomplished similar works and achieved success in a physical environment. These work has improved the teaching data set after using simulation platforms. At present, the BC approach can be rather brittle due to some well-known problems, such as limited data set of teaching, various state of apprentices’ distribution of teaching data set. Supervised learning assumes that the training dataset samples are independent and identically distributed. So supervised learning is difficult to generalize for unknown scenarios. Some work has improved this issue, for example, one-shot imitation is achieved in [14]. However, it is still limited to a single robot and does not take syetem-level applications into account. Our work improved this issue, the teaching data set can be expanded infinitely with the cloud robotic framework. FIL compensates for distribution differences and increases training speed. FIL accomplishes it by enanbling robot to acquire knowledge from other robots in cloud robotic system.

Ii-B Cloud robotic system

FIL fits well with cloud robotic system. Cloud robotic system usually relies on many other resources from a network to support its operation. Since the concept of the cloud robot was proposed by Dr. Kuffner in 2010 [15], the research on cloud robots is rising gradually. Cloud robotics is an emerging field of robots embedded in networking, cloud computing, cloud storage [16]. While providing advantages of powerful computation, storage, and communication resources of modern data centers, it also allows robots to benefit from the platform which includes infrastructure and shared services [17]. Concretely, this approach has been used in mapping [18] and localization [19], perception [20], grasping [21], visuomotor control [22], speech processing [23], and other applications. Cloud robots have high requirements for communication. Wang et al. present a framework targeting near real-time MSDR, which grants asynchronous access to the cloud from the robots [24]. Sandeep Chinchali et al. present network offloading policies for cloud robotics [15]. It is worth noting that with the rapid development of artificial intelligence, knowledge sharing has become a prominent advantage of cloud robots. However, no specific privacy-considered knowledge sharing approach for cloud robots has been proposed up to now. We believe that the work present in this paper is the first privacy-considered knowledge sharing approach for cloud robotic systems with heterogeneous sensor data.

Ii-C Federated learning

Federated learning was first proposed in [26], which showed its effectiveness through experiments on various datasets. In federated learning systems, the raw data is collected and stored at multiple edge nodes, and a machine learning model is trained from the distributed data without sending the raw data from the nodes to a central place [27, 28]. Federated learning is a machine learning method that is capable of protecting the privacy of users. It solves the problem of data silos. In practice, island data has different distributions. And it is reflected by different type of sensor data in cloud robotic systems. To address the issue, we present a knowledge fusing algorithm in FIL. FIL cleverly utilizes the simulation platform to achieve simultaneous acquisition of heterogeneous data. It enables federated learning to be applied to cloud robot systems with heterogeneous data.

Generally, this paper focuses on developing an imitation learning framework for cloud robotic systems, which is capable of protecting privacy and acclimatizing heterogeneous sensor data condition. This framwork is well fit in cloud robot systems.

Iii Methodology

The objects in the FIL framework include local robots, cloud servers, and requesting service robots. In FIL, local robots acquire skills through imitation learning. The cloud integrates the skills of the local robots and provides a corresponding cloud model. The cloud model serves as a guide model for the requesting service robot. Throughout the process, FIL considers the privacy of local robots so the cloud does not allow access to raw training data of local robots. FIL considers the heterogeneity of local robots’ training data. It realizes the fusion of heterogeneous knowledge with the algorithm presented in the paper. FIL is capable of increasing imitation learning without sacrificing privacy of robots in cloud robotic systems. It enables robots to learn skills with Cloud-Robot-Environment setup. Communication devices and a cloud server are supporting facilities of FIL. Unlike distributed algorithms such as A3C or UNREAL, FIL can be executed synchronously or asynchronously. Because cloud servers in FIL only need to get network parameters rather than gradients, It avoids the confinement of communication in traditional cloud robotic systems. In summary, FIL has the following advantages:

  1. FIL has the ability to fuse heterogeneous knowledge;

  2. FIL is capable of protecting the privacy of local robots;

  3. FIL can be executed synchronously or asynchronously, which reduces the requirement of communication.

Iii-a Framework of FIL

The framework of FIL is performed in Cloud-Robot-Environment setup. There are local robots, cloud servers, communication services and computing device. Local robots learn skills through imitation learning and the cloud server fuses knowledge. We develop a federated learning algorithm to fuse private models into the shared model in the cloud. Then the cloud server is capable of generating guide models corresponding to requests of local robots. After that, the local robots perform transfer learning based on the guide model. After that, the final policy will be quickly obtained. As illustrated in Fig.2, we explain the methodology in FIL with the example of a simplified self-driving task.

Fig. 2: FIL framework. We assume that there are three agents to fulfil the task. They perform imitation learning with heterogeneous data: RGB images, depth images and semantic segmentation images. Raw training data is not allowed to be shared between agents. Cloud does not have the right to obtain the original training data from agents either. Agents perform imitation learning to acquire the policy models. Only the parameters of the policy models are uploaded to the cloud. The cloud fuses the knowledge and then provide guide models to robots that request services. It facilitates the process of imitation learning.

In order to accomplish the simplified self-driving task, the three agents use three different types of training data separately. The training data is labeled and can be obtained by manually or simulation platform. Agent A trains RGB images and obtains a private policy model named private model A. The model takes RGB images as input. It outputs actions or the parameters that can determine actions. Similar processes occur in agent A and agent B. Outputs of the three private models are with same types. The inputs to the three models are different: RGB images, depth images, and semantic segmentation images. Then the parameters of the three models are uploaded to the cloud. The cloud fuses these models. Different from local data base, the cloud has its own data base. The data base in cloud has no labels. The cloud server lables the data with the uploaded private models. That is the process of knowledge fusion. Then it is capable of generating guide models with different types of input. When a local robot requests a service, the cloud provides a guide model corresponding to the type of sensor data. FIL can be performed online or offline. The framework of offline FIL can be summarized into four steps:

  • Step1: Local robots perform imitation learning

  • Step2: Transmit parameters of private models

  • Step3: Fuse knowledge in the cloud

  • Step4: Provide guide models when there are requests from local robots

When online, step1 and step 2 are simultaneous. Labels of the cloud data will be updated simultaneously, as presented in algorithm 1.

Initialize action-value Q-network with random weights ; Input: : number of local robots ; : update frequency.
while cloud server is running do
       performs imitation learning if  then
             for  do
                   Send to the cloud;
             end for
            labels=fuse(, ,,)
       end if
      if service_request=True then
             Generate base on labels; Send to local robots; =transfer().
       end if
end while
Algorithm 1 Processing Algorithm in FIL
Fig. 3: Knowledge fusion algorithm deployed in the cloud in a self-driving case. Agent obtain private models by performing imitation learning. The cloud stored different types of sensor data of every scenes. Input corresponding sensor data to private models. Then calculate the numerical characteristics of private models outputs to label scenes. With multi types of sensor data, the cloud is capable of generating guide models corresponding to the sensor type of the local robot.

Key algorithms in this framework includs the knowledge function algorithm and transferring approaches. We present them in the following.

Iii-B Knowledge fusion algorithm in cloud

In traditional federated learning algorithms, heterogeneous data training is still difficult to achieve. But in the cloud robot system, we ingeniously utilize the function that the robot simulation system can collect different sensor data for a same scene. Base on this, we develop a knowledge fusion algorithm to obtain the shared model in cloud. The algorithm is efficient to fuse parameters of networks trained from different robots that may have heterogeneous training data. The algorithm receives local network parameters and fuses them into the sharing network.

Fig.3 presents the knowledge fusion algorithm deployed in the cloud. The cloud server’s primary responsibility is to label scenes. Before this, it is required to collect different types of sensor data that depends on the types of sensor data in local robots. For example, in the simplified self-driving task presented in the paper, the cloud needs to collect RGB images, depth images and semantically segmented images. So one scene has three types of images. Every private model has a corresponding type of images. Based on this, private models provide its own decision suggestion to the cloud. The cloud will label these scenes with suggestions from the private models. The calculation approach of labels can draw on some methods from ensemble learning. It can be defined according to different application scenarios. For instance, in the self-driving case presented in the paper, we take the median of outputs. Because in this task, private models output the steer of the agent. Agents usually make decisions of along the road, turn, obstacle avoidance and so on. Generally speaking, the output of the model includes two extreme cases: turning and no turning. Errors in decisions often occur in the two extreme cases. Median can avoid extreme values in the evaluation, so we take the median of private models outputs as the label of the current scene. With data and labels, cloud models will be trained immediately. The process can be summarized to Formula (1) to Formula (5):


In the Formular (1),

is the data set of the local robot i. L represents the loss function.

repensents parameters of models. are original data and are labels. N is the number of samples of the data set.


The Formular (2) presents training targets of local robots. It is the structure risk minimization criteria.


In the above formular, is the training data in the cloud and i represents sensor types.


In Formular (4) and (5), is the median of , is the training targets in the cloud. M represents the number of sample data. is the regularization term of L2 norm, which is used to reduce the parameter space and avoid over-fitting and lambda is used to control the intensity of regularization.

It should be explained that the shared model in the cloud is not the final policy model of the local robot. We only use the shared model in the cloud as a guide model for local robots. The shared model maintained in the cloud is a cautious policy model but not the optimal for every robot. That is to say, the shared model in the cloud will not make serious mistakes in some private unstructured environments but the action is not the best. It is necessary for the robot to train its own policy model based on the shared model from the cloud, otherwise the error rate will be high in its private unstructured environment. As the saying goes, the older the person, the smaller the courage. In the process of lifelong learning, the cloud model will become more and more “timid”. In order to reduce the error rate, we should transfer the shared model and train a new private model through imitation learning in a new environment. This responded to the transfer learning process in FIL.

Iii-C Transfer the shared model

There have been a lot of valuable studies on transfer learning. Current research mainly focuses on transferring through the relationship between source domain distribution and target domain distribution. This method is theoretically infeasible in cloud robotic systems. If local robots upload their training data to the cloud, the communication resources they need are unbearable. Therefore, cloud training data and local training data are not permitted to be calculated at the same time. Transfer learning from cloud to local belongs to supervised transfer learning. Moreover, the local robot can get the model parameters but without training data in the cloud.

Under the constraints of the above conditions, Layer Transfer is the transferable learning method that can be implemented. Layer Transfer means that some layers in the model trained by source data are copied directly and the remaining layers are trained by target data. The advantage of this is that target data only needs fewer parameters to be trained, thus avoiding over-fitting. It has faster training speed and higher accuracy. On different tasks, the layers that need to be transferred are often different. For example, in speech recognition, we usually copy the last layers and retrain the first layers. This is because the first layers of speech recognition neural network are the way to recognize the speaker’s pronunciation, and the last layers are the recognition. The latter layers have nothing to do with the speaker. In image recognition, we usually copy the front layers and retrain the back layers. This is because the first layers of the image recognition neural network are to identify whether there is a basic geometric structure, so it can be transferred. The latter layers are often abstract and cannot be transferred. So, which layers to be transferred are case by case.

Fig. 4: A transfer learning method of FIL

As shown in Figure 4, we use front layers as feature extractors in the case of imitation learning. The decision model in the cloud can be used as the initial model for local training. In this way, cloud model can play a guiding role. It can speed up local training and increase the accuracy of local robots. In the training of local robots, the feature extraction layer is frozen and only the full connection layers are trained. If necessary, it can also adjust the relevant parameters in the process of back propagation. For example, some work may increase learning rate. After transfer learning, local robots successfully utilize knowledge from other robots in cloud robotic systems.

Iv Experiments

In this section, we describe our experimental setup and present the results of FIL. To verify the effectiveness of FIL, we need to answer two questions: 1) Is FIL capable of generating an effective shared model based on shared knowledge in cloud robotic systems? 2) Does the shared model of FIL improve the learning process or result of local robot with the transferring method? To answer the former question, we conduct experiments to generate a shared model and compare its performance with general models. To answer the latter question, we conduct experiments to compare the learning process and result of local robots with shared model and without sharing model.

Iv-a Experimental setup

Self-driving car is regarded as an advanced robot. So there are sufficient reasons to use cars to verify robot control algorithms. In the work, we use Microsoft AirSim and CARLA as our simulator to evaluate the presented approach. In addition to having high-quality environments with realistic vehicle physics, AirSim and CARLA have a python API which allows for easy data collection and control.

(a) Data collection for local
(b) Data collection for the cloud
Fig. 5: As present in subfigure a, we collected training data for local robots in three different environments corresponding to three different types of data. As present in subfigure b, we collected different types but simultaneous data in many different environments.

In order to collect training data, a human driver is presented with a first-person view of the environment (central camera). The driver controls the simulated vehicle using the keyboard. The driver keeps the car at a speed around 6m/s and strives to avoid collisions with cars or pedestrians, but ignores traffic lights and stop signs. As Fig.5 presents, we use three different types of sensor data: RGB images, depth images and semantic segmentation images. Because any of these three types of sensor data can make the agent to perform obstacle avoidance tasks within tolerable errors. The policy network mainly used convolution layers and fully connected layers. Then the output is used to produce the discrete action probabilities by a linear layer, followed by a softmax, and the value function by a linear layer. The architecture of the policy network similar to VGG-16 is presented in Fig. 6.

Fig. 6: Network architecture of imitation learning in experiments

As for the cloud robotic systems, the server and agent communicate via HTTP requests, with the data server utilizing the Django framework to respond to asynchronous requests. Our cloud programs run on the Microsoft Azure cloud. We conduct local robot experiments with a single NVIDIA Quadro RTX 6000, which allows us to run our simulator to receive photo-realistic images for training.

Fig. 7: Performance comparison of local models and cloud generated models. The table on the right present quantization error of every image. Numbers from left to right correspond to figures from left to right.

Iv-B Evaluation for the shared-model generating method in FIL

Fig. 8: Challenges in the self-driving task

In this section, the work conducts an experiment to verify the effectiveness of the method for generating shared model in the cloud and present the performance of the shared model. As presented in Fig. 7, we compare the six models in a neighborhood environment. Fig.8 presents this environment. The robot (car) should challenge the tasks such as avoiding collisions and making timely turns. The observations (images) are recorded by one central camera. The recorded control signal is the steering angle. The steering angle is scaled between -1 and 1, with extreme values corresponding to full left and full right, respectively. Considering the actual driving, we transfer the steering angle between -0.69 radians and 0.69 radians.

After training the three private policy networks, the cloud will work to fuse their knowledge. As mentioned before, we assume that there are three companies train their policies by imitation learning with heterogeneous sensor data. We are not allowed to share the training data between these agents or send the training data to the cloud. So the cloud server only gets the parameters of the three local networks and performs the Algorithm 2. Before that, we should collect different types of sensor data simultaneously from different environments in CARLA and Airsim. Then we will get the labels of every scene in cloud server. We trained local policy network 1 for company 1 based on RGB images, local policy network 2 for company 2 based on semantic segmentation images, and local policy network 3 for company 3 based on infrared images. Then the parameters of these three private networks without training data are uploaded to the cloud server. Finally, the cloud server gets shared labels of scenes in the cloud. It is capable of generating policy networks corresponding to different sensor requests. In the experiment of this work, we used three types of scene data. So there were three policy networks generated. We named them cloud policies. The process in the cloud is unsupervised learning. There is no manual labels but cloud generation. However, in order to compare the performance between local policies and cloud policies, we labeled some scenes to mark the result. In the simulation environment, the controller uses the policy network to control the robot.

Controller Hit the obstacle Miss turns Mistakes in straight
Local controller for RGB images 3.45% 12% 16.67%
Cloud controller for RGB images 0.69% 0 0
Local controller for depth images 0 20% 0
Cloud controller for depth images 0 4% 0
Local controller for segmentation images 0 12% 6.67%
Cloud controller for segmentation images 0 4% 0
TABLE I: Results of local policies and cloud policies.
Fig. 9: Procedure of the experiment for evaluating knowledge-transfer ability. The part of the left, we performed imitation learning in three different environments with three types of sensor data (RGB images, depth images and semantic segmentation images) and got three policies firstly. Secondly, the parameter of three private policies were sent to the cloud. The cloud generated labels of its own data base. Here is something to explained is that there are semantic segmentation difference between CARLA and AIRSIM, so there is a unifying process for semantic segmentation images. Thirdly, three policies corresponding to different sensor data were generated. After that, we performed transferring learning with corresponded type of images and obtained transferred policies. For comparison, we also conduct the general imitation learning experiment on the right part of the image. Finally, we compare these policies in a new environment (coastline in Airsim).
Fig. 10: Performance comparison of standard policies and transferred policies. The table on the right present quantization error of every image. Numbers from left to right correspond to figures from left to right.
Controller Error rate in normal Error rate in rain Error rate in snow Error rate in fog Error rate in dust
Standard controller for RGB images 17.39% 26.09% 30.43% 34.78% 52.17%
Transferred controller for RGB images 4.35% 8.70% 17.39% 31.82% 39.13%
Standard controller for depth images 13.04% 4.35% 8.70% 8.70% 17.39%
Transferred controller for depth images 10.87% 4.35% 8.70% 8.70% 15.22%
Standard controller for segmentation images 2.17% 2.17% 6.52% 21.74% 36.36%
Transferred controller for segmentation images 2.17% 2.17% 5.43% 17.39% 31.82%
TABLE II: Add caption

Fig. 7 presents results of these six policy networks in some main challenging scenes to the car. From the Fig.7, we can see that compared with local models, the cloud-based models present higher accuracy. The cloud-based model acquired knowledge from different sensors and scenes. So that it can avoid errors of local models training from single training set collected by one type of sensors. We conducted 5 experiments, each one sets a different starting point. Then we evaluated the performance of robots in obstacle avoidance, turning and straight forward tasks. The results are summarized in Table 1. It can be seen from the experimental results that the cloud knowlege improves the local controller that is trained using standard imitation learning. The controller based on cloud policies performes better. Especially the controller for RGB images.

Iv-C Evaluation for the knowledge-transfer ability of FIL

We conducted the experiment as illustrated in Fig. 9 to evaluate the knowledge-transfer ability in FIL. Corresponding to every type of training data, we obtained a pair of policies: a transferred policy and a standard policy. So there are three pairs of policies generated in the experiment. The performance of controllers based on these policies in key challenging tasks are presented in Fig. 10. The results are summarized in Table 2. From the results, we can see that the imitation learning models obtained in cloud robotic system perform significantly better in accuracy, compared with standard models that trained by traditional imitation learning without shared knowledge.

FIL improves the training process of imitation learning with the help of shared knowledge. There is a pre-trained model from the cloud for transfer in local imitation learning. So there is no need for local robots to learn from scratch. We present the comparison of train process in Fig 11. From the figure, we can see that the transferred policies have lower error starting point and the error value.

Fig. 11: Training process comparision of standard learning and transfer learning in FIL
(a) Rain
(b) Snow
(c) Fog
(d) Dust
Fig. 12: Bad weather conditions, rain, snow, fog, sand and dust.We conducted data collection and comparison experiments in these environments

The local policy model FIL also has better generalization. Controllers based on transferred policies from FIL perform better in different weather compared with policies trained by standard imitation learning. As presented in Fig.12, we conducted the experiments for controllers in different weathers. The results are presented in the last three rows of the table 2. The results showed that the model from FIL could improve the accuracy of the controller from standard imitation learning in bad weather. Hence, FIL is valuable in practical applications.

V conclusion

We presented an imitation learning framework FIL for cloud robotic systems. The framework is able to improve imitation learning of robots by taking advantage of knowledge from other robots. Additionally, we presented a knowleged fusion algorithm in FIL and introduced transfer methods. Our approach is able to fuse heterogeneous knowledge and protect privacy of local robots. Finally, we validated our architecture and algorithms in policy-learning experiments.

The framework only inproves the imitation learning approach in cloud robotic systems. We leave it as future work to make FIL flexible to deal with different learning approaches. Considering more types of sensor data is also one of the future work. A more flexible FIL will offer a wider range of services in cloud robotic systems.


  • [1] K. Mülling, J. Kober, O. Kroemer, and J. Peters, “Learning to select and generalize striking movements in robot table tennis,” The International Journal of Robotics Research, vol. 32, no. 3, pp. 263-279, 2013.
  • [2] M. Rana, M. Mukadam, S. R. Ahmadzadeh, S. Chernova, and B. Boots, “Towards robust skill generalization: Unifying learning from demonstration and motion planning,” in International Conference on Robot Learning, 2018.
  • [3] T. Zhang et al., “Deep imitation learning for complex manipulation tasks from virtual reality teleoperation,” in IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 1–8.
  • [4] B. Liu, L. Wang, M. Liu, and C. Xu, “Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems,” arXiv preprint arXiv:1901.06455, 2019.
  • [5] F. Torabi, G. Warnell, and P. Stone, “Recent Advances in Imitation Learning from Observation,” arXiv preprint arXiv:1905.13566, 2019.
  • [6] A. Giusti et al., “A machine learning approach to visual perception of forest trails for mobile robots,” IEEE Robotics and Automation Letters (RA-L), vol. 1, no. 2, pp. 661–667, 2015.
  • [7] P. Englert, A. Paraschos, J. Peters, and M. P. Deisenroth, “Model-based imitation learning by probabilistic trajectory matching,” in IEEE International Conference on Robotics and Automation, 2013, pp. 1922–1927.
  • [8] J. Zhang and K. Cho, “Query-efficient imitation learning for end-to-end simulated driving,” in AAAI, 2017.
  • [9] F. Codevilla, M. Miiller, A. López, V. Koltun, and A. Dosovitskiy, “End-to-end driving via conditional imitation learning,” in IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 1–9.
  • [10] M. Bojarski et al., “End to end learning for self-driving cars,” arXiv preprint arXiv:1604.07316, 2016.
  • [11] Jonatan S. Dyrstad, et al. “Teaching a Robot to Grasp Real Fish by Imitation Learning from a Human Supervisor in Virtual Reality,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2018: 7185-7192.
  • [12] D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in neural information processing systems, 1989, pp. 305–313.
  • [13] L. T, P. Yun, Y. Chen, C. Liu, and H. Y. M. Liu, “Visual-based Autonomous Driving Deployment from a Stochastic and Uncertainty-aware Perspective.” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2019
  • [14] T. Yu et al., “One-shot imitation from observing humans via domain-adaptive meta-learning,” arXiv Prepr. arXiv1802.01557, 2018.
  • [15] J. J. Kuffner, “Cloud-enabled robots,” in IEEE-RAS International Conference on Humanoid Robotics, Nashville, TN, 2010.
  • [16] L. Wang, M. Liu, Q. Meng. “A hierarchical auction-based mechanism for real-time resource allocation in cloud robotic systems,” IEEE transactions on cybernetics, vol. 47, no. 2, pp.473-484.
  • [17] S. A. Miratabzadeh et al., “Cloud robotics: A software architecture: For heterogeneous large-scale autonomous robots,” in 2016 World Automation Congress (WAC), 2016, pp. 1–6.
  • [18] G. Mohanarajah, V. Usenko, M. Singh, et al., “Cloud-based collaborative 3D mapping in real-time with low-cost robots,” IEEE Trans. Autom. Sci. Eng., vol. 12, no. 2, pp. 423–431, 2015.
  • [19] Luis Riazuelo, Javier Civera, et al. “C2tam: A cloud framework for cooperative tracking and mapping. Robotics and Autonomous Systems,” vol. 62, no.4, pp.401– 413, 2014.
  • [20] Javier Salmeron-Garcı, Pablo Inigo-Blasco, et al. “A tradeoff analysis of a cloud-based robot navigation assistant using stereo image processing,” IEEE Transactions on Automation Science and Engineering, Vol. 12, No. 2, pp. 444–454, 2015.
  • [21] Ben Kehoe, Akihiro Matsukawa, Sal Candido, James Kuffner, and Ken Goldberg. “Cloud-based robot grasping with the google object recognition engine,” In IEEE International Conference on Robotics and Automation (ICRA), pp. 4263–4270, 2013.
  • [22] Haiyan Wu, Lei Lou, Chih-Chung Chen, Sandra Hirche, and Kolja Kuhnlenz. “Cloud-based networked visual servo control,” in IEEE Transactions on Industrial Electronics, Vol. 60, No. 2, pp. 554 - 566, 2013.
  • [23] Komei Sugiura and Koji Zettsu. “Rospeex: A cloud robotics platform for human-robot spoken dialogues,” in IEEE International Conference on Intelligent Robots and Systems (IROS), pp.6155-6160, 2015.
  • [24] Wang L , Liu M , Meng Q H. “Real-Time Multisensor Data Retrieval for Cloud Robotic Systems,” IEEE Transactions on Automation Science and Engineering, vol. 12, no. 2, pp. 507-518, 2015.
  • [25] Chinchali S, Sharma A, Harrison J, et al. “Network offloading policies for cloud robotics: a learning-based approach,” arXiv preprint arXiv:1902.05703, 2019.
  • [26]

    M. Nasr, R. Shokri, and A. Houmansadr, “Comprehensive Privacy Analysis of Deep Learning: Stand-alone and Federated Learning under Passive and Active White-box Inference Attacks,” in IEEE Symposium on Security and Privacy, 2018, pp. 1-15.

  • [27] A. Hard et al., “Federated Learning for Mobile Keyboard Prediction,” arXiv preprint arXiv:1811.03604, 2018.
  • [28] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, and M. Chen, “In-Edge AI: Intelligentizing Mobile Edge Computing, Caching and Communication by Federated Learning,” arXiv preprint arXiv:1809.07857, 2018.