For decades, robots have been used to automate tasks in the industry sector. Conventional industrial robots are taught to perform one task at a time, are competent at executing this single task and can perform thousands of repetitions accurately. While the automation of such tasks has led to an increase in efficiency and a decrease in manufacturing costs for mass production, it is less applicable for the individualized consumer expectations of today’s economy. Globalization, digitalization and the resulting growth of markets have led to an increasing number of product variants and shorter product life cycles [brettel2014]. Customers now demand highly individualized products that are designed specifically for them. This change is observable in a wide range of industrial fields [outon2019].
The key challenge for individualized products is to avoid an increase in costs compared to established manufacturing approaches like mass production, even if factories are located in high-cost countries [lanz2017]. The advantages of individualized production are higher flexibility, fast response rates to customer decisions and a more efficient use of resources. An example is the health-care sector, where personalized medicine is becoming increasingly important, and additive manufacturing is being used for the production of biomaterials, implants and prosthetics [zadpoor2017]. To enable individualized manufacturing, traditional programming of machines with repetitive tasks is no longer applicable [outon2019]. When environments and objects change, conventional robots are unable to perform assembly tasks with similar success rates.
We therefore propose to use reinforcement learning (RL) algorithms capable of meta-learning (ML) to enable robots to accomplish highly individualized pick and place tasks as an important part of the product assembly process.
2 State of the Art: Machine Learning and Robotics
Recently, RL has achieved great success in a wide range of different tasks and complex games (e.g. the strategy board game Go [Silver2017]). The implementation of RL and ML seems promising to enable a robot to perform a pick and place task for unknown objectives and destinations. In RL, an agent interacts with the environment and receives its state. Based on this state, the agent takes an action and receives a new state and reward for the chosen action. Each RL algorithm is designed specifically for a certain task in terms of its architecture and training. A major drawback of this approach is the necessity to train the RL agent from scratch for each task.
ML is an approach to overcome this shortcoming by designing an algorithm in such a way that the agent learns how to learn from a broad distribution of similar tasks. Similar to human learning, an ML agent can apply knowledge it has gained from previously solved corresponding tasks to learn a new task with only a small amount of data.
Recent ML algorithms suitable for RL can be divided into two categories depending on their architecture and optimization goals:
Model-based meta-learning approaches [Duan2016, Mishra2017] generalize to a wide range of learning scenarios, seeking to recognize the task identity from a few data samples and adapting to the tasks by adjusting a model’s state (e.g. long short-term memory (LSTM) internal states)
Model-Agnostic Meta-Learning (MAML) [Finn2017] seeks an initialization of model parameters such that a small number of gradient updates will lead to fast learning on a new task, offering flexibility in the choice of models
A promising approach in which machine learning without RL and ML is used for pick and place tasks is described in zakka2019: A six degrees of freedom (DoF) UR5e robot (Universal Robots, Odense, Denmark) with a suction module is used to perform the tasks, and a camera generates a 3D heightmap as input data. Using convolutional neural networks, a correspondence between an object surface and the related placement location is generated.
Another relevant RL approach is introduced by OpenAI, who have trained a robotic hand to solve a Rubik’s Cube despite external perturbations [openai2019]. The main points of this approach are:
An actor-critic consisting of an artificial neural network (ANN) equipped with LSTM cells to install internal memory
Automatic domain randomization (ADR) to generate diverse environments with randomized physics and dynamics (e.g. weight and size of the manipulated object)
This results in a system with high robustness and high success rates in the transfer from simulation to testing in the real-world environment. Due to the combination of internal memory and ADR, this approach also shows signs of emerging ML.
In RL, it is necessary to provide the learning agent with an extrinsic reward signal. This enables the agent to determine if the actions applied to the environment have a positive effect in the long run. Extrinsic reward signals are called sparse if the reward for a certain action is temporally disentangled from the reward, e.g. only a positive reward is given after every successful task. To tackle this problem of sparse extrinsic rewards, we can divide the approaches in literature into two classes. First, by changing the reward function, e.g. using curiosity-driven exploration [PathakAED17], which introduces an intrinsic reward function. This function encourages the agent to experience novel states. Second, hierarchical RL methods which try to divide the main task into a sequence of sub-goals can be used. While the main goal is to successfully perform the task, the agent first learns to find a policy for the sub-goals. One popular candidate for this are FeUdal Networks [VezhnevetsOSHJS17], in which the agent is split into two parts. The manager learns to formulate goals and the worker is intrinsically rewarded to follow the goal. A similar approach is Hierarchical Actor-Critic [levy2017learning], in which the agent learns to set sub-goals to reach the main goal. This is achieved by extending the idea of Hindsight Experience Replay [AndrychowiczWRS17] to the hierarchical setting by establishing goals a fixed number of low-level actions away from the previous state. Multiple policies can be learned independently. On a sub-goal level, the focus is learning the sequences of sub-goal states which can reach the main goal state. To achieve these sub-goal states, the lower-level policies learn the low-level action sequences.
3 Concept for Manufacturing Scenario
The development of intelligent pick and place tasks for the assembly of products using RL is an integral part of the manufacturing scenario we are setting up in the Cologne Cobots Lab. It combines individualized production using additive manufacturing, autonomous mobile systems that transport components, as well as collaborative and social robotics. The complete scenario is shown in Fig. 1.
As described in section 2, we aim to perform object manipulation tasks using RL in a real-world scenario, in which a specific and useful product is manufactured and assembled. Our goal is to assemble individualized sensor cases for different health care sensors (e.g. to measure body temperature, heart rate or blood oxygen saturation), as shown in Fig. 2. These cases will be used in our research concerned with social robots conducting health assessments [richert2019]. This is a practical product for research in highly flexible areas, such as the manufacturing and health care sectors. It is also a well-suited demonstrator for the application of RL algorithms. The assembly of individualized products is desirable, as different users with different health conditions require different kinds of information. Therefore, the combination of sensors can be adapted for each individual user and the assembly process will differ for each new product. The long-term goal regarding these sensors is to create individualized wearable devices with different kinds and numbers of health care sensors. These parameters offer a promising approach to a hybrid job shop scheduling or action planning system in which human and robot actions are combined in an optimized way.
4 Approach: Hardware and Machine Learning
In our manufacturing scenario, we will be using various hardware and software/machine learning components, described in the following.
The individual sensor cases will be manufactured using fused deposition modeling (FDM)/ fused filament fabrication (FFM). For the assembly, we will use a collaborative robotic arm that meets safety standards defined in ISO/TS 15066:2016 [standard2016iso]. The arm also has a high pose repeatability, at least six DoF, and a payload of for fine and for gross manipulation tasks. To receive additional feedback during object manipulation tasks, its gripper will be outfitted with tactile sensors. This will improve the efficiency and robustness of the grasping task. The sensor provides feedback regarding the grip quality confidence and enables slip detection of the grasped object, which can then be counteracted by improving the applied force of the gripper on the object. For the object detection, we will use a 3D scanner to create heightmaps of the objects. Several area scan cameras will be implemented for vision-based information from various angles and to determine the orientation of the objects.
In order to successfully teach the robot to perform the pick and place task, the trained machine learning algorithm needs to recognize which produced element belongs to the corresponding case. To accomplish this, the tools presented in section 2 will be implemented and combined. The heightmaps applied by Google [zakka2019] will be used as input for the RL agent. By implementing ADR [openai2019], the agent will be trained to realize a robust system with a high success rate in the transfer from simulation to the real world. Additionally, due to the implementation of ML and solutions for sparse reward, the learning time of the agent will be decreased.
5 Conclusion and Outlook
In this paper, we propose an approach to successfully perform intelligent pick and place tasks for the assembly of individualized products using RL algorithms capable of ML. A combination of the algorithms presented in this work will be implemented to develop an autonomous robotic system capable of performing these tasks. With a combination of RL, ML, ADR
and other machine learning tools, problems like sparse reward or transfer learning can be solved. The pick and place tasks are first performed in simulation, then in a real-world environment using a collaborative robot, equipped with tactile sensors, area scan cameras and 3D cameras. This demonstrator will then be used to study and answer the following research questions, which are both of technical and socio-technical nature:
How can we apply (a combination of) machine learning algorithms to generalize pick and place assembly tasks (i.e. various weights, sizes, geometries, quantities) for individualized products?
Which algorithms have which impact on the robustness of the system? How can we assure that the robustness reached in simulation can be transferred to the real world environment?
How can we implement a dynamic work space for the robot when working collaboratively with a human? How can a human be integrated into the collaborative assembly process in a way that is both sensible and effective?
In the future, we plan on fully implementing the developed assembly process into our manufacturing scenario described in section 3. The manufacturing scenario includes the transportation of individual parts using AGVs and presenting the final product to the customer. A further goal is to study the collaborative assembly process between humans and robots. This is the focus of another research project in our lab, which aims to achieve adaptive human-robot collaboration through the implementation of sensors to detect the user’s status (e.g. focus, stress). The combined results of these projects will contribute to an optimal collaborative working process.