Towards Intelligent Pick and Place Assembly of Individualized Products Using Reinforcement Learning

02/11/2020 ∙ by Caterina Neef, et al. ∙ TH Köln 0

Individualized manufacturing is becoming an important approach as a means to fulfill increasingly diverse and specific consumer requirements and expectations. While there are various solutions to the implementation of the manufacturing process, such as additive manufacturing, the subsequent automated assembly remains a challenging task. As an approach to this problem, we aim to teach a collaborative robot to successfully perform pick and place tasks by implementing reinforcement learning. For the assembly of an individualized product in a constantly changing manufacturing environment, the simulated geometric and dynamic parameters will be varied. Using reinforcement learning algorithms capable of meta-learning, the tasks will first be trained in simulation. They will then be performed in a real-world environment where new factors are introduced that were not simulated in training to confirm the robustness of the algorithms. The robot will gain its input data from tactile sensors, area scan cameras, and 3D cameras used to generate heightmaps of the environment and the objects. The selection of machine learning algorithms and hardware components as well as further research questions to realize the outlined production scenario are the results of the presented work.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

For decades, robots have been used to automate tasks in the industry sector. Conventional industrial robots are taught to perform one task at a time, are competent at executing this single task and can perform thousands of repetitions accurately. While the automation of such tasks has led to an increase in efficiency and a decrease in manufacturing costs for mass production, it is less applicable for the individualized consumer expectations of today’s economy. Globalization, digitalization and the resulting growth of markets have led to an increasing number of product variants and shorter product life cycles [brettel2014]. Customers now demand highly individualized products that are designed specifically for them. This change is observable in a wide range of industrial fields [outon2019].

The key challenge for individualized products is to avoid an increase in costs compared to established manufacturing approaches like mass production, even if factories are located in high-cost countries [lanz2017]. The advantages of individualized production are higher flexibility, fast response rates to customer decisions and a more efficient use of resources. An example is the health-care sector, where personalized medicine is becoming increasingly important, and additive manufacturing is being used for the production of biomaterials, implants and prosthetics [zadpoor2017]. To enable individualized manufacturing, traditional programming of machines with repetitive tasks is no longer applicable [outon2019]. When environments and objects change, conventional robots are unable to perform assembly tasks with similar success rates.

We therefore propose to use reinforcement learning (RL) algorithms capable of meta-learning (ML) to enable robots to accomplish highly individualized pick and place tasks as an important part of the product assembly process.

2 State of the Art: Machine Learning and Robotics

Recently, RL has achieved great success in a wide range of different tasks and complex games (e.g. the strategy board game Go [Silver2017]). The implementation of RL and ML seems promising to enable a robot to perform a pick and place task for unknown objectives and destinations. In RL, an agent interacts with the environment and receives its state. Based on this state, the agent takes an action and receives a new state and reward for the chosen action. Each RL algorithm is designed specifically for a certain task in terms of its architecture and training. A major drawback of this approach is the necessity to train the RL agent from scratch for each task.

ML is an approach to overcome this shortcoming by designing an algorithm in such a way that the agent learns how to learn from a broad distribution of similar tasks. Similar to human learning, an ML agent can apply knowledge it has gained from previously solved corresponding tasks to learn a new task with only a small amount of data.

Recent ML algorithms suitable for RL can be divided into two categories depending on their architecture and optimization goals:

  1. Model-based meta-learning approaches [Duan2016, Mishra2017] generalize to a wide range of learning scenarios, seeking to recognize the task identity from a few data samples and adapting to the tasks by adjusting a model’s state (e.g. long short-term memory (LSTM) internal states)

  2. Model-Agnostic Meta-Learning (MAML) [Finn2017] seeks an initialization of model parameters such that a small number of gradient updates will lead to fast learning on a new task, offering flexibility in the choice of models

A promising approach in which machine learning without RL and ML is used for pick and place tasks is described in zakka2019: A six degrees of freedom (DoF) UR5e robot (Universal Robots, Odense, Denmark) with a suction module is used to perform the tasks, and a camera generates a 3D heightmap as input data. Using convolutional neural networks, a correspondence between an object surface and the related placement location is generated.

Another relevant RL approach is introduced by OpenAI, who have trained a robotic hand to solve a Rubik’s Cube despite external perturbations [openai2019]. The main points of this approach are:

  • An actor-critic consisting of an artificial neural network (ANN) equipped with LSTM cells to install internal memory

  • Automatic domain randomization (ADR) to generate diverse environments with randomized physics and dynamics (e.g. weight and size of the manipulated object)

This results in a system with high robustness and high success rates in the transfer from simulation to testing in the real-world environment. Due to the combination of internal memory and ADR, this approach also shows signs of emerging ML.

In RL, it is necessary to provide the learning agent with an extrinsic reward signal. This enables the agent to determine if the actions applied to the environment have a positive effect in the long run. Extrinsic reward signals are called sparse if the reward for a certain action is temporally disentangled from the reward, e.g. only a positive reward is given after every successful task. To tackle this problem of sparse extrinsic rewards, we can divide the approaches in literature into two classes. First, by changing the reward function, e.g. using curiosity-driven exploration [PathakAED17], which introduces an intrinsic reward function. This function encourages the agent to experience novel states. Second, hierarchical RL methods which try to divide the main task into a sequence of sub-goals can be used. While the main goal is to successfully perform the task, the agent first learns to find a policy for the sub-goals. One popular candidate for this are FeUdal Networks [VezhnevetsOSHJS17], in which the agent is split into two parts. The manager learns to formulate goals and the worker is intrinsically rewarded to follow the goal. A similar approach is Hierarchical Actor-Critic [levy2017learning], in which the agent learns to set sub-goals to reach the main goal. This is achieved by extending the idea of Hindsight Experience Replay [AndrychowiczWRS17] to the hierarchical setting by establishing goals a fixed number of low-level actions away from the previous state. Multiple policies can be learned independently. On a sub-goal level, the focus is learning the sequences of sub-goal states which can reach the main goal state. To achieve these sub-goal states, the lower-level policies learn the low-level action sequences.

3 Concept for Manufacturing Scenario

A Ordering System
splits orders
Places order
Transfer to collaborative
assembly cell
C Two-arm-robot on
mobile platform
B 3D printing of
individual components
D Collaborative assembly
of ordered product
E Social robot
Figure 1: A customer places an order for an individualized product using the ordering system (A). This system splits the order into the individual components, which are then manufactured using additive manufacturing (B). Once the manufacturing process is completed, a two-arm-robot, mounted on an AGV (C), removes the components from the 3D printers and transports them to the collaborative assembly cell (D). This assembly cell consists of one or more cobots and one or more human workers, collaboratively assembling the individual components into the final product. The number of cobots and human workers can be adjusted based on need for the specific task. Once the assembly process is completed, the two-arm-robot transports the finished product and hands it over to a social robot (E), which presents and hands over the product to the customer.
Sensor 1Sensor 2Sensor 3Sensor n. . .Assembly of individualsensors into casesnumbers and kinds ofAssembly of individual“sensor station”sensor cases intoHRTSpO2
Figure 2: The goal of the assembly task is to pick individual health care sensors, (e.g. heart rate (HR), body temperature (T), blood oxygen saturation (SpO2)), place them into their cases, then place the sensor cases in a health care sensor station for the monitoring of health conditions. As the sensor combination changes with each user, RL will be used to successfully train an agent to perform each individual assembly task.

The development of intelligent pick and place tasks for the assembly of products using RL is an integral part of the manufacturing scenario we are setting up in the Cologne Cobots Lab. It combines individualized production using additive manufacturing, autonomous mobile systems that transport components, as well as collaborative and social robotics. The complete scenario is shown in Fig. 1.

As described in section 2, we aim to perform object manipulation tasks using RL in a real-world scenario, in which a specific and useful product is manufactured and assembled. Our goal is to assemble individualized sensor cases for different health care sensors (e.g. to measure body temperature, heart rate or blood oxygen saturation), as shown in Fig. 2. These cases will be used in our research concerned with social robots conducting health assessments [richert2019]. This is a practical product for research in highly flexible areas, such as the manufacturing and health care sectors. It is also a well-suited demonstrator for the application of RL algorithms. The assembly of individualized products is desirable, as different users with different health conditions require different kinds of information. Therefore, the combination of sensors can be adapted for each individual user and the assembly process will differ for each new product. The long-term goal regarding these sensors is to create individualized wearable devices with different kinds and numbers of health care sensors. These parameters offer a promising approach to a hybrid job shop scheduling or action planning system in which human and robot actions are combined in an optimized way.

4 Approach: Hardware and Machine Learning

In our manufacturing scenario, we will be using various hardware and software/machine learning components, described in the following.

The individual sensor cases will be manufactured using fused deposition modeling (FDM)/ fused filament fabrication (FFM). For the assembly, we will use a collaborative robotic arm that meets safety standards defined in ISO/TS 15066:2016 [standard2016iso]. The arm also has a high pose repeatability, at least six DoF, and a payload of for fine and for gross manipulation tasks. To receive additional feedback during object manipulation tasks, its gripper will be outfitted with tactile sensors. This will improve the efficiency and robustness of the grasping task. The sensor provides feedback regarding the grip quality confidence and enables slip detection of the grasped object, which can then be counteracted by improving the applied force of the gripper on the object. For the object detection, we will use a 3D scanner to create heightmaps of the objects. Several area scan cameras will be implemented for vision-based information from various angles and to determine the orientation of the objects.

In order to successfully teach the robot to perform the pick and place task, the trained machine learning algorithm needs to recognize which produced element belongs to the corresponding case. To accomplish this, the tools presented in section 2 will be implemented and combined. The heightmaps applied by Google [zakka2019] will be used as input for the RL agent. By implementing ADR [openai2019], the agent will be trained to realize a robust system with a high success rate in the transfer from simulation to the real world. Additionally, due to the implementation of ML and solutions for sparse reward, the learning time of the agent will be decreased.

5 Conclusion and Outlook

In this paper, we propose an approach to successfully perform intelligent pick and place tasks for the assembly of individualized products using RL algorithms capable of ML. A combination of the algorithms presented in this work will be implemented to develop an autonomous robotic system capable of performing these tasks. With a combination of RL, ML, ADR

and other machine learning tools, problems like sparse reward or transfer learning can be solved. The pick and place tasks are first performed in simulation, then in a real-world environment using a collaborative robot, equipped with tactile sensors, area scan cameras and 3D cameras. This demonstrator will then be used to study and answer the following research questions, which are both of technical and socio-technical nature:

  • How can we apply (a combination of) machine learning algorithms to generalize pick and place assembly tasks (i.e. various weights, sizes, geometries, quantities) for individualized products?

  • Which algorithms have which impact on the robustness of the system? How can we assure that the robustness reached in simulation can be transferred to the real world environment?

  • How can we implement a dynamic work space for the robot when working collaboratively with a human? How can a human be integrated into the collaborative assembly process in a way that is both sensible and effective?

In the future, we plan on fully implementing the developed assembly process into our manufacturing scenario described in section 3. The manufacturing scenario includes the transportation of individual parts using AGVs and presenting the final product to the customer. A further goal is to study the collaborative assembly process between humans and robots. This is the focus of another research project in our lab, which aims to achieve adaptive human-robot collaboration through the implementation of sensors to detect the user’s status (e.g. focus, stress). The combined results of these projects will contribute to an optimal collaborative working process.