Human Intention Recognition in Flexible Robotized Warehouses based on Markov Decision Processes

by   Tomislav Petković, et al.

The rapid growth of e-commerce increases the need for larger warehouses and their automation, thus using robots as assistants to human workers becomes a priority. In order to operate efficiently and safely, robot assistants or the supervising system should recognize human intentions. Theory of mind (ToM) is an intuitive conception of other agents' mental state, i.e., beliefs and desires, and how they cause behavior. In this paper we present a ToM-based algorithm for human intention recognition in flexible robotized warehouses. We have placed the warehouse worker in a simulated 2D environment with three potential goals. We observe agent's actions and validate them with respect to the goal locations using a Markov decision process framework. Those observations are then processed by the proposed hidden Markov model framework which estimated agent's desires. We demonstrate that the proposed framework predicts human warehouse worker's desires in an intuitive manner and in the end we discuss the simulation results.


page 3

page 7

page 10


Human Intention Estimation based on Hidden Markov Model Motion Validation for Safe Flexible Robotized Warehouses

With the substantial growth of logistics businesses the need for larger ...

Human Intention Recognition for Human Aware Planning in Integrated Warehouse Systems

With the substantial growth of logistics businesses the need for larger ...

Emergence of Theory of Mind Collaboration in Multiagent Systems

Currently, in the study of multiagent systems, the intentions of agents ...

Polynomial-Time Algorithms for Multi-Agent Minimal-Capacity Planning

We study the problem of minimizing the resource capacity of autonomous a...

Formal Language Constraints for Markov Decision Processes

In order to satisfy safety conditions, a reinforcement learned (RL) agen...

Deceptive Decision-Making Under Uncertainty

We study the design of autonomous agents that are capable of deceiving o...

Finding Optimal Cancer Treatment using Markov Decision Process to Improve Overall Health and Quality of Life

Markov Decision Processes and Dynamic Treatment Regimes have grown incre...

1 Introduction

The European e-commerce turnover managed to increase to € 455.3 billion in 2015, compared to the growth of general retail in Europe [1]. With the internationalization of distribution chains, the key for success lies within efficient logistics, consequently increasing the need for larger warehouses and their automation. There are many fully automated warehouse systems such as the Swisslog’s CarryPick Mobile system and Amazon’s Kiva system [2]. They use movable racks that can be lifted by small, autonomous robots. By bringing the product to the worker, productivity is increased by a factor of two or more, while simultaneously improving accountability and flexibility [3]. However, current automation solutions based on strict separation of humans and robots provide limited operation efficiency of large warehouses. Therefore, a new integrated paradigm arises where humans and robots will work closely together and these integrated warehouse models will fundamentally change the way we use mobile robots in modern warehouses. Besides immediate safety issues, example of a challenge such models face, is to estimate worker’s intentions so that the worker may be assisted and not impeded in his work. Furthermore, if the robot is not intelligent, but controlled by a supervisory system, the supervisory system needs to be able to estimate worker’s intentions correctly and control the robots accordingly, so that the warehouse operation efficiency is ensured.

There exists a plethora of challenges in human intention recognition, because of the subtlety and diversity of human behaviors [4]. Contrary to some more common quantities, such as the position and velocity, the human intention is not directly observable and needs to be estimated from human actions. Furthermore, the intentions should be estimated in real-time and overly complicated models should be avoided. Having that in mind, only the actions with the greatest influence on intention perception should be considered. For example, in the warehouse domain, worker’s orientation and motion have large effect on the goal intention recognition. On the other hand, observing, e.g., worker’s heart rate or perspiration could provide very few, if any, information on worker’s intentions. Therefore, such measurements should be avoided in order to reduce model complexity and ensure real-time operation [4].

Many models addressing the problem of human intention recognition successfully emulate human social intelligence using Markov decision processes (MDPs). The examples of such models can be found in [4], where authors propose framework for estimating pedestrian’s intention to cross the road and in [5] where authors proposed framework for gesture recognition and robot assisted coffee serving. There are multiple papers from the gaming industry perspective, proposing methods for improving the non-playable character’s assisting efficiency [6, 7]. An interesting approach is the Bayesian Theory of Mind (BToM) [8] where beliefs and desires generate actions via abstract causal laws. BToM framework in [8] observes agent’s actions and estimates agent’s desires to eat at a particular food-truck. However, though impressive, BToM model does not predict the possibility of agent’s change of mind during the simulation [9].

In this paper, we propose an algorithm for warehouse worker intention recognition motivated by the BToM approach. We expand the BToM to accommodate the warehouse scenario problem and we present formal and mathematical details of the constructed MDP framework. The warehouse worker is placed in a simulated 2D warehouse environment with multiple potential goals, i.e., warehouse items that need to be picked. The worker’s actions, moving and turning, are validated based on the proposed MDP framework. Actions, resulting in motion towards the goal, yield greater values than those resulting in moving away from the goal. We introduce worker’s intention recognition algorithm based on the hidden Markov model (HMM) framework similarly to those presented in [10, 11]. The proposed intention recognition algorithm observes action values generated by the MDP and estimates potential goal desires. We have considered worker’s tendency to change its mind during the simulation, as well as worker’s intention of leaving the rack. In the end, we demonstrate that the proposed algorithm predicts human warehouse worker’s desires in an intuitive manner and we discuss the simulation results.

2 Human Action Validation

In the integrated warehouse environment the worker’s position and orientation need to be precisely estimated, and in the present paper we assume that these quantities are readily available. Furthermore, we would like to emphasize that most of warehouse worker duties, such as sorting and placing materials or items on racks, unpacking and packing, include a lot of motion which is considered to be inherently of the stochastic nature. Therefore, we model the worker’s perception model , with and representing observation and state, respectively, as deterministic, and the action model , with representing action, as stochastic. A paradigm that encompasses uncertainty in agent’s motion and is suitable for the problem at hand are MDPs [12]. The MDP framework is based on the idea that transiting to state yields an immediate reward . Desirable states, such as warehouse items worker needs to pick up, have high immediate reward values, while undesirable states, such as warehouse parts that are of no immediate interest, have low immediate reward values. The rational worker will always take actions that will lead to the highest total expected reward and that value needs to be calculated for each state.

Figure 1: Agent (green tile) in simulation environment with three potential agent’s goals (colored tiles). Unoccupied space is labeled with yellow tiles and occupied space (i.e. warehouse racks) is labeled with black tiles. The optimal path to each goal is shown with red dots and red solid lines denote agent’s vision field. Visible tiles are colored white. Red dashed line denotes agent’s current orientation and colored lines indicate direction of the average orientation of the visible optimal path to each goal calculated using (2).

Before approaching that problem, we define the MDP framework applicable to the warehouse domain. In order to accomplish that, we have placed the worker (later referred as agent) in a simulated 2D warehouse environment shown in Fig. 1. The environment is constructed using MATLAB® GUI development environment without predefined physical interpretation of the map tiles and the map size is chosen arbitrary to be . There are three potential goals and the shortest path to each goal is calculated. There are many off-the-shelf graph search algorithms we can use to find the optimal path to each goal, such as Dijkstra’s algorithm and A. However, if there exist multiple optimal paths to the goal, there are no predefined rules which one to select and the selection depends on the implementation details. Consider the example scenario with the warehouse worker in Fig. 5. It is intuitive that the rational worker will tend to follow the green path because the orange path would require the worker to either take the additional action of turning or unnatural walking by not looking forward.

Figure 2: Warehouse worker’s (blue circle) shortest path to the red goal is ambiguous because both orange dashed and green dotted paths are optimal. The black arrow denotes worker’s orientation.
(a) (b)
Figure 5: The proposed A modification yields different optimal paths with the agent’s orientation change.

Having this in mind, we modify the A

search algorithm which selects optimal path the agent currently sees the most. This has been done by introducing the heuristic matrix

using the Manhattan distance () heuristics as follows:


where is a small value. Subtracting a small value from the visible tiles directs the search in their direction and does not disrupt heuristic’s admissibility. The cost of each movement is also modified in a similar way by subtracting a small value from the base movement cost of , if the tile is visible. Example of the modified A search algorithm results can be seen in Fig. 5. The average orientation of the visible path to each goal is defined as follows:


where is the number of visible optimal path tiles, is agent’s orientation and are relative orientations of each visible optimal path tile (, ) with respect to the agent (, ):


We propose a mathematical model for validating agent’s actions based on the assumption that the rational agent tends to (i) move towards the goal it desires most by taking the shortest possible path, and (ii) orients in a way to minimize difference between its orientation and most desirable goal’s average orientation of the visible optimal path calculated in (2). The proposed model goal is to assign large value to the actions compatible with the mentioned assumptions, and small values to the actions deviating from them. These values will be used to develop agent’s intention recognition algorithm in the sequel. We can notice that the introduced validation problem is actually a path planning optimization problem. Perfectly rational agent will always choose the action with the greatest value and consequently move towards the goal. We approach the agent’s action values calculation by introducing the agent’s action validation MDP framework. We assume that agent’s position and orientation are fully observable and create the MDP state space as:


The agent’s orientation space must be discrete because the MDP framework assumes a finite number of states. We have modeled to include orientations divisible with and it can be arbitrary expanded:


The action space includes actions ‘Up’, ‘Down’, ‘Left’, ‘Right’, ‘Turn Clockwise’, ‘Turn Counterclockwise’ and ‘Stay’, labeled in order as follows:


It has already been stated that the agent’s actions are fully observable but stochastic. In order to capture stochastic nature of the agent’s movement, we define the transition matrix of agent’s movement:


where element

denotes realization probability of the action

, if the wanted action is . Moving actions have small probability of resulting in lateral movement, and turning actions have small probability of failing. The value of the constant is obtained experimentally and equals to

. If the agent’s action cannot be completed, because of the occupied space blocking the way, column responding to the impossible action is added to last column and is set to zero vector afterwards. We define three hypotheses,

, one for each goal state as follows: “Agent wants to go to the goal and other potential goals are treated as unoccupied tiles”. The immediate reward values for each hypothesis and state are calculated as follows:


where is a small number and represents the absolute difference between average orientation of the visible path to the goal i and agent’s orientation. Note that we have taken the angle periodicity into account while calculating the angle difference in (8). The goal state is rewarded and other states are punished proportionally to the orientation difference. If the agent does not see path to the goal i, the reward is set to the lowest value, which is derived from (3). One of the most commonly used algorithms for solving the MPD optimal policy problem is the value iteration algorithm [4], which assigns calculated value to the each state. The optimal policy is derived by choosing the actions with the largest expected value gain. The value iteration algorithm iteratively solves the Bellman’s equation [13] for each hypothesis :


where is the current state, adjacent state, and element of the row in transition matrix which would cause transitioning from state to . The algorithm stops once the criteria:


is met, where the threshold is set to . State values, if the goal state is the dark yellow (southern) goal and agent’s orientation of , is shown in Fig. 6.

Figure 6: State values for the agent’s orientation of if the goal state is the southern goal labeled with the red circle.

The agent’s behavior consistent with the hypothesis is defined as follows. (Consistent behavior) If the agent in state takes the action under the hypothesis , with the expected value gain greater or equal than the expected value gain of the action “Stay”, its behavior is considered consistent with the hypothesis . Otherwise, its behavior is considered inconsistent with the hypothesis . Behavior consistency is an important factor in determining agent’s rationality, which will be further discussed in next section. While calculating the immediate rewards and state values has complexity and can be time consuming, it can be done offline, before the simulation start. Optimal action, , for each state is the action that maximizes expected value gain and, on the other hand, the worst possible action, , is the action that minimizes expected value gain.

3 Human Intention Recognition

Once the state values

are obtained, model for solving agent’s intention recognition is introduced. While agent’s actions are fully observable, they depend on agent’s inner states (desires), which cannot be observed and need to be estimated. We propose framework based on hidden Markov model for solving the agent’s desires estimation problem. HMMs are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition

[14] and force analysis [15]. They are an MDP extension including the case where the observation (agent’s action) is a probabilistic function of the hidden state (agent’s desires) which cannot be directly observed. We propose a model with five hidden states, which is shown in Fig. 7 and listed in Table 1.

Symbol Name Description
Goal 1 Agent wants to go to the northern goal
Goal 2 Agent wants to go to the eastern goal
Goal 3 Agent wants to go to the southern goal
Unknown goal Agent’s desires are not certain
Irrational agent Agent behaves irrationally
Table 1: HMM framework components
Figure 7: Hidden states and transition probabilities. The used constant values are as follows: , , , .

In order to avoid confusion caused by MDP and HMM frameworks, both having similar or the same element names, MDP states will be referred as states and HMM states will be referred to as hidden states, or simply desires. All of the other similarly named elements in this chapter refer to the HMM unless stated otherwise. Hidden state indicates that the agent behaves consistent with multiple goal hypotheses and the model cannot decide between them with enough certainty. On the other hand, hidden state indicates that the agent behaves inconsistently with every goal hypothesis. This hidden state includes the cases of the agent being irrational or agent’s desire to go to an a priori unknown goal. The proposed model cannot distinguish between these cases. The agent’s change of mind during the simulation is allowed, but with very low probability. The constant values in Fig. 7 are obtained experimentally and we introduce HMM transition matrix :


During the simulation, each agent’s action generates a three element observation vector , each element belonging to one hypothesis. Observation vector element is calculated as follows:


where denotes expected value gain. Calculated observations are used to generate the HMM emission matrix . The emission matrix is expanded with each agent’s action (simulation step) with the row , where the element stores the probability of observing observation vector from hidden state . Last three observations are averaged and maximum average value is selected. It is used as an indicator if the agent is behaving irrationally. Each expansion row is calculated as follows:


where is a normalizing constant and is -th observation vector. The initial probabilities of agent’s desires are:


indicating that the initial state is . After each agent’s action, the agent’s desires are estimated using the Viterbi algorithm [16] which is often used for solving HMM human intention recognition models [17]. The Viterbi algorithm outputs the most probable hidden state sequence and the probabilities of each hidden state in each step. These probabilities are the agent’s desire estimates.

4 Simulation Results

In the previous sections, we have introduced the MDP and HMM frameworks for modeling human action validation and intention recognition. We have obtained the model parameters empirically and conducted multiple simulations evaluating the proposed human intention recognition algorithm. The proposed algorithm is tested in a scenario, where the most important simulation steps are shown in Fig. 17

(a) Simulation step 2.
(b) Simulation step 3.
(c) Simulation step 6.
(d) Simulation step 7.
(e) Simulation step 12.
(f) Simulation step 13.
(g) Simulation step 18.
(h) Simulation step 25.
(i) Simulation step 31.
Figure 17: Representative simulation steps (best viewed in color).

and the corresponding desire estimates are shown in Fig. 18. The starting position is . The agent behaves consistently with all the hypotheses and proceeds to the state . Because of the mentioned hypothesis consistency, the desire estimates for all of the goal states increase. The actions from simulation step 7 to step 12 are consistent only with the hypothesis which manifests as the steep rise of the and fall of probabilities related to other goal hypotheses. In the step 13, action “Stay” is the only action consistent with the hypothesis and because the agent chooses the action “Right”, the instantly falls towards the zero and and rise. While it might seem obvious that the agent now actually wants to go to the Goal 2, it has previously chosen actions inconsistent with that hypothesis and the model initially gives greater probability value to the desire than to . Next few steps are consistent with the hypothesis and the rises until the simulation step 18, when it enters steady state of approximately 0.85. The goal desires will never obtain value of because the element is never zero, thus allowing agent’s change of mind. In the state agent can decide to go to the Goal 1 or Goal 2. However, it chooses to take the turn towards the dead end in the simulation step 31. The proposed model recognizes that this behavior is inconsistent with all of the hypotheses and the steeply rises to value slightly smaller than 1, declaring the agent irrational.

Figure 18: Hidden state (desires) probabilities. Probabilities of the goal states are colored according to the goal tile’s color. The unknown goal state probability is colored black and irrational agent state probability is colored red.

5 Conclusion

In this paper we have proposed a feasible human intention recognition algorithm. Our goal was to estimate the intention of a human worker, i.e., agent, inside of a robotized warehouse, where we assumed that the agent’s position and orientation are known, as well as the potential goals. The proposed approach is based on the Markov decision process, where first we run offline the value iteration algorithm for known agent goals and discretized possible agent states. The resulting state values are then used within the hidden Markov model framework to generate observations and estimate the final probabilities of agent’s intentions. Simulations have been carried out within a simulated 2D warehouse with three potential goals, modeling a situation where the human worker should need to enter the robotized part of the warehouse and pick an item from a rack. Results suggest that the proposed framework predicts human warehouse worker’s desires in an intuitive manner and within reasonable expectations.


This work has been supported from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 688117 “Safe human-robot interaction in logistic applications for highly flexible warehouses (SafeLog)” and has been carried out within the activities of the Centre of Research Excellence for Data Science and Cooperative Systems supported by the Ministry of Science, Education and Sports of the Republic of Croatia.


  • [1] E-commerce Europe. European B2C E-commerce Report 2016 - Facts, Figures, Infographic & Trends of 2015 and the 2016 Forecast of the European B2C E-commerce Market of Goods and Services. 2016.
  • [2] Raffaello D’Andrea. Guest editorial: A revolution in the warehouse: A retrospective on kiva systems and the grand challenges ahead. IEEE Transactions on Automation Science and Engineering, 9(4):638–639, 2012.
  • [3] Peter R. Wurman, Raffaello D’Andrea, and Mick Mountz. Coordinating Hundreds of Cooperative, Autonomous Vehicles in Warehouses. AI Magazine, 29(1):9, 2008.
  • [4] Tirthankar Bandyopadhyay, Kok Sung Won, Emilio Frazzoli, David Hsu, Wee Sun Lee, and Daniela Rus. Intention-aware motion planning. In Springer Tracts in Advanced Robotics, volume 86, pages 475–491, 2013.
  • [5] Hsien-i Lin and Wei-kai Chen. Human Intention Recognition using Markov Decision Processes. CACS International Automatic Control Conference, (Cacs):340–343, 2014.
  • [6] Truong-Huy Dinh Nguyen, David Hsu, Wee-Sun Lee, Tze-Yun Leong, Leslie Pack Kaelbling, Tomas Lozano-Perez, and Andrew Haydn Grant. CAPIR: Collaborative Action Planning with Intention Recognition. pages 61–66, 2012.
  • [7] Alan Fern and P. Tadepalli. A computational decision theory for interactive assistants. Advances in Neural Information Processing Systems 23 (NIPS), pages 577–585, 2011.
  • [8] Chris L. Baker and Joshua B. Tenenbaum. Modeling Human Plan Recognition using Bayesian Theory of Mind. Plan, Activity, and Intent Recognition, pages 1–24, 2014.
  • [9] Nick Chater, Mike Oaksford, Ulrike Hahn, and Evan Heit. Bayesian models of cognition. Wiley Interdisciplinary Reviews: Cognitive Science, 1(6):811–823, 2010.
  • [10] Zheng Wang, Angelika Peer, and Martin Buss. An HMM approach to realistic haptic Human-Robot interaction. Proceedings - 3rd Joint EuroHaptics Conference and Symposium on Haptic Interfaces for Virtual Environment and Teleoperator Systems, World Haptics 2009, pages 374–379, 2009.
  • [11] Lei He, Chang-fu Zong, and Chang Wang. Driving intention recognition and behaviour prediction based on a double-layer hidden Markov model. Journal of Zhejiang University SCIENCE C, 13(3):208–217, 2012.
  • [12] Sebastian Thrun, Wolfram Burgard, and Dieter Fox. Probabilistic Robotics. 1999.
  • [13] Richard Bellman. A Markovian decision process. Journal Of Mathematics And Mechanics, 6:679–684, 1957.
  • [14] Lawrence R. Rabiner. A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2):257–286, 1989.
  • [15] C.S. Chen, Y Xu, and Jie Yang. Human action learning via Hidden Markov Model. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 27(1):34–44, 1997.
  • [16] G. David Jr. Forney. The Viterbi Algorithm. Proceedings of the IEEE, 61(3):268–278, 1973.
  • [17] Chun Zhu Chun Zhu, Qi Cheng Qi Cheng, and Weihua Sheng Weihua Sheng. Human intention recognition in Smart Assisted Living Systems using a Hierarchical Hidden Markov Model. IEEE International Conference on Automation Science and Engineering, pages 253–258, 2008.