I Introduction
Portable vision sensors, parallelizeable perception algorithms [1]
, and general purpose GPUbased computational architectures make simultaneous decisionmaking and scene understanding in complex domains an increasinglyviable goal in robotics. Consider the problem of multirobot perceptionbased decisionmaking in noisy environments, where observations may be low in framerate or where semantic labeling is a timedurative process. Each robot may observe an object, infer its underlying class, change its viewpoint, and relabel the object as a different class based on new observations (
Fig. 1). Robots must infer underlying object classes based on histories of past classifications, then use this information to execute tasks in a teambased decisionmaking setting.For autonomous execution of complex missions using perceptionbased sensors, robots need access to highlevel information extending beyond the topological data typically used for navigation tasks. Use of semantic maps (qualitative environment representations) has been recently explored for intelligent task execution [2, 3, 4]
. Yet, limited work has been conducted on semanticlevel multirobot decisionmaking in stochastic domains. Heuristic labeling rules
[5] or rigid, handtuned observation models are failureprone as they do not infer underlying environment stochasticity for robust decisionmaking. As realworld robot observation processes are notoriously noisy, semanticlevel decisionmaking can benefit from principled consideration of probabilistic observations.Cooperative multiagent decisionmaking under uncertainty, in its most general form, can be posed as a Decentralized Partially Observable Markov Decision Process (DecPOMDP) [6]. Yet, infinite horizon DecPOMDPs are undecidable and finite horizon DecPOMDPs are NEXPcomplete, severely limiting application to realworld robotics [7, 8]. Recent efforts have improved DecPOMDP scalability by introducing macroactions (temporallyextended actions) into the framework, resulting in Decentralized Partially Observable SemiMarkov Decision Processes (DecPOSMDPs) [9, 10, 11]. Use of durative macroactions significantly improves planner scalability by abstracting lowlevel actions from highlevel tasks.
So far, research focus has been on actionspace scalability—no similar work targeting observationspace scalability has been conducted. Further, the scope of the large body of work on DecPOMDPs has primarily been within the artificial intelligence perspective, with limited focus on traditional robotics applications
[6]. While the strength of DecPOMDPs and DecPOSMDPs comes from principled treatment of stochasticity, they have primarily been applied to benchmark domains with simple or handcrafted observation models [6]. Derivation of datadriven, robust observation processes usable for DecPOSMDP policy search remains a challenge. As planning complexity is exponential in the number of observations, abstraction to meaningful highlevel macroobservations (appropriate for the tasks being completed) is desired. Thus, major research gaps exist in leveraging the DecPOSMDP’s full potential for realworld robotics. This paper addresses these issues, providing a highlevel abstraction of observation processes and scalability improvements in a similar manner as previous work on macroactions.This paper’s primary contribution is a formalization of macroobservation processes within DecPOSMDPs, with a focus on the ubiquitous perceptionbased decisionmaking problem encountered in robotics. A hierarchical Bayesian macroobservation framework is introduced, using statistical modeling of observation noise for probabilistic classification in settings where noiseagnostic methods are shown to fail. The resulting datadriven approach avoids handtuning of observation models and produces statistical information necessary for DecPOSMDP solvers to compute a policy. Hardware results for realtime semantic labeling on a moving quadrotor are presented, with accurate inference in settings with high perception noise. The entire processing pipeline is executed onboard a quadrotor at approximately 20 frames per second. The macroobservation process is then integrated into a DecPOSMDP planner, with demonstration of semanticlevel decisionmaking executed on a quadrotor team performing a perceptionbased healthaware disaster relief mission.
Ii Decentralized MultiRobot DecisionMaking
This section summarizes the DecPOSMDP framework, a decentralized decisionmaking process targeting largescale multiagent problems in stochastic domains. The DecPOSMDP addresses scalability issues of DecPOMDPs by incorporating beliefspace macro actions, or temporallyextended actions. For details on DecPOSMDP fundamentals, we refer readers to our previous work [11, 9, 10].
Robots involved in DecPOSMDPs operate in belief space, the space of probability distributions over states, as they only perceive noisy observations of the underlying state. Solving a DecPOSMDP results in a hierarchical decisionmaking policy, where a macroaction (MA)
is first selected by each robot , and lowlevel (primitive) actions are conducted within the MA until an neighborhood of the MA’s belief milestone is reached.^{1}^{1}1We denote a generic parameter of the th robot as , joint parameter of the team as , and joint parameter at timestep as . This neighborhood defines a goal belief node for the MA, denoted . Each MA encapsulate a lowlevel POMDP involving primitive actions and observations .Definition 1
The DecPOSMDP is defined below:

is the set of heterogeneous robots.

is the underlying belief space, where is the set of belief milestones of the th robot’s MAs and is the environment state space.

is joint independent MA space, where is the set of MAs for the th robot. is the team’s joint MA.

is the set of all joint MAobservations.

is the highlevel transition probability model under MAs from to .

is the generalized reward of taking a joint MA at , where is the joint belief.

is the joint observation likelihood model, with observation .

is the reward discount factor.
Let be the highlevel or macroenvironment state space, a finite set describing the state space extraneous to robot states (e.g., an object in the domain). An observation of the macroenvironment state is denoted as the macroobservation . Upon completion of its MA, each robot makes macroobservation and calculates its final belief state, . This macroobservation and final belief are jointly denoted as .
The history of executed MAs and received highlevel observations is denoted as the MAhistory,
(1) 
The transition probability from to under joint MA in timesteps is [11],
(2)  
The generalized team reward for a discretetime DecPOSMDP during execution of joint MA is defined [11],
(3) 
where is the timestep at which robot completes its current MA, . Note that
is itself a random variable, since MA completion times are also nondeterministic. Thus, the expectation in (
3) is taken over MA completion durations as well. In practice, samplingbased approaches are used to estimate this expectation.
MA selection is dictated by the joint highlevel policy, . Each robot’s highlevel policy maps its MAhistory to a subsequent MA to be executed. The joint value under policy is,
(4)  
The optimal joint highlevel policy is then,
(5) 
To summarize, the DecPOSMDP is a hierarchical decisionmaking process which involves finding a joint highlevel policy dictating the MA each robot conducts based on its history of executed MAs and received highlevel observations. Within each MA, the robot executes lowlevel actions and perceives lowlevel observations . Therefore, the DecPOSMDP is an abstraction of the DecPOMDP process which treats the problem at the high macroaction level to significantly increase planning scalability.
Iii Semantic MacroObservations
This section formalizes DecPOSMDP macroobservations. It also outlines the sequentialobservation classification problem for macroobservation models and introduces a hierarchical Bayesian scheme for semanticlevel macroobservations.
Iiia MacroObservation Processes
DecPOSMDPs naturally embed state and macroaction uncertainty into a highlevel decisionmaking process. In a similar manner, task planning can benefit from the robot’s highlevel understanding of the environment state. Previous research has focused on formal definitions of MAs in terms of lowlevel POMDPs and on algorithms for automatically generating them [10]. Yet, no formal work on automatic macroobservation generation has been done to date. Benchmark domains used to test DecPOSMDP search algorithms use simplistic or handcoded highlevel observation processes, which are subsequently sampled during the evaluation phase of policy search algorithms [11, 10]. In contrast, this paper provides a foundation for deriving meaningful, datadriven macroobservations. We formally define macroobservations herein by distinguishing them from lowlevel observations:
Definition 2
Macroobservations are durative, generative probabilistic processes within which sequences of lowlevel observations are filtered, resulting in a semanticlevel observation of the environment.
Macroobservations allow each robot’s noisy semantic perception of the world to affect its task selection. Just as MAs provide an abstraction of lowlevel actions to a highlevel task (e.g., “Open the door”), macroobservations abstract lowlevel observations to a highlevel meaningful understanding of the environment state (e.g., “Am I in an office?”).
For uncertaintyaware planning, DecPOSMDP policy search algorithms require sampling of the domain transition and observation model distributions discussed in Section II. Thus, the following distributions must be calculable for any robot’s derived macroobservation process:

a semantic output distribution of underlying macro (environment) state

a distribution over computation time
While lowlevel observation processes can be treated as instantaneous for simplicity, observations related to scene semantics require nonnegligible computation time which must be accounted for in the planner. DecPOSMDPs seamlessly take this computation time into account. Definition 2 provides a natural representation for realworld highlevel observation processes, as they are durative (i.e., take multiple timesteps to process lowlevel data). Further, this computation time is nondeterministic (e.g., the amount of time needed to answer “Am I in an office?” is conditioned on scene lighting). The existing DecPOSMDP transition dynamics in (3) take an expectation over MA completion times. As every macroobservation is perceived following an MA, the time distribution in (3) can seamlessly include macroobservation computation time.
The result is a particularly powerful semanticlevel decisionmaking framework, as MAs targeting desired macroobservations can be embedded in the DecPOSMDP (e.g., “Track object until its class is inferred with 95% confidence”). The next sections focus on development of an automatic process which provides DecPOSMDP solvers with the two necessary macroobservation distributions (semantic output distribution and computation time distribution).
IiiB Sequential Classification Filtering Problem
We now detail generation of macroobservations in the context of probabilistic object classification. Specifically, consider a ubiquitous decisionmaking scenario where a robot observes a sequence of lowlevel classifier outputs and must determine its surrounding environment state or class of an object in order to choose a subsequent task to execute. A unique trait of robotic platforms is locomotion, allowing observations of an object or scene from a variety of viewpoints (Fig. 1). This motivates the need for a sequential macroobservation process using the history of classification observations made by the robot throughout its mission. In contrast to naïve reliance on framebyframe observations, sequential filtering offers increased robustness against domain uncertainty (e.g., camera noise, lighting conditions, or occlusion).
In settings with high observation noise, or where training data is not representative of mission data, statistical analysis of lowlevel classifier outputs both improves accuracy of macroobservations and provides useful measures of perception uncertainty. As a motivating example, consider the 3class scenario in Fig. 2. A lowlevel classifier predicts the probability of a single image belonging to each class. A sequence of images results in a corresponding sequence of observed class probabilities, as in Fig. 1(a) for a 4image sequence. This makes inference of the underlying class nontrivial.
Let us formalize the problem of constructing semantic macroobservations using streaming classification outputs. Given input feature vector
at time , an Mclass probabilistic classifier outputs lowlevel probability observation , where is the raw probability of belonging to the th class (e.g., Fig. 1(a)). Thus, is a member of the simplex, .In object classification, may be an image or feature representation thereof, and represents probability of the object belonging to the th class. This probabilistic classification is conducted over a sequence of images , resulting in a stream of class probability observations . In robotics, this macroobservation process is inherently durative as multiple lowlevel observations of the object need to be perceived to counter domain noise. Simply labeling the object as belonging to the class with maximal probability, , can lead to highly sporadic outputs as the image sequence progresses. A filtering scheme using the history of classifications is desired, along with the two aforementioned characterizing macroobservation distributions necessary for DecPOSMDP search algorithms.
Prior work on aggregation of multiple classifiers’ predictions can be extended to singleclassifier multiobservation filtering, where in each case, the posterior outputs would become macroobservation . Fixed classifier combination rules offer simplicity in implementation at the cost of suboptimality. One example is the maxofmean approach [12], where the
th class posterior probability,
, is the mean of observed probabilities throughout the image sequence,(6) 
Another strategy is votingbased consensus [13], with posterior class chosen based on the highest number of votes from all individual prediction observations ,
(7) 
where is the Kronecker delta function.
The above approaches do not exploit the probabilistic nature of underlying classifier outputs, . A Bayes filter offers a more principled treatment of the problem. For example, binary Bayes filters are a popular approach for occupancy grid filtering and object detection [14, 15], where repeated observations are filtered to determine occupancy probability or presence of an object (both are class cases, with classes ‘occupied/present’ or ‘empty/absent’). Binary Bayes filters can be extended to Mclass recursive classification by applying Bayes rule and a Markovian observation assumption,
(8) 
where is the prior class distribution and . This Bayes filter assumes a fixed underlying class, henceforth called a Static State Bayes Filter (SSBF).
Though SSBF allows probabilistic filtering of classifier outputs, it assigns equal confidence to each observation in its update. It takes equal amount of evidence for a class to “cancel out” evidence against it, an issue encountered in Bayesbased occupancy mapping [16]. In settings with heterogeneous classifier performance, this approach performs poorly. One class may be particularly difficult to infer in a given domain, increasing probability of misclassifications compared to other classes. In our motivating example, Figs. 1(d), 1(c) and 1(b) illustrate noisy classification samples for the 3 underlying object classes. Class (Fig. 1(b)
) is particularly difficult to classify, with a nearuniform distribution of
throughout the simplex, in contrast to highaccuracy classifications of (Fig. 1(d)). In this case, given uniform observations throughout the simplex and knowledge of underlying classifier noise, the filter update weight on underlying class should be higher than , since the classifier outputs are most sporadic for class .The critical drawback of the above approaches is that they simply filter, but do not model, the underlying observation process. As discussed earlier in Section IIIA, generative highaccuracy macroobservation models are necessary for DecPOSMDP policy search algorithms [11, 9]. Perceptionbased observations are highly complex and involve images/video sequences generated from the domain, making them (currently) impossible to replicate in these offline search algorithms. While it may be tempting to use handcoded generative distributions for the above filterbased macroobservation processes during policy search, such an approach fails to exploit the primary benefit of POMDPbased frameworks: the use of datadriven noise models which result in policies that are robust in the real world.
IiiC Hierarchical Approach for Semantic MacroObservations
This section introduces a generative macroobservation model titled Hierarchical Bayesian Noise Inference (HBNI), which infers inherent heterogeneous classifier noise. HBNI provides a compact, accurate, generative perceptionbased observation model, which is subsequently used to sample the two macroobservation distributions in DecPOSMDP solvers. The combination of DecPOSMDPs with HBNI macroobservations allows robust, probabilistic semanticlevel decisionmaking in settings with limited, noisy observations.
To ensure robustness against misclassifications, HBNI involves both noise modeling and classification filtering, making it a multilevel inference approach. Given a collection of image class probability observations (Fig. 1(a)), the underlying class for each image is inferred while modeling classifier noise distributions.
Hierarchical Bayesian models allow multilevel abstraction of uncertainty sources [17]. This is especially beneficial in stochastic settings targeted by DecPOSMDPs, which have layered sources of uncertainty. In semantic labeling, for instance, parameterization of the classifier confidence for the classes can be modeled using a set of noise parameters . Moreover, it is beneficial to model the relationship between noise parameters through a shared prior (Fig. 3). Consider, for instance, a robot performing object classification using a lowquality camera or in a domain with poor visibility. In this setting, observations may be noisier than expected a priori, indicating presence of a highlevel, classindependent uncertainty source. This information should be shared amongst all class models, allowing more accurate modeling of domain uncertainty through the noise parameters. Layered sharing of statistical information between related parameters is a strength of hierarchical Bayesian models, and has been demonstrated to increase robustness in posterior inference compared to nonhierarchical counterparts [18].
Fig. 3 illustrates the graphical model of HBNI. A categorical prior is used for classes,
(9) 
where . This allows integration of prior domain knowledge into HBNI. A Dirichlet observation model is used for raw classifier outputs ,
(10) 
where is a scalar noise parameter for the associated class, is an categorical vector with the th element equal to 1 and remaining element equal to zero, and is an vector of ones. Each class observation has an associated class label , which in turn links to the appropriate noise parameter (the th element of parameter set ). This choice of parameterization offers two advantages. First, the selection of provides a direct, intuitive measure of noise for the classifier observations. As in Figs. 1(d), 1(c) and 1(b),
is the Dirichlet concentration parameter and is related to the variance of the classification distribution. Low values of
imply high levels of observation noise, and vice versa. A second advantage is that it simplifies the posterior probability calculations used within Markov chain Monte Carlo (MCMC) inference, as discussed below.
A gamma prior is used for noise parameter ,
(11) 
where and themselves are treated as unknown hyperparameters. The role of and is to capture highlevel sources of domain uncertainty, allowing sharing of crossclass noise statistics. Gamma priors (parameterized by ( and ) were also used for these hyperparameters in our experiments, although results showed low sensitivity to this prior choice.
Given raw class probability observations , the posterior probability of noise parameters and associated classes is,
(12)  
(13) 
This allows inference of noise parameters and hyperparameters and using the collection of observed data . The computational complexity of Equation 13 can be further reduced. The log of the prior Equation 9 is simply . To efficiently compute , consider a notation change. Letting ,
(14) 
with as the Beta function. Based on the definition of ,
(15) 
Combining Equation 15 with Equation 14 and taking the log,
(16)  
(17) 
where is the gamma function. Note that as per Equation 15,
(18) 
and . Thus, the Dirichlet logposterior is,
(19) 
Finally, the logprobability of (and similarly , ) is,
(20) 
To summarize, the log of Equation 13 is efficiently computed by combining (19) and (20). An MCMC approach is used to calculate the posterior distribution over noise parameters () and hyperparameters (, ). This allows a history of observations to be filtered using the noise distributions, resulting in posterior class probabilities,
(21)  
(22) 
where is conditionally independent of and given , allowing hyperparameter terms to be dropped. Recall , the Dirichlet density at . Thus, Equation 13 provides a generative distribution for lowlevel observations (after noise parameter inference), and Equation 22 provides a recursive filtering rule for macroobservations given each new observation . Combined, these equations provide a macroobservation model and filtering scheme which can be used in DecPOSMDP search algorithms.
To summarize, the proposed HBNI approach uses the collection of classification observations to calculate a posterior distribution on noise parameters for each object class, and shared hyperparameters and . These noise distributions are then used for online streaming of class probability macroobservations. While HBNI noise inference is computationally efficient and can be conducted online, the complexity of DecPOSMDPs means that existing samplingbased policy search algorithms are run offline. Thus, integration of HBNI macroobservations into DecPOSMDPs is a threefold process. First, domain data is collected and HBNI noise inference of parameters and hyperparameters is conducted, resulting in a generative observation distribution. This distribution is then used for domain sampling and policy search in DecPOSMDP search algorithms. The resulting policy is then executed online, with HBNIbased filtering used to output macroobservations. The generative nature of HBNI allows usage of complex, durative macroobservation processes, which can filter the stream and output a macroobservation only when a desired confidence level is reached.
Iv Simulated Experiments
This section validates HBNI’s performance in comparison to noiseagnostic filtering schemes, before integration into DecPOSMDPs. As stated earlier, an MCMC approach is used to compute the posterior over , , and . Specifically, the experiments conducted use a MetropolisHastings (MH) [19] sampler with an asymmetric categorical proposal distribution for underlying classes , with high weight on previouslyproposed class and low weight on remaining classes (given uniform random initialization). Gaussian MH proposals are used for transformed variables , , and .
Fig. 4 shows noise parameter () posterior distributions for the problem outlined in Fig. 2. Parameter inference was conducted using only classification observations (5 from each class). Despite the very limited number of observations, the posterior distributions provide reasonable inferences of the true underlying noise parameters.
Hyperparameter (, ) posteriors are shown in Fig. 4(a). Recall these shared parameters capture trends in outputs which indicate shifts in classification confidence levels (for all classes) due to domainlevel uncertainty. To test sensitivity of inference to the hyperparameters, priors for and were chosen such that (on average) they indicate very high values of (Fig. 4(b), top). This sets a prior expectation of nearperfect outputs from classifiers (median ). However, given only classifier observations, posteriors of and shift to indicate much lower overall classification confidence (Fig. 4(b), bottom). has now shifted to better capture the range of noise parameters expected in the domain. This sharing of highlevel noise statistics improves filtering of subsequent observations (even if from an entirely new class).
HBNI classification error is evaluated against the voting, maxofmean, and SSBF methods discussed in Section IIIB. Fig. 6 shows results for varying number of class observations , with 2000 trials used to calculate error for each case. Voting performs poorly as it disregards class probabilities altogether. HBNI significantly outperforms the other methods, requiring 510 observations to converge to the true object class for all trials. The other methods need 45 times the number of observations to match HBNI’s performance. One interesting result is that for , predictions for voting, maxofmean, and SSBF are equivalent. However, due to noise modeling, HBNI makes an informed decision regarding underlying class, leading to lower classification error.
V Hardware Experiments
This section evaluates HBNI on a robotics platform to ascertain the benefits of noise modeling in realworld settings. It then showcases multirobot DecPOSMDP decisionmaking in hardware using HBNIbased macroobservations.
Va Underlying (LowLevel) Classification Framework
Lowlevel classifier training is conducted on a dataset of 3 target vehicle classes (iRobot, Quadrotor, Racecar) in a welllit room, using a QVGAresolution webcam (Fig. 8
). 100 snapshots of each object type are used for training, including crops and mirror images for increased translational and rotational invariance. Feature extraction is done using a Convolutional Neural Net (CNN) implemented in Caffe
[20](though the proposed HBNI approach is agnostic to underlying classifier type). Images are centercropped with 10% padding and resized to 227
227 resolution. Features are extracted from the 8th fully connected layer of an AlexNet [21] trained on the ILSVRC2012 dataset [1]. These features are used to train a set of Support Vector Machines (SVMs), with a onevsone approach for multiclass classification. As SVMs are inherently discriminative classifiers, probabilities
for each imageare calculated using Platt Scaling, by fitting a sigmoid function to SVM scores
[22]. These probabilities are then processed using HBNIbased macroobservations.VB Hardware Platform
DJI F330 quadrotors with custom autopilots are used for the majority of experiments (Fig. 7), with a Logitech C615 webcam for image capture. The macroobservation pipeline is executed on an onboard NVIDIA Jetson TX1, powered using a dedicated 3cell 1350mAh LiPo battery. Runtime for the underlying classifier is 495ms per frame, and the entire pipeline (including communication and filtering) executes fully onboard at approximately 20 frames per second.
VC Results: HBNIbased MacroObservations
Classification robustness is verified using an augmented reality testbed [23] to change domain lighting conditions. In contrast to the welllit images used to train the underlying classifier (Figs. 7(c), 7(b) and 7(a)), test images have textured backgrounds and dim lighting which reduce camera shutter speed, increasing blur (Fig. 7(d)). Experiments are designed to simulate typical scenarios in robotics where the training dataset is not fully representative of mission test data.
Filtered classification results for the test dataset are shown in Fig. 9. In new lighting conditions, classification of the Quadrotor object class is particularly difficult, resulting in nearly equal raw probabilities amongst all three classes (raw data in Fig. 9). Noiseagnostic filters such as SSBF fail to correctly classify the object as a Quadrotor, instead classifying it as an iRobot with high confidence (filtered output in Fig. 8(a)). Moreover, probability of the Quadrotor class asymptotically approaches zero as more observations are made. In contrast, HBNI infers underlying noise, leading to robust classification of the Quadrotor object after only 7 frames (Fig. 8(b)). In the to range, due to improved lighting, raw classifier probabilities increase for the Quadrotor class. SSBF only slightly lowers its probability of the object being an iRobot, whereas the HBNI approach significantly increases probability of the true Quadrotor class. Fig. 10 shows HBNI macroobservations on a quadrotor exploring an environment with multiple objects. The results indicate that HBNI accurately classifies objects onboard a moving robot in noisy domains. For additional HBNI results and analysis, readers can refer to our technical report [24].
VD Results: MultiRobot DecisionMaking
HBNIbased macroobservations were integrated into the DecPOSMDP framework (as described in Section III) and evaluated on a multirobot healthaware disaster relief domain (Fig. 11). This is an extension of the DecPOSMDP package delivery domain [10] involving a team of quadrotors. Disaster relief objects of 6 types (ambulance, police_car, medical_copter, news_copter, food_crate, medical_crate) are randomly generated at 2 bases, each with an associated delivery destination (hospital, airport, or crate_zone). Nine MAs are available for execution by each robot: Go to , Go to repair station for maintenance, Infer object class with 95% confidence, Pick up disaster relief object, Put down disaster relief object. Quadrotors are outfitted with the hardware discussed in Section VB and use HBNI to infer the underlying disaster relief object class during policy execution. The team receives a reward for each object delivered to the correct destination. Quadrotors also receive noisy observations from onboard health sensors and maintain a belief distribution over their underlying health state (high, medium, and low health), indicated by colored rings in Fig. 11. Robots with low health take longer to complete MAs, thereby reducing overall team reward due to the discount factor in Equation 4. Perception data is collected and used to train the HBNIbased macroobservation process, which is then used for DecPOSMDP policy search via the Graphbased Direct Cross Entropy algorithm [11].
MAs in this domain have probabilistic success rates and completion times. An augmented reality system is used to display bases, disaster relief objects, and delivery destinations in realtime in the domain. The domain includes shadows and camera noise, but perception uncertainty is further increased by projecting a dynamic daynight cycle and moving backdrop of clouds on the domain.
Our video attachment shows this multirobot mission executed on a team of quadrotors. HBNI inference occurs onboard, with the necessary number of lowlevel observations processed to achieve high confidence. Mission performance matches that of previous (simpler) results for this domain which simulated all observations [11]. To the best of our knowledge, this is the first demonstration of realtime, CNNbased classification running onboard quadrotors in a team setting. It is also the first demonstration of datadriven multirobot semanticlevel decisionmaking using DecPOSMDPs.
Vi Conclusion
This paper presented a formalization of macroobservation processes used within DecPOSMDPs, targeting scalability improvements for realworld robotics. A hierarchical Bayesian approach was used to model semanticlevel macroobservations. This approach, HBNI, infers underlying noise distributions to increase classification accuracy, resulting in a generative macroobservation model. This is especially useful in robotics, where perception sensors are notoriously noisy. The approach was demonstrated in realtime on moving quadrotors, with classification and filtering performed onboard at approximately 20 frames per second. The novel macroobservation process was then integrated into a DecPOSMDP and demonstrated in a probabilistic multirobot healthaware disaster relief domain. Future work includes extension of existing DecPOSMDP algorithms to online settings to leverage the computationalefficiency of HBNI.
References

[1]
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. FeiFei, “ImageNet Large Scale Visual Recognition Challenge,”
Int. Journal of Computer Vision (IJCV)
, pp. 1–42, April 2015.  [2] O. M. Mozos, P. Jensfelt, H. Zender, G.J. Kruijff, and W. Burgard, “From labels to semantics: An integrated system for conceptual spatial representations of indoor environments for mobile robots,” in Proc. of the Workshop ”Semantic Info. in Robotics” at IEEE ICRA, April 2007.
 [3] C. Galindo, J.A. FernándezMadrigal, J. González, and A. Saffiotti, “Robot task planning using semantic maps,” Robot. Auton. Syst., vol. 56, no. 11, pp. 955–966, November 2008.
 [4] C. Wu, I. Lenz, and A. Saxena, “Hierarchical semantic labeling for taskrelevant RGBD perception.” in Robotics: Science and Systems, D. Fox, L. E. Kavraki, and H. Kurniawati, Eds., 2014.
 [5] C. Chanel, F. TeichteilKönigsbuch, and C. Lesire, “Planning for perception and perceiving for decision POMDPlike online target detection and recognition for autonomous uavs,” in Proc. of the 6th Int. Scheduling and Planning Applications Workshop, 2012.
 [6] F. A. Oliehoek and C. Amato, A Concise Introduction to Decentralized POMDPs. Springer, 2016.
 [7] D. S. Bernstein, R. Givan, N. Immerman, and S. Zilberstein, “The complexity of decentralized control of Markov decision processes,” Math. of Oper. Research, vol. 27, no. 4, pp. 819–840, 2002.
 [8] D. S. Bernstein, C. Amato, E. A. Hansen, and S. Zilberstein, “Policy iteration for decentralized control of Markov decision processes,” Journal of Artificial Intelligence Research, vol. 34, pp. 89–132, 2009.
 [9] C. Amato, G. Konidaris, A. Anders, G. Cruz, J. How, and L. Kaelbling, “Policy search for multirobot coordination under uncertainty,” in Robotics: Science and Systems XI (RSS), 2015.
 [10] S. Omidshafiei, A.A. AghaMohammadi, C. Amato, and J. P. How, “Decentralized control of partially observable markov decision processes using belief space macroactions,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on. IEEE, 2015, pp. 5962–5969.
 [11] S. Omidshafiei, A.A. AghaMohammadi, C. Amato, S.Y. Liu, J. P. How, and J. Vian, “Graphbased cross entropy method for solving multirobot decentralized pomdps,” in Robotics and Automation (ICRA), 2016 IEEE International Conference on. IEEE, 2016, pp. 5395–5402.
 [12] L. Xu, A. Krzyzak, and C. Y. Suen, “Methods of combining multiple classifiers and their applications to handwriting recognition.” IEEE Trans. on Systems, Man, and Cybern., vol. 22, no. 3, 1992.
 [13] R. Florian and D. Yarowsky, “Modeling consensus: Classifier combination for word sense disambiguation,” in Proc. of the ACL02 Conf. on Empir. Methods in Nat. Lang. Proc., vol. 10. Stroudsburg, PA, USA: Association for Computational Linguistics, 2002, pp. 25–32.
 [14] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics, 2001.
 [15] A. Coates and A. Y. Ng, “Multicamera object detection for robotics.” in ICRA. IEEE, 2010, pp. 412–419.
 [16] M. Yguel, O. Aycard, and C. Laugier, “Update policy of dense maps: Efficient algorithms and sparse representation.” in FSR, ser. Springer Tracts in Advanced Robotics, C. Laugier and R. Siegwart, Eds., vol. 42. Springer, 2007, pp. 23–33.
 [17] I. J. Good, “Some history of the hierarchical Bayesian methodology,” in Bayesian Stat., J. M. Bernardo, M. H. DeGroot, D. V. Lindley, and A. F. M. Smith, Eds. Valencia University Press, 1980, pp. 489–519.
 [18] J. Huggins and J. Tenenbaum, “Risk and regret of hierarchical bayesian learners.” in ICML, ser. JMLR Proc., F. R. Bach and D. M. Blei, Eds., vol. 37, 2015, pp. 1442–1451.
 [19] W. K. Hastings, “Monte Carlo methods using Markov chains and their applications,” Biometrika, vol. 57, pp. 97–109, 1970.
 [20] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” arXiv preprint arXiv:1408.5093, 2014.

[21]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in
Advances in neural information processing systems, 2012, pp. 1097–1105.  [22] J. C. Platt, “Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods,” in Advances in Large Margin Classifiers. MIT Press, 1999, pp. 61–74.
 [23] S. Omidshafiei, A. Aghamohammadi, Y. F. Chen, N. K. Ure, S. Liu, B. Lopez, J. How, J. Vian, and R. Surati, “MARCPS: Measurable Augmented Reality for Prototyping CyberPhysical Systems,” in IEEE CSM, 2016.
 [24] S. Omidshafiei, B. T. Lopez, J. P. How, and J. Vian, “Hierarchical bayesian noise inference for robust realtime probabilistic object classification,” Tech. Rep., 2016, http://arxiv.org/abs/1605.01042.