1 Introduction
We introduce a new approach to the adaptive sensing problem, in which a robot must traverse an environment to identify locations or items of interest. Adaptive sensing encompasses many wellstudied problems in robotics, including the rapid identification of accidental contamination leaks and radioactive sources [1, 2], and finding individuals in search and rescue missions [3]. In such settings, it is often critical to devise a sensing trajectory that returns a correct solution as quickly as possible. We focus on the problem of radiocative sourceseeking (RSS), in which a UAV (Fig. 1) must identify the largest radioactive emitters in its environment, where is a userdefined parameter. RSS is a particularly interesting instance of the adaptive sensing problem, due both to the challenges posed by the highly heteregenous background noise [4], as well as to the existence of a wellcharacterized sensor model amenable to the construction of statistical confidence intervals. We emphasize, however, that our main contribution, , generalizes to other settings with multiple signal sources and a heterogenereous background.
The current state of the art for this type of problem is information maximization, in which measurements are collected in the most promising locations, following a receding planning horizon. Information maximization is appealing because it favors measuring regions that are likely to contain the highest emitters, and avoids wasting time elsewhere. However, when operating in realtime, computational constraints necessitate approximations such as limits on planning horizon and trajectory parameterization. These limitations scale with size of the search region and complexity of the sensor model, and they may cause the algorithm to be excessively greedy and spend too much time tracking down false leads.
We introduce , a successiveelimination framework for general adaptive sensing problems, and demonstrate it within the context of RSS. explicitly maintains confidence intervals over the emissions rate at each point in the environment. Using these confidence intervals, the algorithm identifies a set of candidate points likely to be among the top emitters, and eliminates points which are not.
Rather than iteratively planning for short, receding time horizons, repeats a fixed, globallyplanned path, adjusting the robot’s speed in realtime to focus measurements on promising regions. This approach offers coverage of the full search space while affording an adaptive measurement allocation in the spirit of information maximization. By maintaining a single fixed, global path, we reduce the online computational overhead, yielding an algorithm easily amenable to realtime implementation.
In simulations (Sec. 6), we find that significantly outperforms an implementation of information maximization based on [5]. While initially surprising, this suggests that either our confidence interval approach yields a more efficient measurement allocation, or that the receding horizon in information maximization causes it to be misled by spurious background signals. To disambiguate between these two effects, we compare to a uniform search algorithm which follows the same global path as but at a uniform speed. Simulations and theoretical results demonstrate that excels in situations where background signals are heterogeneous. Surprisingly, this uniform procedure with global planning also outperforms the information maximization method, but by a smaller margin than . This suggests that the global path is an important factor in ’s success. Finally, we corroborate these findings with a hardware demonstration, using a quadrotor helicopter in a motion capture arena.
2 Related Work
Adaptive, or active, robotic sensing is central to problems in environmental monitoring [6, 1, 2], and encompasses many different modeling choices and algorithmic methods.
Information maximization methods. One of the most popular approaches to robotic sourceseeking, notably in active localization [7] and target localization [8], is to choose trajectories that maximize some measure of information [9, 10, 11, 7]. Planning for information maximizationbased methods typically proceeds with a receding horizon [9, 12, 5, 13, 14]. In the specific case of linear Gaussian measurements, Atanasov et al. [15] formulate the informative path planning problem as an optimal control problem which affords an offline solution.
Marchant et. al. [12] combine upper confidence bounds (UCBs) at potential source locations with a penalization term for travel distance to define a greedy acquisition function for Bayesian optimization. Their follow up work [5] reasons at the path level to find longer, more informative trajectories. Noting the limitations of a greedy receding horizon approach, [16] incentivizes exploration by using a lookahead step in planning. Recently, [10] encodes a notion of spatial hierarchy in designing informative trajectories, based on a multiarmed bandit formulation. While [10] and are similarly motivated, hierarchical planning is inefficient for many sensing models, e.g. for shortrange sensors, or signals that decay quickly with distance from the source.
Gaussian processes for information maximization. Information maximization methods require a prior distribution on the underlying signals. Many works in active sensing model this prior as being drawn from a Gaussian process (GP) over an underlying space of possible functions [8, 9, 12]. The GP prior tacitly enforces the assumption that the sensed signal is smooth [12]. In certain applications, this is well motivated by physical laws, e.g. diffusion [16]. However, GP priors may not reflect the sparse, heterogeneous emissions encountered in radiation detection and similar problem settings.
Multiarmed bandit approaches. draws heavily on confidencebound based algorithms from the pure exploration bandit literature [17, 18, 19]. In contrast to these works, our method allows for efficient measurement allocation despite the spatial constraints of movement inherent to mobile robotic sensing, and explicitly encorporates a realistic sensor model. Other works have studied spatial constraints in the online, “adversarial” reward setting [20, 21]. One can also study bandit algorithms from a Bayesian perspective, where a prior is placed over underlying rewards. For example, [22] provides an interpretation of GPUCB in terms of information maximization. is similar to the lower and upper confidence bound (LUCB) algorithm [23], but opts for successive elimination over the more aggressive LUCB sampling strategy to afford efficient traversal of the search space.
Other sourceseeking methods. Modeling emissions as a continuous field, gradientbased approaches estimate and follow the gradient of the measured signal toward local maxima [24, 25, 26]. One of the key drawbacks of gradientbased methods is their susceptibility to finding local, rather than global, extrema. Moreover, the error margin on the noise of gradient estimators for largegain sensors measuring noisy signals can be prohibitively large [27], as is the case in RSS.
3 AdaSearch Planning Strategy
Problem Statement. We consider signals (e.g. radiation) which emanate from a finite set of environment points . Each point emits signals indexed by time with means , independent and identically distributed over time. Our aim is to correctly and exactly discern the set of the points in the environment which emit the maximal signals:
(1) 
for a prespecified integer . Throughout, we assume that is unique.
In order to decide which points are maximal emitters, the robot takes sensor measurements along a fixed path in the robot’s configuration space. Measurements are determined by a known sensitivity function that describes the contribution of environment point to a sensor measurement collected from sensing configuration . We consider a linear sensing model in which the total observed measurement at time , , taken from sensing configuration , is the weighted sum of the contributions from all environment points:
(2) 
Note that while is known, the are unknown, and must be estimated via the observations .
The path of sensing configurations, , should be as short as possible, yet also provide sufficient information about the entire environment. This may be expressed as a condition on the minimum aggregate sensitivity to any given environment point over the sensing path :
(3) 
Morover, we must be able to disambiguate between contributions from different environment points . Consider the sensitivity matrix that encodes the sensitivity of each sensing configuration to each point , so that . The disambiguation constraint then translates to a rank constraint: . Sections 4.1 and 4.2 define two different sensitivity functions which we will study in the context of the RSS problem. In Section 7, we discuss sensitivity functions that may arise in other application domains.
The AdaSearch algorithm. (Alg. 1) proceeds by concentrating measurements in regions of uncertainty, until it is confident which points belong to . At each round , we maintain a set of environment points which we are confident are among the top, and a set of candidate points about which we are still uncertain. As the robot traverses the environment, new sensor measurements allow us update the confidence intervals
and prune the uncertainty set . The procedure for constructing these intervals from observations should ensure that for every ,
with high probability. Sections
4.1 and 4.2 detail the definition of these confidence intervals under different sensing models.Using the updated confidence intervals, we can expand the set and prune the set . The new topset is comprised of the old topset as well as all points whose lower confidence bounds exceed the upper confidence bounds of all but points in ; formally,
(4) 
Next, the points added to are removed from , since we are now certain about them. Additionally, we remove all points in whose upper confidence bound is lower that than the lower confidence bounds of at least points in . The set is defined constructively as:
(5) 
Trajectory planning for : Observe that the update rules (4) and (5) only depend on confidence intervals for points . Rather than wasting time measuring points , at each round, chooses a subset of the sensing configurations which are informative to . For omnidirectional sensors, choosing is relatively straightforward (see Sec. 4.3). We discuss generalizing to more sophisticated sensors in Sec. 7.
defines a trajectory through the informative configurations by following the fixed path over all sensing configurations, spending a minimal time at each uninformative configuration in , and slowing down to spend time at each informative configuration in . Doubling the time spent at each in each round amortizes the time spent traversing the entire path . Changing only the speed, rather than the entire path, makes amenable to realtime operation. For omnidirectional sensors, a simple raster pattern (Fig. 1(a)) suffices for ; other cases are discussed in Sec. 7.
In Appendix A, we establish the following lemma, which states that the two update rules above guarantee the overall correctness of , whenever the confidence intervals actually contain the correct mean :
Lemma 1 (Sufficient Condition for Correctness)
For each round , . Moreover, whenever the confidence intervals satisfy the coverage property:
(6) 
then . If (6) holds for all rounds , then terminates and correctly returns .
4 Radioactive SourceSeeking with Poisson Emissions
While may be used to solve a range of adaptive sensing problems, here we refine our focus to the application of to the radioactive sourceseeking (RSS) problem with an omnidirectional sensor onboard a quadrotor helicopter. The environment is defined by potential emitter locations which lie on the ground plane, , and sensing configurations encode spatial position, . Environment points emit gamma rays according to a Poisson process, i.e. . Here, corresponds to rate or intensity of emissions from point .
Thus, the number of gamma rays observed over a time interval of length from configuration has distribution
(7) 
where is specified by the sensing model. In the following sections, we introduce two sensing models: a pointwise sensing model amenable to theoretical analysis (Sec. 4.1), and a more physically realistic sensing model for the RSS problem (Sec. 4.2).
In both settings, we develop appropriate confidence intervals for use in the algorithm. We introduce the specific path used for global trajectory planning in Sec. 4.3. Finally, we conclude with two benchmark algorithms to which we compare (Sec. 4.4) for the RSS problem.
4.1 Pointwise Sensing Model
First, we consider a simplified sensing model, where the set of sensing locations coincides with the set of all emitters, i.e. each corresponds to precisely one and vice versa. The sensitivity function is defined as
Now we derive confidence intervals for Poisson counts observed according to this sensing model. Define to be the total number of gamma rays observed during the time interval of length spent at . The maximum likelihood estimator (MLE) of the emission rate for point is . In Appendix B, we introduce the bounding functions and , and show that for any , , and ,
Let denote the number of gammas rays observed from emitter during round , so that . For any point , the corresponding duration of measurement would be . The bounding functions above provide the desired confidence intervals for signals :
(8) 
This bound implies that the inequality holds with probability . Dividing by , we see that and are valid confidence bounds for .
The term can be thought of as an “effective confidence” for each interval that we construct during round . In order to achieve the correctness in Lemma 1 with overall probability , we set the effective confidence at each round to be (see Appendix C.3). As the number of confidence intervals we reason about increases, we must be more confident about each individual interval in order to retain a total confidence of . Thus, is decreasing in both the round number and the environment size .
4.2 Physical Sensing Model
A more physically accurate sensing model for RSS reflects that, in general, the gamma ray count at each location is a sensitivityweighted combination of the emissions from each environment point. Conservation of energy allows us to approximate the sensitivity function with an inversesquare law , where is a known, sensordependent constant.
Because multiple environment points contribute to the counts observed from any sensor position , the MLE for the emission rates at all is difficult to compute efficiently. However, we can approximate it in the limit: as . Thus, we may compute as the least squares solution:
(9) 
where
is a vector representing the mean emissions from each
, is a vector representing the observed number of counts at each of consecutive time intervals, and is a rescaled sensitivity matrix such that gives the measurementadjusted sensitivity of the environment point to the sensor at the sensing position.^{1}^{1}1Explicitly stated, , The rescaling termis a plugin estimator for the variance of measurement
(with small bias introduced for numerical stability), which serves to downweight measurements with higher variance. The resulting confidence bounds are given by the standard Gaussian confidence bounds:(10) 
where controls the roundwise effective confidence widths in equation (10) as a function of the desired maximum probability of overall error,
. We use a Kalman filter to solve the least squares problem (
9) and compute the confidence intervals (10).4.3 Design and Planning for .
Pointwise sensing model. In the pointwise sensing model, and the informative measurements about the signal value can only be obtained from location . Hence, the informative sensing locations at round are precisely . We therefore choose the path to be a simple space filling curve over a raster grid, depicted in Fig. 1(a), which provides coverage of all of . We adopt a simple dynamical model of the quadrotor in which it can fly at up to a prespecified top speed, and where acceleration and deceleration times are neglible. This model is reasonably accurate for large outdoor environments where travel times are dominated by movement at maximum speed. We let denote the time required for the quadrotor to traverse a given grid pixel at top speed. Figure 1(a) shows an example environment with a raster pattern trajectory overlaid, while Fig. 1(b) illustrates the trajectory followed during round (recall that begins at ) with desired measurement locations shown in green.
Physical sensing model. Because the physical sensitivity follows an inversesquare law, the most informative measurements about are those taken at locations near to . Hence, we restrict measurement locations to the points one meter vertically offset above points on the ground plane. We use the same design and planning strategy as in the pointwise measurement model, following the raster pattern depicted in Fig. 1(a).
4.4 Baselines
To demonstrate the effectiveness of , we compare it to two baselines: a uniformsampling based algorithm , and a spatiallygreedy information maximization algorithm .
NaiveSearch algorithm. As a nonadaptive baseline, we consider a uniform sampling scheme that follows the raster pattern in Fig. 1(a) at constant speed. This global trajectory results in measurements uniformly spread over the grid, and avoids redundant movements between sensing locations. The only difference between and is that flies at a constant speed, while varies its speed adaptively in response to measurements it has collected so far. For emitter, terminates at the first round in which for some environment point . The general termination criterion for is described in Appendix B. Comparing to serves to measure the advantages of ’s adaptive measurement allocation separately from the effects of its global trajectory heuristic.
InfoMax algorithm. As discussed in Sec. 2, one of the most successful methods for active search in robotics is receding horizon informative path planning, e.g. [5, 13]. We implement , a version of this approach based on [5] and specifically adapted for RSS. Each planning invocation solves an information maximization problem over the space of trajectories mapping from time in the next seconds to a box .
We measure the information content of a candidate trajectory by accumulating the sensitivityweighted variance at each grid point at evenlyspaced times along , i.e.
(11) 
This objective favors taking measurements sensitive to regions with high uncertainty. As a consequence of the Poisson emissions model, these regions will also generally have high expected intensity ; therefore we expect this algorithm to perform well for the RSS task. We parameterize trajectories as Bezier curves in , and use Bayesian optimization (see [28]) to solve (11) because of the high computational cost of evaluating the objective function. Empirically, we found that Bayesian optimization outperformed both naive random search and a finite difference gradient method. We set to 10 s and used secondorder Bezier curves. Longer time horizons and higher order Bezier curves quickly become intractable in real time.
Stopping criteria and metrics. All three algorithms use the same stopping criterion, which is satisfied when the highest LCB exceeds the highest UCB. For sufficiently small probibility of error , this ensures that the top sources are almost always correctly identified by all algorithms; they are always correctly identified in all experiments in Sec. 6.
5 Theoretical Runtime Analysis
We now present a theoretical runtime analysis for and , under the pointwise sensing model from Sec. 4.1. For simplicity, we will present our the result for . Proofs, along with general results for arbitrary , and complimentary lower bounds, are presented in Appendix B.
Our analysis shows that both algorithms correctly identify the maximal source with high probability, and sharply quantifies the relative advantage of over in terms of the distribution of the background radiation. The empirical results presented in Sec. 6 validate the correctness of both algorithms, and show that our theoretical results are predictive of the relative performance of and in simulation.
We analyze with the trajectory planning strategy outlined in Sec. 4.3. For , the robot spends time at each point in each round until termination, which is determined by the same confidence intervals and termination criterion for .
We will be concerned with the total runtime, defined as
where denotes the round at which the algorithm terminates. Since , we shall replace with . Our bounds will be stated in terms of the divergences
which approximate the divergence between the distribution and (Lemma 4 in Appendix B), and therefore the sample complexity of distinguishing between the maximal source and a source emitting photons at rate . Our main theorem is as follows:
Proposition 2
Define the adaptive and uniform sample complexity terms and :
and  (12) 
No matter the distribution of emitters in the environment, . Then for any , executed with confidence parameter returns the correct maximal source and satisfies the following guarantee with probability :
Moreover, with probability , returns the correct maximal source and satisfies the guarantee:
The accounts for travel times of transitioning between measurement configurations. The second term accounts for the travel time of traversing the uninformative points in the global path at a high speed. Observe that this term is never larger than the term , and is typically dominated by . With a uniform strategy, the number of measurements (and hence the total time) scales with the largest value of over because that quantity alone determines the number of rounds required. In contrast, scales with the average of because it dynamically chooses which regions to sample more precisely. In experiments, we validate that when the values of are heterogeneous, yields significant speedups over .
6 Experiments
We compare the performance of with the baselines defined in Sec. 4.4 in simulation for the RSS problem, and validate in a hardware demonstration.
Simulation methodology. We evaluate , , and in simulation using the Robot Operating System (ROS) framework [29]. The environment points lie in a planar grid, spread evenly over a total area . Radioactive emissions are detected by a simulated sensor following the physical sensing model in Sec 4.2. We set , so that the set is a single point in the environment. We then fix a maximum emission rate photons/s, as well as a parameter photons/s governing the maximum rate of emissions for background radiation.
For each setting of , we test all three algorithms on grids randomly generated as follows. For each point , we draw the mean emissions uniformly at random in the interval . Then, we choose one point uniformly at random from the grid as the maximal point source , which we set to emit at rate . All mean intensities remain fixed throughout the trial, but are randomized between trials. Finally, we execute all three algorithms on each of the instantiated grids in a realtime simulation, using approximate nearhover quadrotor dynamics controlled by a stabilizing linear feedback controller.
. Each algorithm was given the same 10 randomized grid instantiations. Box plots show quartile values.
Results. We analyze performance with respect to the following metrics: (a) total runtime (time from takeoff until is located with confidence), (b) absolute difference between the predicted and actual emission rate of the maximal source , and (c) deviation of the estimated source emissions from the ground truth emission rates over the entire environment, measured in the Euclidean norm. Fig. 3 plots the runtimes for each algorithm for . The uniform baseline terminates significantly earlier than , and terminates even earlier, on average. The comparison between and may seem surprising, because one would expect to seek out highervariance points (which are also higher mean), and thereby outperform uniform sampling.
Fig. 3(a) plots the absolute difference in the estimated emission rate and the true emission rate at . and perform comparably over time, though terminates significantly earlier. Fig. 3(b) plots the total Euclidean error between the emissions estimates and the ground truth over the entire grid. While and have comparable performance over time, we observe that reduces total Euclidean error quickly at first but is eventually overtaken by both and . This is again surprising, since we expect to excel at totalgrid mapping.
Figure 4(b) compares relative performance of and over the trials. consistently outperforms , and the relative speedup increases as approaches . This result is consistent with the theoretical analysis in Sec. 5; the dashed line in Fig. 4(b) plots a fit curve with rule (see discussion below).
Discussion. While all three methods eventually locate the correct source , the two algorithms with global planning heuristics, and , terminate considerably earlier than , which uses a greedy, receding horizon approach (Fig. 3). Moreover, the adaptive algorithm consistently terminates before its nonadaptive counterpart, .
As approaches and the gaps become more variable, adaptivity confers even greater advantages over uniform sampling (Fig. 5). Consider the following computation. As shown in Appendix B.3, when , the sample complexities for and in (17) simplify to and , respectively. Hence, we expect the ratio of runtime to runtime to scale as , which is corroborated by the fit of the dashed line to average relative performance in Fig. 4(b).
The algorithm excels when it can quickly rule out points in early rounds. From (17) we recall that the sample complexity scales with the average value of (rather than the maximum, for ). Hence, will outperform when there are varying levels of background radiation and some points have emission rates close to (see Fig. 4(b)). However, the doubling procedure in may be wasteful, and result in overmeasuring on the last round. This may occur when all background points have emission rates considerably lower than (see in Fig. 5).
’s strength lies in quickly reducing global uncertainty across the entire emissions landscape. However, takes considerably longer to identify (Fig. 3) and, surprisingly, and ultimately outperform in mapping the entire emissions landscape on longer time scales (Fig. 3(b)). We attribute this to the effects of greedy, receding horizon planning. Initially, has many locallypromising points to explore and reduces the Euclidean error quickly. Later on, it becomes harder to find informative trajectories that route the quadrotor near the few underexplored regions. This suggests that when a path such as the raster path used here is available, it is well worth considering.
Hardware results. The previous results are based on a simulation of two key physical processes: radiation sensing and vehicle dynamics. The Poisson model of radioactive emissions and sensing is common in the literature [30]; however, there are varying degrees of accuracy in modeling vehicle dynamics. In order to validate the results of our simulation study in the presence of the inevitable mismatch between the nearhover dynamics model and true quadrotor dynamics, we test on a Crazyflie 2.0 palmsized quadrotor in a motion capture room (Fig. 1). The motion capture data (position and orientation) is acquired at roughly Hz and processed in realtime using precisely the same implementation of as was used in our software simulations. Our supplementary video shows a more detailed display of our system.^{2}^{2}2A video summarizing this work is available at: https://people.eecs.berkeley.edu/~erolf/adasearch_5mins.m4v. Fig. 1 shows a visualization of the confidence intervals, and the absolute source point estimation error, during a representative flight over a small grid, roughly m on a side. After two rounds, has identified the two highest emitting points as the highlighted pixels in the inset in Fig. 1. At this point, the absolute error in estimating is very small (see Fig. 6). For the remainder of the flight, spends most of its time sensing these two points and avoids taking too many redundant measurements elsewhere in the grid. The general pattern of performance matches the simulation results in Fig. 3(a) (solid blue).
7 Generalizations and Extensions
Unknown number of sources. If the number of sources is initially unknown, then running with small will result in measurements that are still informative about all true sources, since they must be distinguished from the top sources. This could result in sufficient measurement coverage, or it could inform the choice of a larger .
Oriented sensor. A natural extension of our radioactive sourceseeking example is to consider a sensing model with a sensitivity function that depends upon orientation. Here, the additional challenge lies in identifying informative sensing configuration sets , and a reasonably efficient equivalent fixed global path . More broadly, the sensing configurations could be taken to represent generalized configurations of the robot and sensor, e.g. they could encode the position and angular orientation of a directional sensor, or joint angles of a manipulator arm.
Pointwise sensing model. We motivated the pointwise sensing model where sensitivity function is as a model conducive to theoretical analysis. Though it is only a coarse (yet still predictive) approximation of the physical process of radiation sensing, this sensitivity model is a more precise descriptor of other adaptive sensing processes, for example, survey design. As a concrete example, suppose an aid group with enough funding to set up medical clinics sought to identify which towns had the highest rates of disease. It is reasonable to think that the data collected about village is mostly informative about only the rate of disease in that town, so that the pointwise sensing model may be quite appropriate.
Surveying. Although we demonstrate operating onboard a UAV in the context of RSS, the core algorithm applies more broadly, even to nonrobotic embodied sensing problems. Consider the problem of planning clinic locations itroduced about. Because surveys are conducted in person, the aid group is resource limited in terms of using human surveyors, both in terms of the time it takes to survey a single person or clinic within a town, and in terms of travel time between towns. A survey planner could use to guide the decisions of how long to spend in each town counting new cases of the disease before moving on to the next, and to tradeoff the travel time of returning to collect more data from a certain town with spending extra time at the town in the first place.
While provides a good starting point for solving such problems, the high cost of transportation would likely make it worthwhile to further optimize the surveying trajectory at each round, e.g. by (approximately) solving a traveling salesman problem.
8 Conclusion
In summary, we have demonstrated that statistical methods from pure exploration active learning offer a promising and underexplored toolkit for adaptive sensing. Specifically, we have shown that motion constraints need not impede active learning strategies, and highlighted incorporating realistic measurement models as fertile ground for future research.
Our main contribution, , outperforms a greedy informationmaximization baseline. Its success can be understood as a consequence of two structural phenomena: planning horizon and implicit design objective. The informationmaximization baseline operates on a receding horizon and seeks to reduce global uncertainty, which means that even if its planned trajectories are individually highly informative, they may lead to suboptimal performance over a long time scale. In contrast, uses an applicationdependent global path that provides efficient coverage of the entire search space, and allocates measurements using principled, statistical confidence intervals.
excels in situations with a heterogeneous distribution of the signal of interest; it would be interesting to make a direct comparison with GPbased methods in a domain where the smooth GP priors are more appropriate. We also plan to explore active sensing in more complex environments and with dynamic signal sources and more sophisticated sensors (e.g. directional sensors). Furthermore, as is explicitly designed for general embodied sensing problems, it would be exciting to test it in a wider variety of application domains.
Acknowledgments
This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE 1752814.
References
 [1] Kai Vetter, Ross Barnowksi, Andrew Haefner, Tenzing HY Joshi, Ryan Pavlovsky, and Brian J. Quiter. GammaRay Imaging for Nuclear Security and Safety: Towards 3D GammaRay Vision. Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 878:159–168, 2018.
 [2] Frank Mascarich, Taylor Wilson, Christos Papachristos, and Kostas Alexis. Radiation Source Localization in GPSdenied Environments using Aerial Robots. In ICRA. IEEE, 2018.
 [3] Gabriel M Hoffmann and Claire J Tomlin. Mobile sensor network control using mutual information methods and particle filters. IEEE Trans. on Auto. Control, 55(1):32–47, 2010.
 [4] Chetan D Pahlajani, Jianxin Sun, Ioannis Poulakakis, and Herbert G Tanner. Error probability bounds for nuclear detection: Improving accuracy through controlled mobility. Automatica, 50(10):2470–2481, 2014.
 [5] Roman Marchant and Fabio Ramos. Bayesian Optimisation for Informative Continuous Path Planning. In ICRA, pages 6136–6143. IEEE, 2014.
 [6] Matthew Dunbabin and Lino Marques. Robots for environmental monitoring: Significant advancements and applications. IEEE Robotics & Auto. Mag., 19(1):24–39, 2012.
 [7] Frederic Bourgault, Alexei A Makarenko, Stefan B Williams, Ben Grocholsky, and Hugh F DurrantWhyte. Information based adaptive robotic exploration. In IROS, volume 1, pages 540–545. IEEE, 2002.
 [8] Lauren M Miller, Yonatan Silverman, Malcolm A MacIver, and Todd D Murphey. Ergodic exploration of distributed information. IEEE Trans. on Robotics, 32(1):36–52, 2016.
 [9] Shi Bai, Jinkun Wang, Fanfei Chen, and Brendan Englot. InformationTheoretic Exploration with Bayesian Optimization. In IROS, pages 1816–1822. IEEE, 2016.
 [10] Yifei Ma, Roman Garnett, and Jeff G Schneider. Active Search for Sparse Signals with Region Sensing. In AAAI, pages 2315–2321, 2017.
 [11] Benjamin Charrow, Sikang Liu, Vijay Kumar, and Nathan Michael. InformationTheoretic Mapping using cauchyschwarz quadratic mutual information. In ICRA, pages 4791–4798. IEEE, 2015.
 [12] Roman Marchant and Fabio Ramos. Bayesian Optimisation for Intelligent Environmental Monitoring. In IROS, pages 2242–2249. IEEE, 2012.
 [13] Ruben MartinezCantin, Nando de Freitas, Eric Brochu, José Castellanos, and Arnaud Doucet. A Bayesian ExplorationExploitation Approach for Optimal Online Sensing and Planning with a Visually Guided Mobile Robot. Autonomous Robots, 27(2):93–103, 2009.
 [14] Carlos Guestrin, Andreas Krause, and Ajit Paul Singh. Nearoptimal sensor placements in gaussian processes. In ICML, pages 265–272. ACM, 2005.
 [15] Nikolay Atanasov, Jerome Le Ny, Kostas Daniilidis, and George J Pappas. Information Acquisition with Sensing Robots: Algorithms and Error Bounds. In ICRA, pages 6447–6454. IEEE, 2014.
 [16] Gregory Hitz, Alkis Gotovos, MarieÉve Garneau, Cédric Pradalier, Andreas Krause, Roland Y Siegwart, et al. Fully autonomous focused exploration for robotic environmental monitoring. In ICRA, pages 2658–2664. IEEE, 2014.
 [17] Eyal EvenDar, Shie Mannor, and Yishay Mansour. Action Elimination and Stopping Conditions for the MultiArmed Bandit and Reinforcement Learning Problems. JMLR, 7(Jun):1079–1105, 2006.
 [18] JeanYves Audibert and Sébastien Bubeck. Best Arm Identification in MultiArmed Bandits. In COLT, pages 13–p, 2010.
 [19] Kevin Jamieson, Matthew Malloy, Robert Nowak, and Sébastien Bubeck. lil’UCB: An Optimal Exploration Algorithm for MultiArmed Bandits. In COLT, pages 423–439, 2014.
 [20] Tomer Koren, Roi Livni, and Yishay Mansour. MultiArmed Bandits with Metric Movement Costs. In NIPS, pages 4122–4131, 2017.
 [21] Sébastien Bubeck, Michael B. Cohen, James R. Lee, Yin Tat Lee, and Aleksander Madry. kserver via multiscale entropic regularization. CoRR, abs/1711.01085, 2017.
 [22] Niranjan Srinivas, Andreas Krause, Sham M. Kakade, and Matthias W. Seeger. InformationTheoretic Regret bounds for Gaussian Process Optimization in the Bandit Setting. IEEE Trans. on Info. Theory, 58(5):3250–3265, 2012.
 [23] Shivaram Kalyanakrishnan, Ambuj Tewari, Peter Auer, and Peter Stone. Pac subset selection in stochastic multiarmed bandits. 2012.
 [24] Emrah Bıyık and Murat Arcak. Gradient Climbing in Formation via Extremum Seeking and Passivitybased Coordination Rules. Asian Journal of Control, 10(2):201–211, 2008.
 [25] Alexey S Matveev, Michael C Hoy, and Andrey V Savkin. Extremum Seeking Navigation without Derivative Estimation of a Mobile Robot in a Dynamic Environmental Field. IEEE Trans. on Control Sys. Tech., 24(3):1084–1091, 2016.
 [26] Boaz Porat and Arye Nehorai. Localizing VaporEmitting Sources by Noving Sensors. IEEE Trans. on Sig. Proc., 44(4):1018–1021, 1996.
 [27] Luma K. Vasiljevic and Hassan K. Khalil. Error Bounds in Differentiation of Noisy Signals by HighGain Observers. Systems & Control Letters, 57(10):856–862, 2008.
 [28] Ruben MartinezCantin. BayesOpt: A Bayesian Optimization Library for Nonlinear Optimization, Experimental Design and Bandits. JMLR, 15:3915–3919, 2014.
 [29] Morgan Quigley, Ken Conley, Brian P. Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Y. Ng. ROS: an OpenSource Robot Operating System. In ICRA Workshop on Open Source Software, 2009.
 [30] Kenneth Lange, Richard Carson, et al. EM Reconstruction Algorithms for Emission and Transmission Tomography. J Comput Assist Tomogr, 8(2):306–316, 1984.
 [31] Max Simchowitz, Kevin Jamieson, and Benjamin Recht. BestofK Bandits. In COLT, pages 1440–1489, 2016.
 [32] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford university press, 2013.
 [33] Emilie Kaufmann, Olivier Cappé, and Aurélien Garivier. On the Complexity of BestArm Identification in MultiArmed Bandit Models. JMLR, 17(1):1–42, 2016.
Appendix A Proof of Lemma 1
We verify Lemma 1 given in Section 3. The proof of this lemma holds for any instantiation of Algorithm 1, regardless of the sensing model or the planning strategy.
First, we verify that for each round , . Indeed, at round , , so the bound holds immediately. Suppose by an inductive hypothesis that for some . Then, for any , we have two cases:

. Then, by the inductive hypothesis, and by (5).
Next, we verify that if the confidence intervals are correct in all rounds leading up to round , i.e.
(13) 
then , and . We again use induction. Initially, we have . Now, suppose that at round , one has that , and .
To show that , it suffices to show that if is added to , then . By the inductive hypothesis there exists elements of in . Hence, if is added to , and if (13) holds, then
Hence, is among the largest values of for . Since , we therefore have that .
Similarly, to show , it suffices to show that if , and , then . For such that , and , it follows that
hence .
Finally, we verify that if (13) holds at each round, then at the termination round , , so that , so that .
Appendix B Theoretical Results for Pointwise Sensing
In this appendix, we present formal statements of the measurement complexities provided in Sec. 5 in the main text, and generalize them to the full top problem presented in Algorithm 1 of the main text. We also provide specialized bounds for the randomly generated grids considered in our simulations.
Notation: Throughout, we shall use the notation to denote that there exists a universal constant , independent of problem parameters, for which . We also define .
Formal Setup. Throughout, we consider a rectangular grid of points, and let denote the mean emission rate of each point in counts/second. We let denote the th largest mean . In the case that , we denote , and let denote the highestmean point, with emission rate . For identifiability, we assume .
Measurements. As described in the main text, we assume a pointwise sensing model in which and can measure each point directly. Recall that , at each round , takes measurements at each point , and takes measurements at each . We let denote the total number of counts collected at position at round . We further assume that are standardized according to the same time units as , measuring a source of mean for time interval of length yields counts distributed according to . Finally, we shall let denote the (random) round at which a given algorithm  either or  terminates.
Confidence Intervals At the core of our analysis are rigorous
upper and lower confidence intervals for Poisson random variables, proved in Sec.
E.1:Proposition 3
Fix any and let . Define
Comments
There are no comments yet.