I Introduction
Mobile sensing tasks are critical robotic applications in which a sensing agent gathers information about an environment. In localization, a type of mobile sensing task, the information gathered by the agent reduces uncertainty about unknown parameters. These parameters might represent the location of a target that must be found. Example targets include GPS jammers [1], avalanche beacons [2], or radiotagged wildlife [3]. Localizing these targets quickly is often critical, so planning informative paths for mobile sensors is an important research topic.
Unfortunately, planning paths that maximize gathered information is a difficult controls problem. Longterm optimality can theoretically be achieved with dynamic programming, but these approaches are often computationally infeasible, and it is difficult to attain approximately optimal solutions [4]. Greedy optimizations that maximize the information gathered in the next timestep are computationally feasible and have been implemented in many tasks [2, 3, 5]. However, greedy planners can lead to poor longterm results and are vulnerable to unmodeled noise [4, 6].
One approach to the planning problem is to generate an information map. This map associates a state in the sensing agent’s state space to some measure of information. The agent can plan a path through this distribution of information that avoids getting stuck in local minima. The information map is constructed using the target estimate, the sensor model, and informationtheoretic quantities like Fisher information or mutual information.
Information maps are an integral component of ergodic control for mobile sensing tasks. In this control framework, mobile sensors execute trajectories that are ergodic with respect to an information map. Trajectories are ergodic if they spend time in a state space region proportional to the information available there. There is empirical evidence that ergodic trajectories efficiently gather information while being robust to unmodeled sensor noise [6].
Unfortunately, these information maps can themselves be computationally challenging to generate. As observations are made, the belief—the distribution over target locations—changes, and the information map changes as a result. In order to incorporate these changes, new trajectories are planned and executed in a modelpredictive fashion. Before these trajectories can be recomputed, the information map must be updated with recent measurements; it is therefore crucial that these maps be generated in realtime. However, the information map can be computationally expensive to update, hindering realtime application.
We propose using convolutional neural networks to generate information maps directly from beliefs. These networks are trained offline using simulated trajectories and a sensor model. When given a belief, these networks output the information maps or their Fourier coefficients. As a result, the information maps are produced quickly and can easily be applied online in real mobile sensing tasks. We demonstrate the speed improvements in simulations. Depending on the sensor model and type of information map, computation time is reduced by two orders of magnitude.
Ii Background
Iia Neural Networks
The idea of using machine learning to speed up online computation is not new. In an early example, support vector machines were used to determine if a robotic agent could reach other points in the state space
[7]. Traditional approaches numerically solved a twopoint boundary value problem to determine reachability, but this machine learning approach drastically reduced computation time.In our application, we also have values with high computational complexity that must be computed in realtime. We also try a machine learning approach but use convolutional neural networks instead of support vector machines. Convolutional neural networks are a natural choice because our input is an information distribution over the state space. This input is like an image and we expect there to be some spatial correlation between points in the state space.
Convolutional neural networks have had stunning success classifying and modifying images, in large part because of spatial correlation in images
[8]. A convolutional layer is a set of filters that is convolved over the input image. These filters detect repeated features in the input. Typically, a few convolutional layers are stacked until fed into a fully connected layer for the output.IiB Ergodic Control
One application that has seen extensive use of information maps is ergodic control for mobile sensors. The mobile sensor maintains an information map , which shows how information is distributed over its state space. The sensor then plans a trajectory that is ergodic with respect to this distribution. As the sensor executes the trajectory and collects measurements, the information map is updated and new trajectories are computed in a modelpredictive fashion.
The sensor trajectory is converted into a spatial distribution over the state space . If is a sensor trajectory of duration , the density of this spatial distribution at a point is
(1) 
where is the Dirac delta function.
In some methods, an ergodic trajectory is generated by comparing to using a metric like KL divergence [9]. However, it is more common to decompose and into Fourier coefficients and modify the trajectory until the coefficients are roughly equal [10, 11, 12]. The information map is decomposed according to
(2) 
where is a Fourier basis function and is a multiindex with as many dimensions as the state space. For example, if , . The highestorder coefficient in any dimension is ; in the example, and each range from 0 to . In Euclidean space, the basis function is cosinebased [10], but in special Euclidean groups such as a basis function that uses the Bessel function and complex exponentials is used [13].
The coefficients are derived in a similar fashion and the ergodic objective compares the coefficients:
(3) 
The weighting factor assigns higher weight to low frequency coefficients and is well explained in the literature [10]. A trajectory is deemed ergodic when is low.
Executing ergodic control in realtime can be challenging [12]. One of the challenges is that the information map changes with new measurements. An ergodic trajectory is generated for , but this map becomes obsolete as new measurements are made. Recomputing the information map for each new measurement are made can be computationally expensive and might not be feasible on a robot with limited computing power. It can also be difficult to generate the Fourier coefficients . For this reason, we use neural networks not just to generate but also .
Iii Model
As a motivating example, we consider the localization of a single, stationary target with a mobile sensor.
Iiia Dynamic Model
The stationary target has a location . The set of possible target locations is a 2D square. The location consists of north and east components so that .
At time , the mobile sensor is in state . The sensor state space depends on the sensor model used. If the agent’s heading is unimportant, then . If heading matters, then the agent state space is a subset of a special Euclidean group: . In this case, the agent state includes a heading in addition to its 2D position.
This paper’s focus is on information map generation, so we use a simple dynamic model. The mobile sensor has deterministic, single integrator dynamics.
IiiB Sensor Models
Measurements are made every seconds. At time , the mobile sensor makes the measurement , where is the domain of possible measurements.
The sensor model provides
, the probability of receiving measurement
given mobile sensor state and target location . This probability is used in the filtering and estimation as well as generation of the information map.We consider two sensor models in this work. The first is a bearingonly sensor that returns bearing estimates to the target. Such measurements can be obtained with beamsteering [14]. Because the beam is electronically steered, the sensing agent’s heading does not affect the sensing model, resulting in the state space . The sensor state consists of a north and east component: . The measurement obtained at time is
(4) 
where is zeromean Gaussian noise. Measured east of north, the bearing to the target is
(5) 
The probability is derived from this model.
The second sensor model is a fieldofview (FOV) sensor introduced in previous work [5]. The sensing agent makes radio strength measurements simultaneously with two directional antennas. One points forward and the other rearward. Only two measurements are possible so that . A measurement of 1 is received when the front antenna measures a higher strength than the rear antenna; otherwise 0 is received. Because the antennas are directional, we expect a measurement of 1 when the mobile sensor points at the target. Here, the sensor’s heading affects the measurement, so . Denoting the sensor state , the measurement function is
(6) 
IiiC Belief and Filtering
Because the target location is unknown, a distribution over possible target locations is maintained. This distribution is called the belief and denotes the belief at time .
In this work, we use a discrete filter, sometimes called a histogram filter [15]. In a discrete filter, the search area is discretized into grid. The belief is an array representation of this grid. The weight of an entry is the probability the target is in the corresponding grid cell.
Unlike Kalman filters, discrete filters do not require unimodal, Gaussian beliefs or linearizable sensing models. Unlike particle filters, discrete filters represent the belief as an array, making it easier to feed into machine learning techniques like neural networks. The discrete filter has also been used extensively in localization tasks
[3, 5].The term is the probability the target is in cell , according to the belief at time . Given the belief from the previous timestep, , and the measurement made at the current timestep, , the belief is updated according to
(7) 
The belief is normalized so it sums to one.
Iv Information Maps
The information map maps the state space to a measure of information quality or quantity. To feasibly compute the map, we compute the information values at a discrete set of points . The information at is computed using quantities like mutual or Fisher information.
Computing information values typically requires integrating over the target space , so we also limit this to a discrete set of points . It is sometimes necessary to integrate over the measurement space . If this space is continuous, we also limit ourselves to a discrete set . In bearingonly localization, where , we choose .
Iva Mutual Information
Mutual information is often used to guide mobile sensors in localization tasks [2, 5]
. The mutual information at a state is equal to the expected reduction in belief entropy resulting from a measurement there. Entropy captures the uncertainty in a distribution or random variable. Because the goal in localization is to reduce uncertainty about the unknown parameter
, minimizing belief entropy is sensible.Given two random variables and , the mutual information is the amount of information obtained about one variable given the other is known. Equivalently, gives the reduction in uncertainty of given is known or the reduction in uncertainty of given is known. Mutual information is symmetric so .
In localization, we are often interested in , the reduction in uncertainty of given the next measurement is known. Strictly speaking, is a distribution and not a random variable; we abuse notation and use to refer to the random variable describing the value of , which has distribution . The measurement at the next timestep, , is a random variable because it is an unknown quantity.
The value of is made explicit in the relation
(8) 
The quantity is the current entropy of . The quantity is the conditional entropy of given the next measurement were known. We leverage the symmetry of mutual information:
(9) 
In greedy control, Eq. 9 is evaluated for each , or possible agent state at the next timestep. The first term is
(10) 
where, using total and conditional probability,
(11) 
The second term in Eq. 9 is
(12) 
where for short.
When generating an information map, Eq. 9 is evaluated for each instead of , the list of states that can be reached in the next timestep. Each term in Eq. 9 is of order so generating the entire map is . Each operation requires a call to the sensor model , which can be expensive. For example, a bearing sensing modality requires calls to relatively expensive trigonometric functions. However, these calls can be reduced with caching and memoization.
The main computational concern is that the numbers of sensor and target states are often exponential functions of some other variable. Consider a mobile sensor localizing a target in a square field. We might discretize to states per dimension, meaning the sensor and target could each occupy any of states. Generating the information map is of order . Clearly, increasing the discretization or the size of the search area incurs enormous increases in computation.
IvB Fisher Information
Fisher information offers another way to generate information maps. The Fisher information describes the amount of information that a random variable carries about the unknown parameter .
In our case, the observable variable is the measurement and it is conditional on the sensor state and target state . We restrict ourselves to the bearingonly sensor model, where the measurement value is scalar and has Gaussian noise that is constant across the state space. The Fisher information matrix for a specific sensortarget state is
(13) 
where the
is the standard deviation of the Gaussian noise and
is the gradient of the measurement function with respect to . In bearingonly sensing, is the true bearing in Eq. 5 and its gradient is(14) 
When calculating Fisher information for a point in the sensor’s state space, the sensor state is known but the target state is not. Therefore, the current belief is used to take an expectation over sensor states:
(15) 
An information map requires a scalar value of information at each point, so the determinant is commonly used [16]:
(16) 
The information map is built using Eqs. 16, 15, 14 and 13 to evaluate the information at each point .
The computational complexity of generating the Fisher information map is , better than mutual information by the factor . This factor is eliminated in part because of the simplified version of Fisher information in Eq. 13. In cases with more complex noise models, integration over the measurement space is needed to compute . However, can be precomputed offline, so that the Fisher information map complexity is still . Further, there are no calls to the log or measurement functions. The low complexity helps explain why Fisher information maps are common in prior work [6, 16], including a realtime implementation on a robot [12].
However, Fisher information is not the best metric for all problems, and picking the exact form of the information map is an open research question. In some problems, it is not even clear how to apply Fisher information. For example, the FOV sensor model from Section IIIB has just two discrete observations, and the gradient is not well defined. Mutual information might be more appropriate in that case.
V Network Design
We design networks for a mobile sensor according to the models from Section III. A stationary target sits in a field. The belief is represented with an discrete grid, where
. The weight of each cell in the belief gives the probability the target is in the grid. The belief is initialized to a uniform distribution.
The agent moves through this field while searching for the target. When using the bearing modality, we also discretize the agent state space to points in the search area. When using the FOV modality, the agent state space is , as we discretize possible agent headings into 36 points.
By using the sensor models and mutual or Fisher information, information maps over the agent state space can be generated. In the bearing modality, these maps cover points; in the FOV modality, they cover points.
We also generate Fourier coefficients from these information maps. For the bearing modality, we use as the highest order coefficient, in line with prior work [12]. In the FOV modality, we use , the smallest value that captured major features in observed information maps.
Va Neural Network Architectures
The first architecture takes in an discrete belief and outputs either an or an information map, depending on the sensing modality.
The input passes through two convolutional layers, a fully connected layer, a deconvolution layer, and a softmax activation. The number of filters per convolutional layer depends on the size of the output. The softmax activation ensures the output sums to one, making it easier to use KL divergence as the loss function.
The second architecture takes in an belief and outputs a vector containing the coefficients of the information map. Because there are far fewer coefficients than points in the belief, the network is simpler. The network consists of two convolutional layers before two fully connected layers. The mean absolute error is used as the loss function.
VB Training
To train the networks, we run 500 simulations of 20 steps each. In each simulation, the target sits at a random location. The sensing agent selects its control input with a onestep, mutual information optimization. Measurements are made at each step, after which an information map is generated and decomposed into Fourier coefficients. The beliefs are used as training inputs, and the resulting maps and coefficients are used as training outputs.
Training was done on a Tesla k40c graphics processing unit (GPU). A GPU is not necessary, but it reduced training time from a few hours to about ten minutes.
Overfitting is always a concern with machine learning. To minimize overfitting, we separate 10% of the training data into a validation set. At each epoch, the loss is evaluated on both the training and validation sets. If loss diverges, overfitting has likely occurred. This behavior was not observed.
VC Complexity in Evaluation
Before evaluating the trained networks in simulation, we consider the computational complexity of these evaluations. The networks are trained offline, so it does not matter if training is slow. However, a trained network must generate information maps from beliefs in realtime. The computational complexity of evaluating a convolutional neural network for a new input is , where is the number of convolutional layers, is the number of input layers to layer , is the filter width of layer , is the number of filters in layer , and is the width of layer ’s output [17]. The input is 2D so the number of input layers is
. We set the stride to one and zeropad so that the output width
equals the input width. The input is an belief, so the output width of a layer is .Because convolutions take most of the computation time, this complexity does not include the cost of any pooling or fully connected layers; prior empirical work suggests these layers account for 5–10% computation time [17].
If the network structure is held constant except for the input size , then the asymptotic complexity is . Recall that Fisher information was ; if we use points for and for , the asymptotic complexity is . If is discretized with points, then the complexity is . In theory, neural networks can generate information maps faster than computing them with Fisher or mutual information.
Of course, this result is theoretical and describes the limit as
grows. In reality, other network elements affect computation time. Further, convolutional layers often have nonlinear activation functions at their output, which can be expensive to compute. Finally, it is possible the network structure must implicitly grow with input width
. Perhaps more filters would be needed to capture finescale details that appear due to finer discretization of the state space.Vi Simulations
Once designed and trained, the networks are evaluated in simulations. After each observation, the belief is updated and information maps are generated along with their Fourier coefficients. These are compared to the neural network outputs. An example is shown in Fig. 1.
All quantitative results in this section are from 100 20step simulations with random target locations. As in the data generation, the agent moves according to a myopic entropy minimization. As a result, the beliefs seen in execution are similar to, but not necessarily equal to, those seen in training.
A tilde indicates a distribution was generated from Fourier coefficients, and the superscript n indicates the distribution was generated by a neural network. For example, is the true information map generated by the equations in Section IV; is the distribution generated from the true Fourier coefficients—that is, coefficients generated from the true distribution. The distribution is generated from the networkproduced coefficients and is the neural network approximation of the true information map.
Via Quality of Approximation
Because neural networks are nonlinear function approximators, there will be some degradation in the information maps produced. KL divergence is used to evaluate this degradation quantitatively. The KL divergence is a measure of how well approximates ; the KL divergence is zero when equals .
Table I shows the average KL divergence after each simulation step. The first quantity, , compares the networkproduced information maps to the true maps. The second quantity, , captures the quality of the networkproduced coefficients by comparing their reconstructed information map to that reconstructed from the true coefficients. The third quantity, , compares the map generated from the true coefficients to the true information map. Fourier coefficients introduce bandlimiting degradation but are still used to guide mobile sensors [6, 10, 11, 13, 12], so this last value is a useful reference of acceptable quality.
Modality  Metric  

Bearing  Fisher  0.069  0.00045  2.78 
Mutual  0.036  0.0074  0.049  
FOV  Mutual  0.038  0.010  0.10 
The results suggest the networks accurately capture the information maps. The divergence values between the true FOV maps and the network maps are low. The divergence is only 0.038 when comparing the network map to the true map. In comparison, the divergence is nearly triple that when using the true coefficients to reconstruct the information map, suggesting that more information is lost when approximating with the true coefficients than with the neural network. If the true coefficients can be used in control tasks, then the network output will suffice as well. Figure 2 shows the approximations are also visually similar to the true maps.
ViB Computation Time
Table II shows the mean time to generate maps and coefficients from beliefs. For the true methods, the map is made before decomposing it into coefficients, so the true coefficient time includes the true map generation time.
Time to Compute (s)  

Modality  Metric  Method  Map  Coefficients 
Bearing  Fisher  True  0.0061  0.0061 
NN  0.0031  0.0016  
Mutual  True  0.33  0.33  
NN  0.0021  0.0013  
FOV  Mutual  True  0.76  1.33 
NN  0.0093  0.026 
In the bearing modality, where the information map is a distribution over , the time to compute Fourier coefficients from the map is trivial. Both the domain and number of coefficients are small, leading to fast computation.
Fisher information is also computed rapidly, resulting in computation times that are slower than, but comparable to, the neural network times. Although a neural network can be much faster in the asymptotic limit, there is not much difference at the map size used in this work.
However, when using mutual information, neural networks generate maps and coefficients roughly two orders of magnitude faster. This difference holds in the FOV modality, where the information maps cover . The Fourier decomposition is slow because more coefficients are needed to faithfully represent the distribution and there is another dimension to integrate over. The neural network is much faster.
Simulations were performed on a laptop computer with an i7 processor and 8 GB RAM. Neural network evaluations were performed on the CPU (instead of the GPU) for a fair comparison. Care was also taken to reduce the computation time of mutual information and its Fourier coefficients. Julia, a highlevel language whose performance approaches C, was used. Caching and memoization were used to eliminate calls to measurement functions or the complex functions used in Fourier decomposition. Vectors were ordered to match Julia’s columnmajor ordering and prevent cache misses. Nonetheless, the neural network generated maps much more quickly and could be comfortably used at rates greater than , allowing realtime use.
Vii Conclusion
Convolutional neural networks can generate highfidelity information maps in realtime, allowing mobile sensors to update maps as new observations are made. This technique is already being implemented on a real robot [5], which uses the models in this paper. Future work will evaluate the robustness and fidelity of networkgenerated maps in the presence of unmodeled noise or trajectories significantly different from those seen in training. Other approximation techniques will also be compared to the neural network approach.
References
 [1] A. Perkins, L. Dressel, S. Lo, and P. Enge, “Demonstration of UAVbased GPS jammer localization during a live interference exercise,” in Institute of Navigation (ION) GNSS+, 2016.
 [2] G. M. Hoffmann, S. L. Waslander, and C. J. Tomlin, “Distributed cooperative search using informationtheoretic costs for particle filters, with quadrotor applications,” in AIAA Guidance, Navigation, and Control Conference (GNC), 2006.
 [3] O. M. Cliff, R. Fitch, S. Sukkarieh, D. L. Saunders, and R. Heinsohn, “Online localization of radiotagged wildlife with an autonomous aerial robot system,” in Robotics: Science and Systems, 2015.
 [4] L. Dressel and M. J. Kochenderfer, “Efficient decisiontheoretic target localization,” in International Conference on Automated Planning and Scheduling (ICAPS), 2017.
 [5] ——, “Efficient and lowcost localization of radio signals with a multirotor UAV,” in AIAA Guidance, Navigation, and Control Conference (GNC), 2018.
 [6] L. Miller, “Optimal ergodic control for active search and information acquisition,” Ph.D. dissertation, Northwestern University, 2015.
 [7] R. E. Allen, A. A. Clark, J. A. Starek, and M. Pavone, “A machine learning approach for realtime reachability analysis,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2014.

[8]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in
Advances in Neural Information Processing Systems (NIPS), 2012.  [9] E. Ayvali, H. Salman, and H. Choset, “Ergodic coverage in constrained environments using stochastic trajectory optimization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017.
 [10] G. Mathew and I. Mezić, “Metrics for ergodicity and design of ergodic dynamics for multiagent systems,” Physica D: Nonlinear Phenomena, vol. 240, no. 4, pp. 432–442, 2011.
 [11] L. M. Miller and T. D. Murphey, “Trajectory optimization for continuous ergodic exploration,” in American Control Conference (ACC), 2013.
 [12] A. Mavrommati, E. Tzorakoleftherakis, I. Abraham, and T. D. Murphey, “Realtime area coverage and target localization using recedinghorizon ergodic exploration,” IEEE Transactions on Robotics, vol. 34, no. 1, pp. 62–80, 2018.
 [13] L. M. Miller and T. D. Murphey, “Trajectory optimization for continuous ergodic exploration on the motion group SE(2),” in IEEE Conference on Decision and Control (CDC), 2013.
 [14] A. Perkins, Y.H. Chen, W. Lee, S. Lo, and P. Enge, “Development of a threeelement beam steering antenna for bearing determination onboard a UAV capable of GNSS RFI localization,” in Institute of Navigation (ION) GNSS+, 2017.
 [15] S. Thrun, W. Burgard, and D. Fox, Probabilistic Robotics. MIT Press, 2005.
 [16] L. M. Miller, Y. Silverman, M. A. MacIver, and T. D. Murphey, “Ergodic exploration of distributed information,” IEEE Transactions on Robotics, vol. 32, no. 1, pp. 36–52, 2 2016.

[17]
K. He and J. Sun, “Convolutional neural networks at constrained time cost,”
in
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR)
, 2015.
Comments
There are no comments yet.