1 Introduction
Sea exploration is important for countries with large areas in the ocean under their control, since in the future it may be possible to exploit some of the resources in the seafloor. However, its contents are largely unknown; for characterizing the seafloor, a preliminary step is to fetch information about its composition. This is currently being done by sending a ship in an expedition during which an underwater robot, or other equipment, collects samples at selected points. Such expeditions are typically very costly; additionally, the ship must be available for other commitments at a predetermined port within a rigid and tight time limit. Nowadays, planning is usually done by experts, based on previously collected information and on intuition.
Even though this paper describes the problem in the context of sea exploration, similar problems arise in other contexts (e.g., fire detection by drones on a forest). The aim is to schedule the journey of a ship for collecting information about the resources of the seafloor (e.g., composition in certain materials). The surface being considered here represented as a given (bounded) surface . For the sake of simplicity, we consider that the actual resource level at any point can be conveyed by a real number, denoted by . This true value is unknown, except for a limited number of points in for which there is previous empirical information.
Optimal expedition planning involves three subproblems, each corresponding to a different phase on the process. This first is assessment, which consists of the following: given a finite set of points for which the contents are known, build and indicator function that associates to each point the “attractiveness” (a real number) for exploring it, in terms of information that can be gathered in case that point is selected for probing. The second subproblem is planning, i.e., deciding on the position of a certain number of points to probe in the next expedition so as to maximize the overall informational reward; the duration of the trip is limited to a known bound. The third subproblem, estimation, is related to the final aim of the problem, which is to have an evaluation of the resource level available at any point on the surface , based on all the information available at the end of the trip.
2 Related work
The problem we address in this work is a routing problem with similarities to the orienteering problem (OP). The orienteering problem was initially introduced by Golden et al (1981), and its roots are in an outdoor sport with the same name, where there is a set of “control points”, each with an associated score, in a given area. Competitors use a compass and a map for assisting in a journey where they visit a subset of control points, starting and ending at given nodes, with the objective of maximizing their total score. They must reach the end point within a predefined amount of time.
The input to the standard OP consists of a vertex and edgeweighted graph , a source and a target vertices , and a time limit ; is the set of vertices and is the set of edges. The goal is to find an walk of total length at most so as to maximize the sum of weights at vertices visited through the walk. It can be shown that the OP is NPhard via a straightforward reduction from the traveling salesman problem. It is also known to be APXhard to approximate. The literature describes an unweighted version (i.e., with a unit score at each vertex), for which a approximation is presented in Chekuri et al (2012); for the weighted version, the approximation ratio has a loss with factor .
An essential difference between the OP and our problem is that in the OP a finite set of vertices is given, from which the solution must be selected. In our problem, only the surface where some locations may be chosen for sampling is given. To the best of our knowledge, the closest related work can be found in (Yu et al, 2016). The authors propose a nonlinear extension to the orienteering problem (OP), called the correlated orienteering problem (COP). They use the COP to model the planning of informative tours for persistent monitoring, through a single or multiple robots, of a spatiotemporal field with timeinvariant spatial correlations. The tours are constrained to have a fixed length time budget. Their approach is discrete, as they focus on a quadratic COP formulation that only looks at correlations between neighboring nodes in a network.
Another problem related to ours is tour recommendation for groups, introduced in Anagnostopoulos et al (2017), where the authors deal with estimating the best tour that a group could perform together in a city, in such a way that the overall utility for whole group is maximized. They use several measures to estimate this utility, such as the sum of the utilities of members in the group, or the utility of the least satisfied member. In our case, we estimate this utility (the attractiveness of a point) using a Gaussian process regression (see, e.g., Rasmussen and Williams (2005)). This is a generalization of the method known as kriging, introduced in Krige (1951) as a geostatistical procedure for generating an estimated surface from a scattered set of points with known values; the original application was on mine valuation.
Gaussian process regression provides estimations for both the mean and the standard deviation; we use the latter as a measure of attractiveness in the assessment phase, and the former as a value the final estimation, after the data set is extended with points actually observed in the expedition.
3 Method
3.1 Assessment and estimation.
The first and the third subproblems, assessment and estimation, are strongly related, in the sense that the aim of the assessment phase is to have a measure of the interest of having empirical information on new points in for improving the estimation. Assessment evaluates how much a given point, if probed, is expected to improve the quality of the estimation done at the third subproblem.
The estimation phase is a regression problem: given the known resource levels at the previously observed points and at points to be observed in the expedition, what is the best estimate for the resource level at a new point in ? A first step for answering this question is to make an assumption on the nature of the underlying function . Our assumption is that is can be conveniently estimated by a Gaussian process. In this approach, the model attempts to describe the conditional distribution based on a set of empirical observations of on input , conveyed as a set of triplets , where is the number of samples (in our case, before the trip and after the trip). This conditional distribution describes the dependency of the observable on the input , assuming that this relation can be decomposed into a systematic and a random component. The systematic dependency is given by a latent function , which is to be identified based on data
. Hence, the prior is on function values associated with the set of inputs, whose joint distribution is assumed to be multivariate normal.
We use the posteriors inferred thought the Gaussian process model in two ways. In the assessment phase, we use the standard deviation of the model at each point directly as an indicator of attractiveness for probing at that point. Later, in the estimation phase, the Gaussian process (now with an enlarged data set) is used as a regression for the resource level at any point in .
3.2 Planning.
The second subproblem is the selection of points in for probing, so as to allow a subsequent estimation as accurate as possible. Points are to be probed in a trip whose maximum duration is known beforehand. We are thus in the presence of an orienteering problem. A standard orienteering problem consists of the following: given a graph with edge lengths and a prize that may be collected at each vertex, determine a path of length at most , starting and ending at given vertices, that maximizes the total prize value of the vertices visited. The problem here is rather particular for several reasons. The first reason is that besides edge lengths (in our case, edge traversal durations), we have to take into account the time spend in probing at each vertex (which is a parameter of our problem). The second reason is that the graph may consist of any discrete subset of points , as long as the duration of the tour — the time spent on probing and on traveling from a point to the next — is not larger than the upper bound . An additional difficulty is related to the correlation between the prizes obtained in visited vertices; indeed, as the “prize” is a measure of the improvement on information obtained by probing, after probing at a given location, probing other locations in this neighborhood are expected to provide less information than distant points (other factors being equal).
3.3 Tackling the problem
An instance of this problem must specify the area being studied, an upper bound for the trip duration (including traveling and probing), the duration required for each probing (here considered independent of the location), and the traveling speed , which allows computing the traveling time between two given points as , where is the Euclidean distance between those points. Without loss of generality, we are assuming that the initial and end points of the trip to be planned are the same. Among the instance’s data, it must also be provided the previously known data , corresponding to a set of points and the corresponding resource level , for .
In order to evaluate an algorithm for this problem, another set of points at which the level of information predicted by the model is requested, should also be specified. For these points, the true value of the resource level must be known (for computing the error with respect to its estimated value), but it cannot be used by the algorithm.
The main method, making use of a set of auxiliary functions, is provided in Algorithm 1.
Algorithm 2 uses the assessment of the attractiveness on a grid of points in to determine a trip, i.e., a list of points to visit and probe. That list is constructed in a greedy way, by determining which is the currently most attractive point, and attempting to add it to the trip. This is done by checking if a traveling salesman tour including the previous points and the current one can still be done within the time limit (we use an implementation of the algorithm described in Lin and Kernighan (1973) for quickly finding a tour; if its length is feasible, the solution is immediately returned, otherwise the exact model available in Kubo et al (2012) is used to find the optimal solution with a generalpurpose mixed integer programming solver). After a new point is added to the tour, it is conjectured that a simulation using the latest available Gaussian process provides the “true” evaluation , and based on the new speculative datum the attractiveness allover is recomputed.
Finally, a method for evaluating attractiveness is provided in Algorithm 3.
4 Computational results
4.1 Benchmark instances used
In order to assess the quality of the method proposed for solving this problem, we have devised a set of artificial benchmark instances (see Appendix A for details on their characteristics).
A solution to our problem consisting of two parts: a sequence of locations for points where to collect samples (aiming at having maximum information collected), such that the total probing and travel time does not exceed ; and an estimation of the level of information predicted by the model on each of the requested points .
The evaluation of a method for solving this problem is firstly based on feasibility: the orienteering tour is checked by verifying that the duration of tour does not exceed . Then, the final solution quality is measured by , as an approximation of .
4.2 Results obtained
Tables 2 to 7 report the results of the computational experiment executed. Instances for the following cases have been solved: 16 previously known points on a regular grid (Table 2) and on random positions (Table 3); instances with 49 points on a regular grid (Table 4) and on random positions (Table 5); and instances with 100 points on a regular grid (Table 6) and on random positions (Table 7). Instances 1 to 5 have an increasing number of local maxima, but are relatively smooth; instances 6 to 10 are less so, with some of them having rather narrow local maxima. Parameters used in the Gaussian process were roughly tuned using benchmark instances f1 to f5; hence, these instances can be seen as the training set, and instances f6 to f10 as the test set.
In order to have an assessment of the quality of Algorithm 2, we compare it to a simple grid search, dividing the time available for exploration into probes on a regular grid, simply avoiding new observations on points for which there was already information available (notice that when the previously available information is very scarce, searching on a grid is a sensible strategy). Both methods use the same Gaussian process for regression; hence, their initial solution is the same.
As expected, increasing the number of initial points with information available increases the quality of the initial estimation with a Gaussian process; this can be observed in the first column of tables 2 to 7. Having those points disposed in a grid is usually preferable, but this is not a general pattern. The estimation is much improved when new data becomes available (an exception is observed on instance f7 with 100 initial points randomly spread, where adding more data on a regular grid strangely decreased the quality of the final estimation; this is likely to be a limitation of the optimizer used for training the Gaussian processes).
The main results are summarized in Table 1, where we can observe that the orienteering method proposed in Algorithm 2 is generally superior to grid search. This superiority increases with the quantity of initial points available, indicating that, e.g., only 16 points is not enough information for preparing a probing plan based on those data.
Benchmarks  Grid search  Orienteering 

16grid  4  6 
49grid  2  8 
100grid  2  8 
16random  6  4 
49random  5  5 
100random  1  9 
Name  Initial  Grid  Orienteering 

f1  
f2  
f3  
f4  
f5  
f6  
f7  
f8  
f9  
f10 
Name  Initial  Grid  Orienteering 

f1  
f2  
f3  
f4  
f5  
f6  
f7  
f8  
f9  
f10 
Name  Initial  Grid  Orienteering 

f1  
f2  
f3  
f4  
f5  
f6  
f7  
f8  
f9  
f10 
Name  Initial  Grid  Orienteering 

f1  
f2  
f3  
f4  
f5  
f6  
f7  
f8  
f9  
f10 
Name  Initial  Grid  Orienteering 

f1  
f2  
f3  
f4  
f5  
f6  
f7  
f8  
f9  
f10 
Name  Initial  Grid  Orienteering 

f1  
f2  
f3  
f4  
f5  
f6  
f7  
f8  
f9  
f10 
In order to visualize the improvements that are obtained by probing, we have prepared figures 1 and 2, each with the true function, the initial estimation (in the cases shown, with 16 points), and with the estimation after probing with the orienteering tour. As can be seen, the application of our method results in a clear improvement on the approximation of the true function.
5 Conclusions
This paper describes a problem arising in sea exploration, where the aim is to decide the schedule of a ship expedition for collecting information about the resources of the seafloor. The setting involves the simultaneous use of tools from machine learning and combinatorial optimization. We propose a method for its solution, dividing the process in three phases: assessment (building an indicator function that associates to each point in the sea a value for the interest in probing it), planning (deciding on the position of points to probe in the next trip) and estimation (predicting a value of the resource level at any point on the surface). The results obtained indicate that using the method here proposed clearly improves the quality of the estimation, by probing at promising points and adding the newly collected information to a regression step, which is based on Gaussian processes.
An interesting direction for future research is adapting the algorithm proposed here in order to deal with realtime changes in the path, reacting in the most appropriate way to information available as new observations arrive.
Acknowledgements
This work was partially supported by project "Coral  Sustainable Ocean Exploitation: Tools and Sensors/NORTE01 0145FEDER000036", financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF).
References
 Anagnostopoulos et al (2017) Anagnostopoulos A, Atassi R, Becchetti L, Fazzone A, Silvestri F (2017) Tour recommendation for groups. Data Mining and Knowledge Discovery 31(5):1157–1188
 Chekuri et al (2012) Chekuri C, Korula N, Pál M (2012) Improved algorithms for orienteering and related problems. ACM Transactions on Algorithms (TALG) 8(3):23
 Golden et al (1981) Golden B, Levy L, Dahl R (1981) Two generalizations of the traveling salesman problem. Omega 9(4):439 – 441
 Krige (1951) Krige DG (1951) A statistical approach to some basic mine valuation problems on the witwatersrand. Journal of the Southern African Institute of Mining and Metallurgy 52(6):119–139

Kubo et al (2012)
Kubo M, Pedroso JP, Muramatsu M, Rais A (2012) Mathematical Optimization: Solving Problems using Gurobi and Python. Kindaikagakusha, Tokyo, Japan

Lin and Kernighan (1973)
Lin S, Kernighan BW (1973) An effective heuristic algorithm for the travelingsalesman problem. Operations research 21(2):498–516
 Rasmussen and Williams (2005) Rasmussen CE, Williams CKI (2005) Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press
 Yu et al (2016) Yu J, Schwager M, Rus D (2016) Correlated orienteering problem and its application to persistent monitoring tasks. IEEE Transactions on Robotics 32(5):1106–1118
Appendix A Benchmark instances used
A complete description of the benchmark instances used is available at the author’s homepage^{1}^{1}1http://www.dcc.fc.up.pt/~jpp/code/CORAL. In summary, there are 10 different “true” functions that should be guessed, f1 to f10; an illustration of their shapes is provided in figures 1 to 4.
The area being studied is considered to be . The set of points initially available are noiseless evaluations of functions f1 to f10, either in a regular grid (e.g., as in Table 8) or randomly spread in (as in Table 9).
Other relevant data are: the probing time time units; the speed of traveling ; and parameter . The starting and ending nodes are located at coordinates (0,0).
As for the evaluation of the quality of the algorithms, the differences between the true and the predicted values, and , are assessed on points on a mesh .
0.20  0.40  0.60  0.80  

0.2  0.00  0.02  8.21  60.65 
0.4  0.00  0.00  0.15  1.11 
0.6  0.00  0.00  0.00  0.00 
0.8  0.00  0.00  0.00  0.00 
Comments
There are no comments yet.