Problems of decision-making under uncertainty frequently contain cases where information can be obtained using some costly actions, called measurement actions. In order to act rationally in the decision-theoretic sense, measurement plans are typically optimized based on some form of value of information (VOI). Computing VOI can also be computationally intensive. Since frequently an exact VOI is not needed in order to proceed (e.g. it is sufficient to determine that the VOI of a certain measurement is much lower than that of another measurement, at a certain point in time), significant computational resources can be saved by controlling the resources used for estimating the VOI. This paper examines this tradeoff via a case study of measurement selection.
In general, computation of value of information (VOI), even under the commonly used simplifying myopic assumption, involves multidimensional integration of a general function [Russell and Wefald, 1991]. For some problems, the integral can be computed efficiently [Russell and Wefald, 1989]; but when the utility function is computationally intensive or when a non-myopic estimate is used, the time required to compute the value of information can be significant [Heckerman et al., 1993] [Bilgic and Getoor, 2007] and must be taken into account while computing the net value of information. This paper presents and analyzes an extension of the known greedy algorithm that decides when to recompute VOI of each of the measurements based on the principles of limited rationality [Russell and Wefald, 1991].
Although it may be possible to use this idea in more general settings, this paper mainly examines on-line most informative measurement selection [Krause and Guestrin, 2007] [Bilgic and Getoor, 2007], an approach which is commonly used to solve problems of optimization under uncertainty [Zheng et al., 2005] [Krause et al., 2008]. Since this approach assumes that the computation time required to select the most informative measurement is negligible compared to the measurement time[Russell and Wefald, 1991], it is important in this setting to ascertain that VOI estimation indeed does not consume excessive computational resources.
2 The Measurement Selection Problem
As our case study, we examine the following optimization problem. Given:
A set of items .
A set of item features . (Each feature has a domain .)
A set of measurement types , with potentially different intrinsic measurement cost and observation distribution , conditional on the true feature values, for each measurement type.
A utility function on features. In the simplest case, there is just one real-valued feature, acting as the item’s utility value, and is simply the identity function.
A measurement budget .
Find a policy of measurement decisions and a final selection that maximize the expected net utility of the selection (the expected reward):
where is the performed measurement sequence and is the selected item. A next measurement is selected on-line, after the outcomes of all preceding measurements are known.
The above selection problem is intractable, and is therefore commonly solved approximately using a greedy heuristic algorithm. The greedy algorithm selects a measurementwith the greatest net value of information . The net value of information is the difference between the intrinsic value of information and the measurement cost.
The intrinsic value of information is the expected difference in the true utility of the finally selected item after and before the measurement.
The pseudocode for the algorithm is presented as Algorithm 1.
At each step, the algorithm recomputes the value of information estimate of every measurement. The assumptions behind the greedy algorithm are justified when the cost of selecting a next measurement is negligible compared to the measurement cost. However, optimization problems with hundreds and thousands of items are common [Tolpin and Shimony, 2010]; and even if the value of information of a single measurement can be computed efficiently [Russell and Wefald, 1989], the cost of estimating the value of information of all measurements becomes comparable to and outgrows the cost of performing a measurement.
Recomputation of the value of information for every measurement is often unnecessary, especially when using the ”blinkered” scheme [Tolpin and Shimony, 2010], a greedy algorithm which attempts to also compute VOI for sequences of measurements of the same type. When there are many different measurements, the value of information of most measurements is unlikely to change abruptly due to just one other measurement results. With an appropriate uncertainty model, it can be shown that the VOI of only a few of the measurements must be recomputed after each measurement, thus decreasing the computation time and ensuring that the greedy algorithm exhibits a more rational behavior w.r.t. computational resources.
3 Rational Computation of Value of Information
For the selective VOI recomputation, the belief about the intrinsic value of information of measurement:
After a measurement is performed, and the beliefs about the item features are updated (line 13 of Algorithm 1), the belief about becomes less certain. Under the assumption that the influence of each measurement on the value of information of other measurements is independent of influence of any other measurement, the uncertainty is expressed by adding Gaussian noise with variance to the belief:
When of measurement is computed, becomes exact (). At the beginning of the algorithm, the beliefs about the intrinsic value of information of measurements are computed from the initial beliefs about item features.
In the algorithm that recomputes the value of information selectively, the initial beliefs about the intrinsic value of information are computed immediately after line 2 in Algorithm 1, and lines 6–11 of Algorithm 1 are replaced by Algorithm 2.
While the number of iterations in lines 7–12 of Algorithm 2 is the same as in lines 6–10 of Algorithm 1, is efficiently computable, and the subset of measurements for which the value of information is computed in line 15 of Algorithm 2 is controlled by the computation cost :
where is the highest value of information if any but the highest value of information is recomputed, and the next to highest value of information if the highest value of information is recomputed;
is the Gaussian cumulative probability offor .
3.1 Obtaining Uncertainty Parameters
Uncertainty variance can be learned as a function of the total cost of performed measurements, either off-line from earlier runs on the same class of problems, or on-line. Learning on-line from earlier VOI recomputations proved to be robust and easy to implement: is initialized to and gradually updated with each recomputation of the value of information.
4 Empirical Evaluation
Experiments in this section compare performance of the algorithm that recomputes the value of information selectively with the original algorithm in which the value of information of every measurement is recomputed at every step. Two of the problems evaluated in [Tolpin and Shimony, 2010] are considered: noisy Ackley function maximization and SVM parameter search
. For each of the optimization problems, plots of the number of VOI recomputations, the reward, the intrinsic utility, and the total cost of measurements are presented. The results are averaged for multiple (100) runs of each experiment, such that the standard deviation of the reward isof the mean reward. In the plots, the solid line corresponds to the rationally recomputing algorithm, the dashed line corresponds to the original algorithm, and the dotted line corresponds to the algorithm that selects measurements randomly and performs the same number of measurements as the rationally recomputing algorithm for the given computation cost . Since, as can be derived from (6), the computation time of the rationally recomputing algorithm decreases with the logarithm of the computation cost , , the computation cost axis is scaled logarithmically.
4.1 The Ackley Function
In the optimization problem, the utility function is , the measurements are normally distributed around the true values with variance , and the measurement cost is . There are uniform dependencies with in both directions of the coordinate grid with a step of along each axis.
4.2 SVM Parameter Search
The results for the myopic scheme are presented in Figure 2.
4.3 Discussion of Results
In all experiments, a significant decrease in the computation time is achieved with only slight degradation of the reward; performance of the rationally recomputing algorithm decreases slowly with the computation cost and exceeds performance of the algorithm that makes random measurements even when VOI for only a small fraction of measurements is recomputed at each step. Exact dependency of performance of the rationally recomputing of algorithm on the intensity of VOI recomputations varies among problems and depends both on the problem properties and on the VOI estimate used in the algorithm.
The paper proposes an improvement to a widely used class of VOI-based optimization algorithms. The improvement allows to decrease the computation time while only slightly affecting the performance. The proposed algorithm rationally reuses computations of VOI and recomputes VOI only for measurements for which a change in VOI is likely to affect the choice of the next measurement.
The research is partially supported by the IMG4 Consortium under the MAGNET program of the Israeli Ministry of Trade and Industry, by Israel Science Foundation grant 305/09, by the Lynn and William Frankel Center for Computer Sciences, and by the Paul Ivanier Center for Robotics Research and Production Management.
- [Ackley, 1987] Ackley, D. H. (1987). A connectionist machine for genetic hillclimbing. Kluwer Academic Publishers, Norwell, MA, USA.
- [Bilgic and Getoor, 2007] Bilgic, M. and Getoor, L. (2007). Voila: Efficient feature-value acquisition for classification. In AAAI, pages 1225–1230. AAAI Press.
- [Heckerman et al., 1993] Heckerman, D., Horvitz, E., and Middleton, B. (1993). An approximate nonmyopic computation for value of information. IEEE Trans. Pattern Anal. Mach. Intell., 15(3):292–298.
- [Krause and Guestrin, 2007] Krause, A. and Guestrin, C. (2007). Near-optimal observation selection using submodular functions. In AAAI, pages 1650–1654.
- [Krause et al., 2008] Krause, A., Leskovec, J., Guestrin, C., VanBriesen, J., and Faloutsos, C. (2008). Efficient sensor placement optimization for securing large water distribution networks. Journal of Water Resources Planning and Management, 134(6):516–526. (Draft; full version available here).
- [Russell and Wefald, 1989] Russell, S. J. and Wefald, E. (1989). On optimal game-tree search using rational meta-reasoning. In IJCAI, pages 334–340.
- [Russell and Wefald, 1991] Russell, S. J. and Wefald, E. (1991). Do the right thing: studies in limited rationality. MIT Press, Cambridge, MA, USA.
- [Tolpin and Shimony, 2010] Tolpin, D. and Shimony, S. E. (2010). Semi-myopic measurement selection for optimization under uncertainty. Technical Report 10-01, Lynne and William Frankel Center for Computer Science at Ben Gurion University of the Negev, Israel.
- [wei Hsu et al., 2003] wei Hsu, C., chung Chang, C., and jen Lin, C. (2003). A practical guide to support vector classification. Technical report.
- [Zheng et al., 2005] Zheng, A. X., Rish, I., and Beygelzimer, A. (2005). Efficient test selection in active diagnosis via entropy approximation. In UAI, pages 675–682.