Maximum Joint Entropy and Information-Based Collaboration of Automated Learning Machines

11/15/2011 ∙ by N. K. Malakar, et al. ∙ 0

We are working to develop automated intelligent agents, which can act and react as learning machines with minimal human intervention. To accomplish this, an intelligent agent is viewed as a question-asking machine, which is designed by coupling the processes of inference and inquiry to form a model-based learning unit. In order to select maximally-informative queries, the intelligent agent needs to be able to compute the relevance of a question. This is accomplished by employing the inquiry calculus, which is dual to the probability calculus, and extends information theory by explicitly requiring context. Here, we consider the interaction between two question-asking intelligent agents, and note that there is a potential information redundancy with respect to the two questions that the agents may choose to pose. We show that the information redundancy is minimized by maximizing the joint entropy of the questions, which simultaneously maximizes the relevance of each question while minimizing the mutual information between them. Maximum joint entropy is therefore an important principle of information-based collaboration, which enables intelligent agents to efficiently learn together.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Present day scientific explorations involve gathering data at an ever-increasing rate, thereby requiring autonomy as a vital part of exploration. For example, remote science operations require automated systems that can both act and react with minimal human intervention. Our vision is to construct an autonomous intelligent instrument system (AIIS) that collects data in an automated fashion, learns from that data, and then, based on the learning goal, decides which new measurements to take. Such a system would constitute a learning machine that could act and react with minimal human intervention. This is made possible by the comprehensive successes of Bayesian inference, the decision theoretic approach to experimental design

Fedorov (1972); Lindley (1956); Bernardo (1979); Loredo and Chernoff (2003), and the development of the inquiry calculus Cox (1979); Fry (2002); Knuth (2003, 2006).

Our efforts to construct such autonomous systems Knuth et al. (2007); Knuth and Center (2010); Malakar and Knuth (2011a)

have considered the process of data collection and the process of learning in two distinct phases: the inquiry phase and inference phase. By coupling these processes of inference and inquiry one can form a model-based learning unit that cyclically collects data and learns from that data by updating its models. At this stage, the inference phase, which is based on Bayesian probability theory, is sufficiently well-understood so that our current focus is on inquiry. For this reason, we tend to view the AIIS as a question-asking machine.

In this paper, we build upon our previous work Knuth et al. (2007); Knuth and Center (2010) and consider the problem of coordinating two question-asking intelligent agents. Without coordination, after each agent has independently solved the presented problem, there will be a redundancy in the information obtained by the two agents. In addition, during the question-asking process, there is a great potential for redundancy in terms of the questions that they pose. We show that collectively this redundancy is minimized at each step of the question-asking process by maximizing the joint entropy of the two questions that the agents plan to ask. This has the tendency to simultaneously maximize the relevance of each of the two questions posed while minimizing the mutual information between them. We illustrate the process via simulation and show that maximization of the joint entropy is an important principle of information-based collaboration, which enables intelligent agents to efficiently learn together.

The Inquiry Calculus

In this section, we briefly review the inquiry calculus. The development of this calculus relies on several order-theoretic notions, which are more thoroughly discussed in papers outlining the theoretical development Knuth (2003, 2006). Central to this development is the concept of a partially ordered set, which is a set of elements in conjunction with a binary ordering relation. Related, is a special case of a partially ordered set called a lattice, which is endowed with a pair of operations called the join and the meet so that the lattice can be thought of as an algebra where the join and meet are algebraic operators. Here we will consider elements that can be described in terms of sets. So that the main concepts can be described in terms of subsets ordered by subset inclusion, the set union (join) and the set intersection (meet).

We consider three spaces: the state space, the hypothesis space, and the inquiry space.

The state space describes the possible states of the system itself. In the situations we will consider, the elements of the state space are mutually exclusive so that in terms of a partially ordered set, they can be represented as an antichain.

The hypothesis space describes what can be known about a system. Its elements are sets of potential states of the system. As such, it is a Boolean lattice (or a Boolean algebra) constructed by taking the power set of the set of states and ordering them according to set inclusion. In this space, the logical OR operation is implemented by set union (join) and the logical AND operation by set intersection (meet). Logical deduction is straightforward in this framework, since implication is implemented by subset inclusion so that a statement in the lattice implies every statement that includes it in terms of subset inclusion. General logical induction is implemented by quantifying degree to which one statement implies another with a real-valued bi-valuation, probability, which quantifies the degree to which one statement implies another.

The inquiry space describes what can be asked about a system. Its elements are sets of statements, which are called questions, such that if a set contains a given statement, then it also contains all the statements that imply it. In this sense, a question can be thought of as a set of potential statements that can be made. It is constructed by taking down-sets of statements and ordering them by set inclusion resulting in a free distributive lattice. Just as some statements imply other statements, some questions answer other questions. Specifically, if question is a subset of question , , then by answering question , we will have necessarily answered question .

Questions, which include all atomic statements as potential answers, are assured to be answerable by a true statement. Cox had termed such questions as real questions Cox (1979). If one considers the sub-space formed from the real questions, the minimal real question is defined as the central issue,

(1)

where and is the statement ‘The system is in state ’. The central issue can then be expressed as the question ‘Is the system in state or in state … or in state ?’ Since it is the minimal real question, answering the central issue will necessarily answer all other real questions.

In practice, however, we cannot always pose the central issue directly. A special class of real questions are the partition questions, which partition the set of answers. For example, given a set of atomic statements indexed by integers through , we can partition this set in ways, where can be defined in terms of a generating function

(2)

which blows up rapidly. For example, for three atomic statements we have the set that can be partitioned as which results in the central issue

(3)

Another possible partitioning is , which represents the binary question ‘Is the system in state or not ?’ denoted

(4)

where and . In this way by answering , , or one has provided the information that the system is not in state . Other partition questions are written similarly.

Valuations are handled in a way that is analogous to probability in the lattice of statements comprising the hypothesis space. However, due to multiple competing constraints, a bi-valuation can only be consistently assigned to the partition sublattice of the real questions. This bi-valuation is called the relevance, and is denoted , which is read as ‘the degree to which answers Knuth (2006). In the special case where , we have that the relevance is maximal, which enables one to choose a grade so that . Otherwise, the relevance takes on a value between and .

The relevance of a partition question depends on the probability of its particular partition of answers. One can show that the relevance of the question with respect to the central issue is given by the entropy of that partition of probabilities Knuth (2006). In the case of the partition question described in (4) we have that

(5)

where is the Shannon entropy, and which is the probability that the system is in state . The proportionality constant is the inverse of the relevance

(6)

so that

(7)

and

(8)

Automated Experimental Design

Previously, we demonstrated a robotic arm, built with the LEGO MINDSTORMS NXT system, capable of autonomously locating and characterizing a white circle on a dark background Knuth et al. (2007); Knuth and Center (2010). Here we aim to extend this problem by introducing two robots that work in a collaborative effort to solve the same problem.

Computing with Questions

The white circle is characterized by three unknown parameters . We are interested in asking questions about the center position of the circle as well as its radius

by taking light intensity measurements centered at locations determined by the inquiry system. Model-based descriptions enable one to make predictions about the outcomes of potential experiments. Given the joint posterior probability of the circle location and radius, one can determine the probability that a given intensity measurement at a position

, will result in a “white” or “black” intensity reading. This is easily done with sampling by maintaining a set of sampled circles and noting how many circles contain the proposed measurement location and would result in a white intensity reading, and how many circles do not contain the measurement circle resulting in a black intensity reading. Such predictions can be made more precise by modeling the spatial sensitivity of the light sensor and computing the predicted numerical result of the sensor given the measurement location and the hypothesized characteristics of the white circle Malakar et al. (2009).

Furthermore, the entropy associated with such a measurement can be computed as the entropy of the probability distribution of predicted measurement intensities. This can be rapidly computed by generating a set of predicted measurements from the set of circles sampled from the posterior. By generating a histogram of this set of predicted intensities, one has a model of the density function of predicted measurements. The entropy of this histogram is computed and serves as an excellent estimate of the entropy associated with the question posed by recording the intensity at a particular measurement location. By computing the entropy associated with a large set of measurement locations, one can create an entropy map based on the sampled circles and the known characteristics of the light sensor. For increased speed, we also have developed an entropy-based search algorithm to intelligently search the entropy space without computing it everywhere

Malakar and Knuth (2011b).

We begin by encoding the questions one might ask in terms of sets of circle parameters. The central issue considers all possible circle parameter values, and in doing so asks the question “Precisely where is the circle?” In practice, this is a finite set since one can only measure to finite precision, and in the simulations we force it to be finite by considering a discrete grid of possible circle center positions and radii. The central issue can be written as

(9)

where each element of the set, such as , represents a potential precise answer to the question. One way to solve this problem is to simply ask all of the binary questions ‘Is the circle in state ?’ However, this is not very efficient. Moreover, faced with measurement uncertainties, we do not know the exact answer to Eq. (9), as we cannot measure the exact values of the parameters of interest.

Since we cannot directly perform a single, or even a small number, of measurements that directly answer the central issue. Instead, we must identify measurements that can be performed that are maximally relevant to the central issue. This involves finding measurement locations that have the maximum entropy as computed from the posterior probability of the circle states.

We note that any given measurement location divides the space of circles into two regions: the set of circles that contain the measurement location, and the set of circles that do not contain the measurement location

(10)

Similarly, a second robot choosing a different measurement location, , partitions the question space differently into two sets defining a different binary partition.

Jointly, the two distinct measurement locations partition the space of circles into four regions, say , , and :

(11)

where, for example, refers to the set of circles that contain so that a measurement there will be predicted to result in a white intensity, but do not contain so that a measurement there will be predicted to result in a black intensity. The circles where the first robot measures white belong to the set . We can then define the elementary questions as

(12)

and write the question that the first robot poses as

(13)

and the question the second robot poses as

(14)

where the expressions on the right illustrate what the robots are measuring with signifying either black or white.

Jointly the robots partition the space into four sets,

(15)

Therefore, the relevance of the joint question, with respect to the central issue, is given by the joint entropy of the predictions of the two measurements and Malakar and Knuth (2011a)

(16)

where, given that robot 1 measures at and robot 2 measures at , denotes the probability that both the first and the second robot’s measurement locations result in a white intensity, denotes the probability that the first measurement results in white and the second in black, denotes the probability that the first measurement results in black and the second in white, and denotes the probability that both the first and the second measurements result in black. Considered jointly, the predicted measurement results associated with the pair of measurement locations constitute a two-dimensional distribution at each point in the four-dimensional space of pairs of measurement locations. The relevance dictates that we select measurement locations that maximize the joint entropy of the intensities predicted to be measured by the two robots.

Results

In the present case of model based exploration, given a hypothesized circle location and radius, the intensity to be measured at any point in the field can be predicted. By considering 45 posterior samples, we made predictions about the intensities which gave a distribution of 45 predicted intensities. The entropy associated each possible measurement location was computed by estimating the entropy of the histogram of predicted intensities at that position in the field. This enables us to produce an entropy map for a single proposed measurement. Joint entropy maps would require four dimensions to display. Instead, we plot the joint entropy of the two measurements for the case where the first experiment E1 is determined. This map then represents a two-dimensional slice through the four-dimensional space of pairs of measurements. The mutual information maps (not shown) can be made similarly.

Figure 1: Figures illustrating two set of examples where we implement the information-based collaboration for experimental design in the problem where two robots are to characterize a circle using light sensors. Figures (a) and (b) show the cases where circles are highly correlated, whereas figures (c) and (d) show the cases where circles are less correlated. In both cases, we have drawn a set circles from the posterior samples and used these circles to make predictions about the expected measured light intensity at each point. The top figures (a and c) show the entropy map, which illustrates the optimal measurement location in the case where only one measurement is to be taken. The botom figures (b and d) illustrate the joint entropy map of measurement location shares with measurement location fixed. Note that the selected location of meaurement maximizes the joint entropy, which involves finding an informative measurement location that does not provide information redundant to .

Figure 1 shows two examples, where we considered different degrees of overlap of the sample circles. The figures on the left column represent the case with more correlated circles than those on the right. Each of the figures show a set of circles drawn from the posterior probability. Overlaid on this are the entropy maps (a and c), and the joint entropy maps (b and d).

The entropy maps in (Figures 1a and 1c) show the measurement location that would be selected in the event that only one measurement was being performed. The joint entropy maps (Figure 1b and 1d) show the locations of the second measurement that maximize the relevance of the question given that the location of has been selected. By comparing the locations of in the joint entropy map with the corresponding values of entropy and mutual information, one can see that the selected measurement locations for favor regions of high entropy while avoiding locations that share mutual information with . Maximizing the joint relevance naturally chooses informative measurement locations that promise to provide independent information. The two measurement locations and that maximize the relevance of the joint question are indicated by arrows.

Conclusions and Future Applications

In this paper, we have presented the method of information-based collaboration for Automated Intelligent Instruments System (AIIS). We have considered the intelligent agent as a question-asking machine and have focused on the inquiry phase, where our aim has been to select maximally informative queries with respect to a given goal. We have extended the order-theoretic approach Knuth (2003, 2006) to assign the relevance of questions for collaborative AIIS. We have shown that the joint entropy gives the relevance of the joint question posed by the agents. Maximum joint entropy is an important principle of information-based collaboration, which enables intelligent agents to efficiently learn together.

Currently our team in UTD is working on to develop a fleet of aircrafts to deploy in the field using the technique of collaboration developed in this paper. The aircraft fleet consists of helicopters as well as the fixed wing small aircrafts. We aim to use the fleet to characterize and help predict tornado forecasts, assist with the gas leak detection, and monitor the health of cattle. The work is in progress.

References

  • Fedorov (1972) V. V. Fedorov, Theory of optimal experiments, Probability and mathematical statistics, Academic Press, 1972.
  • Lindley (1956) D. V. Lindley, The Annals of Mathematical Statistics 27, 986–1005 (1956).
  • Bernardo (1979) J. M. Bernardo, The Annals of Statistics 7, 686–690 (1979).
  • Loredo and Chernoff (2003) T. J. Loredo, and D. F. Chernoff, “Bayesian Adaptive Exploration,” in Bayesian Inference and Maximum Entropy Methods in Science and Engineering, edited by G. Erickson, and Y. Zhai, AIP, 2003, pp. 330–346.
  • Cox (1979) R. T. Cox, “Of inference and inquiry, an essay in inductive logic,” in The Maximum Entropy Formalism, edited by R. D. Levine, and M. Tribus, The MIT Press, Cambridge, 1979, pp. 119–167.
  • Fry (2002) R. L. Fry, “The engineering of cybernetic systems,” in Bayesian Inference and Maximum Entropy Methods in Science and Engineering, edited by R. L. Fry, AIP, 2002, vol. 617, pp. 497–528.
  • Knuth (2003) K. H. Knuth, “What is a question?,” in Bayesian Inference and Maximum Entropy Methods in Science and Engineering, edited by C. Williams, AIP, 2003, pp. 227–242.
  • Knuth (2006) K. H. Knuth, “Valuations on lattices and their application to information theory,” in Proceedings of the 2006 IEEE World Congress on Computational Intelligence (IEEE WCCI ), 2006.
  • Knuth et al. (2007) K. H. Knuth, P. M. Erner, and S. Frasso, “Designing Intelligent Instruments,” in Bayesian Inference and Maximum Entropy Methods in Science and Engineering, edited by K. Knuth, A. Caticha, J. Center, A. Giffin, and C. Rodriguez, AIP, 2007, vol. 954, pp. 203–211.
  • Knuth and Center (2010) K. H. Knuth, and J. L. Center, “Autonomous science platforms and question-asking machines,” in 2nd International Workshop on Cognitive Information Processing (CIP), 2010, pp. 221 –226, ISSN 2150-4938.
  • Malakar and Knuth (2011a) N. K. Malakar, and K. H. Knuth (2011a), in preparation.
  • Malakar et al. (2009) N. K. Malakar, A. J. Mesiti, and K. H. Knuth, “The spatial sensitivity function of a light sensor,” in Bayesian Inference and Maximum Entropy Methods in Science and Engineering, edited by P. Goggans, and C.-Y. Chan, AIP, New York, 2009, pp. 352–359.
  • Malakar and Knuth (2011b) N. K. Malakar, and K. H. Knuth, “Entropy-Based Search Algorithm for Experimental Design,” in Bayesian Inference and Maximum Entropy Methods in Science and Engineering, edited by A. M. Djafari, J. F. Bercher, and P. Bessiére, AIP, 2011b, vol. 1305, pp. 157–164.