Log In Sign Up

The Value of Interaction in Data Intelligence

In human computer interaction (HCI), it is common to evaluate the value of HCI designs, techniques, devices, and systems in terms of their benefit to users. It is less common to discuss the benefit of HCI to computers. Every HCI task allows a computer to receive some data from the user. In many situations, the data received by the computer embodies human knowledge and intelligence in handling complex problems, and/or some critical information without which the computer cannot proceed. In this paper, we present an information-theoretic framework for quantifying the knowledge received by the computer from its users via HCI. We apply information-theoretic measures to some common HCI tasks as well as HCI tasks in complex data intelligence processes. We formalize the methods for estimating such quantities analytically and measuring them empirically. Using theoretical reasoning, we can confirm the significant but often undervalued role of HCI in data intelligence workflows.


Co-adaptation in a Handwriting Recognition System

Handwriting is a natural and versatile method for human-computer interac...

A Practical Unified Notation for Information-Theoretic Quantities in ML

Information theory is of importance to machine learning, but the notatio...

Cost-Benefit Analysis of Data Intelligence – Its Broader Interpretations

The core of data science is our fundamental understanding about data int...

A Short Introduction to Information-Theoretic Cost-Benefit Analysis

This arXiv report provides a short introduction to the information-theor...

Implementation of Hand Detection based Techniques for Human Computer Interaction

The computer industry is developing at a fast pace. With this developmen...

A Partial Information Decomposition Based on Causal Tensors

We propose a partial information decomposition based on the newly introd...

Unveiling the link between logical fallacies and web persuasion

In the last decade Human-Computer Interaction (HCI) has started to focus...

1. Introduction

Data intelligence

is an encompassing term for processes that transform data to decisions or knowledge, such as statistical inference, algorithmic analysis, data visualization, machine learning, business intelligence, numerical modelling, computational simulation, prediction, and decision making. While many of these processes are propelled by the desire for automation, human-computer interaction (HCI) has been and is still playing valuable roles in almost all nontrivial data intelligence workflows. However, the benefits of HCI in a data intelligence workflow are often much more difficult to measure and quantify than its costs and disadvantages. This inevitably leads to a more fervent drive for replacing humans with machines in data intelligence.

In information theory, the Data Processing Inequality (DPI) is a proven theorem. It states that fully automated post-processing of data can only lose but not increase information. As Cover and Thomas explained (Cover and Thomas, 2006), “No clever manipulation of data can improve the inferences that can be made from the data.” In most data intelligence workflows, since the original data space contains much more variations (in terms of entropy) than the decision space, the loss of information is not only inevitable but also can be very significant (Chen and Golan, 2016).

In the context of data visualization, Chen and Jänicke first pointed out that HCI alleviates the undesirable bottleneck of DPI because the mathematical conditions for proving DPI are no longer satisfied with the presence of HCI. As illustrated in Figure 1(a), the proof of DPI assumes that (i) each process in a workflow must receive data only from its proceeding process, and (ii) the output of a process must depend only on the output of its proceeding process. As illustrated in Figure 1(b), any human inputs based on human knowledge (e.g., the variation of context and task) violate the first condition. Meanwhile any human inputs based on observing the previous processes in the workflow (e.g., the details being filtered out or aggregated) violate both conditions. Therefore, if we can quantitatively estimate or measure the amount of information passing from humans to the otherwise automated processes, we can better appreciate the value of interaction in data intelligence.

In this paper, we present an information-theoretic framework for measuring the knowledge received by a computational process from human users via HCI. It includes several fundamental measures that can be applied to a wide range of HCI modalities as well as the recently discovered cost-benefit metric for analyzing data intelligence workflows (Chen and Golan, 2016). We describe the general method for estimating the amount of human knowledge delivered using HCI. We outline the general design for an empirical study to detect and measure human knowledge used in data intelligence. With these theoretical contributions, we can explore the value of HCI from the perspective of assisting computers, which differs from the commonly-adopted focuses on assisting human users.

2. Related Work

In the field of HCI, the term of “value” has several commonly-used referents, including (a) worth in usefulness, utility, benefit, merit, or importance, (b) monetary, material, developmental, or operational cost, (c) a quantity that can be measured, estimated, calculated, or computed, and (d) a principle or standard in the context of moral or ethics. In this paper, we examine the value of HCI processes primarily in terms of (a) and (c) with some supplemental discussions on (b). Readers who are interested in (d) may consult other works in the literature (e.g., (Shilton, 2018; Giaccardi, 2011; Smith et al., 2014; Rotondo and Freier, 2010)).

Most research effort in HCI has been devoted to bring about the usefulness and benefits to humans. The goals of HCI and the criteria for good HCI are typically expressed as “support people so that they can carry out their activities productively and safely(Preece et al., 1994); “effective to use, efficient to use, safe to use, having good quality, easy to learn, easy to remember how to use(Preece et al., 2015); “time to learn, speed of performance, rate of errors by users, retention over time, subjective satisfaction(Shneiderman et al., 2010); and “useful, usable, used(Dix et al., 2003). On the contrary, it is less common to discuss the usefulness and benefits of HCI to computers. While there is little doubt that the ultimate goal is for computers to assist humans, it will be helpful to understand and measure how much a computer needs to be assisted by its users before becoming useful, usable, and used. It is hence desirable to complement the existing discourses on value-centered designs (e.g., (Friedman, 1996; Cockton, 2004; Bias and Mayhew, 2005; Light et al., 2005; Rotondo and Freier, 2010)) by examining the value of HCI from the perspective of computers.

It is also feasible to develop quantitative methods for measuring and estimating how much a computer needs to know since we can investigate the inner “mind” of a computer program more easily than that of human users. The field of HCI has benefited from a wide-range of quantitative and qualitative research methods (Helander et al., 1997; Diaper and Stanton, 2003; Cairns and Cox, 2008; Lazar et al., 2010; Purchase, 2012; Jacko, 2012; MacKenzie, 2013; Oulasvirta et al., 2018; Norman and Kirakowski, 2018), including quantitative methods such as formal methods, statistical analysis, cognitive modelling, and so on. This work explores the application of information theory (Shannon, 1948; Cover and Thomas, 2006) in HCI.

Claude Shannon’s landmark article in 1948 (Shannon, 1948)

signifies the birth of information theory. It has been underpinning the fields of data communication, compression, and encryption since. Its applications include physics, biology, neurology, psychology, and computer science (e.g., visualization, computer graphics, computer vision, data mining, and machine learning). The cost-benefit metric used in this work was proposed in 2016

(Chen and Golan, 2016) in the context of visualization and visual analytics. It has been used to prove mathematically the correctness of a major wisdom in HCI, “Overview first, zoom, details-on-demand(Shneiderman, 1996; Chen and Jänicke, 2010; Chen et al., 2016), and to analyze the cost-benefit of different virtual reality applications (Chen et al., 2019). Two recent papers have showed that the metric can be estimated in practical applications (Tam et al., 2017) and be measured using empirical studies (Kijmongkolchai et al., 2017). While, information theory has been applied successfully to visualization (i.e., interaction from computers to humans) (Chen and Jänicke, 2010; Chen et al., 2016), this work focuses on the other direction of HCI (i.e., from humans to computers).

3. Fundamental Measures

From every human action upon a user interface or an HCI device, a computer receives some data, which typically encodes information that a running process on the computer wishes to know, and cannot proceed without. Through such interactions, humans transfer their knowledge to computers. In some cases, the computers learn and retain part of such knowledge (e.g., preference setting and annotation for machine learning). In many other cases, the computers asked blithely for the same or similar information again and again.

In HCI, we all appreciate that measuring the usefulness or benefits of HCI to humans is not a trivial undertaking. In comparison, the amount of knowledge received by a computer from human users via HCI can be measured relatively easily. Under an information-theoretic framework, we can first define several fundamental measures about the information that a computer receives from an input action. We can then use these measures to compare different types of interaction mechanisms, e.g., in terms of the capacity and efficiency for a computer to receive knowledge from users.

3.1. Alphabet and Letter

When a running process on a computer pauses to expect an input from the user, or a thread of the process continuingly samples the states of an input device, all possible input values that can be expected or sampled are valid values of a univariate or multivariate variable. In information theory, this mechanism can be considered in abstraction as a communication channel from a user to a computational process. This variable is referred to as an alphabet, , and these possible values are its letters, .

In a given context (e.g., all uses of an HCI facility), each letter

is associated with a probability of occurrence,

. Before the process receives an input, the process is unsure about which letter will arrive from the channel. The amount of uncertainty is typically measured with the Shannon entropy (Shannon, 1948) of the alphabet:


We can consider alphabets broadly from two different perspectives, the input device and the input action, which are detailed in the following two subsections.

3.2. Input Device Alphabet

An input device alphabet enumerates all possible states of a physical input device, which can be captured by a computational process or thread through sampling. Such devices include keyboard, mouse, touch screen, joystick, game controller, VR glove, camera (e.g., for gestures), microphone (e.g., for voices), and many more. Most of these devices feature multivariate states, each of which is a letter in the input device alphabet corresponding to the device.

For example, the instantaneous state of a simple 2D pointing device may record four values: its current location in - relative to a reference point, the activation status of its left and right buttons. The instantaneous state of a conventional keyboard may consist of 80-120 variables, one for each of its keys, assuming that simultaneous key activations are all recorded before being sequentialized and mapped to one or more key inputs in an input action alphabet (to be discussed in the next subsection).

The design of an input device may involve many human and hardware factors. Among them, the frequency or probability, in which a variable (e.g., a key, a button, a sensor, etc.) changes its status, is a major design consideration. Hence, this particular design consideration is mathematically underpinned by the measurement of Shannon entropy. The common wisdom is to assign a lower operational cost (e.g., speed and convenience) to a variable that is more frequently changed (e.g., a more frequently-used button). This is conceptually similar to entropic coding schemes such as Huffman encoding in data communication (Huffman, 1952).

However, the sampling mechanism for an input device usually assumes an equal probability for all of its variables. From the perspective of the device, a variable may change at any moment, and all letters (i.e., states) in the alphabet have the same probability. This represents the maximal uncertainty about what is the next state of the input device, as well as the maximal amount of information that the device can deliver. For an input device alphabet

with letters and each letter has a probability of , this maximal quantity is the maximum of the Shannon entropy in Eq. (1), i.e., . We call this quantity the instantaneous device capacity and we denote it as .

For example, for a simple 2D mouse with 2 on-off buttons, operating in conjunction with a display at a 19201080 resolution, its instantaneous device capacity is:

While the notion of instantaneous device capacity may be useful for characterizing input devices with which a sampling process has to be triggered by a user’s action, it is not suited for input devices with a continuing sampling process (e.g., a video camera for gesture recognition). Hence a more general and useful quantity for characterizing all input devices is to define the maximal device capacity over a unit of time. We use “unit: second” for the unit of time in the following discussions. Let be the sampling rate, that is, maximal number of samples that a process can receive from an input device within a second. Assuming that the instantaneous device capacity of the device is invariant for each sample, the bandwidth (cf. bandwidth in data communication) of the device is defined as:

Note: while the instantaneous device capacity is measured in bit, the bandwidth, , is measured in bit per second.

For example, if the sampling rate of the aforementioned mouse is 100 Hz, then its bandwidth is bits/s. Similarly, consider a data glove with 7 sensors with a sampling rate of 200Hz. If its five sensors for finger flexure have 180 valid values each and the pitch and roll sensors have 360 valid values each, its bandwidth is:

3.3. Input Action Alphabet

An input action alphabet enumerates all possible actions that a user can perform for a specific HCI task in order to yield an input meaningful to the computer. Here the phrase “a specific HCI task” stipulates the condition in which the user is aware of what the computer wants to know, e.g., through a textual instruction or a visual cue on the screen or through context awareness based on previous experience or acquired knowledge. The phrase “meaningful to the computer” stipulates the condition in which an action that the computer is not programmed to handle for the specific HCI task should not be included in the input action alphabet.

Consider a simple HCI task of making a selection out of radio buttons. (Multiple choice buttons can also be included in this consideration.) Assume that selecting nothing is not meaningful to the computer. The corresponding input action alphabet is: . When each option is associated with a binary bit in an implementation, the letters in the alphabet can be encoded as a set of -bit binary codewords: . If all options are equally probable, the entropy of the alphabet is . A selection action by the user thus allows the computer to remove bits of uncertainty, or in other words, to gain bits of knowledge from the user for this specific HCI task. We call this quantity the action capacity of the HCI task, which is denoted as .

(a) a 3-letter alphabet (b) a 1-letter alphabet
Figure 2. After selecting a channel from a TV listing, a TV set typically prompts a few options for a user to decide.

In practice, radio buttons featured in many HCI tasks do not have the same probability of being selected. For example, as shown in Figure 2(a), after selecting a channel from a list of current shows, the TV displays an input action alphabet with three options, : “More Event Info”, : “Select Channel”, and : “View HD Alternatives”. The probability of depends on several statistical factors, e.g., how informative is the title in the list, how many users prefer to explore a less-known program via investigational viewing verse how many prefer reading detailed information, and so on. The probability of

depends on how often a user selects a non-HD channel from a TV listing with an intention to view the corresponding HD channel. Different probability distributions for

will lead to different amount of knowledge

. As exemplified by the instances below, the more skewed a distribution is, the less the knowledge is worth or the less action capacity that the HCI task has:

When the probability of a letter in an alphabet becomes 1, the alphabet no longer has any uncertainty. As shown in Figure 2(b), if a TV offers only one optional answer, the device capacity of the corresponding alphabet, , is of 0 bits. We will revisit this example in Section 5.

Similarly, we can apply entropic analysis to check boxes. Consider an input action alphabet that consists of check boxes. With possible combinations. . The alphabet can be encoded using a -bit code, , where each bit, , indicates whether the corresponding checkbox is on or off. If all combinations have the equal probability, the amount of knowledge that computer can gain from the user is bits, which is also the maximum entropy of the alphabet.

We now examine a more complicated type of input actions. Consider an HCI task for drawing a freehand path using a 2D pointing device. Assume the following implementation constraints: (i) the computer can support a maximum sampling points for each path; (ii) the drawing canvas is a rectangular area ; (iii) the points along the path are sampled at a regular time interval , though the computer does not store the time of each sample; and (iv) the sampling commences with the first button-down event and terminates with the subsequent button-up event.

Let be all possible paths that a user may draw using the 2D pointing device, and be a subset of , consisting of all paths with points (). The sub-alphabet thus enumerates all possible paths in the form of where each point is within the rectangular area . If it is possible to select any pixel in the rectangular area for every point input, the total number of possible paths, is , which is also the size of the sub-alphabet .

For example, given a rectangular area, the grand total number of possible paths is:

If all paths have an equal probability, the maximal amount of knowledge that the computer can gain from a user’s freehand drawing action is thus bits when , or slightly more than bits when . For an alphabet of possible paths with up to points, the maximal amount of knowledge, , is slightly more than 360 bits. This is similar to the amount of knowledge that a computer would gain from an HCI action involving radio buttons or 360 checkboxes.

Many data glove devices come with a built-in gesture recognition facility. The gestures that can be recognized by such a device are letters of an input action alphabet . For example, an alphabet may consist of 16 elementary gestures (1 fist, 1 flat hand, and 14 different combinations of figure pointing). The maximum entropy of this alphabet, i.e., the maximal amount of knowledge that can be gained, is bits. If a system using the data glove can recognize a more advanced set of gestures, each of which is comprised of one or two elementary gestures, the advanced alphabet consists of letters.111Note: Repeating the same elementary gesture, e.g., “fist” + “fist”, is considered as one elementary gesture due to the ambiguity in recognition. The maximum entropy is increased to 8 bits. When we begin to study the probability distributions of the elementary gestures and the composite gestures, this is very similar to Shannon’s study of English letters and their compositions in data communication (Shannon, 1951).

(a) The two alphabets during a computer receives an input.

(b) The three abstract measures about a process.

(c) The two transformations from knowledge to action data.

Figure 3. Schematic illustrations of the fundamental components and measures in performing an HCI task.

3.4. Input Device Utilization

As illustrated in Figure 3(a), performing an HCI task involves two interrelated transformations: one is associated with an input device alphabet and another with an input action alphabet; and one characterizes the resources used by the HCI task, and another characterizes the amount of knowledge delivered in relation to the HCI task. The level of utilization of an input device can thus be measured by:

where is the time (in unit: second) taken to perform the HCI task. In general, instead of using an accurate time for each particular HCI task, one can use an average time estimated for a specific category of HCI tasks.

Using the examples in the above two subsections, we can estimate the DU for HCI tasks using radio buttons, check boxes, freehand paths, and gestures. Consider a set of four radio buttons with a uniform probability distribution, a portion of a display screen of pixels, and a simple 2D mouse with 2 on-off buttons and 100Hz sampling rate. Assume that the average input time is 2 seconds, we have:

Following on from the previous discussion on different probability distributions of an input action alphabet, we can easily observe that the more skewed distribution, the lower the action capacity and thereby the lower the DU.

If the same mouse and the same portion of the screen device are used for a set of four check boxes with a uniform probability distribution and we assume that the average input time is 4 seconds, we have:

Consider that the same 100 Hz mouse and the same portion of the screen device are used for drawing a freehand path with a uniform probability distribution for all possible paths. We assume that on average, a freehand path is drawn in 1 second, yielding 100 points along the path. The DU is thus:

For gesture input using the aforementioned 200 Hz data glove, if an elementary gesture on average takes 2 seconds to be recognized with a reasonable certainty, the DU is:

Some HCI tasks require additional display space or other resources for providing users with additional information (e.g., multiple-choice questions) and some do not (e.g., keyboard shortcuts). In the former cases, the resources should normally be included in the consideration of device bandwidth . At the same time, the varying nature of the information should be included in the consideration of the input action alphabet. For instance, for a set of 10 different yes-no questions, the corresponding alphabet actually consists of 20 letters rather than just “yes” and “no”. For more complicated additional information, such as different visualization images, we can extend the definitions in this work, e.g., by combining the (input) device utilization herein with the display space utilization defined in (Chen and Jänicke, 2010).

From the discussions in this section, we can observe that the quantitative measurement allows us to compare the capacity and efficiency of different input devices and HCI tasks in a sense that more or less correlates with our intuition in practice. However, there may also be an uncomfortable sense that the device utilization is typically poor for many input devices and HCI tasks. One cannot help wonder if this would support an argument about having less HCI.

4. Cost-benefit of HCI

Chen and Jänicke raised a similar question about display space utilization (Chen and Jänicke, 2010) when they discovered that quantitatively the better utilization of the display space did not correlate to the better visual design. While they did identify the implication of visualization and interaction upon the mathematical proof of DPI as discussed in Section 1, the effectiveness and efficiency of visualization was not addressed until 2016 when Chen and Golan proposed their information-theoretic metric for analysing the cost-benefit of data intelligence processes (Chen and Golan, 2016). As HCI plays a valuable role in almost all nontrivial data intelligence workflows, we hereby use this metric to address the question about the cost-benefit of HCI.

The cost-benefit metric by Chen and Golan considers three abstract measures that summarize a variety of factors that may influence the effectiveness and efficiency of a data intelligence workflow or individual machine- or human-centric processes in the workflow. The general formulation is:


Given an HCI process, which may represent the completion of an HCI task from start to finish, a micro-step during the execution of an HCI task, or macro-session comprising several HCI tasks, the metric first considers the transformation from the input alphabet to the output alphabet. As given in Eq. (4), this abstract measure is referred to as Alphabet Compression.

Consider a function, , which consists of all actions from the point when a user starts executing a HCI task to the point when a computer stores the information about the input (in terms of the input action alphabet ) and is ready to forward this information to the subsequent computational processes. In information theory, such a function is often referred to as a transformation from one alphabet to another. Alphabet compression measures the entropic difference between the two alphabets, .

As discussed in Section 3, every HCI task is defined by an input action alphabet that captures the essence what a computer would like to know. The computer is uncertain before the transformation, and becomes certain after the transformation. The amount of uncertainty removed equals to the action capacity of the input action alphabet . In terms of Eq. (4), we have since .

At the end of the HCI task, the computer receives an answer from the user, the subsequent alphabet consists of only one letter. Therefore the entropy is 0, and the alphabet compression .

As illustrated in Figure 3(b), alphabet compression measures an quantity about the forward mapping from to . The more entropy is removed, the higher amount of alphabet compression, and hence the higher amount of benefit according to Eq. (4). If we did not have another measure to counter-balance alphabet compression, a computer randomly chooses a ratio button or fails to recognizes a gesture correctly would not have direct impact on the benefit of HCI. Therefore it is necessary to introduce the second abstract measure Potential Distortion

, which is mathematically defined by the Kullback-Leibler divergence

(Kullback and Leibler, 1951).

In a less-theoretical sense, potential distortion measures a quantity for the reverse mapping from to . We use to denote the alphabet resulting from this reverse mapping. has the same set of letters as , but usually a different probability distribution. If a computer can always detect and stores a user’s intended input correctly, the potential distortion is 0. A high value of the potential distortion indicates a high level of inconsistency between and . In information theory, this is the most common way for measuring errors. Readers who are interested in the mathematical definitions of alphabet compression and potential distortion may consult (Cover and Thomas, 2006; Chen and Golan, 2016) for further details.

The third abstract measure is the Cost of the process, which should ideally be a measurement of the energy consumed by a machine- or human-centric process. In practice, this is normally approximated by using time, a monetary quantity, or any other more obtainable measurement. For example, in HCI, we may use the average time, cognitive load, skill levels for a user to perform an HCI task, computational time, or monetary cost of computational resources for recognizing a human action. In fact, if we use device bandwidth as the cost while assuming that the computer always detects and stores the user’s input correctly, the cost-benefit metric is the same as the measure of input device utilization DU.

In fact, we have only examined the second transformation, , for performing a HCI task as depicted in Figure 3(c) where there is less tangible and often unnoticeable first step. Before a user considers an input action alphabet , the user has to take in and reason about various information that may affect an action of HCI. Collectively all possible variations of any information that may be considered for a HCI task are letters in an alphabet, denoted as in Figure 3(c). Hence the first step of “taking in and reasoning about” is, in abstract, a transformation, . As takes place in a user’ mind, it is often unnoticeable. Broadly speaking, may take in the following types of information.

Explicit Prompt. This includes any information that is purposely provided by a computer or a third party for the HCI task concerned, e.g., textual and visual prompts for radio buttons or check boxes, audio questions asked prior to voice-activated commands, instructions from a trainer or a user manual to a trainee, and so forth.

Situational Information. This includes any information provided by a computer or an environment. It is not specifically, but can be used, for the HCI task. This may include the texts or drawings that a user is currently working on when the user issues a “save as” command, and the current sound or lighting quality in a video conference when the user issues a command to switch on or off the video stream.

Soft Knowledge. This includes any information that resides in the user’s mind, which can be called upon to support the HCI task. Tam et al. (Tam et al., 2017) considered two main types of soft knowledge: soft alphabets and soft models. The former encompasses factual knowledge that is not available as explicit prompts or situational information, e.g., the knowledge about keyboard shortcuts, the knowledge about the reliability for the computer to recognize a gesture or voice. The latter encompasses analytical knowledge that can be used to derive information for the HCI task dynamically. For example, a user may assess the levels of risk associated with each radio button (or in general, each letter in ). While the levels of risk are letters of a soft alphabet, the alphabet exists only after a soft model has been executed.

Figure 4 shows an example of an HCI task. A user is editing a .tex file using a word processor (Microsoft Word) because it is easy to zoom in and out. After the user issues a “Ctrl-S” command, the computer displayed a pop-up window of 734140 pixels, with a textual prompt. The input action alphabet has three multiple choice buttons. Hence the maximal benefit that can be brought by the transformation in Figure 3(c) for this case is about 1.58 bits.

Meanwhile, the word processor may have different explicit prompts following a “Ctrl-S” command according to, e.g., the file modification status, the existence of a file with the same name, access permission, etc. A colleague may offer advice as to which button to choose. The display screen may show different situational information, e.g., different documents being edited, and different concurrent windows that may or may not be related to the file being processed. The user may have the soft knowledge that a .tex file is a plain text file, the so-called “features” in the prompt cannot be processed by a LaTeX compiler, the “help” button does not provide useful guidance to this particular way of using the word processor, and so on. As we can see that is not a simple alphabet and has a non-trivial amount of entropy, we can conclude that the two transformations, and , together bring about benefit much more than 1.58 bits.

Figure 4. A simple HCI task may be affected by three types of variables, which are collectively a very complex alphabet.

5. Estimating Cost-Benefit Analytically

The cost-benefit metric described in Section 4 provides HCI with a mean for probabilistic characterization of complex phenomena. While it can be calculated from gathered data about every letter in an alphabet in some simple or highly controlled settings (e.g., see Section 6), it is more practical to estimate the three measures in real-world applications, e.g., for comparing different user interface designs or evaluating HCI facilities in a data intelligence workflow.222In thermodynamics, the notion of entropy provides a microscopic measure, reflecting the fundamental understanding about thermodynamic phenomena. It is typically estimated based on macroscopic quantities such as temperature, volume, and pressure that are more easily measureable.

Let us first exemplify the estimation method by revisiting the channel selection scenario in Figure 2 in Section 3. A coarse estimation can be made with an assumption that a user’s selection is always correct. In this case, the potential distortion in Eq. (4) is of 0 bits. The amount of alphabet compression thus equals to the action capacity . Meanwhile, from a usability perspective, the cost can be estimated based on the time, effort, or cognitive load required for selecting each option. For example, in Figure 2(a), the top option is the default selection, and requires only one [OK] action on the remote controller. The second option requires a [] action followed by [OK], while the third option requires three actions: [], [], and [OK]. If the time for each button action is estimated to take 2 seconds, the average cost for this HCI task is:

Using the three example probability distributions for the input action alphabet in Section 3, we can obtain,

Combining with the calculation of in Section 3, we have the cost benefits ratios for the three probability distributions are approximatively , , and bits/s respectively. Hence for the skewed distribution associated with , the cost-benefit is very low.

During the design or evaluation of the TV system, a UX expert may discover that users select “Select Channel” more frequently than the other two options. The UX expert can consider an alternative design by swapping the position of and . With the changes of the corresponding probability distributions, the UX expert can value the improvement of the cost-benefit quantitatively, such as, and .

A more detailed estimation may consider the factor that users may mistakenly press [OK] for the default option. For example, if in 20% cases, users are intended to select but select the default by mistake, there are both potential distortion and extra cost. In the case of , the reconstructed probability distribution is . The potential distortion can be calculated as 0.16 bits. In the case of , the reconstructed probability distribution is . The potential distortion is 0.24 bits. Let the extra time for showing detailed information about a TV show and going back to the original three options is 4 seconds. We can estimate that the extra time in the two cases are: seconds on average. The cost benefit ratio will be reduced as: and . If the mistakes would reach 31% or more, the metric would return a negative value for .

Similarly, one may estimate the user’s effort as the cost by counting the steps needed to perform an action, such as reading the screen, looking at the remote control, and pressing a button. One may also weigh these steps differently based on pre-measured cognitive load for different types of elementary steps, which may be obtained, for instance, using electroencephalography (EEG) (e.g., (Tan and Nijholt, 2010)).

For the example in Figure 2(b), it is easy to observe that the cost-benefit is always 0 since bits, though the cost for pressing [OK] on the remote control may not be considered high. This quantitative measure is consistent with what most UX experts would conclude qualitatively.

Figure 5. (a) The alphabets in an abstract workflow representing a trial in a typical controlled empirical study. (b) The example of a simple “yes-no” trial for detecting and measuring human knowledge in HCI.

The estimation for the channel selection task does not consider any situational information or soft knowledge. When such variables are considered as part of an HCI task, as illustrated in Figure 3(c), the amount of cost-benefit usually increase noticeably. For example, consider the LaTeX example in Figure 4. If the word processor on the left has 5 different pop-up windows in responses to a “save”, “save as”, or “Ctrl-S” command, each with 3 options, the input action alphabet has 15 letters. The maximum alphabet compression for the second process in Figure 3(c) is about 3.9 bits.

On the other hand, when given any one of 10 file types (e.g., .doc, .tex, .txt, .htm, etc.), the user has the knowledge about whether formatting styles matter. There are 10 binary variables or 10 bits of knowledge available. Consider conservatively that on average a user deletes or modifies 10 English letters independently before saving. The user knows whether it is critical to overwrite the existing file when a pop-up window asks for a confirmation. There are 10 nominal variables, each with some 26 valid values for English letters. As the entropy of English alphabet is about 4.7 bits

(Shannon, 1951), the total amount of knowledge available is about 47 bits. Without considering other factors (e.g., digits, symbols, etc.), we can conservatively estimate the amount of alphabet compression for the process in Figure 3(c) is about bits. Let us assume that selecting one of the three options takes 1 second. The cost-benefit for such a simple HCI task () is at the scale of 57 bits/s.

Tam et al. (Tam et al., 2017)

estimated the amount of human knowledge available to two interactive machine learning workflows. Both workflows were designed to build decision tree models, one for classifying facial expression in videos and other for classifying types of visualization images. They were curious by the facts that the interactive workflows resulted in more accurate classifiers than fully automated workflows. They estimated the amount of human knowledge available to the two workflows. Using the approach exemplified by the LaTeX example, they identified 9 types of soft knowledge in the facial expression workflow and 8 types in the other. In both cases, there were several thousands of bits of knowledge available to the computational processes in the workflows.

6. Measuring Cost-Benefit Empirically

As the cost-benefit metric described in Section 4 is relatively new, there has been only one reported empirical study attempting to measure the three quantities in the metric. Kijmongkolchai et al. (Kijmongkolchai et al., 2017) conducted a study to measure the cost-benefit of three types of soft knowledge used during visualizing time series data. This includes the knowledge about (i) the context (e.g., about an electrocardiogram but not weather temperature or stock market data), (ii) the pattern to be identified (e.g., slowly trending down), and (iii) the statistical measure to be matched.

The knowledge concerned can be considered as the transformation in Figure 3(c), while the user’s input to answer the trial questions can be considered the transformation . They converted the conventional measures of accuracy and response time to that of benefit and cost in Eq. (4).

In (Kijmongkolchai et al., 2017), Kijmongkolchai et al. described briefly the translation from (accuracy, response time) to (benefit, cost) with the support of a supplementary spreadsheet. Here we generalize and formalize their study, and present a conceptual design that can be used as a template for other empirical studies for detecting and measuring humans’ soft knowledge in HCI.

Consider a common design for a controlled experiment, in which an apparatus presents a stimulus to participants in each trial, poses a question or gives the input requirement, and asks them to make a decision or perform an HCI action. The participants’ action in response to the stimulus and input requirement is a human-centric form of data intelligence. Figure 5(a) illustrates the workflow of such a trial.

A stimulus may comprise of textual, visual, audio, and other forms of data as the input to the process. Normally, one would consider the alphabet, , containing only the stimuli designed for a trial or a set of trials for which participants’ responses can be aggregated. However, if the pre-designed stimuli are unknown to participants, one must consider that consisting of all possible stimuli that could be presented to the participants. For instance, the design of a study may involve only 64 pairs of colors and ask users to determine which is brighter. Since any pairing of two colors are possible, actually consists of letters where is the number of colors that can be shown on the study apparatus. On a 24-bit color monitor, and . Hence the entropy of is usually very high.

On the other hand, the participants’ inputs are commonly facilitated by multiple-choice buttons, radio buttons, or slide bars, which have a smaller alphabet . In some studies, more complicated inputs, e.g., spatial locations and text fields, are used. Nevertheless, for any quantitative analysis, such inputs will be aggregated, or grouped into, to a set of post-processed letters in a smaller alphabet . It is not difficult to notice that or is essentially an input action alphabet .

Once a participant has made a decision, the alphabet consists of only one letter and is of entropy 0 bits. However, after one merges all repeated measures and responses from different participants in analysis, is expected to contain difference decisions, each is associated with its number or frequency of occurrence.

From the perspective of interaction, the alphabet has different probability distributions at different stages. Before and after the stimulus presentation stage, has a ground truth for each trial, and thus one letter has the full probability 1. After the question stage, the letters in are pretended to have an equal probability . After the decision stage, only one letter in is chosen, which thus has the full probability 1. After the aggregation stage, has a probability distribution reflecting all repeated measures and all participants’ responses.

The humans’ soft knowledge used in the transformation from to and to can be very complicated. The amount of alphabet compression can be huge. Nevertheless, the essence of any controlled experiment is to investigate one or a few aspects of this soft knowledge while restricting the variations of many other aspects. Here we refer one particular aspect under investigation as a sub-model,

, which may be a heuristic function for extracting a feature or factor from the stimulus or for retrieving a piece of information that is not in the stimulus.

Let us first examine a very simple “yes-no” trial designed to investigate if a sub-model has a role to play in the transformation from to . As illustrated in Figure 5(b), at the beginning has two letters . Assuming that is the ground truth, the probabilities are . Here is a tiny small value used in practice to prevent the Kullback-Leibler divergence from handling the conditions of and . is thus capped at a value .

When the question, “yes” or “no”, is posed to each participant, is associated with the probabilities . When a decision is made by a participant, the probability distribution of is either or .

After all related responses are collected, has probabilities . has the maximum amount of entropy of 1 bit, while that of is between 0 and 1 bit depending on . If (e.g., random choices), offers no alphabet compression. If (i.e., all “yes” answers) or (i.e., all “no” answers), contributes bit alphabet compression from to . Without repeated measures, all participants individually achieve the same alphabet compression. We will discuss the case of repeated measures towards the end of this section.

Meanwhile, without repeated measures, the potential distortion has to be estimated using the collective results from all participants. As shown in Figure 5(b), it is measured based on the reverse mapping from to . If all participants have answered “yes”, we have and thus . If all participants have answered “no”, and . In other words, if all answers are correct, the benefit of is of bits. If all answers are incorrect, the benefit is . When is tiny, bits, and the benefit is therefore negative.

Note again that the sub-model is only part of the soft knowledge for transforming a stimuli to a decision. The benefit of is computed with an assumption that the variations of all other aspects of the soft knowledge are minimized by the means for controlling the potential confounding effects.

The aforementioned simple “yes-no” alphabet can be coded using binary codewords as . For an empirical study designed to examine a sub-model at a slightly higher resolution, we can assign bits to the input action alphabet . For example, a 3-bit alphabet can be labelled as . It is necessary to use all letters as choices in order to maximize the entropy of at the question stage. For examining the combined effects of several sub-models, we assign a bit string to each sub-model and then concatenate their bit strings together. For example, to study one 2-bit sub-model and two 1-bit sub-models and , we can have .

Given an input action alphabet with letters (i.e., all possible answers in a trial), one assigns a ground truth in , e.g., and . In conjunction with a stimulus, one poses a question with choices in , which are pretended to have an equal probability of . After participants have answered the question individually, only one letter in is selected, i.e., and . After collecting all related responses, the probability of each letter in is computed based on its frequency in participants’ responses. One can then convert the accuracy and response time to cost-benefit as:


Using the example in (Kijmongkolchai et al., 2017), we have a 3-bit input action alphabet for three sub-models (each with 1-bit resolution). Each of the eight possible answers in is encoded by three bits , where if the sub-model functions correctly, and otherwise. With , they have bits and bits. The alphabet compression for an individual is thus about 2.927 bits. Their study obtained a set of accuracy data, which shows the percentages of eight possible answers in are:

  • 68.3% for letter 111 — , , are all correct.

  • 10.7% for letter 110 — , are correct.

  • 10.3% for letter 011 — , are correct.

  • 4.6% for letter 101 — , are correct.

  • 2.5% for letter 010 — only is correct.

  • 1.7% for letter 001 — only is correct.

  • 1.1% for letter 100 — only is correct.

  • 0.8% for letter 000 — , , are all incorrect.

The potential distortion can thus be computed as bits. The combined benefit of the three sub-models is therefore about 1.335 bits.

The reason that we use to compute alphabet compression in Eq. (3) is based on an assumption that the empirical study simulates a relatively consistent decision process performed by either an individual or an organized team. Repeated measures of each participant are either aggregated first to yield a quasi-consistent measure or fused into the overall statistical measure involving all participants. One should only use instead of when (i) the empirical study is to simulate a random decision process by a team where each time a member is arbitrarily chosen to make a decision, or (ii) to simulates a relatively inconsistent decision process by an individual and the probability distribution of is calculated based on the repeated measures obtained from just one participant.

7. Conclusions

The information-theoretic approach presented in this paper is not a replacement for but an addition to the existing toolbox for supporting the design and evaluation of HCI devices, interfaces, and systems. Because this approach allows us to examine the benefit of HCI to computer, i.e., from a perspective different from the commonly adopted focuses on the benefits to human users, it offers a new tool complementary to the existing qualitative and quantitative methods.

With estimated or measured quantitative values of HCI, we can appreciate more the necessity of HCI, especially in data intelligence workflows. To study the value of HCI is not in any way an attempt to forestall the advancement of technologies such as data mining, machine learning, and artificial intelligence. On the contrary, such research can help us understand better the transformation from human knowledge to computational models, and help us develop better automated processes to be used in data intelligence workflows. As shown in an ontological map by Sacha et al.

(Sacha et al., 2019), many steps in machine learning workflows have benefited, or can benefit, from visualization and interaction. It is indeed not the time to reduce HCI in data intelligence, but to design and provide more cost-beneficial HCI.


  • (1)
  • Bias and Mayhew (2005) R. G. Bias and D. J. Mayhew. 2005. Cost Justifying Usability (2nd ed.). Elsevier, Oxford, UK.
  • Cairns and Cox (2008) P. Cairns and A. L. Cox (Eds.). 2008. Research Methods for Human-Computer Interaction. Cambridge University Press.
  • Chen et al. (2016) M. Chen, M. Feixas, I. Viola, A. Bardera, H.-W. Shen, and M. Sbert. 2016. Information Theory Tools for Visualization. A K Peters/CRC Press.
  • Chen et al. (2019) M. Chen, K. Gaither, N. W. John, and B. McCann. 2019. Cost-benefit analysis of visualization in virtual environments. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019).
  • Chen and Golan (2016) M. Chen and A. Golan. 2016. What May Visualization Processes Optimize? IEEE Transactions on Visualization and Computer Graphics 22, 12 (2016), 2619–2632.
  • Chen and Jänicke (2010) M. Chen and H. Jänicke. 2010. An Information-theoretic Framework for Visualization. IEEE Transactions on Visualization and Computer Graphics 16, 6 (2010), 1206–1215.
  • Cockton (2004) G. Cockton. 2004. Value-Centred HCI. In Proc. Nordic HCI ’04. 149–160.
  • Cover and Thomas (2006) T. M. Cover and J. A. Thomas. 2006. Elements of Information Theory (2nd ed.). John Wiley & Sons.
  • Diaper and Stanton (2003) D. Diaper and N. Stanton (Eds.). 2003. The Handbook of Task Analysis for Human-Computer Interaction. CRC Press.
  • Dix et al. (2003) A. Dix, J. Finlay, G. D. Abowd, and R. Beale. 2003. Human-Computer Interaction (3rd ed.). Prentice Hall.
  • Friedman (1996) B. Friedman. 1996. Value-Sensitive Design. Interactions 3, 6 (1996), 16–23.
  • Giaccardi (2011) E. Giaccardi. 2011. Things We Value. Interactions 18, 1 (2011), 17–21.
  • Helander et al. (1997) M. G. Helander, T. K. Landauer, and P. V. Prabhu (Eds.). 1997. Handbook of Human-Computer Interaction. Elsevier.
  • Huffman (1952) D. A. Huffman. 1952. A Method for the Construction of Minimum-Redundancy Codes. Proceedings of the IRE 40, 9 (1952), 1098–1101.
  • Jacko (2012) J. A. Jacko (Ed.). 2012. Human Computer Interaction Handbook: Fundamentals, Evolving Technologies, and Emerging Applications (3rd ed.). CRC Press.
  • Kijmongkolchai et al. (2017) N. Kijmongkolchai, A. Abdul-Rahman, and M. Chen. 2017. Empirically measuring soft knowledge in visualization. Computer Graphics Forum 36, 3 (2017), 73–85.
  • Kullback and Leibler (1951) S. Kullback and R. A. Leibler. 1951. On information and sufficiency. Annals of Mathematical Statistics 22, 1 (1951), 79–86.
  • Lazar et al. (2010) J. Lazar, J. H. Feng, and H. Hochheiser. 2010. Research Methods in Human-Computer Interaction. Wiley.
  • Light et al. (2005) A. Light, P. J. Wild, A. Dearden, and M. J. Muller. 2005. Quality, Value(s) and Choice: Exploring deeper outcomes for HCI products. In Proc. CHI ’05 (Workshops). 2124–2125.
  • MacKenzie (2013) I. S. MacKenzie. 2013. Human-Computer Interaction: An Empirical Research Perspective. Morgan Kaufmann.
  • Norman and Kirakowski (2018) K. Norman and J. Kirakowski (Eds.). 2018. The Wiley Handbook of Human Computer Interaction. Wiley-Blackwell.
  • Oulasvirta et al. (2018) A. Oulasvirta, P. O. Kristensson, X. Bi, and A. Howes (Eds.). 2018. Computational Interaction. Oxford University Press.
  • Preece et al. (1994) J. Preece, Y. Rogers, H. Sharp, D. Benyon, S. Holland, and T. Carey. 1994. Human-Computer Interaction: Concepts And Design. Addison Wesley.
  • Preece et al. (2015) J. Preece, H. Sharp, and Y. Rogers. 2015. Interaction Design: Beyond Human-Computer Interaction (4th ed.). John Wiley.
  • Purchase (2012) H. C. Purchase. 2012. Experimental Human-Computer Interaction: A Practical Guide with Visual Examples. Cambridge University Press.
  • Rotondo and Freier (2010) A. Rotondo and N. Freier. 2010. The Problem of Defining Values: A Lack of Common Ground Between Industry & Academia?. In Proc. CHI ’2010 (Work-in-Progress). 4183–4188.
  • Sacha et al. (2019) D. Sacha, M. Kraus, D. A. Keim, and M. Chen. 2019. VIS4ML: An ontology for visual analytics assisted machine learning. IEEE Transactions on Visualization and Computer Graphics 25, 1 (2019).
  • Shannon (1948) C. E. Shannon. 1948. A mathematical theory of communication. Bell System Technical Journal 27 (1948), 379–423.
  • Shannon (1951) C. E. Shannon. 1951. Prediction and Entropy of Printed English. Bell System Technical Journal 30 (1951), 50–64.
  • Shilton (2018) K. Shilton. 2018. Values and Ethics in Human-Computer Interaction. Foundations and Trends in Human Computer Interaction 12, 2 (2018), 107–171.
  • Shneiderman (1996) B. Shneiderman. 1996. The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. In Proc. IEEE Symposium on Visual Languages. 336–343.
  • Shneiderman et al. (2010) B. Shneiderman, C. Plaisant, M. Cohen, and S. Jacobs. 2010. Designing the user interface : strategies for effective human-computer interaction. Pearson.
  • Smith et al. (2014) W. Smith, G. Wadley, S. Webber, B. Ploderer, and R. Lederman. 2014. Unbounding the Interaction Design Problem: the Contribution of HCI in Three Interventions for Well-being. In Proc. OzCHI ’14. 392–395.
  • Tam et al. (2017) G. K. L. Tam, V. Kothari, and M. Chen. 2017. An analysis of machine- and human-analytics in classification. IEEE Transactions on Visualization and Computer Graphics 23, 1 (2017), 71–80.
  • Tan and Nijholt (2010) D. S. Tan and A. Nijholt (Eds.). 2010. Brain-Computer Interfaces: Applying our Minds to Human-Computer Interaction. Springer.

Appendix A ACM CHI 2019 Reviews

AC review: score 3/5
Expertise: Knowledgeable

Neutral: I am unable to argue for accepting or rejecting this paper; 3.0

1AC: The Meta-Review

Three reviewers assessed the submission. While they all see some value in the submission, they come to somewhat differing final conclusions. The submission is very well structured [2AC]. Despite the varying scores, all reviewers seem to appreciate the submissions general direction. R3, for example, states that the paper addresses an important but challenging topic and that there are inspirational elements to the paper.

2AC found the content very dense and appreciated how the authors walk the reader through the different steps. R2 & R3, however, criticize that ”straightforward information theoretic definitions” [R2] are described in length. This seems like a general problem for this kind of submission to me. The authors did a good job in making the submission accessible to readers that are not experts in this domain like myself. Just as 2AC, I found the paper very easy to read but just like R2 & R3 I also wonder about the novelty provided by the work. Finding the right balance when adressing different audiences is a major challenge for this kind of work.

In their rebuttal, I would suggest that the authors address the following aspects discussed by the reviewers: 1. Highlight what the authors believe is the grand contribution of the submission (see R2) and provide an example of a more realistic application (see R3 & R2). 2. Discuss how the work differs from related attempts in the specific domains mentioned by R2 and Bayesian models of interactive systems mentioned by R3. 3. Respond to the additional aspects criticized in the reviews and describe how they could be addressed in a revised version of the paper.
+ Briefly clarify aspects unclear to 2AC
+ Clarify the ”costs” in the denominator of the key formula (see R2).
+ Address additional aspects raised by the reviewers.

When preparing the rebuttal, I would recommend to make it as tangible as possible. I.e. CHI has a very tight review process. Thus, it should be as clear as possible how a refined version would look like. The authors’ can, for example, provide short versions of paragraphs that could be almost directly copied into the paper.


2AC review: score 3/5
Expertise: Knowledgeable

. . . Between neutral and possibly accept; 3.5


This paper proposes an information-theoretic approach to measure the cost-benefit of HCI in data intelligence workflows. It has in-depth explanation of the theories, the metric, and applications.

I really enjoy reading this paper. It is very-well structured. The content is very dense, but the examples walk readers through the concepts, math, and ideas.

There are a couple of places that could be improved for an audience without information theory background.

When the ”cost-benefit metric” is first mentioned in Line 178, it would be good to give readers an intuitive idea of its meaning. Does it measure ”the capacity and efficiency for a computer to receive knowledge from users?” It would also be nice to present the intuition behind ”the lower the action capacity Cact and thereby the lower the DU,” although we can see it mathematically (for example, low entropy means high certainty).

I am confused by the definition of value presented in Related Work. It would be nice to provide references to each of value referents. What does value mean in ”value-centered designs?” How does the value definition differ when it is from the perspective of computer versus human? This paper examines value in terms of (a) and (c), but they are different concepts. Under what circumstance does the paper refer to (a) and when it means (c) (in the series of equation for the cost-benefit metric)?

I also find the paragraph of ”Some HCI tasks require additional display space or other resources for providing users with additional information…” a bit confusing. At first I thought it means a bigger screen space so that the mouse has more potential positions, but the yes/no example seems to indicate more radio buttons (interactions). What does the ”the varying nature of the information” mean?

Some statements may need further clarification. For example, ”On the contrary, it is less common to discuss the usefulness and benefits of HCI to computers” – but human computation and human-in-the-loop machine learning are all studying this topic. Why is it necessary to assume in the freehand path example that ”though the computer does not store the time of each sample?”

What is the theoretic upper bound of the cost-benefit metric (like entropy is 0 1), so that one could assess is a design is satisfactory in terms of this measure? In the current example, we could only compare if one action is better than the other or one design is better than before.

In the conclusion, it is said that the proposed approach ”an addition to the existing toolbox for supporting the design and evaluation of HCI devices.” I would like to see more discussion on what is the existing toolbox, and in what way the cost-benefit metric complements it. Do they measure different things? How could they be used together?

Some minor issues:
(Line 352) ‘z3: ”View HD Alternatives”.’ should be ‘a3’
Figure 2 is too small to see
It would be nice to explain the meaning of i in Eq. (2) or refer it to Figure 3.

Overall, this paper is a nice read and very informative.


reviewer 2 review: score 2/5
Expertise: Passing Knowledge

. . . Between possibly reject and neutral; 2.5


This paper aims to contribute to theoretical understanding of interaction, and especially information theoretical analyses of communication between humans and computers. Inspired by the work of Posner and Fitts in investigating the capacity limitations of the human motor control system, HCI researchers have looked at the relationship between information theoretic variables like throughput and interactive human performance in domains like pointing and forced choice. The papers of MacKenzie, Zhai, Seow come to mind.

The submitted paper aims to contribute by proposing to quantify the ”value of interaction” especially in the area of data intelligence. Data intelligence was here defined as all processes that transform data to decisions.

The starting point to the paper is the DPI theorem of Cover and Thomas, according to which post-processing of data can only lose but not increase information. This is intuitive: the decoder cannot add information. The authors refer to a source pointing out (or proving?) that HCI relaxes some assumptions of the Markov Chain in the DPI theorem that change the game. This point is already published, and I see the aimed contribution of this paper in the quantification of the added value.

The paper proceeds to relatively straightforward information theoretic definitions that base on alphabet, entropy, and distortion, in order to characterize the capacity of input devices. This is neat, but the insight obtainable with these is never made very clear. Then, tutorial-like exposition follows with simple examples. I’m afraid that similar treatments have been proposed in studies of pointing, choice, and more recently in intelligent text entry (e.g., probabilistic decoding). I’d like to hear how the paper differs from those.

After this part, I’m afraid, the paper falls short from the promised goals in three ways.

First, the introduced ”costs” in the denominator of the key formula are vague. It is not clear what it means to divide bits with, say, task completion time or workload, as the authors suggest. Perhaps this could be clarified in the rebuttal.

Secondly, in the end, the grand contribution remains ambiguous. I am not sure what the obtained scores imply of ”value of humans to computers” . What would have been obtained by other means? I wish the authors would return to develop a broader, general point about ”data intelligence”, or what this work means for applications of information theory in HCI.

Third , the application of this framework remains unclear. The given examples in the ”empirical” section are hand-crafted and I failed to see neither interesting findings nor a general procedure that would be replicable and rigorous. The examples that are given are often related to simple input interactions. I’m at loss what this all implies to ”data intelligence”.

In sum, while I commend the general ambitious of this paper, I believe that more work is needed in theory, application procedure, and empirical work. I encourage the authors to keep working on this topic.


reviewer 3 review: score 2/5
Expertise: Expert

. . . Between possibly reject and neutral; 2.5


Significance of the paper’s contribution to HCI and the benefit that others can gain from the contribution: ?
The paper addresses an important but challenging topic, how to measure the value of the information provided by human users. The formal information-theoretic representation of the problem is a sensible, and valuable approach. I would like to see more of this style of analysis being accepted at CHI, and there are inspirational elements to the paper. However, once the authors try to get more detailed, the formal clarity dissipates. The discussion around the Data processing inequality appears convoluted. Of course the human is a source of external data for the system, and that needs to be taken into account. This is in no way ‘breaking the DPI’.

The claims around the benefit of HCI to the computer are also a red herring. The computer being able to reduce its uncertainty of the user’s intentions is an essential feature of the computer being able to mean the user’s desires, so reducing the uncertainty in the computer enables us to maximise the utility for the user.

The authors step us through basic inference of statistics at great length, for a relatively simple trial system, but the authors do not demonstrate any convincing examples of practical measurement of human input in a realistically complex setting.

Presentation clarity;
The figures are often unclear. E.g. in Figure 1 the arrow into P1 claims to be interaction but there is only an arrow from the user, not one to them - so is P1 only affected in an open-loop manner? Is that possible? Labelling this as ’HCI breaks the conditions of DPI’ is just nonsense - the human is just another source of information and processing! If you replace/simulate the human with a separate computational model, what would that look like?

The term ‘data intelligence’ although used as a buzz word in industrial sales pitch, appears imprecise in an academic context. This looks like basic statistical inference - why not keep the terminology clean and standard?

The paper has many spelling and grammatical errors, and would have been better proofread by a native speaker. At times this becomes distracting for the reader

Originality of the work, and relevant previous work: The paper comes over as an textbook introduction to the application of information theory, and builds on recent work by Chen and colleagues, but the review and terminology tend to be limited to an information theoretic vocabulary. Many of the same principles are there in standard Bayesian models where the prior distributions for input movements are coupled with the sensor model distribution. More of this literature should have been included. (along with recognition of the challenges involved in specifying such distributions)

In conclusion, I think that the authors have got quite a few interesting elements in the paper, but that it is not yet mature enough for publication at CHI. There needs to be a tightening up of the argumentation and justification of the work, clearer figures which represent how the information processing aspects of the user are coupled into the computational elements. I think you then need a more convincing application which can be controlled empirically, and test the validity of the proposed measures. (You could do this with a computational model replacing the user, where you knew theoretically how many bits were being provided by that agent, then seeing if your estimates from empirical observations are in line with the capacity of the simulated agent).

Appendix B Author Feedback (max. 5000 characters)

Thank you for comments. We will improve the paper accordingly regardless if it would appear in CHI or arXiv. R2 & R3 are knowable about information theory (InfoT). We are pleased that they did not point out any serious theoretical flaw except that R3 disagrees with the DPI statement (see E below). R2 is uneasy about the explanatory style of writing. This is understandable from the InfoT perspective. The paper was meant for an HCI audience. If possible, 1AC and 2AC may advise the style and content from the HCI perspective.

Below we focus on the main points of 1AC.


A. The novelty of the work can be seen from what is missing in the HCI literature, e.g., (a) the need for a simple and effective theoretical framework for analysing humans’ contributions to machine processes; and (b) the need for methodologies for estimating human knowledge quantitatively in practice and for measuring such qualities in empirical studies. Addressing such lacks will of course take years or decades. This work is a non-trivial step towards this goal. The mathematical simplicity is a merit rather than demerit. See also L30-38, L120-132.

B. The work is built on the theoretical advances in VIS, but focus on HCI. Sec.3 relates to [6] and Sec.4 relates to [5]. [34] (VAST 2016 best paper) is a realistic application that supports Sec.5 for estimating human knowledge in interactive ML. A recent lab. study [16] supports Sec.6. This work brings these together to provide HCI with a coherent InfoT framework. The examples in the paper are designed to be easily understandable without application-specific explanations.

R2 considers Sec.6 hand-crafted, possibly due to a mix-up between applications and lab. studies. A lab. study is meant to be “handcrafted” or controlled in order to study a phenomenon with statistical significance. 1AC and 2AC may revisit this comment.

C. While it is not easy to conduct another substantial lab. study, it is relatively easy to describe another real-world application by (a) replacing the LaTex example in Sec.5 or (b) including one or two in the supplementary material.

To improve the connection with data intelligence, we will add in Sec.7:

“Although this work uses relatively simple HCI examples to illustrate the concepts of measuring information, it is not difficult to extrapolate such interactions to those in a variety of data intelligence processes. For instances, one may estimate the ‘values’ of selecting a set of statistical measures, changing the parameters of a clustering algorithm, choosing keys for sorting, selecting an area to zoom in in visualization, reconfiguring layers in machine learning, accepting or discarding an automatically generated recommendation, and so on.”


D. The works inspired by Fitts & Posner focus on psychological or perceptual responses to some stimuli (i.e., without thinking). These models cannot be used to estimate human knowledge used in HCI processes.

E. R3 is uneasy with the statement “HCI breaks the conditions of DPI”’, possibly due to the overlook of the word “conditions”. The DPI theorem is correct but has conditions. Any defined condition must be breakable. If HCI could not in general break such conditions, we should seriously ask when these conditions would be broken. If such conditions could not be broken at all, DPI would be incorrectly defined and proved.

Further, (a) DPI is formulated based on a definable input space, while the human knowledge and ad hoc sensing of new variables is rarely included as part of the input space for any DP application. (b) If a human is replaced by an algorithmic model, and if this model can access the information discarded by earlier processes in the pipeline, the proof of DPI cannot be obtained. (c) If humans are allowed to add any arbitrarily new information into any process in a workflow, the corresponding DPI has to assume that the input space consists of infinitely all possible variables. This renders DPI meaningless in practice. These arguments have been validated by a number of InfoT experts.

F. R3 correctly noticed some structural resemblance with Bayesian models used in ML. A Bayesian network assumes that knowing the probability distribution of an input is good enough. Whenever a computer requests a user input in HCI, it assumes that it does not know the answer. There are also hypotheses that human mind might be similar to a Bayesian network, CNN, RNN, etc. The proposed framework is not biased towards any of such hypotheses.


G. 2AC on cost-benefit. Very good question as we have not found an appropriate intuitive description so far. The suggested wording with “capacity” and “efficiency” is good as long as the term “capacity” is defined as the capacity of alphabet transformation rather than just communication. Action capacity is a simplified case assuming no reconstructive distortion. It is good for F2 in Fig. 3(c), but not for F1. That is why Action capacity and DU do not capture the amount of knowledge accurately. See also L617-L657.

H. Additional display space. We will add “e.g., textual instructions for aiding multiple choices.”

I. The ideal measure of cost is energy (unit: Joule). See also L605-616.

J. Intelligent text entry is a mean for improving the cost-benefit of HCI using the underlying probabilistic distribution. It also relates to L450-453.

K. The term of data intelligence is defined at the beginning of the paper. If there is a better encompassing term, please suggest.

L. 2AC is right that human computation and interactive ML follow the same thinking. The method in Sec. 5 can be applied to these applications.

M. In CompSc, we like to emphasize the benefit of computing. R3 refers to user inputs as users’ intentions/desires. Intentions are just one category of variables that the computer do not have. HCI can reduce the entropy of any variables that the computer do not have or are out of date. See also C. If we in HCI cannot give humans a bit more credit, who else can? (cf. Darwin’s confession.)