A Human-Computer Interface Design for Quantitative Measure of Regret Theory

09/30/2018 ∙ by Longsheng Jiang, et al. ∙ Clemson University 0

Regret theory is a theory that describes human decision-making under risk. The key of obtaining a quantitative model of regret theory is to measure the preference in humans' mind when they choose among a set of options. Unlike physical quantities, measuring psychological preference is not procedure invariant, i.e. the readings alter when the methods change. In this work, we alleviate this influence by choosing the procedure compatible with the way that an individual makes a choice. We believe the resulting model is closer to the nature of human decision-making. The preference elicitation process is decomposed into a series of short surveys to reduce cognitive workload and increase response accuracy. To make the questions natural and familiar to the subjects, we follow the insight that humans generate, quantify and communicate preference in natural language. The fuzzy-set theory is hence utilized to model responses from subjects. Based on these ideas, a graphical human-computer interface (HCI) is designed to articulate the information as well as to efficiently collect human responses. The design also accounts for human heuristics and biases, e.g. range effect and anchoring effect, to enhance its reliability. The overall performance of the survey is satisfactory because the measured model shows prediction accuracy equivalent to the revisit-performance of the subjects.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

RegretMeasurement-GUI

This is a survey instrument coded in Matlab for quantitative measure of regret theory. Regret theory is a model for human-like decision-making which can describe the risk-seeking and risk-averse behaviors. The explanation of the HCI design is published in IFAC conference on Cyber Physical and Human System 2018. The preprint is available at https://arxiv.org/abs/1810.00462.


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In many applications of human-robot systems, both humans and robots can execute tasks. For some human routine tasks, for example, object pick-up or handling, robots not only have their advantages (e.g. low cost) but disadvantages (e.g. low reliability). Humans can easily perfectly do such tasks, but the cost is higher: humans are slower, subject to fatigue, and sparse in the such systems. In such applications, a common decision-making problem is “What is more beneficial, choosing the robot (option) or the human (option)?” Mathieu et al. (2000) proved that when all team members share the same mental model, its overall performance improves. Hence, we need to study how humans make decisions and embody the same model on robots since it is neither feasible nor moral to force humans think mechanically. In particular, human decision-making is inevitably influenced by the regret emotion (Zeelenberg and Pieters, 2007). This influence in decision-making is modeled by regret theory (Loomes and Sugden, 1982). Regret theory is a qualitative model which is more for analysis than for prediction as needed for robotic decision-making. To transform the model from qualitative to quantitative, one requires to reliably measure preference in human minds. A well-designed measurement instrument thus is particularly important.

Measuring the mental values is the common act in psychology and economics. Questionnaire-based survey is the predominant tool. For questionnaires, designing the instrument is to design the human-computer interface (HCI) if the test is conducted on computers. Hence, these two terms can be used interchangeably. Designing a proper HCI is uneasy because our understanding of human perception, heuristics and cognition is limited. Any of unintended aspects of the design may influence human judgments in unexpected direction (Fischhoff et al., 1988). Research devoted to survey methodology is abundant (McKelvie, 1978; Lozano et al., 2008; Harpe, 2015). The main purposes of these works are to avoid ambiguity, refine response, and reduce mental effort. The Likert-type scale with 5 points emerges as the widely-accepted optimal one (McKelvie, 1978; Lozano et al., 2008). However, Li (2013) argued that the standard Likert method has drawbacks such as information loss and distortion. She proposed to incorporate fuzzy sets theory to the Likert scale by requiring subjects to report how much degree the rated level matches their opinion. Similarly, Fourali (1997) designed a fuzzy instrument that allows subjects to choose more than one scores on a rating scale. It relaxes the mental strain of subjects by acknowledging that subjects often know a range other than a single value during rating. de Sáa et al. (2015) proposed a method to require subjects directly draw the membership functions, introduced by fuzzy sets theory, over a range of values. It gives subjects more freedom in manifesting their thought and feeling. These designs all enable more subtle expression. However, they unavoidably complicate the process for answering the questions.

In this work, we contribute a simple, comprehensive, neutral design of the HCI that is compatible with its real world applications. The main novelty of this HCI design is, firstly, we utilize the connection between fuzzy sets theory and natural language to measure more detailed data but in an intuitive way to the human subjects; secondly, we consider human heuristics in judgment in an effort to make the measurement less distorted and more reliable.

The rest of the paper is organized as follows. Section 2 formulates the decision-making problem. Section 3 explains in detail the design of the HCI. Section 4 describes the experiment using the HCI. Section 5 presents the experimental results. Section 6 concludes the work.

2 Decision-Making Problem Formulation

Figure 1: A human robot team in a warehouse (top view).

For the sake of illustration, we choose an application in warehouses without losing generality. A human worker acts as an agent collaborating with a group of robots in picking up objects from shelves in a super-market style warehouse as in Figure  1

. In normal configuration, the robots are distributed on the floor for picking up while the worker remains in a station. The worker has direct communication with every robot such that he/she can respond to robots’ requests. Robots’ grippers are not dexterous enough to guarantee every pick is a success because of the diversity of goods. But it is assumed that robots can recognize every good and have a probabilistic estimate of pick-up success rate

. The cost of a successful robot pick-up (the robot option) is negligible. However, when a pick-up fails, the good drops, causing damage cost . On the other hand, the worker can pick up any objects with confident ease—the probabilistic success rate is . Such benefit comes with price. The worker has to disengage with other work he/she is doing, take time to move to the requesting location, and help the robot. Considering human workforce is more expensive, a human pick-up (the human option) requires a non-negligible cost . When a robot is ready for a pick-up, it needs to contemplate whether to resort to the robot option, which is risky, or the human option, which imposes a certain moderate cost. This decision-making is summarized in Table  1.

Robot option Human option
Outcome: Outcome:
Probability: Probability:
Table 1: Two options faced by a robot

The complete application of one-human-multi-robot collaboration involves motion planing, priory assignment, and etc., hence deserves a separate study. The decision-making part, however, is identical for each robot. We can focus on one robot without loss of generality.

We want the robot to share the same mental model with the worker. The robot should know that when facing the same situation how the worker makes decisions. It then acts accordingly. Therefore, we model the decision-making of the worker in front of the options in Table 1. On the basis of Regret Theory, we propose a human decision-making model as follows,

(1)

The probability weighting function represents the subjective perception of the objective probabilities (Gonzalez and Wu, 1999). Function

is an odd function defined in

(Loomes and Sugden, 1982). It evaluates the utilities of outcomes and includes the influence of the regret emotion on decision-making. (See (Loomes and Sugden, 1982) for details.) The normalized cost is defined as , where is the upper bound of the magnitude range for all outcomes, i.e. . Variable is the net advantage of the robot option over the human option. Its sign determines the choice:

(2)

where means that the robot option is preferred to, equally liked as, or surpassed by the human option. Moreover, larger magnitude indicates stronger preference.

Equation (1) gives only a framework, since the exact -function and -function are not given. Quantitative -function and -function, however, are required for a complete model. To do so, we must use the clue in equations (1) and (2). The only variable unrelated to the two functions in this equation is . According to equation (2), means that the two options are equally liked. Thus, we can eliminate from the equation by studying only the cases when . If we further define new variables and , equation (1) becomes

(3)

where is the probability to make hold.

This equation provides a way to iteratively calculate a sequence of points on the -function when choosing a specific -function from all possible -function candidates (See more details of calculating in (Liao et al., 2017)). For each -function, we can have a -function. Among all the possible pairs of , we then can find the optimal one that shows the best fit with human data.

Figure 2: The graphical interface consisting of two regions: the problem statement region enclosed by the red dash-dot line, the response region enclosed by the blue dashed line. (Dashed lines are not shown in the actual HCI.)

The prerequisite of the above strategy is . Pairs of options coinciding with this preference condition are rare. It is then important for a pair of options—no matter what their initial preference condition is—to converge to . It depends on an instrument that can reliably measure the human preference. The goal of this work is to design such an instrument (HCI).

3 Design of the Survey-based HCI

In designing the HCI, we kept 3 principles in mind (Fischhoff et al., 1988),

  1. Compatibility: Formulate questions in the experiment compatibly with their real-world appearance.

  2. Neutrality: Avoid implicit biases in information presentation.

  3. Simplicity: Reduce cognitive workload without destroying subjects’ intuition.

These principles lead to a design of a graphical interface shown in Figure 2. The problem statement region displays the information for a pair of defined options as in Table 1. The response region guides subjects in measuring their preferences. The interface shows the information of only one decision-making problem. To elicit the human decision-making model, many such problems are needed. The HCI needs a good organization of these problems.

3.1 Problem Statement Region

The problem statement comprises 3 blocks, displaying the question, the individual option information and the option comparison information, respectively.

The question in the title of Figure 2 asks subjects to directly make a choice. It is the same question asked in the warehouse application, avoiding invalid readings due to procedure change (Tversky et al., 1988). The labels of the options—Robot and Human—have impact on subjects’ judgments, because people have different impressions of robots (Arras and Cerqui, 2005). This influence is necessary because the worker’s impression of robots does matter, so we keep the labels, but it should not overshadow the attributes of options—outcomes and probabilities. The word economically in the title is important because it focuses the attention of subjects on the attributes.

The attributes of one option is nested within one cell (see Figure 2). This type of framing exempts the complaints that the information is distorted by the event-splitting effect (Bleichrodt and Wakker, 2015). To visualize the values of the attributes, both outcomes and probabilities are graphically represented by bar charts. The bar chart is chosen for its merit of salient proportionality, making it easier to see the changes of the values. Many previous experiments only provides visualization for probabilities (e.g. (Bleichrodt and Wakker, 2015)). This unbalanced presentation has the potential to introduce bias between different types of attributes. To comply with the neutrality principle, it is better to provide visual aids for both attributes. Furthermore, the expected value of options is provided, because of two reasons. Firstly, in pilot tests in which the expected value were not provided, the subjects tried to calculate the expected values mentally. It is better to directly provide it to avoid the mental effort. Secondly, regret theory is originally proposed as an alternative to expected utility theory (Loomes and Sugden, 1982). Since the expected value is the simplest form of expected utility, according to the neutrality principle, it is important to let subjects acknowledge it.

Loomes and Sugden (1982) hypothesized in regret theory that the influence of the regret emotion, which is evoked by the comparison of outcomes in different options, is substantial to decision-making. They modeled the comparison as subtraction in equation (1). Zadeh (1996), however, suggested that it is not natural for human mind to compute numbers. The workload for human to do mental calculation is large. To reduce it, we provide the comparison information. It consists of the comparison of the outcomes and the probabilities. Although the probability comparison is not modeled in equation (1), its inclusion exempts potential information bias.

This is a dilemma between two guiding principles: as discussed above, the comparison information block is prescribed by the principle of simplicity; however, the mere appearance of juxtaposition primes subjects to think in the way biasing toward regret theory (Harless, 1992). The compromise is to display the comparison information on the right margin of the interface. The right hand side of the page generally attracts little visual attention unless the content there is necessary (Buscher et al., 2009).

3.2 Response Region

Preference is the feeling of attractiveness. Preference exists objectively, although it cannot be measured as physical signals (Tversky et al., 1988). Zadeh (1996) speculated that human minds percept, reason and communicate mainly in natural language rather than in calculation. Natural language may be the only proper conveyance.

We would like to model the preference expressed in natural language and fuzzy sets theory is the tool for this job (Zadeh, 1996). A statement in natural language is denoted as a linguistic label and, by definition, is a fuzzy set. With respect to the preference, three linguistic labels can be defined: preferring the robot, preferring the human, and equally liking. The meaning of the 3 labels is easily understandable. People’s preference often is more subtle and refined than the 3 categories. They often can estimate the degree of match between their inner preference with certain label. The degree of match is defined as membership to a fuzzy set (linguistic label) . Hence, subjects can intuitively and easily describe their preference in a pair . The connection between and is the defined membership function . Three hypothetic , one for each linguistic label, are shown in Figure 3.

Figure 3: The hypothetic definition of fuzzy sets on

In the HCI we have to collect respectively for the 3 linguistic labels. We use an introspection-type of questions to ask subjects to evaluate the degree of match for the statements (see Figure 2), for example,

“I prefer Robot to Human.”

The degree of match is measured using a 5-point rating scale, since the 5 levels gives the optimal reliability and validity (McKelvie, 1978) (see the first cell in the response region in Figure 2). We label the scale with both numbers and verbal labels for enhanced scale anchoring (Harpe, 2015).

The spatial organization of the response region aims to reduce the mental workload. The following spatial cues are used: towards left corresponds to preferring the robot; center, equally liking; right, preferring the human; up, value increasing; down, value decreasing. To realize these cues, a gradual colored horizontal bar provides hints to the spatial coordination of preference (See Figure 2). The rating scales querying preferring the robot, equally liking, preferring the human are on the left, center and right, respectively. The scales of rating starts from 0 at the bottom to 1 at the top.

3.3 Architecture of the HCI

Figure 2 only represent the information of one decision-making problem and collects their responses. However, many problems of this type should be organized in a sequence to create the architecture of the HCI. This architecture determines what problems are included and how they are queued. In the designing, we consider relevant aspects of human heuristics—namely, range effect and anchoring effect—to avoid pitfalls due to them.

3.3.1 Range Effect.

Range effect refers to the unwanted phenomenon in experimentation that simply different ranges of attribute values can cause different decision-making (Hutchinson, 1983). Outcomes, in our case, do not have a well-defined fixed range, thus are subject to this effect. For the options in Table 1, the range of outcomes immediately available to subjects is defined by , since . To combat the range effect, we should maintain a constant range throughout the experiment. That to say our objective is to ensure always in a close neighborhood of a constant. At the same time, the assignment of values to and should also facilitate the iterative calculation in equation (3). Specifically, our another objective is to find sequences of and , respectively, such that and —both are defined on and —construct an extended connected chain , e.g. . One way to achieve both of the above objectives is to define sequences of and as in Table 2 (recall the definition ).

0 -0.5 -0.4 -0.9 -0.5
1 -0.4 -0.6 -1.0 -0.4
2 -0.6 -0.3 -0.9 -0.6
3 -0.3 -0.7 -1.0 -0.3
4 -0.7 -0.2 -0.9 -0.7
5 -0.2 -0.8 -1.0 -0.2
6 -0.8 -0.1 -0.9 -0.8
7 -0.1 -0.9 -1.0 -0.1
Table 2: The sequences of and

Inspecting the columns of and , there is a connected chain extends from to , iteratively. Also, in the column of the value alternates between and , which is within a close neighborhood of .

3.3.2 Anchoring Effect.

The goal of the HCI is to converge a pair of options from any preference state (either or ) to , which makes equation (3) hold. Human responses collected by the HCI, nevertheless, suffer insufficient adjustment from initial states of the option-pair: anchoring effect (Tversky and Kahneman, 1974). The anchoring effect dictates that the free variable converged from the state is highly like to be different with from the state . Both directions are subject to insufficient adjustment. Averaging the value got from the two directions will cancel out at least partly the insufficiency. In the architecture, the converging first starts in one direction, either from or . Once the state is reached, we get . It then immediately jumps to a new initial state and reaches the state again from the opposite direction to get . This iteration can go as many times as possible. The final is the average.

4 Experiment

We recruited 14 graduate students (3 females) from the Mechanical Engineering Department at Clemson University. Among them, 12 datasets, each from one subject, are reported while the other 2 (male) are detected as outliers. The incentive to each subject was a $10 flat payment.

The experiment lasted around 1.5 hours. There were 10 modules and each has 10 decision-making problems. Among them 8 modules were for training the model and 2 for validating. The validating modules were duplicates and each had 10 different problems. One module was inserted at the half of the experiment and the other was at the end.

To prepare, the experimenter explained the structure of the experiment and basic concepts such as expected value, independent random events. The subjects then independently finished a training session, which had only one module. Before the testing part, the subjects were surveyed with questionnaire for the value of . Value is defined as the amount of money that if lost would cause the subject significantly regret. Each training module contained problems defined by and from one row of Table 2, and a free variable . The problems were presented in the form of Figure 2. If the preference state had not reached after 10 problems, the last was taken as the approximation of .

5 Result and Discussion

In this section, we show the performance of the quantitative decision-making model elicited with the designed HCI. Because building the model relies on human responses collected only through the HCI, we believe that bad design of HCI can hardly generate a satisfactory model. In other words, a high-performed model implies an acceptable HCI.

μ

Figure 4: The actual membership function of equally liking for individual subject

Since the response region is designed based on fuzzy sets theory, we hypothesized the membership functions in Figure 3. Using the elicited and , we can calculate with equation (1) for any decision-making problem defined by , , and . For the same problem, the subjects provided their response in the experiment. Hence, we can plot the actual degree of membership of each subject for the 80 model training data, as in Figure 4. For clarity, only the label equally liking—whose membership function in Figure 3 is triangular—is considered.

Regardless of the point type, the shape of the point clouds of 9 subjects resembles a triangle to different degrees (except subject 3, 5, 8). It justifies the use of fuzzy sets theory in the HCI design. However, the actual membership functions are much noisier. Unlike the hypothetical membership functions, in Figure 4 spreads over the plotting space. It may be due to the fact that human evaluation is probabilistic rather than deterministic.

According to equation (2), if a point is on the right half plane, the choice should be the robot option; on the left half, the human option; otherwise, equally liking. In Figure 4, the points are denoted as indicated in the legend by the subjects’ actual response in the experiment. The elicited model performs well in the sense that most of the points from the model align with the ground truth. However, the plots use the model training data. A more objective evidence should come from the validating data.

During the experiment, the subjects answered the same validating module twice, their responses in these two modules are compared. When two choices to the same decision-making problem are the same, this response is regarded as consistent. The percentage of the consistency is named as human revisit accuracy. The choices predicted by the elicited model are compared with the subjects’ actual choices for the two modules, respectively. The average percentages of the accurate predictions in one modules are denoted as averaged prediction accuracy. We further segregate the consistent responses of each subject and denote the percentage of accurate predictions among these consistent responses as consistent-response prediction accuracy. The accuracies are shown for each subject in Figure 5.

For each subject, the averaged prediction accuracy is close to the human revisit accuracy (maximum difference is 25%). The consistent-response prediction accuracy is high () for each subject. This is true even for subject 5 (consistent-response prediction accuracy is 100%) who has low human revisit accuracy (30%). Moreover, the averaged prediction accuracy is strongly positively correlated with the human revisit accuracy (correlation coefficient is 0.79). For the whole group, the averaged prediction accuracy ( and ) is close to the human revisit accuracy ( and

). A paired-samples t-test shows no significant difference:

, . The prediction of the quantitative decision-making model can achieve the accuracy as good as the subjects re-answer the problems themselves. In terms of consistent-response prediction accuracy, the model performs satisfactory ( and ). The data show that the elicited decision-making model is satisfactory in predicting human decisions, which implies the HCI is also well-performing.

6 Conclusion

Including humans in the cyber-physical systems requires modeling human decision-making behaviors. The computation units then can either share the model or assist the humans based on the model. Such a model needs information from human minds. Designing a measurement instrument to do this job is crucial yet not easy because of our limited understanding of human mind. But for the known psychological effects of human heuristics, it is important to acknowledge them during designing. This work shows that a well-designed HCI can help us elicit a satisfactory computational human decision-making model. This elicited model can help improve human-robot collaboration performance is however still a hypothesis, which will be tested in our future work.

Figure 5: Comparison of the model prediction with the subjects’ actual choices in validating data

References

  • Arras and Cerqui (2005) Arras, K.O. and Cerqui, D. (2005). Do we want to share our lives and bodies with robots? a 2000 people survey. Technical Report, 605.
  • Bleichrodt and Wakker (2015) Bleichrodt, H. and Wakker, P.P. (2015). Regret theory: A bold alternative to the alternatives. The Economic Journal, 125(583), 493–532.
  • Buscher et al. (2009) Buscher, G., Cutrell, E., and Morris, M.R. (2009). What do you see when you’re surfing?: using eye tracking to predict salient regions of web pages. In Proceedings of the SIGCHI conference on human factors in computing systems, 21–30. ACM.
  • de Sáa et al. (2015) de Sáa, S.d.l.R., Gil, M.Á., González-Rodríguez, G., López, M.T., and Lubiano, M.A. (2015). Fuzzy rating scale-based questionnaires and their statistical analysis. IEEE Transactions on Fuzzy Systems, 23(1), 111–126.
  • Fischhoff et al. (1988) Fischhoff, B., Slovic, P., and Lichtenstein, S. (1988). Knowing what you want: Measuring labile values. Decision Making: Descriptive, Normative and Prescriptive Interactions, Cambridge University Press, Cambridge, 398–421.
  • Fourali (1997) Fourali, C. (1997). Using fuzzy logic in educational measurement: the case of portfolio assessment. Evaluation & Research in Education, 11(3), 129–148.
  • Gonzalez and Wu (1999) Gonzalez, R. and Wu, G. (1999). On the shape of the probability weighting function. Cognitive psychology, 38(1), 129–166.
  • Harless (1992) Harless, D.W. (1992). Actions versus prospects: The effect of problem representation on regret. The American Economic Review, 82(3), 634–649.
  • Harpe (2015) Harpe, S.E. (2015). How to analyze likert and other rating scale data. Currents in Pharmacy Teaching and Learning, 7(6), 836–850.
  • Hutchinson (1983) Hutchinson, J. (1983). On the locus of range effects in judgment and choice. ACR North American Advances.
  • Li (2013) Li, Q. (2013). A novel likert scale based on fuzzy sets theory. Expert Systems with Applications, 40(5), 1609–1618.
  • Liao et al. (2017) Liao, Z., Jiang, L., and Wang, Y. (2017). A quantitative measure of regret in decision-making for human-robot collaborative search tasks. In American Control Conference (ACC), 2017, 1524–1529. IEEE.
  • Loomes and Sugden (1982) Loomes, G. and Sugden, R. (1982). Regret theory: An alternative theory of rational choice under uncertainty. The economic journal, 92(368), 805–824.
  • Lozano et al. (2008) Lozano, L.M., García-Cueto, E., and Muñiz, J. (2008). Effect of the number of response categories on the reliability and validity of rating scales. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 4(2), 73.
  • Mathieu et al. (2000) Mathieu, J.E., Heffner, T.S., Goodwin, G.F., Salas, E., and Cannon-Bowers, J.A. (2000). The influence of shared mental models on team process and performance. Journal of applied psychology, 85(2), 273.
  • McKelvie (1978) McKelvie, S.J. (1978). Graphic rating scales—how many categories? British Journal of Psychology, 69(2), 185–202.
  • Tversky and Kahneman (1974) Tversky, A. and Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. science, 185(4157), 1124–1131.
  • Tversky et al. (1988) Tversky, A., Sattath, S., and Slovic, P. (1988). Contingent weighting in judgment and choice. Psychological review, 95(3), 371.
  • Zadeh (1996) Zadeh, L.A. (1996). Fuzzy logic= computing with words. IEEE transactions on fuzzy systems, 4(2), 103–111.
  • Zeelenberg and Pieters (2007) Zeelenberg, M. and Pieters, R. (2007). A theory of regret regulation 1.0. Journal of Consumer psychology, 17(1), 3–18.