Deep Tractable Probabilistic Models for Moral Responsibility

10/08/2018 ∙ by Lewis Hammond, et al. ∙ 0

Moral responsibility is a major concern in automated decision-making, with applications ranging from self-driving cars to kidney exchanges. From the viewpoint of automated systems, the urgent questions are: (a) How can models of moral scenarios and blameworthiness be extracted and learnt automatically from data? (b) How can judgements be computed tractably, given the split-second decision points faced by the system? By building on deep tractable probabilistic learning, we propose a learning regime for inducing models of such scenarios automatically from data and reasoning tractably from them. We report on experiments that compare our system with human judgement in three illustrative domains: lung cancer staging, teamwork management, and trolley problems.



There are no comments yet.


page 9

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Moral responsibility is a major concern in automated decision-making. In applications ranging from self-driving cars to kidney exchanges [Conitzer et al., 2017], contextualising and enabling judgements of morality and blame is becoming a difficult challenge, owing in part to the philosophically vexing nature of these notions. In the infamous trolley problem [Thomson, 1985], for example, a putative agent encounters a runaway trolley headed towards five individuals who are unable to escape the trolley’s path. Their death is certain if the trolley were to collide with them. The agent, however, can divert the trolley to a side track by means of a switch, but at the cost of the death of a sixth individual, who happens to be on this latter track. While one would hope that in practice the situations encountered by, say, self-driving cars would not involve such extreme choices, providing a decision-making framework with the capability of reasoning about blame seems prudent.

Moral reasoning has been actively studied by philosophers, lawyers, and psychologists for many decades. Especially when considering quantitative frameworks,111The quantitative nature of the framework used in this work implicitly takes a consequentialist stance when it comes to the normative ethical theory used to assess responsibility and blame, and we also rely on our utility functions being cardinal as opposed to merely ordinal. See, for example, [Sinnott-Armstrong, 2015] and [Strotz, 1953] for definitions and discussions on these stances. a definition of responsibility that is based on causality has been argued to be particularly appealing [Chockler and Halpern, 2004]. But most of these definitions are motivated and instantiated by carefully constructed examples designed by the expert, and so are not necessarily viable in large-scale applications. Indeed, problematic situations encountered by automated systems are likely to be in a high-dimensional setting, with hundreds and thousands of latent variables capturing the low-level aspects of the application domain. Thus, the urgent questions are:

  • How can models of moral scenarios and blameworthiness be extracted and learnt automatically from data?

  • How can judgements be computed tractably, given the split-second decision points faced by the system?

In this work, we propose a learning regime for inducing models of moral scenarios and blameworthiness automatically from data, and reasoning tractably from them. To the best of our knowledge, this is the first of such proposals. The regime leverages the tractable learning paradigm [Poon and Domingos, 2011, Choi et al., 2015, Kisa et al., 2014], which can induce both high- and low- tree width graphical models with latent variables, and thus realises a deep probabilistic architecture [Pronobis et al., 2017]. We remark that we do not motivate any new definitions for moral responsibility, but show how an existing model can be embedded in the learning framework. We suspect it should be possible to analogously embed other definitions from the literature too. We then study the computational features of this regime. Finally, we report on experiments regarding the alignment between automated morally-responsible decision-making and human judgement in three illustrative domains: lung cancer staging, teamwork management, and trolley problems.

2 Preliminaries

2.1 Blameworthiness

We use the word blameworthiness to capture an important part of what can more broadly be described as moral responsibility, and consider a set of definitions (taken directly from the original work, with slight changes in notation for the sake of clarity and conciseness) put forward by [Halpern and Kleiman-Weiner, 2018] (henceforth HK). In HK, environments are modelled in terms of variables and structural equations relating their values [Halpern and Pearl, 2005]. More formally, the variables are partitioned into exogenous variables external to the model in question, and endogenous variables that are internal to the model and whose values are determined by those of the exogenous variables. A range function maps every variable to the set of possible values it may take. In any model, there exists one structural equation for each .

Definition 1.

A causal model is a pair where is a signature and is a set of modifiable structural equations . A causal setting is a pair where is a context.

In general we denote an assignment of values to variables in a set as Y. Following HK, we restrict our considerations to recursive models, in which, given a context X, the values of all variables in are uniquely determined.

Definition 2.

A primitive event is an equation of the form for some , . A causal formula is denoted where and is a Boolean formula of primitive events. This says that if the variables in were set to values Y (i.e. by intervention) then would hold. For a causal formula we write if is satisfied in causal setting .

An agent’s epistemic state is given by where is a set of causal settings,

is a probability distribution over this set, and

is utility function on the set of worlds, where a world is defined as a setting of values to all variables in . denotes the unique world determined by the causal setting .

Definition 3.

We define how much more likely it is that will result from performing than from using:

where is a variable identified in order to capture an action of the agent and is the set of causal settings in which (a causal formula) is satisfied.

The costs of actions are measured with respect to a set of outcome variables whose values are determined by an assignment to all other variables. In a given causal setting , denotes the setting of the outcome variables when action is performed and denotes the corresponding world.

Definition 4.

The (expected) cost of relative to is:

Finally, HK introduce one last quantity to measure how important the costs of actions are when attributing blame (this varies according to the scenario). Specifically, as then and thus the less we care about cost. Note that blame is assumed to be non-negative and so it is required that .

Definition 5.

The degree of blameworthiness of for relative to (given and ) is:

The overall degree of blameworthiness of for is then:

For reasons of space we omit an example here, but include several when reporting the results of our experiments. For further examples and discussions, we refer the reader to HK.

2.2 PSDDs

Since, in general, probabilistic inference is intractable [Bacchus et al., 2009], tractable learning has emerged as a recent paradigm where one attempts to learn classes of Arithmetic Circuits (ACs), for which inference is tractable [Gens and Pedro, 2013, Kisa et al., 2014]. In particular, we use Probabilistic Sentential Decision Diagrams (PSDDs) [Kisa et al., 2014] which are tractable representations of a probability distribution over a propositional logic theory (a set of sentences in propositional logic) represented by a Sentential Decision Diagram (SDD). Space precludes us from discussing SDDs and PSDDs in detail, but the main idea behind SDDs is to factor the theory recursively as a binary tree: terminal nodes are either 1 or 0, and the decision nodes are of the form where primes are SDDs corresponding to the left branch, subs are SDDs corresponding to the right branch, and form a partition (the primes are consistent, mutually exclusive, and their disjunction is valid). In PSDDs, each prime in a decision node is associated with a non-negative parameter such that and if and only if . Each terminal node also has a a parameter such that , and together these parameters can be used to capture probability distributions. Most significantly, probabilistic queries, such as conditionals and marginals, can be computed in time linear in the size of the model. PSDDs can be learnt from data [Liang et al., 2017], possibly with the inclusion of logical constraints standing for background knowledge. The ability to encode logical constraints into the model directly enforces sparsity which in turn can lead to increased accuracy and decreased size. In our setting, we can draw parallels between these logical constraints and deontological ethical principles (e.g. it is forbidden to kill another human being), and between learnt distributions over decision-making scenarios (which can encode preferences) and the utility functions used in consequentialist ethical theories.

3 Blameworthiness via Psdds

We aim to leverage the learning of PSDDs, their tractable query interface, and their ability to handle domain constraints for inducing models of moral scenarios.222Our technical development can leverage both parameter and (possibly partial) structure learning for PSDDs. Of course, learning causal models is a challenging problem [Acharya et al., 2018], and in this regard, probabilistic structure learning is not assumed to be a recipe for causal discovery in general [Pearl, 1998]. Rather, under the assumptions discussed later, we are able to use our probabilistic model for causal reasoning. This is made possible by means of an embedding that we sketch below, while also discussing assumptions and choices. At the outset, we reiterate that we do not introduce new definitions here, but show how an existing one, that of HK, can be embedded within a learning regime.

3.1 Variables

We first distinguish between scenarios in which we do and do not model outcome variables. In both cases we have exogenous variables , but in the former the endogenous variables are partitioned into decision variables and outcome variables , and in the latter we have (this does not affect the notation in our definitions, however). This is because we do not assume that outcomes can always be recorded, and in some scenarios it makes sense to think of decisions as an end in themselves.

The range function

is defined by the scenario we model, but in practice we one-hot encode the variables and so the range of each is simply

. A subset (possibly empty) of the structural equations in are implicitly encoded within the structure of the SDD underlying the PSDD, consisting of the logical constraints that remain true in every causal model . The remaining equations are those that vary depending on the causal model. Each possible assignment D, O given X corresponds to a set of structural equations that combine with those encoded by the SDD to determine the values of the variables in given X. The PSDD then corresponds to the probability distribution over , compacting everything neatly into a single structure.

Our critical assumption here is that the signature (the variables and the values they may take) remains the same in all models, although the structural equations (the ways in which said variables are related) may vary. Given that each model represents an agent’s uncertain view of a decision-making scenario we do not think it too restrictive to keep the elements of this scenario the same across the potential eventualities, so long as the way these elements interact may differ. Indeed, learning PSDDs from decision-making data requires that the data points measure the same variables each time.

3.2 Probabilities

Thus, our distribution ranges over assignments to variables instead of . As a slight abuse of notation we write . The key observation needed to translate between these two distributions (we denote the original as ) and which relies on our assumption above is that each set of structural equations together with a context X deterministically leads to a unique, complete assignment V of the endogenous variables, which we write (abusing notation slightly) as , though there may be many such sets of equations that lead to the same assignment. Hence, for any context X and any assignment Y for we have:

We view a Boolean formula of primitive events (possibly resulting from decision ) as a function that returns 1 if the original formula is satisfied by the assignment, or 0 otherwise. We write

for a general vector of values over

, and hence . Here, the probability of occurring given that action is performed (i.e. conditioning on intervention) given by HK can also be written as . In general, it is not the case that , but by assuming that the direct causes of action are captured by the context X and that the other decisions and outcomes and O are in turn caused by X and we may use the back-door criterion [Pearl, 2009] with as a sufficient set to write:

and thus may use for . In order not to re-learn a separate model for each scenario we also allow the user of our system the option of specifying a current, alternative distribution over contexts .

3.3 Utilities

We now consider the utility function , the output of which we assume is normalised to the range .333This has no effect on our calculations as we only use cardinal utility functions with bounded ranges, which are invariant to positive affine transformation. We avoid unnecessary extra notation by defining the utility function in terms of X, D, and instead of worlds . In our implementation we allow the user to input an existing utility function or to learn one from data. In the latter case the user further specifies whether or not the function should be context-relative, i.e. whether we have or (our notation) as, in some cases, how good a certain outcome O is depends on the context X. Similarly, the user also decides whether the function should be linear in the outcome variables, in which case the final utility is or respectively (where we assume that each ). Here the utility function is simply a vector of weights and the total utility of an outcome is the dot product of this vector with the vector of outcome variables.

When learning utility functions, the key assumption we make (before normalisation) is that the probability of a certain decision being made given a context is linearly proportional to the expected utility of that decision in the context. Note that here a decision is a general assignment D and not a single action . For example, in the case where there are outcome variables, and the utility function is both linear and context-relative, we assume that . The linearity of this relationship is neither critical to our work nor imposes any real restrictions on it, but simplifies our calculations somewhat and means that we do not have to make any further assumptions about the noisiness of the decision-making scenario, or how sophisticated the agent is with respect to making utility-maximising decisions. The existence of a proportionality relationship itself is far more important. However, we believe this is, in fact, relatively uncontroversial and can be restated as the simple principle that an agent is more likely to choose a decision that leads to a higher expected utility than one that leads to a lower expected utility. If we view decisions as guided by a utility function, then it follows that the decisions should, on average, be consistent with and representative of that utility function.

3.4 Costs and Blameworthiness

We also adapt the cost function given in HK, denoted here by . As actions do not deterministically lead to outcomes in our work, we cannot use to represent the specific outcome when decision is made (in some context). For our purposes it suffices to use or , depending on whether is context-relative or not. This is simply the negative expected utility over all contexts, conditioning by intervention on decision . Using our conversion between and , the back-door criterion [Pearl, 2009], and our assumption that action is not caused by the other endogenous variables (i.e. is a sufficient set for ), it is straightforward to to show that this cost function is equivalent to the one in HK (with respect to determining blameworthiness scores).444In particular, it suffices to use . Again, we also give the user the option of updating the distribution over contexts to some other distribution so that the current model can be re-used in different scenarios. Given and , both and are computed as in HK, although we instead require that (the equivalence of this condition to the one in HK is an easy exercise). With this the embedding is complete.

4 Complexity Results

Given our concerns over tractability we provide several computational complexity results for our embedding. Basic results were given in [Halpern and Kleiman-Weiner, 2018], but only in terms of the computations being polynomial in , , and . Here we provide more detailed results that are specific to our embedding and to the properties of PSDDs. The complexity of calculating blameworthiness scores depends on whether the user specifies an alternative distribution , although in practice this is unlikely to have a major effect on tractability. Finally, note that we assume here that the PSDD and utility function are given in advance and so we do not consider the computational cost of learning. A summary of our results is given in Table 1.

Term Time Complexity
Table 1: Time complexities for each of the key terms that we compute. If the user specifies an extra distribution over contexts, then the complexity is given by the expressions below with each occurrence of the term replaced by , where is the time taken to evaluate .

Here, is the time taken to evaluate the PSDD where is the size of the PSDD, measured as the number of parameters; is the time taken to evaluate the utility function; and is the time taken to evaluate the Boolean function , where measures the number of Boolean connectives in . We observe that all of the final time complexities are exponential in the size of at least some subset of the variables. This is a result of the Boolean representation; our results are, in fact, more tightly bounded versions of those in HK, which are polynomial in the size of . In practice, however, we only sum over worlds with non-zero probability of occurring. Using PSDDs allows us to exploit this fact in ways that other models cannot, as we can logically constrain the model to have zero probability on any impossible world. Thus, when calculating blameworthiness we can ignore a great many of the terms in each sum and speed up computation dramatically. To give some concrete examples, the model counts of the PSDDs in our experiments were 52, 4800, and 180 out of , , and possible variable assignments, respectively.

5 Implementation

The underlying motivation behind our system was that a user should be able to go from any stage of creating a model to generating blameworthiness scores as conveniently and as straightforwardly as possible. With this in mind our package runs from the command line and prompts the user for a series of inputs including: data; existing PSDDs, SDDs, or vtrees; logical constraints; utility function specifications; variable descriptions; and finally the decisions, outcomes, and other details needed to compute a particular blameworthiness score. These inputs and any outputs from the system are saved and thus each model and its results can be easily accessed and re-used if needed. Note that we assume each datum is a sequence of fully observed values for binary (possibly as a result of one-hot encoding) variables that correspond to the context, the decisions made, and the resulting outcome, if recorded.

Our implementation makes use of two existing resources: [The SDD Package 2.0, 2018], an open-source system for creating and managing SDDs, including compiling them from logical constraints; and LearnPSDD [Liang et al., 2017], a recently developed set of algorithms that can be used to learn the parameters and structure of PSDDs from data, learn vtrees from data, and to convert SDDs into PSDDs. The resulting functionalities of our system can then be broken down into four broad areas:

  • Building and managing models, including converting logical constraints specified by the user in simple infix notation to restrictions upon the learnt model. For example, can be entered as a command line prompt using =(&(A,B),C).

  • Performing inference by evaluating the model or by calculating the MPE, both possibly given partial evidence. Each of our inference algorithms are linear in the size of the model, and are based on pseudocode given in [Kisa et al., 2014] and [Peharz et al., 2017] respectively.

  • Learning utility functions from data, whose properties (such as being linear or being context-relative) are specified by the user in advance. This learning is done by forming a matrix equation representing our assumed proportionality relationship across all decisions and contexts, then solving to find utilities using non-negative linear regression with L2 regularisation (equivalent to solving a quadratic program).

  • Computing blameworthiness by efficiently calculating the key quantities from our embedding, using parameters from particular queries given by the user. Results are then displayed in natural language and automatically saved for future reference.

A high-level overview of the complete structure of the system and full documentation are included in a package, which will be made available online.

6 Experiments and Results

Using our implementation we learnt several models using a selection of datasets from varying domains in order to test our hypotheses. In particular we answer three questions in each case:

  • Does our system learn the correct overall probability distribution?

  • Does our system capture the correct utility function?

  • Does our system produce reasonable blameworthiness scores?

Full datasets are available as part of the package and summaries of each (including the domain constraints underlying our datasets) are given in the appendix.

6.1 Lung Cancer Staging

We use a synthetic dataset generated with the lung cancer staging influence diagram given in [Nease Jr and Owens, 1997]. The data was generated assuming that the overall decision strategy recommended in the original paper is followed with some high probability at each decision point. In this strategy, a thoractomy is the usual treatment unless the patient has mediastinal metastases, in which case a thoractomy will not result in greater life expectancy than the lower risk option of radiation therapy, which is then the preferred treatment. The first decision made is whether a CT scan should be performed to test for mediastinal metastases, the second is whether to perform a mediastinoscopy. If the CT scan results are positive for mediastinal metastases then a mediastinoscopy is usually recommended in order to provide a second check, but if the CT scan result is negative then a mediastinoscopy is not seen as worth the extra risk involved in the operation. Possible outcomes are determined by variables that indicate whether the patient survives the diagnosis procedure and survives the treatment, and utility is measured by life expectancy.

For (Q1) we measure the overall log likelihood of the models learnt by our system on training, validation, and test datasets (see Table 2). A full comparison across a range of similar models and learning techniques is beyond the scope of our work here, although to provide some evidence of the competitiveness of PSDDs we include the log likelihood scores of a sum-product network (SPN) as a benchmark. We follow a similar pattern in our remaining experiments, each time using Tachyon [Kalra, 2017] (an open source library for SPNs) to produce an SPN using the same training, validation, and test sets of our data, with the standard learning parameters as given in the Tachyon documentation example. We also compare the sizes (measured by the number of nodes) and the log likelihoods of PSDDs learnt with and without logical constraints in order to demonstrate the effectiveness of the former approach. Our model is able to recover the artificial decision-making strategy well (see Figure 1); at most points of the staging procedure the model learns a very similar distribution over decisions, and in all cases the correct decision is made the majority of times.

Model Training Validation Test Size
PSDD* -2.047 -2.046 -2.063 134
1 PSDD -2.550 -2.549 -2.564 436
SPN -3.139 -3.143 -3.158 1430
PSDD* -5.541 -5.507 -5.457 370
2 PSDD -5.637 -5.619 -5.556 931
SPN -7.734 -7.708 -7.658 3550
PSDD* -4.440 -4.510 -4.785 368
3 PSDD -6.189 -6.014 -6.529 511
SPN -15.513 -16.043 -15.765 3207
Table 2: Log likelihoods and sizes of the constrained PSDDs (the models we use in our system, indicated by the * symbol), unconstrained PSDDs, and the SPNs learnt in our three experiments.

Answering (Q2) here is more difficult as the given utilities are not necessarily such that our decisions are linearly proportional to the expected utility of that decision. However, our strategy was chosen so as to maximise expected utility in the majority of cases. Thus, when comparing the given life expectancies with the learnt utility function, we still expect the same ordinality of utility values, even if not the same cardinality. In particular, our function assigns maximal utility (1.000) to the successful performing of a thoractomy when the patient does not have mediastinal metastases (the optimal scenario), and any scenario in which the patient dies has markedly lower utility (mean value 0.134).

In attempting to answer (Q3) we divide our question into two parts: does the system attribute no blame in the correct cases?; and does the system attribute more blame in the cases we would expect it to (and less in others)? Needless to say, it is very difficult (perhaps even impossible, at least without an extensive survey of human opinions) to produce an appropriate metric for how correct our attributions of blame are, but we suggest that these two criteria are the most fundamental and capture the core of what we want to evaluate. We successfully queried our model in a variety of settings corresponding to the two questions above and present representative examples below (we follow this same pattern in our second and third experiments).

Figure 1: A comparison between the five probability values specified in our data generation process and the corresponding values learnt by our system from this data.

Regarding the first part of (Q3), one case in which we have blameworthiness scores of zero is when performing the action being judged is less likely to result in the outcome we are concerned with than the action(s) we are comparing it to. The chance of the patient dying in the diagnostic process () is increased if a mediastinoscopy () is performed, hence the blameworthiness for such a death due to not performing a mediastinoscopy should be zero. As expected, our model assigns . To answer the second part of (Q3), we show that the system produces higher blameworthiness scores when a negative outcome is more likely to occur (assuming the actions being compared have relatively similar costs). For example, in the case where the patient does not have mediastinal metastases then the best treatment is a thoractomy, but a thoractomy will not be performed if the result of the last diagnostic test performed is positive. The specificity of a mediastinoscopy is higher than that of a CT scan, hence a CT scan is more likely to produce a false positive and thus (assuming no mediastinoscopy is performed as a second check) lead to the wrong treatment.555Note that even though a mediastinoscopy has a higher cost (as the patient is more likely to die if it is performed), it should not be enough to outweigh the test’s accuracy in this circumstance. In the case where only one diagnostic procedure is performed we therefore have a higher degree of blame attributed to the decision to conduct a CT scan (0.013) as opposed to a mediastinoscopy (0.000), where we use .

6.2 Teamwork Management

Our second experiment uses a recently collected dataset of human decision-making in teamwork management [Yu et al., 2017]. This data was recorded from over 1000 participants as they played a game that simulates task allocation processes in a management environment. In each level of the game the player has different tasks to allocate to a group of virtual workers that have different attributes and capabilities. The tasks vary in difficulty, value, and time requirements, and the player gains feedback from the virtual workers as tasks are completed. At the end of the level the player receives a score based on the quality and timeliness of their work. Finally, the player is asked to record their emotional response to the result of the game in terms of scores corresponding to six basic emotions. We simplify matters slightly by considering only the self-declared management strategy of the player as our decisions. Within the game this is recorded by five check-boxes at the end of the level that are not mutually exclusive, giving 32 possible overall strategies. These strategy choices concern methods of task allocation such as load-balancing (keeping each worker’s workload roughly even) and skill-based (assigning tasks by how likely the worker is to complete the task well and on time), amongst others. We also measure utility purely by the self-reported happiness of the player, rather than any other emotions.

Figure 2: The log probability assigned to each possible decision strategy across all contexts by our model, compared to the log proportion of times each strategy was used in the six levels of the game by participants. Strategies are sorted in ascending order by their proportion of use in level 1 and gaps in each plot represent strategies never used in that game level.

As part of our answer to (Q1) we investigate how often the model would employ each of the 32 possible strategies (where a strategy is represented by an assignment of values to the binary indicator decision variables) compared to the average participant (across all contexts), which can be seen in Figure 2. In general the learnt probabilities are similar to the actual proportions in the data, though noisier. The discrepancies are more noticeable (though understandably so) for decisions that were made very rarely, perhaps only once or twice in the entire dataset. These differences are also partly due to smoothing (i.e. all strategies have a non-zero probability of being played).

Figure 3: Each point is a decision strategy in a level of the game; we compare the proportion of times it is used against the average self-reported utility that results from it. Each line is a least-squares best fit to the points in that level.

For (Q2) we use the self-reported happiness scores to investigate our assumption that the number of times a decision is made is (linearly) proportional to the expected utility based on that decision. In order to do this we split the data up based on the context (game level) and produce a scatter plot (Figure 3) of the proportion of times a set of decisions is made against the average utility (happiness score) of that decision. Overall there is no obvious positive linear correlation as our original assumption would imply, although this could be because of any one or combination of the following reasons: players do not play enough rounds of the game to find out which strategies reliably lead to higher scores and thus (presumably) higher utilities; players do not accurately self-report their strategies; or players’ strategies have relatively little impact on their overall utility based on the result of the game. We recall here that our assumption essentially comes down to supposing that people more often make decisions that result in greater utilities. The eminent plausibility of this statement, along with the relatively high likelihood of at least one of the factors in the list above means we do not have enough evidence here to refute the statement, although certainly further empirical work is required in order to demonstrate its truth.

Investigating this discrepancy further, we learnt a utility function (linear and context-relative) from the data and inspected the average weights given to the outcome variables (see right plot in Figure 4). A correct function should place higher weights on the outcome variables corresponding to higher ratings, which is true for timeliness, but not quite true for quality as the top rating is weighted only third highest. We found that the learnt utility weights are in fact almost identical to the distribution of the outcomes in the data (see left plot in Figure 4). Because our utility weights were learnt on the assumption that players more often use strategies that will lead to better expected outcomes, the similarity between these two graphs adds further weight to our suggestion that, in fact, the self-reported strategies of players have very little to do with the final outcome.

Figure 4: A comparison of the learnt utility weights for each of the outcome variables (to the right) and the proportion of times each outcome occurs in the data (to the left).

To answer (Q3) we examine cases in which the blameworthiness score should be zero, and then compare cases that should have lower or higher scores with respect to one another. Once again, comprehensive descriptions of each of our tested queries are omitted for reasons of space, but here we present some representative examples.666In all of the blameworthiness scores below we use the cost importance measure . Firstly, we considered level 1 of the game by choosing an alternative distribution over contexts when generating our scores. Here a player is less likely to receive a low rating for quality ( or ) if they employ a skill-based strategy where tasks are more frequently allocated to better workers (). As expected, our system returns . Secondly, we look at the timeliness outcomes. A player is less likely to obtain the top timeliness rating () if they do not use a strategy that uniformly allocates tasks () compared to their not using a random strategy of allocation (). Accordingly, we find that , and more specifically we have and (i.e. a player should avoid using a random strategy completely if they wish to obtain the top timeliness rating).

6.3 Trolley Problems

We also devised our own experimental setup with human participants, using a small-scale survey (the relevant documents and data are included in the package) to gather data about hypothetical moral decision-making scenarios. These scenarios took the form of variants on the infamous trolley problem [Thomson, 1985]. We extended this idea, as is not uncommon in the literature (see, e.g. [Moral Machine, 2016]), by introducing a series of different characters that might be on either track: one person, five people, 100 people, one’s pet, one’s best friend, and one’s family. We also added two further decision options: pushing whoever is on the side track into the way of the train in order to save whoever is on the main track, and sacrificing oneself by jumping in front of the train, saving both characters in the process. The survey then took the form of asking each participant which of the four actions they would perform (the fourth being inaction) given each possible permutation of the six characters on the main and side tracks (we assume that a character could not appear on both tracks in the same scenario). The general setup can be seen in Figure 5, with locations and denoting the locations of the characters on the main track and side track respectively.

Figure 5: A cartoon given to participants showing the layout of the experimental scenario and the four possible options. Clockwise from top (surrounding the face symbol) these are: sacrificing oneself, flipping the switch, inaction, and pushing the character at onto the main track. Locations and are instantiated by particular characters depending on the context.

Last of all, we added a probabilistic element (which was explained in advance to participants) to our scenarios whereby the switch only works with probability 0.6, and pushing the character at location onto the main track in order to stop the train succeeds with probability 0.8. This was used to account for the fact that people are generally more averse to actively pushing someone than to flipping a switch [Singer, 2005], and people are certainly more averse to sacrificing themselves than doing either of the former. However, depending on how much one values the character on the main track’s life, one might be prepared to perform a less desirable action in order to increase their chance of survival.

In answering (Q1) we investigate how well our model serves as a representation of the aggregated decision preferences of participants by calculating how likely the system would be to make particular decisions in each of the 30 contexts and comparing this with the average across participants in the survey. For reasons of space we focus here on a representative subset of these comparisons: namely, the five possible scenarios in which the best friend character is on the main track (see Figure 6). In general, the model’s predictions are similar to the answers given in the survey, although the effect of smoothing our distribution during learning is noticeable, especially due to the fact that the model was learnt with relatively few data points. Despite this handicap, the most likely decision in any of the 30 contexts according to the model is in fact the majority decision in the survey, with the ranking of other decisions in each context also highly accurate.

Figure 6: A comparison of the decisions made by participants and the predictions of our model in each of the five scenarios in which the best friend character is on the main track ().

Unlike our other two experiments, the survey data does not explicitly contain any utility information, meaning our system was forced to learn a utility function by using the probability distribution encoded by the PSDD. Within the decision-making scenarios we presented, it is plausible that the decisions made by participants were guided by weights that they assigned to the lives of each of the six characters and to their own life. Given that each of these is captured by a particular outcome variable we chose to construct a utility function that was linear in said variables. We also chose to make the utility function insensitive to context, as we would not expect how much one values the life of a particular character to depend on which track that character was on, or whether they were on a track at all.

For (Q2), with no existing utility data to compare our learnt function, we interpreted the survival rates of each character as the approximate weight assigned to their lives by the participants. While the survival rate is a non-deterministic function of the decisions made in each context, we assume that over the experiment these rates average out enough for us to make a meaningful comparison with the weights learnt by our model. A visual representation of this comparison can be seen in Figure 7. It is immediately obvious that our system has captured the correct utility function to a high degree of accuracy. With that said, our assumption about using survival rates as a proxy for real utility weights does lend itself to favourable comparison with a utility function learnt from a probability distribution over contexts, decisions, and outcomes (which thus includes survival rates). Given the setup of the experiment, however, this assumption seems justified and, furthermore, to be in line with how most of the participants answered the survey.

Figure 7: A comparison between the average survival rates of the seven characters (including the participants in the survey), normalised to sum to one, and the corresponding utility function weights learnt by our system.

Because of the symmetric nature of the set of contexts in our experiment, the probability of a particular character surviving as a result of a particular action across all contexts is just the same as the probability of that character not surviving. Hence in answering (Q3) we use our system’s feature of being able to accept particular distributions over the contexts in which we wish to attribute blame, allowing us to focus only on particular scenarios. Clearly, in any of the possible contexts one should not be blamed at all for the the death of the character on the main track for flipping the switch () as opposed to inaction (), because in the latter case they will die with certainty, but not in the former.777Note that this is not to say one would not be blameworthy when compared to all other actions as one could, for example, have sacrificed oneself instead, saving all other lives with certainty. Choosing a scenario arbitrarily to illustrate this point, with one person on the side track and five people on the main track, we have and (with our measure of cost importance , 1.1 times the negative minimum cost of any action).

Now consider the scenario in which there is a large crowd of a hundred or so people on the main track, but one is unable to tell from a distance if the five or so people on the side track are strangers or one’s family. Of course, the more likely it is that the family is on the side track, the more responsible one is for their deaths () if one, say, flips the switch () to divert the train. Conversely, we would also expect there to be less blame for the deaths of the 100 people () say, if one did nothing (), the more likely it is that the family is on the side track (because the cost, for the participant at least, of somehow diverting the train is higher). We compare cases where there is a 0.3 probability that the family is on the side track against a 0.6 probability and for all calculations use the cost importance measure . Therefore, not only would we expect the blame for the death of the family to be higher when pulling the switch in the latter case, we would expect the value to be approximately twice as high as in the former case. Accordingly, we compute values and respectively. Similarly, when considering blame for the deaths of the 100 people due inaction, we find that in the former case and that in the latter case (when the cost of performing any other action is higher).

7 Related Work

Our work here is differentiated from related efforts in two main ways: jointly addressing the automated learning of models of moral scenarios, and tractable reasoning. We discuss other efforts below.

As mentioned before, we do not motivate new definitions for moral responsibility here, but draw on HK, which, in turn, is based upon [Chockler and Halpern, 2004] and the work on causality in [Halpern and Pearl, 2005]. Their framework is also related to the intentions model in [Kleiman-Weiner et al., 2015] which considers predictions about the moral permissibility of actions via influence diagrams, though there is no emphasis on learning or tractability. In fact, the use of tractable architectures for decision-making itself is recent (see, e.g. [Bhattacharjya and Shachter, 2012, Melibari et al., 2016]). The authors in [Choi et al., 2015] learn PSDDs over preference rankings (as opposed to decision-making scenarios more generally), though their approach does not take account of different preferences in different contexts.

An important part of learning a model of moral decision-making is in learning a utility function. This is often referred to as

inverse reinforcement learning

(IRL) [Ng and Russell, 2000] or Bayesian inverse planning [Baker et al., 2009]. Our current implementation considers a simple approach for learning utilities (similar to [Nielsen and Jensen, 2004]), but more involved paradigms such as those above could indeed have been used.

Our contributions here are related to the body of work surrounding MIT’s Moral Machine website [Moral Machine, 2016]. For example, [Kim et al., 2018] build on the theory of [Kleiman-Weiner et al., 2017]

by developing a computational model of moral decision-making whose predictions they test against Moral Machine data. Their focus is on learning abstract moral principles via hierarchical Bayesian inference, and although our framework can be used to these ends, it is also flexible with respect to different contexts, and allows constraints on learnt models.

[Noothigattu et al., 2017] develop a method of aggregating the preferences of all participants (again, a secondary feature of our system) in order to make a given decision. However, due to the large numbers of such preference orderings, tractability issues arise and so sampling must be used.

Finally, a high-level overview of strategies for creating moral decision-making frameworks for automated systems is discussed in [Conitzer et al., 2017], and similar considerations regarding hybrid collective decision-making systems are made by [Greene et al., 2016]. We refer the reader to these works for more discussions.

8 Conclusion

Our system utilises the specification of decision-making scenarios in HK, and at the same time exploits many of the desirable properties of PSDDs (such as tractability, semantically meaningful parameters, and the ability to be both learnt from data and include logical constraints). The system is flexible in its usage, allowing various inputs and specifications. In general, the models in our experiments are accurate representations of the distributions over the moral scenarios that they are learnt from. Our learnt utility functions, while simple in nature, are still able to capture subtle details and in some scenarios are able to match human preferences with high accuracy using very little data. With these two elements we are able to generate blameworthiness scores that are, prima facie, in line with human intuitions. We hope that our work here goes some way towards bridging the gap between the existing philosophical work on moral responsibility and the existing technical work on decision-making in automated systems.


The full set of supplementary materials, source code, and extended discussions on much of our work presented here are included within a package, which will be made available online. Here we provide brief summaries of the three datasets used in our experiments, including the variable encoding used for each domain and the underlying constraints.

No. data points 360
No. variables 23
variables One Person On Track A (), … , Family On Track A (), One Person On Track B (), … , Family On Track B ()
Inaction (), Flip Switch (), Push B (), Sacrifice Oneself ()
One Person Lives (, … , Family Lives (), You Live ()
Constraints , , for all , for all , for all , , , for all , for all , for all , for all , , for all , for all
Model count 180
Utilities given? No
Table 3: A summary of the trolley problem data used in our third experiment.
No. data points 7446
No. variables 21
variables Level 1 (), … , Level 6 ()
variables Other (), Load-balancing (), Uniform (), Skill-based (), Random ()
variables Timeliness 1 (), … , Timeliness 5 (), Quality 1 (), … , Quality 5 ()
Constraints , for all , , for all , , for all
Model count 4800
Utilities given? Yes (Self-reported Happiness Score)
Table 4: A summary of the teamwork management data used in our second experiment.
No. data points 100000
No. variables 12
variables Mediastinal Metastases (), CT Positive (), CT Negative (), No CT (), Mediastinoscopy Positive (), Mediastinoscopy Negative (), No Mediastinoscopy ()
variables Perform CT (), Perform Mediastinoscopy ()
variables Perform Thoractomy (), Diagnosis Procedures Survived (), Treatment Survived ()
Constraints , , , , , , , , , , ,
Model count 52
Utilities given? Yes (Life Expectancy)
Table 5: A summary of the lung cancer staging data used in our first experiment.


  • [Acharya et al., 2018] Acharya, J., Bhattacharyya, A., Daskalakis, C., and Kandasamy, S. (2018). Learning and testing causal models with interventions. arXiv preprint arXiv:1805.09697.
  • [Bacchus et al., 2009] Bacchus, F., Dalmao, S., and Pitassi, T. (2009). Solving #SAT and Bayesian inference with backtracking search.

    Journal of Artificial Intelligence Research

    , 34:391–442.
  • [Baker et al., 2009] Baker, C. L., Saxe, R., and Tenenbaum, J. B. (2009). Action understanding as inverse planning. Cognition, 113(3):329–349.
  • [Bhattacharjya and Shachter, 2012] Bhattacharjya, D. and Shachter, R. D. (2012). Evaluating influence diagrams with decision circuits. arXiv preprint arXiv:1206.5257.
  • [Chockler and Halpern, 2004] Chockler, H. and Halpern, J. Y. (2004). Responsibility and blame: A structural-model approach. Journal of Artificial Intelligence Research, 22:93–115.
  • [Choi et al., 2015] Choi, A., Van den Broeck, G., and Darwiche, A. (2015). Tractable learning for structured probability spaces: A case study in learning preference distributions. In Proceedings of the 24th International Joint Conference on Artificial Intelligence, pages 2861–2868.
  • [Conitzer et al., 2017] Conitzer, V., Sinnott-Armstrong, W., Borg, J. S., Deng, Y., and Kramer, M. (2017). Moral decision making frameworks for artificial intelligence. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, pages 4831–4835.
  • [Gens and Pedro, 2013] Gens, R. and Pedro, D. (2013). Learning the structure of sum-product networks. In

    Proceedings of the 30th International Conference on Machine Learning

    , pages 873–880.
  • [Greene et al., 2016] Greene, J., Rossi, F., Tasioulas, J., Venable, K. B., and Williams, B. C. (2016). Embedding ethical principles in collective decision support systems. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, pages 4147–4151.
  • [Halpern and Kleiman-Weiner, 2018] Halpern, J. Y. and Kleiman-Weiner, M. (2018). Towards formal definitions of blameworthiness, intention, and moral responsibility. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pages 1853–1860.
  • [Halpern and Pearl, 2005] Halpern, J. Y. and Pearl, J. (2005). Causes and explanations: A structural-model approach. Part I: Causes. The British Journal for the Philosophy of Science, 56(4):843–887.
  • [Kalra, 2017] Kalra, A. (2017). Tachyon. University of Waterloo., Accessed 2018-08-23.
  • [Kim et al., 2018] Kim, R., Kleiman-Weiner, M., Abeliuk, A., Awad, E., Dsouza, S., Tenenbaum, J., and Rahwan, I. (2018). A computational model of commonsense moral decision making. arXiv preprint arXiv:1801.04346.
  • [Kisa et al., 2014] Kisa, D., Van den Broeck, G., Choi, A., and Darwiche, A. (2014). Probabilistic sentential decision diagrams. In Proceedings of the 14th International Conference on Principles of Knowledge Representation and Reasoning, pages 558–567.
  • [Kleiman-Weiner et al., 2015] Kleiman-Weiner, M., Gerstenberg, T., Levine, S., and Tenenbaum, J. B. (2015). Inference of intention and permissibility in moral decision making. In Proceedings of the 37th Annual Conference of the Cognitive Science Society, pages 1123–1128.
  • [Kleiman-Weiner et al., 2017] Kleiman-Weiner, M., Saxe, R., and Tenenbaum, J. B. (2017). Learning a commonsense moral theory. Cognition, 167:107–123.
  • [Liang et al., 2017] Liang, Y., Bekker, J., and Van den Broeck, G. (2017). Learning the structure of probabilistic sentential decision diagrams. In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence, pages 134–145.
  • [Melibari et al., 2016] Melibari, M. A., Poupart, P., and Doshi, P. (2016). Sum-product-max networks for tractable decision making. In Proceedings of the 15th International Conference on Autonomous Agents & Multiagent Systems, pages 1419–1420.
  • [Moral Machine, 2016] Moral Machine (2016). Scalable Cooperation (MIT Media Lab)., Accessed 2018-08-14.
  • [Nease Jr and Owens, 1997] Nease Jr, R. F. and Owens, D. K. (1997). Use of influence diagrams to structure medical decisions. Medical Decision Making, 17(3):263–275.
  • [Ng and Russell, 2000] Ng, A. Y. and Russell, S. J. (2000). Algorithms for inverse reinforcement learning. In Proceedings of the 17th International Conference on Machine Learning, pages 663–670.
  • [Nielsen and Jensen, 2004] Nielsen, T. D. and Jensen, F. V. (2004). Learning a decision maker’s utility function from (possibly) inconsistent behavior. Artificial Intelligence, 160(1-2):53–78.
  • [Noothigattu et al., 2017] Noothigattu, R., Gaikwad, S., Awad, E., Dsouza, S., Rahwan, I., Ravikumar, P., and Procaccia, A. D. (2017). A voting-based system for ethical decision making. arXiv preprint arXiv:1709.06692.
  • [Pearl, 1998] Pearl, J. (1998). Graphical models for probabilistic and causal reasoning. In Quantified Representation of Uncertainty and Imprecision, pages 367–389. Springer.
  • [Pearl, 2009] Pearl, J. (2009). Causal inference in statistics: An overview. Statistics Surveys, 3:96–146.
  • [Peharz et al., 2017] Peharz, R., Gens, R., Pernkopf, F., and Domingos, P. (2017). On the latent variable interpretation in sum-product networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(10):2030–2044.
  • [Poon and Domingos, 2011] Poon, H. and Domingos, P. (2011). Sum-product networks: A new deep architecture. In

    IEEE International Conference on Computer Vision Workshops

    , pages 689–690.
  • [Pronobis et al., 2017] Pronobis, A., Gens, R., Kakade, S., and Domingos, P. (2017).

    ICML Workshop on Principled Approaches to Deep Learning., Accessed 2018-10-03.
  • [Singer, 2005] Singer, P. (2005). Ethics and intuitions. The Journal of Ethics, 9(3-4):331–352.
  • [Sinnott-Armstrong, 2015] Sinnott-Armstrong, W. (2015). Consequentialism. In Zalta, E. N., editor, The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab (Stanford University).
    , Accessed 2018-08-17.
  • [Strotz, 1953] Strotz, R. H. (1953). Cardinal utility. The American Economic Review, 43(2):384–397.
  • [The SDD Package 2.0, 2018] The SDD Package 2.0 (2018). Automated Reasoning Group (University Of California, Los Angeles)., Accessed 2018-08-17.
  • [Thomson, 1985] Thomson, J. J. (1985). The trolley problem. The Yale Law Journal, 94(6):1395–1415.
  • [Yu et al., 2017] Yu, H., Shen, Z., Miao, C., Leung, C., Chen, Y., Fauvel, S., Lin, J., Cui, L., Pan, Z., and Yang, Q. (2017). A dataset of human decision-making in teamwork management. Scientific Data, 4:160127.