Alfie: An Interactive Robot with a Moral Compass

09/11/2020 ∙ by Cigdem Turan, et al. ∙ Technische Universität Darmstadt 0

This work introduces Alfie, an interactive robot that is capable of answering moral (deontological) questions of a user. The interaction of Alfie is designed in a way in which the user can offer an alternative answer when the user disagrees with the given answer so that Alfie can learn from its interactions. Alfie's answers are based on a sentence embedding model that uses state-of-the-art language models, e.g. Universal Sentence Encoder and BERT. Alfie is implemented on a Furhat Robot, which provides a customizable user interface to design a social robot.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

There is a broad consensus that artificial intelligence (AI) research is progressing steadily and has pronounce impact on our daily life. Keeping the impact beneficial for society is of most importance. We all remember the unfortunate event that happened when Microsoft Research (MSR) decided to release a chatbot for Twitter111 After many interactions with Twitter users, the bot started creating racist and sexually inappropriate posts. This resulted in the suspension of the bot for the users. This clearly shows the potential dangers of unattended AI models.

Recent studies have shown that language representations encode not only human knowledge but also biases such as gender bias (Bolukbasi et al., 2016; Caliskan et al., 2017), and according to more recent studies (Jentzsch et al., 2019; Schramowski et al., 2020, 2019) also the moral and deontological values of our culture. Schramowski et al. (Schramowski et al., 2020) have shown that language models such as BERT (Devlin et al., 2019) and the Universal Sentence Encoder (Cer et al., 2018) cannot only reflect the accurate imprints of moral and ethical choices of actions such as “kill” and “murder”, but also understand the context of the action, e.g., “killing time” is positive whereas “killing humans” is negative. This, in turn, can be used to compute a moral score of any (deontological) question at hand, measuring the rightness of taking an action. This “Moral Choice Machine” (MCM) (Schramowski et al., 2020) can be used to determining the moral score of any given sentence and in turn paves the way to avoid incidents like the MSR chatbot.

Unfortunately, the MCM approach is purely unsupervised, just making use of the knowledge encoded in the language models trained without any supervision. This makes it difficult—if not impossible—to correct the score and, in turn, help avoiding “MSR chatbot” moments. An attractive alternative would be to revise the moral choice via interacting with the MCM algorithm in a user-centric and easy way. In this demonstration, we investigate the use of the MCM algorithm in the context of an interactive robot, called Alfie and shown in Fig. 

1. Alfie is giving us a great opportunity to investigate individuals’ reactions to the moral and deontological values of our culture encoded in human text. Alfie can also learn from the users and adjust its moral score based on human feedback.

The rest of this paper is as follows: Section 2 presents the architecture of the system including the Moral Choice Machine, the employed Furhat Robot and the dialog model. Section 3 concludes the paper with a discussion and future work.

2. The Architecture of Alfie

Alfie is a Furhat Robot222

, which provides a customizable user interface. We can customize the speech production and facial expressions as well as the human face presented through Furhat’s Software Development Kit. There are a side microphone and a camera in front of the Furhat Robot that allows the robot to follow the user and provides the opportunity to access the camera feed so that one can perform more sophisticated computer vision algorithms.

The interacting users are able to ask questions (user queries) to Alfie to get a moral score of the corresponding question. In the current version, the questions have to be in a certain form, e.g. Should I [action] [context] or Is it okay to [action] [context]. The Furhat Software preprocesses the speech input. The resulting text output is then passed to the Moral Choice Machine (MCM) algorithm presented in (Schramowski et al., 2020, 2019) as an input to calculate a moral score. The moral score computed is a real number normalized to . In our current design, the range of moral scores is divided into three intervals: is no, is neutral, and is yes. Both MCM variants (Schramowski et al., 2020, 2019) employ current state-of-the-art sentence embeddings computed using transformer architectures (Cer et al., 2018; Devlin et al., 2019; Reimers and Gurevych, 2019) and determine the moral score based on sentence similarities in the embedding space. This is an unsupervised method and consequently the quality of the moral score heavily depends on the performance of the language models. In the current version of Alfie, we use the algorithm described in (Schramowski et al., 2019).

Additionally, we compute an emotional state corresponding to the user query based on sentence similarities in the embedding space, i.e. finding the emotion with the highest similarity score to the question asked. In the current version, possible emotions are Anger, Confusion, Disgust, Fear, Joy, Sadness, Satisfaction, Surprise. We change the facial expressions of Alfie based on these emotions and adapt the pitch and the speech’s speed to fit the corresponding emotion the best. According to the answer—”yes”, ”no”, or ”indecisive”—we also add the respective head movement to make the conversation engaging. Due to the computational resource limitations of the Furhat Robot, the MCM algorithms and other operations on the embedding space are computed on a separate server. The resulting moral score is passed to Alfie again so that the Furhat Software produces the speech as an output in form of a corresponding answer. We save all the questions asked to Alfie to a database in our servers for statistical purposes.

Once in a while (as determined with a percentage value in the script), Alfie asks for feedback about whether the user agrees with its answer. This response is also saved to the database. Of particular interest are the responses when the user disagrees with Alfie. This gives us the opportunity and the data to retrain Alfie to adjust its moral score with data collected during interactions or even online during the interaction. We also created a training mode where Alfie asks users many moral questions listed in our database. It is meant for collecting feedback from the user for moral questions we are interested in human feedback. This data can later be used for adapting Alfie’s moral scores.

3. Discussion and Future Work

As mentioned earlier, Alfie’s capabilities on the moral score depend on the performance of the language model, as well as the algorithm we use to calculate the moral score. Also, since there is no absolute agreement of right and wrong in general, it is difficult to qualitatively evaluate the computed moral score. These are the reasons why we designed an interactive robot that is able to interact with humans and collect their responses to learn from them. We aim to extend the interactions of simple feedback to explanatory interactive learning (Schramowski et al., 2020), i.e. adding the capability to explain Alfie’s decisions and revising them based on user feedback. Although we currently focus on explicit feedback from users, i.e. their direct feedback on whether they agree or not, we aim to obtain implicit feedback using the channels like gaze and body movement and facial expressions similar to the study (Turan et al., 2019).

We would like to thank Dustin Heller, Philipp Lehwalder, Jonas Müller, Steven Pohl for their work on programming the initial version of Alfie by transferring the Moral Choice Machine.


  • T. Bolukbasi, K. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai (2016) Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. In Proceedings of Neural information Processing (NIPS), USA, pp. 4349–4357. Cited by: §1.
  • A. Caliskan, J. J. Bryson, and A. Narayanan (2017) Semantics derived automatically from language corpora contain human-like biases. Science 356 (6334), pp. 183–186. Cited by: §1.
  • D. Cer, Y. Yang, S. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al. (2018) Universal sentence encoder. arXiv preprint arXiv:1803.11175. Cited by: §1, §2.
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pp. 4171–4186. Cited by: §1, §2.
  • S. Jentzsch, P. Schramowski, C. Rothkopf, and K. Kersting (2019) Semantics derived automatically from language corpora contain human-like moral choices. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society (AIES), pp. 37–44. Cited by: §1.
  • N. Reimers and I. Gurevych (2019) Sentence-bert: sentence embeddings using siamese bert-networks. In

    Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, EMNLP

    pp. 3980–3990. Cited by: §2.
  • P. Schramowski, W. Stammer, S. Teso, A. Brugger, F. Herbert, X. Shao, H. Luigs, A. Mahlein, and K. Kersting (2020)

    Making deep neural networks right for the right scientific reasons by interacting with their explanations

    Nature Machine Intelligence 2 (8), pp. 476–486. Cited by: §3.
  • P. Schramowski, C. Turan, S. Jentzsch, C. Rothkopf, and K. Kersting (2019) BERT has a moral compass: improvements of ethical and moral values of machines. arXiv preprint arXiv:1912.05238. Note: Cited by: §1, §2.
  • P. Schramowski, C. Turan, S. Jentzsch, C. Rothkopf, and K. Kersting (2020) The moral choice machine. Frontiers in Artificial Intelligence 3, pp. 36. Note: Cited by: §1, §2.
  • C. Turan, K. D. Neergaard, and K. Lam (2019) Facial expressions of comprehension (fec). IEEE Transactions on Affective Computing. Cited by: §3.