Machine Learning and Social Robotics for Detecting Early Signs of Dementia

09/05/2017 ∙ by Patrik Jonell, et al. ∙ KTH Royal Institute of Technology Karolinska Institutet 0

This paper presents the EACare project, an ambitious multi-disciplinary collaboration with the aim to develop an embodied system, capable of carrying out neuropsychological tests to detect early signs of dementia, e.g., due to Alzheimer's disease. The system will use methods from Machine Learning and Social Robotics, and be trained with examples of recorded clinician-patient interactions. The interaction will be developed using a participatory design approach. We describe the scope and method of the project, and report on a first Wizard of Oz prototype.



There are no comments yet.


page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With an increasing number of elderly in the population, dementia is expected to increase. Dementia disorders and related illnesses such as depression in the aging population are devastating for the quality of life of the afflicted individuals, their families, and caregivers. Moreover, the increase in such disorders imply enormous costs for the health sectors of all developed countries. There is still no cure available. However, studies, e.g. [18], show that it is possible to reduce the risk for dementia in later life through a number of preventative measures. Early diagnoses and prevention measures are thus key to counteract dementia.

Diagnoses of dementia are now made by teams of expert clinicians with standardized neuropsychological tests (Section 4) as well as EEG and MRI examinations. A number of additional factors in the behavior are taken into regard apart from the actual test scores and measurements (Section 6). Using these methods, dementia in early stages are often missed, since the time intervals between the sessions usually are quite large, and symptoms often come and go over time. Once a diagnosis has been made, the caring medical expert typically prescribes a specific therapeutic intervention of the type of memory training. Training sessions are often scarce and require the presence of a medically trained person.

The main goal of the multi-disciplinary research described in this position paper, is to develop an embodied agent (Section 3) capable of interacting with elderly people in their home, assessing their mental abilities using both standardized test and analysis of their non-verbal behavior, in order to identify subjects at the first stages of dementia.

The contributions of such a system to the diagnostics and treatment of dementia would be twofold:

  1. Firstly, since the proposed embodied system would enable tests to be carried out much more often, at the test subject’s home, subtle and temporarily appearing symptoms might be detected that are missed in regular screenings at the clinic. Moreover, the computerized test procedure makes it possible to collect quantitative measurements of the subject’s cognitive development over time, enabling a clinician overseeing the test process to base their diagnosis on rich statistical data instead of few, more qualitative, test results. All this will make it possible to introduce preventive therapy much earlier.

  2. Secondly, after a mild cognitive impairment or early dementia diagnosis, the proposed system may also be used for patients to train regularly at home as a complement to sessions with an expert, greatly enhancing the efficiency of the therapy.

The research efforts needed to achieve these goals span over such diverse research areas of Social Robotics, Geriatrics, and Machine Learning, and include:

  1. a user- and caregiver-friendly embodied agent that carries out established neuropsychological tests. This includes the development of an adaptive and sustainable dialogue system, with elements of gamification that encourages the subject to carry out the tests as effectively as possible (Section 5).

  2. pioneering Machine Learning solutions to achieve diagnostics from the user’s performance in the neuropsychological tests, as well as their non-verbal behavior during the test (Section 7).

  3. systematic evaluation of the developed systems in both large and focused social groups, using feedback from each evaluation to guide further scientific research and design.

Research efforts in this direction have been made before, e.g. [23]. The novelty of the present project with respect to this work is that we propose a more advanced interaction and a more sophisticated embodiment that supports, among other things, mutual gaze and shared attention.

2 Survey of related work

Several efforts have been made to investigate the acceptance of social robots by the elderly [13], some of them with a particular focus on home environments [9, 27]. For example, through a series of questionnaires and interviews to understand the attitudes and preferences of older adults for robot assistance with everyday tasks, Smarr et al. [27] found that this user group is quite open for robots to perform a wide range of tasks in their homes. Surprisingly, for tasks such as chores or information management (e.g., reminders and monitoring their daily activity), older adults reported to prefer robot assistance over human assistance.

One of the main motivations of developing assistive robots for the elderly is the fact that this technology can promote longer independent living [9]. However, in-home experiments are still scarce because of privacy concerns and limited robot autonomy. One of the few exceptions is the work by de Graaf et al. [11], who investigated reasons for technology abandonment in a study where 70 autonomous robots were deployed in people’s homes for about six months. Regardless of age (their participant pool ranged from 8 to 77 years), they found that the main reasons why people stopped using the robot were lack of enjoyment and perceived usefulness. These findings indicate that involving the target users from the early stages of development can be crucial for the success and acceptance of social robots.

While most of the human-robot interaction studies with older adults have been conducted with neurotypical participants [31, 3], a few authors have included patients with dementia in their research. Sabanovic et al. [24], for instance, evaluated the effects of PARO, a seal-like robot, in group sensory therapy for older adults with different levels of dementia. The results of a seven-week study indicate that the robot’s presence contributed to participants’ higher levels of engagement not only with the robot but also with the other people in the environment. Similar findings were obtained by Iacono and Marti [14], who compared the effects of the presence of a PARO robot and a stuffed toy during a group storytelling task. Furthermore, the authors found that while in the presence of the robot, participants’ stories were much more articulated in terms of number of words, characters and narrative details.

Perhaps most similar to our work both in terms of the interaction modality (language-based) and condition of the user group (older adults diagnosed with dementia), Rudzicz et al. [23]

conducted a laboratory Wizard of Oz experiment to evaluate the challenges in speech recognition and dialogue between participants and a personal assistant robot while performing daily household tasks such as making tea. Their results suggest that autonomous language-based interactions in this setting can be challenging not only because of speech recognition errors but also because robots will often need to proactively employ conversational repair strategies during moments of confusion.

3 The Furhat dialogue system

Figure 1: Furhat and one of the authors performing a clock test (Section 4.1).

The project will make use of the robot platform Furhat [1] (Figure 1). Furhat is a unique social robotic head based on back-projected animation that has proven capable of exhibiting verbal and non-verbal cues in a manner that is in many ways superior to on-screen avatars. Furhat also offers advantages over mechatronic robot heads in terms of high facial expressivity, very low noise level and fast animation allowing e.g. for highly accurate lip-synch. Since the face is projected on a mask, the robot’s visual appearance can be easily modified e.g. to be more or less photo realistic or cartoonish, by changing the projection and/or the mask. This feature will be exploited in the participatory design study (Section 5.2).

The Furhat robot includes a multi-modal dialog system, implemented using the IrisTK framework [26]. It uses statecharts as a powerful formalism for complex, reactive, event-driven spoken interaction management. It also includes a 3D situation model that makes it possible to handle situated interaction, including multiple users talking to the system and objects being discussed. A customizable Wizard of Oz mode as well as API:s for external control makes it straightforward to set up supervised experiments and data collection studies when a fully autonomous system can not be implemented.

4 Clinical assessment of cognitive function

As described in the introduction, dementia is detected through clinical cognitive assessments. They are carried out under supervision of a clinician and evaluate attention, memory, language, visuospatial/perceptual functions, psycho- motor speed, and executive functions through a wide range of tests. Common and valid tests include learning and remembering a wordlist, e.g. Rey Auditory Verbal Learning Test (RAVLT) [32], visuo-spatial and memory abilities, e.g. Rey-Osterrieth Complex Figure Test (ROCF) [19], and test of mental and motor speed, e.g. Wechsler Adult Intelligence Scale (WAIS) subtest Coding [15].

Speech and language assessment includes, e.g., tests of picture naming [17], word fluency [29], and aspects of high-level language comprehension [2].

4.1 Montreal Cognitive Assessment test (MoCA)

The Montreal Cognitive Assessment (MoCA) [21] is a brief test that evaluates several of the cognitive domains mentioned above in a time-effective manner. Visuospatial and executive functions are here assessed using trail making, figure copying and clock drawing tasks. Language is assessed using object naming, sentence repetition, phonemic fluency and abstraction tasks. Attention is assessed using digit repetition, target detection and serial subtraction tasks. Memory is assessed using a delayed verbal recall task after initial learning trials. Temporal and spatial orientation is also assessed.

MoCA has proven robust in identifying subjects with mild cognitive impairment (MCI) and early Alzheimer’s disease (AD) and in distinguishing them from healthy controls, thereby becoming an important screening tool for clinicians all over the world [16]. Due to an increasing interest in using MoCA as a monitoring tool and the need of minimizing practice effects associated with repeated assessment, alternate forms of the test have also been developed [6].

5 Automating assessment of cognitive function

We will in this project implement clinical neuropsychological assessments with the Furhat system (Section 3). The Furhat agent will take the clinician’s role, supervising and driving the test procedure. The assessments with Furhat will take place in the user’s home and complement the regular assessments at the memory clinic.

5.1 Automating the Montreal Cognitive Assessment test (MoCA)

We employ the Furhat system to implement the Montreal Cognitive Assessment test (MoCA). The intent is replicate the typical interaction between patient and clinician during the administration of the test, with the robot acting as the clinician.

In order to rapidly get feedback from participants, a Wizard of Oz [7] prototype has been developed where participants can interact with the embodied agent. The prototype consists of a Furhat robot and a touch-enabled tabletop computer (Figure 1) connected to a remote wizarding interface. The system is set up to run through MoCA as described above.

The prototype follows a scripted interaction, but also gives the wizard a large set of additional utterances in order to handle small deviations and make the interaction more life-like. These include typical conversational devices, such as affirmations, discourse markers, and the ability to add continuity or “flow” to the interaction, such as “ok, let’s move on to the next section”. The interaction is primarily centered around speech, which is realized by converting pre-set text prompts into speech output via a built in text-to-speech (TTS) system. However, some tasks require the participant to use the touch table in order to, for example, draw an image or recall the name of certain objects displayed on the screen. Besides providing the wizard with a set of utterances, the prototype also provides the wizard with the ability to perform certain facial expressions, head nods and sharing attention towards a given point. Using all of these modalities simultaneously allows the wizard to convey a highly engaging and believable embodiment of a clinician who can understand the user’s actions and responses, and respond appropriately, just as a real clinician would.

The wizard interface consists of “buttons” on a standard computer screen (Figure 2) that can be triggered from keys on its keyboard. These include the aforementioned pre-set text prompts which are realized as a TTS voice, as well as dedicated keys for facial expressions, head nods, and head movement to look at the touch screen. The wizard can hear participants via a microphone, and see them through a glass partition or via a webcam.

Figure 2: Wizard of Oz interface to the first Furhat system prototype.

5.2 Future Work: Participatory Design of the dialogue system

A series of workshops, each lasting 1-2 hours, will be conducted with around 10 participants.

These workshops will be based on the concept of Participatory Design, wherein the various stakeholders in the project will participate in the design of both the physical robot and the content of the proposed interactions. Stakeholders include potential users from the target demographic (volunteers and patients), clinicians with relevant expertise (KI), and the robot/interaction researchers (KTH). Specific activities during the workshops will include:

  1. introductory interaction with the robot system,

  2. discussion/feedback of various aspects of system, e.g. look, sound, content,

  3. discussion of volunteers’ and clinicians’ preferences/expectations/concerns regarding system.

The process will be iterative: after each workshop session, the researchers will integrate ideas generated during the workshop into a new version of the system, which they will demonstrate at the subsequent session, which the stakeholders can then re-evaluate.

Interactions with the prototype will be recorded in order to improve the current system, but also in order to collect data used for building the automated system in the future (Section 7).

6 Additional factors used in assessment of cognitive function

During the clinical assessment, the clinician also make use of additional information in the diagnostic process.

Different neurodegenerative diseases often differ in socioemotional presentation; one example is mutual gaze [28]. Studies have also shown changes in eye movements due to Alzheimer’s disease, with e.g. alterations in gaze behavior [20].

Spoken interaction offers clues to cognitive functioning that are not usually measured or rated in clinical assessment. The temporal organization of speech, such as the incidence and duration of pauses, as well as the overall speaking rate, may signal word-finding problems and difficulties with discourse planning [10]. The prosodic organization, including pitch and loudness, may be abnormal in right-hemisphere brain lesions and in striatal loop dysfunction as in Parkinson’s disease and related cognitive disorders [22].

6.1 Future Work: Structured description of the diagnostic process

To be able to incorporate such reasoning in the diagnostic process of the automatic dialogue system, we need to develop simplified and structured descriptions of what clinicians actually do during different kinds of assessment; pseudo-code-like scripts suitable for computer implementation.

These descriptions include both the verbal and non-verbal communication of the clinician during the cognitive assessment, but also the aspects of the patient’s verbal and non-verbal communication relevant to assessment of cognitive function.

7 Automating inference from additional factors

Figure 3: The diagnostics system will build upon a generative model of the patient’s non-verbal behavior perception and production process.

During the last decades, there has been a rapid development of methods for machine analysis of human spoken language. However, the information communicated between interacting humans is only to a small part verbal; humans transfer huge amounts of information through non-verbal cues such as face and body motion, gaze, and tone of voice, and these signals can be analyzed automatically to a certain degree [4]. As described in Section 6, dementia diagnostics relies heavily on such cues, and we aim to equip the system with the capability to take non-verbal cues into account in the diagnostics.

We will model the cognitive processes of non-verbal communication in the human brain (Figure 3), on such a level that they explain the correlation between what the human perceives from the clinician’s communication, and what the human in turn communicates. The underlying condition of an observed human can then be inferred from the recorded interaction with the clinician.

Data will be collected during user studies (Section 5.2). Non-verbal signals of both clinician and evaluated human will be recorded using sensors such as Kinect human tracker, Tobii gaze detector, but also state-of-the-art techniques for extracting social signals from RGB-D video, e.g. facial action units [30], and speech sound, e.g. prosody, laughter and pauses [12].

7.1 Future Work: Development of Machine Learning methodology

The processes depicted in Figure 3

represent incredibly complex, non-smooth, and non-linear mappings and representations, which indicates that it will be suitable to use a deep neural network

[25] approach.

Our initial studies [5] show that it is beneficial to use generative, probabilistic deep models, in order to be able to incorporate prior information in a principled manner. Such information includes clinical expert knowledge and logic reasoning.

Moreover, deep probabilistic approaches, such as [8], provide both more interpretability and lowers the needed amount of training data – important aspects for a diagnostics method.

8 Conclusions

This position paper presented the EACare project, where we aim to develop a system with an embodied agent, that can carry out neuropsychological clinical tests and detect early signs of dementia from both the test results and from the user’s non-verbal behavior.

This is intended as a stand-alone system, which the user brings home, and which interacts with the user in between regular screenings at the clinic. Two different “spin-off products” could be

  1. a system without embodiment or dialogue generation, which only serves as decision support to a clinician during a neuropsychological screening at the clinic,

  2. a system which passively monitors the user’s communication behavior during daily activities, e.g. as a back-end to Skype.


The project described in this paper is funded by the Swedish Foundation for Strategic Research (SSF).


  • [1] Al Moubayed, S., Skantze, G., Beskow, J.: The Furhat back-projected humanoid head – lip reading, gaze and multiparty interaction. International Journal of Humanoid Robotics 10(1) (2013)
  • [2] Antonsson, M., Longoni, F., Einald, C., Hallberg, L., Kurt, G., Larsson, K., Nilsson, T., Hartelius, L.: High-level language ability in healthy individuals and its relationship with verbal working memory. Clin. Ling. Phonetics 30(12), 944–958 (2016)
  • [3] Beer, J.M., Takayama, L.: Mobile remote presence systems for older adults: Acceptance, benefits, and concerns. In: Int. Conf. Human-robot Interaction (2011)
  • [4] Burgoon, J.K., Magnenat-Thalmann, N., Pantic, M., Vinciarelli, A. (eds.): Social Signal Processing (2017)
  • [5]

    Bütepage, J., Black, M.J., Kragic, D., Kjellström, H.: Deep representation learning for human motion prediction and classification. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

  • [6] Costa, A.S., Reich, A., Fimm, B., Ketteler, S.T., Schulz, J.B., Reetz, K.: Evidence of the sensitivity of the MoCA alternate forms in monitoring cognitive change in early Alzheimer’s disease. Dement. Geriatr. Cogn. Disorders 37, 95–103 (2014)
  • [7] Dahlbäck, N., Jönsson, A., Ahrenberg, L.: Wizard of Oz studies – why and how. Knowledge-Based Systems 6(4), 258–266 (1993)
  • [8] Dai, Z., Damianou, A., Gonzalez, J., Lawrence, N.: Variational auto-encoded deep gaussian processes. In: Int. Conference on Learning Representations (2016)
  • [9] Forlizzi, J., DiSalvo, C., Gemperle, F.: Assistive robotics and an ecology of elders living independently in their homes. Human-Comp. Interact. 19(1), 25–59 (2004)
  • [10] Gayraud, F., Lee, H.R., Barkat-Defradas, M.: Syntactic and lexical context of pauses and hesitations in the discourse of Alzheimer patients and healthy elderly subjects. Clinical Linguistics & Phonetics 25(3), 198–209 (2011)
  • [11] de Graaf, M., Ben Allouch, S., van Dijk, J.: Why do they refuse to use my robot?: Reasons for non-use derived from a long-term home study. In: International Conference on Human-Robot Interaction (2017)
  • [12] Gupta, R., Audhkhasi, K., Lee, S., Narayanan, S.: Paralinguistic event detection from speech using probabilistic time-series smoothing and masking. In: Interspeech (2013)
  • [13] Heerink, M., Krose, B., Evers, V., Wielinga, B.: Assessing acceptance of assistive social agent technology by older adults: the almere model. International Journal of Social Robotics 2(4), 361–375 (2010)
  • [14] Iacono, I., Marti, P.: Narratives and emotions in seniors affected by dementia: A comparative study using a robot and a toy. In: IEEE International Symposium on Robot and Human Interactive Communication (2016)
  • [15] Joy, S., Kaplan, E., Fein, D.: Speed and memory in the WAIS-III Digit Symbol-Coding subtest across the adult lifespan. Arch. Clinical Neuropsychology 19 (2004)
  • [16] Julayanont, Y.P., Phillips, N.A., Chertkow, H., Nasreddine, N.S.: Montreal Cognitive Assessment (MoCA): Concept and clinical review. In: Larner, A.J. (ed.) Cognitive Screening Instruments: A Practical Approach, pp. 111–151 (2013)
  • [17] Kaplan, E., Goodglass, H., Weintraub, S.: Boston Naming Test (1983)
  • [18] Kivipelto, M., Mangialaschec, F., Ngandu, T.: Can lifestyle changes prevent cognitive impairment? The Lancet Neurology 16(5), 338–339 (2017)
  • [19] Mitrushina, M., Satz, P., Chervinsky, A.B.: Efficiency of the recall on the Rey-Osterrieth Complex Figure in normal aging. Brain Dysfunction 3, 148–150 (1990)
  • [20] Molitor, J.R., Ko, P.C., Ally, B.A.: Eye movements in Alzheimer’s disease. Journal of Alzheimer’s Disease 44(1), 1–12 (2015)
  • [21] Nasreddine, N.S., Phillips, N.A., Bédirian, V., Charbonneau, S., Whitehead, V., Collin, I., Cummings, J.L., Chertkow, H.: The Montreal Cognitive Assessment, MoCA: A brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society 53, 695––699 (2005)
  • [22] Rektorova, I., Mekyska, J., Janousova, E., Kostalova, M., Eliasova, I., Mrackova, M., Berankova, D., Necasova, T., Smekal, Z., Marecek, R.: Speech prosody impairment predicts cognitive decline in Parkinson’s disease. Parkinsonism & Related Disorders 29, 90–95 (2016)
  • [23] Rudzicz, F., Wang, R., Begum, M., Mihailidis, A.: Speech interaction with personal assistive robots supporting aging at home for individuals with Alzheimer’s disease. ACM Transactions on Accessible Computing 7(2) (2015)
  • [24] Sabanovic, S., Bennett, C.C., Chang, W.L., Huber, L.: PARO robot affects diverse interaction modalities in group sensory therapy for older adults with dementia. In: IEEE International Conference on Rehabilitation Robotics (2013)
  • [25] Schmidhuber, J.: Deep learning in neural networks: An overview. arXiv:1404.7828v4 (2014)
  • [26] Skantze, G., Al Moubayed, S.: IrisTK: a statechart-based toolkit for multi-party face-to-face interaction. In: ACM Int. Conf. Multimodal Interaction (2012)
  • [27] Smarr, C.A., Mitzner, T.L., Beer, J.M., Prakash, A., Chen, T.L., Kemp, C.C., Rogers, W.A.: Domestic robots for older adults: Attitudes, preferences, and potential. International Journal of Social Robotics 6(2), 229–247 (2014)
  • [28] Sturm, V.E., McCarthy, M.E., Yun, I., Madan, A., Yuan, J.W., Holley, S.R., Ascher, E.A., Boxer, A.L., Miller, B.L., Levenson, R.W.: Mutual gaze in Alzheimer’s disease, frontotemporal and semantic dementia couples. Social Cognitive and Affective Neuroscience 6(3), 359–367 (2011)
  • [29] Tallberg, I.M., Ivachova, E., Jones Tinghag, K., Östberg, P.: Swedish norms for word fluency tests: FAS, animals and verbs. Scandinavian Journal of Psychology 49(5), 479–485 (2008)
  • [30] Valstar, M., Pantic, M.: Fully automatic facial action unit detection and temporal analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (2006)
  • [31] Wada, K., Shibata, T.: Living with seal robots – its sociopsychological and physiological influences on the elderly at a care house. IEEE Transactions on Robotics 23(5), 972–980 (2007)
  • [32] Woodard, J.L., Dunlovsky, J., Salthouse, T.A.: Task decomposition analysis of intertrial free recall performance on the Rey Auditory Verbal Learning Test in normal aging and Alzheimer´s disease. Journal of Clinical and Experimental Neuropsychology 21, 666–676 (1999)