A categorisation and implementation of digital pen features for behaviour characterisation

10/01/2018 ∙ by Alexander Prange, et al. ∙ 0

In this paper we provide a categorisation and implementation of digital ink features for behaviour characterisation. Based on four feature sets taken from literature, we provide a categorisation in different classes of syntactic and semantic features. We implemented a publicly available framework to calculate these features and show its deployment in the use case of analysing cognitive assessments performed using a digital pen.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The research described in this paper is motivated by the development of applications for the behaviour analysis of handwriting and sketch input. Our goal is to provide other researchers with a reproducible, categorised set of features that can be used for behaviour characterisation in different scenarios. We use the term feature to describe properties of strokes and gestures which can be calculated based on the raw sensor input from capture devices, such as digital pens or tablets.

In this paper, a large number of features known from the literature are presented and categorised into different subsets. For better understanding and reproducibility we formalised all features either using mathematical notations or pseudo code and summarised them in the appendix section of this paper. Furthermore, we created a open-source python reference implementation of these features, which is publicly available111Download is available at GitHub https://github.com/DFKI-Interactive-Machine-Learning/ink-features.

The presented ink features can be used in a variety of ways. Most commonly they are used to perform character and gesture recognition based on machine learning techniques. Here we describe their use for automated behaviour characterisation in the use case of cognitive assessments. Traditionally these tests are performed using pen and paper with manual evaluation by the therapist. We show how ink features can be used in that context to provide additional feedback about the cognitive state of the patient. Finally, we explain how digital ink can be used as an input modality in multimodal, multisensor interfaces.

2 Digital Ink

Over the past few years the availability of digital pen hardware has increased drastically, and there is a wide variety of devices to choose from if dealing with handwriting analysis. Several different technologies are used to record handwriting, e.g., accelerometer-based digital pens convert the movement of the pen on the surface whereas active pens transmit their location, pressure and other functionalities to the built-in digitiser of the underlying device. Positional pens, most often encountered in graphic tablets, have a surface that is sensitive to the pen tip. A special, nearly invisible, dot pattern can be printed on regular paper, so that camera-based pens detect where the stylus contacts the writing surface.

In this work we focus on the similarities between the most commonly used hardware devices for sketch recognition. As not all technologies deliver the same type of sensor data, we identified a subset that is covered by the majority of input devices. We refer to it as digital ink, a set of time-series data containing coordinates and pressure at each timestamp. For the remainder of this paper we use the follwing notation:


A series of sample points between a pen down and pen up event is called a stroke and can be represented as a series of tuples


where represents the x coordinate of the i-th sample point within the series, with . The tuple itself may be referenced by . Timestamps are measured in milliseconds, it is insignificant if they are absolute or relative to the first point.

3 Features

We refer to individual, measurable properties or charateristics of digital ink as features. Features are calculated directly from the input sample points and represented by a numerical value. Therefore a feature can be seen as a function:


Depending on the feature,

can be a set of strokes (gesture level), a single stroke (stroke level) or a subset of sample points. Usually a vector of features


is extracted from the input gesture and can then be used in a classifier.

3.1 Feature Sets

Traditionally stroke level features are most often used for statistical gesture recognition. One of the most prominent set of features was presented by Dean Rubine in 1991 [35]. It contains a total of 13 features that have been designed to reflect the visual appearance of strokes in order to be used in a gesture recogniser. More recent work by Don J.M. Willems and Ralph Niels [42] defines a total of 89 features using formal mathematical descriptions and algorithms. Adrien Delaye and Eric Anquetil introduced the HBF49 Feature Set [11], which contains 49 features and was specifically designed for different sets of symbols and as reference for evaluating symbol recognition systems. In previous work we used 14 features described by Sonntag et al. [38] to distinguish between written text and other types of gestures in online handwriting recognition.

3.1.1 Common Features

Due to the nature of sketched or handwritten input there are a few features and concepts that the above mentioned publications have in common. The most prominent example is the length of a stroke, here we use the Euclidean distance to measure the distance between sampling points.

Given two sampling points and their distance is calculated as follows:


The length of a stroke (a squence of sampling points) is given by the sum of distances between the sampling points:


A bounding box (see figure 1) around a set of strokes describes the smallest enclosing area containing the entire set of points. Its size is determined by the minimum and maximum sample points:


The area of the bounding box is then given by:

Figure 1: The rectangular bounding box (cyan) around a set of strokes (black) given by a set of sample points (red).

3.2 Feature Categories

We have chosen the above described sets of features which are formalised in a reproducable way. As the features describe different aspects of the digital ink we decided to sort them into different categories. We distinguish each feature to be either a syntactic or semantic feature. Syntactic features reflect task independent characteristics about the geometry of the input, whereas semantic features describe closely task related knowledge. In this work we introduce 7 categories of syntactic features:

3.2.1 Angle Based

Angle based features are calculated from angles between sample points (e.g., curvature, perpendicularity, rectangularity).


3.2.2 Space Based

Space based features depend on the distances between samples (e.g., convex hull area, principal axes, compactness). The area of a gesture is usually derived from the area of the convex hull around all sample points, which can be calculated using Graham’s algorithm [18].


With the area of the convex hull and the length of its perimeter we get a feature called compactness. The closer the sample points are together, the smaller the compactness will be. Handwritten texts, e.g., will have a larger compactness than geometric symbols, such as rectangles [42].


Related to the bounding box of a figure, we use its side length to calculate the eccentricity. Note that we are using the co-ordinate axes instead of the principal axes (which are rotated with the pen gesture).


3.2.3 Centroidal

Centroidal features describe relations between sample points and the overall centroid (e.g., centroid offset, deviation, average radius).

Using the dimensions of the bounding box we calculate the center point :


The average distance of sample points to the center point is another feature:


3.2.4 Temporal

Temporal features are derived from timestamps of sample points (e.g., duration, speed, acceleration). The velocity between sample points is defined as:


From which the feature of average velocity is calculated:


The acceleration is calculated as follows:


And the average acceleration is then given by:


3.2.5 Pressure Based

Pressure based features are computed from hardware sensors capturing applied pressure (e.g., average pressure, standard deviation). The most intuitive and obvious features are the average pressure and the standard deviation in pressure:


3.2.6 Trajectory Based

Trajectory based features reflect the visual appearance of strokes (e.g., closure, average stroke direction).

The path length from one sample point to another is denoted and is calculated as follows:


is the total length of . Whereas the first to last point vector and its length is:


Typical trajectory based features are closure and average direction:


3.2.7 Meta

Meta features are higher level features and relations between components (e.g., number of strokes, inter-connections, crossings, straight line ratio). One intuitive example would be the number of straight lines () or to be more precise the number of straight segments. We use a sliding window with a threshold to calculate sets of connected points which have minimal curvature between them. The size sliding window and threshold can be either dynamically adjusted to the length of the stroke or be a fixed value depending on the task.

The feature called connected components ()  [42] describes the number of segments which are interconnected with other segments, e.g., crossings between strokes.

Geometric Features
Angle Based Space Based

Circular Variance

Stroke Length
Rectangularity Gesture Length
Curvature Perimeter Length
Average Curvature Compactness
SD of Curvature Eccentricity
Angles after Resampling Principal Axes
Cosine of First to Last Point Vector First Point X
Sine of First to Last Point Vector First Point Y
Cosine Initial Vector Last Point X
Sine Initial Vector Last Point Y
Bounding Box Diagonale Angle First to Last Point Vector
Perpendicularity 2D Histogramm
Average Perpendicularity Ratio of Axes
SD of Perpendicularity Ratio of Principal Axes
Signed Perpendicularity Length of First Principal Axis
K-Perpendicularity SD of Stroke Length
Maximum k-Angle Sample Ratio Octants
Absolute Directional Angle Convex Hull Area
Relative Angle Histogram Convex Hull Compactness
Principal Axis Orientation (sin) Distance of First to Last Point
Principal Axis Orientation (cos) Average Length of Straight Lines
Maximum Angular Difference Initial Horizontal Offset
Circular Variance Final Horizontal Offset
Sum of Absolute Values of Angles Initial Vertical Offset
Sum of Angles Final Vertical Offset
Sum of Squared Angles
Macro Perpendicularity
Average Macro Perpendicularity
SD of Macro Perpendicularity
Absolute Curvature
Squared Curvature
Centroid Offset
Average Centroidal Radius
SD of Centroidal Radius

Hu moments

Temporal Features
Maximum Speed (Squared)
Duration of Gesture
Pen Up/Pen Down Ratio
Average Velocity
SD of Velocity
Maximum Velocity
Average Acceleration
SD of Acceleration
Maximum Acceleration
Maximum Deceleration

Pressure Based Features
Average Pressure
SD of Pressure

Trajectory Features
Inflexion X
Inflexion Y
Proportion of Downstroke Trajectory
Ratio between Half-Perimeter and Trajectory
Average Stroke Direction
Cup Count
Last Cup Offset
First Cup Offset
Number of Pen Down Events
Sin Chain Code
Cos Chain Code
SD of Stroke Direction

Meta Features
Number of Strokes
Number of Straight Lines
SD of Straight Lines
Straight Line Ratio
Largest Straight Line Ratio
Number of Connected Components
Number of Crossings

Table 1: Categorisation of syntactic features into classes.

3.3 Semantic/Task Based Features

Depending on the task, additional features can be deduced from the task itself. As these features describe higher level semantic concepts about the sketched contents, we often refer to them as semantic features. Semantic features highly depend on the given context and therefore vary noticeably between different tasks. Such features usually cannot be transfered easily to other tasks, as they are often hard-coded per task.

Figure 2 shows the visualisation of a selected semantic feature set in the context of the Clock Drawing Test, a widely used pen and paper screening test used for more than 50 years as a screening tool for cognitive impairment. Participants are asked to draw a clock face with the time set to 10 past 11 o’clock. The drawn clock is then examined by a trained physician and rated based on a predefined scoring scheme, reflecting the visual appearance and integrity of the clock using a numerical score. In this example we deduced the following features based on the traditional scoring system:

  • denotes the center point of the clock (centroid), the closer it is to the center of the clock’s circle, the more points are awarded.

  • and represent the lengths of the hour and minute hands respectively. If the clock is well drawn, the hour hand should be shorter than the minute hand.

  • The angle between the hour and minute hands is denoted as , together with the orientation of the hands it can be used to determine if the correct time was set.

  • is the displacement of clock face digits relative to their ideal location. In this example it is the vertical offset of digit number 9 to its correct center position.

Figure 2: Visualisation of semantic features in the context of the Clock Drawing Test.

4 Related Work

One of the first reproducible ink feature sets was presented by Rubine [35] in 1991. He described how to use these features in a trainable single-stroke recogniser for gestures. Willems and Niels [42] presented a set of 89 ink features which they used for iconic and multi-stroke gesture recognition. The HBF49 feature set was presented by Delaye and Anquetil [11] to be used in online symbol recognition. Sonntag et al. [38] used ink features to distinguish between writing and sketching in online mode detection of handwriting input.

Ink features can not only be used for gesture or sketch recognition, but also for characterisation of handwriting behaviour. Drotar et al. [13] have shown that the analysis of in-air movement can be used as a marker for Parkinson’s disease. The kinematic analysis of handwriting movements can be used to distinguish between certain forms of dementia [36].

Digitalising popular existing cognitive assessments, such as the Clock Drawing Test (CDT), has been topic of recent debate. There are clear benefits resulting from digitalisation, such as increased diagnostic accuracy [29]. Davis et al. only recently presented their work on how to infer congitive status from subtle behaviours observed in digital ink [10]. Based on such ink features machine learning models can be trained [39], which can also be explained by existing, validated scoring schemes [40]. Examples of complex digitalised cognitive assessments include the Rey-Osterrieth Complex Figure test [7], which can be used for various purposes, such as diagnosing the periphery [8].

Behaviour characterisation can be also used in different settings, e.g., to gain feedback about cognitive load of the writer. Luria and Rosenblum [27] conducted a study to determine the effect of mental workload on handwriting behaviour. Yu et al. [46] showed that online writing features can be used for mental workload classification, such as congitive load evaluation [45]. Ink features can be also used in multimodal scenarios [33], where they may enhance the prediction of cognitive and emotional states [47].

5 Use Case: Cognitive Assessments

On use case where we apply our feature set is the analysis of handwriting behaviour for dementia screening tools in the Interakt project [37]. Dementia is a general term for a decline in mental ability severe enough to interfere with daily life. In 2018, the Alzheimer’s Association documented that approximately 10-20% of the population over 65 years of age suffer from some form of dementia [2]. Screening tests for dementia have been the subject of recent debate because there are limitations when they are conducted using pen and paper. For example, the collected material is monomodal (written form) and there is no direct digitalisation for further and automatic processing, the results can be biased. We selected the assessments based on feedback from domain experts and a recent market analysis of existing, most widely used, cognitive assessments conducted by Niemann et al. [31]. Our selected and implemented paper and penicl tests are shown in table 2, namely Age-Concentration (AKT) [17], Clock Drawing Test (CDT) [16], CERAD Neuropsychological Battery [28], Dementia Detection (DemTect) [23], Mini-Mental State Examination (MMSE) [15], Montreal Cognitive Assessment (MoCA) [30], Rey-Osterrieth Complex Figure (ROCF) [7], and Trail Making Test (TMT) [34]. The selection of the tests accounts for a variety of patient populations and test contexts.

name time needed pen input symbols
AKT [17] 15 min 100% cross-out
CDT [16] 2-5 min 100% clock, digits, lines
CERAD [28] 30-45 min 20% (see figure 6)
DemTect [23] 6-8 min 20% numbers, words
MMSE [15] 5-10 min 9% pentagrams
MoCA [30] 10 min 17% clock, digits, lines
ROCF [7] 15 min 100% circles, rectangles, triangles, lines
TMT [34] 3-5 min 100% lines
Table 2: Comparison of the most widely used cognitive assessments

One of the most prominent example is the internationally used Mini-Mental State Examination (MMSE) [15], a 30-point questionnaire, which is extensively used in medicine and research to measure cognitive impairment. Depending on the experience of the physician and the cogntive state of the patient the administration of the test takes between 5 and 10 minutes and examines functions including awareness, attention, recall, language, ability to follow simple commands and orientation [41]. Due to its standardisation, validity, short administration period and ease of use, it is widely used as a reliable screening tool for dementia [19]. The MMSE also includes several tasks which involve handwriting input by the participant, e.g., writing a complete sentence and copying a geometric figure.

Figure 3: Clock Drawing Test (CDT).

The Clock Drawing Test (CDT) [16] is another popular cognitive assessment, where the patient is asked to draw a clock with a specified time on a piece of paper, see figure 3. Based on the completeness and appearance of the clockface and the arrangement of the digits a score is calculated. The CDT and MMSE are perfect examples for illustrating the two categories of traditional paper and pencil cognitive testing. There are assessments, like the CDT, which rely solely on handwriting and sketch input to produce a score, whereas there are others, such as the MMSE, which also include other modalities, such as speech for instance. Depending on the assessment, the handwriting input has a different weight for the overall scoring of the test. Table 2 shows the absolute percentages of the test questions where the a pen is used to answer them. Tasks in the MMSE containing pen input include writing a sentence and copying a figure of two overalapping pentagrams (see figure 4). Out of 22 possible points in the scoring of the MMSE, the pen input related task add up to 2 points, resulting in an overall 9% of the entire test to be scored through analysis of pen input. Regarding task design the Montreal Cognitive Assessment (MoCA) [30] is comparable with the MMSE and CDT, e.g., it also includes copying a figure and drawing a clock. The CERAD Neuropsychological Battery [28] is a collection of several tests (including the MMSE and TMT), where amongst others the subject has to copy several shapes depicted in figure 6. In the Trail Making Test (TMT) [34] the subject has to connect numbers and letters in ascending order. A more complex example of a test that is rated entirely based on pen input is the Rey-Osterrieth complex figure test (ROCF) [7], where subjects are required to copy the figure three times, once while looking at the template, once directly after that, but without seeing the template, and once from recall 30 minutes later. The Age-Concentration Test (AKT) [17] asks subjects to cross out a specific shape from a set of similar, yet varying shapes in a limited amount of time. Handwritten words and digits are contained in the DemTect [23], where subjects translate numbers into words and vice versa.

Figure 4: Mini Mental Status Exam (MMSE): Copy pentagram figure task.
Figure 5: The Rey-Osterrieth complex figure (ROCF).
(a) Circle
(b) Diamond
(c) Rectangles
(d) Cube
(e) Pentagrams
Figure 6: Symbols used in the CERAD neuropsychological battery.

5.1 Symbols data set

Based on the design of sketching tasks in cognitive testing, we created a set of 11 gestures, which are commonly found in different cognitive assessments. We focused on the geometric shapes of which tests are composed, e.g., the Clock Drawing Test contains a circle (clockface) and lines (hands). The CERAD battery, MMSE and MoCA contain several shapes like pentagrams, diamonds and rectangles. Single shapes in turn compose parts of other assessments, such as the ROCF depicted in figure 7, which contains several sub-shapes, such as, triangles, rectangles, lines and circles. As depicted in figure 8 a total of 8 shapes were chosen from the most commonly used cognitive assessments: arrow, circle, rectangle, triangle, circle, diamond, overlapping rectangles, cube and pentragrams. We chose 3 additional gestures based on a previously conducted user study, where we asked participants to specify gestures that they would use to indicate that they are finished with the current handwriting task. Our symbol data set consists of 11 classes (shapes) with 100 samples per class. The 7 subjects have provided a total of 7700 handwritten samples.

Figure 7: The Rey-Osterrieth complex figure (ROCF) is composed of several sub-shapes.
(a) Arrow
(b) Circle
(c) Rectangle
(d) Triangle
(e) Diamond
(f) Rectangles
(g) Cube
(h) Pentagrams
Figure 8: Set of gestures chosen from cognitive assessments.
(a) Checkmark
(b) Checkmarks
(c) Send Symbol
Figure 9: Set of symbols.

5.2 Interakt Architecture

In the Interakt use case the patient performs a digitalised cognitive assessment using a digital pen, which captures handwriting data in real-time. Figure LABEL:fig:arch shows the technical architecture, in which the digital ink is analysed using the previously described syntactic and semantic features. Completing the cognitive assessment results in raw pen data being streamed to the backend service, where a document is created and indexed based on the performed test. This document contains semantic information about the areas of the test (e.g., text fields, figures etc.) and the digital ink data. We store the documents in a file format called XForm, which is either a JSON or XML based structured description of the test and the captured ink. With this format a visual representation of the completed test can be reconstructed and the doctor can retrace the patient’s input using a playback functionality that replays the strokes in real-time as they were recorded. Based on the respective assessment different sets of syntactic and semantic features are used by the pen data processing server to analyse the handwritten and sketched contents of the test and deliver aggregated evaluation results that can be presented to the therapist. Depending on the situation the analysis of the assessment may also involve additional patient data or previous test results, which are obtained from the data warehouse. The processed and evaluated assessment is finally also stored in the data warehouse, from where the doctor can access the results of the assessment in the therapist interface. The entire evaluation process takes place in real-time.

Figure 10: System Architecture in the dementia screening use case.

6 Multimodality

In this section, we describe how additional modalities, beyond pen-based features, can help in the analysis of observed user behaviour, when interacting with a tablet computer and relying on the built-in sensors only. For instance, researchers in the medical domain investigated “observable differences in the communicative behaviour of patients with specific psychological disorders” [12], e.g., the detection of depression from facial actions and vocal prosody [9], which can be realised using the camera and microphone of a tablet device. Including additional modalities can help with the disambiguation of signal- or semantic-level information in one error-prone recognition modality by using partial information supplied by another modality [32]. We consider the digital pen signal as primary modality for behaviour characterisation in combination with additional sensors and modalities as indicated in Figure 11: eye tracking and facial expression analysis via the video signal of the front-facing camera, natural speech processing via the built-in microphone and additional sensor inputs of modern tablet devices.

Figure 11: Multimodal interaction architecture on mobile device.

6.1 Eye Tracking

Eye tracking can improve human behaviour analysis, because human gaze is related to cognitive processes. For instance, gaze trajectories can be used for inferring a user’s task [44], for differentiating between novices and experts [5] and to model human visual attention [6]. Further, the number and duration of fixations and the transitions between different contents provide information about a user’s cognitive engagement [26] and its cognitive load [24]. To augment pen signals, it is interesting that users pro-actively control their gaze behaviour to gather visual information for guiding movements across different activities [22] including hand movements [25]. This relation suggests that pen and gaze signals can be analysed jointly for improving behaviour characterisation. For multisensory behaviour analysis on unmodified mobile devices, RGB-based eye tracking is most interesting, because it’s deployable using the built-in front-facing camera [43]. Compared to professional eye tracking equipment, the tracking quality is significantly lower [4]. However, the form-factor can be essential for certain use cases, e.g., in dementia day hospitals that require non-obtrusive devices due to the patient’s cognitive abilities.

6.2 Facial Expressions

OpenFace222https://github.com/TadasBaltrusaitis/OpenFace/ [3] is an open source toolkit for facial behaviour analysis using the stream of an RGB-webcam. It provides state-of-the-art performance in facial landmark and head pose tracking, as well as facial action unit recognition which can be used to infer emotions. These observations can be used for affective user interaction. Further, it enables webcam-based eye tracking.

6.3 Speech Signal

The openSMILE toolkit333https://audeering.com/technology/opensmile/ [14]

provides methods for speech-based behaviour analysis and is distributed under an open source license. It offers an API for low-level feature extraction from audio signals and pre-trained classifiers for voice activity detection, speech-segment detection and speech-based emotion recognition in real-time. The toolkit can be used on top of speech-based interaction frameworks to add a valence to users’ utterances.


This research is part of the Intera-KT project, which is supported by the German Federal Ministry of Education and Research (BMBF) under grant number 16SV7768.


  • [1] Almaz’n, J., Fornes, A., and Valveny, E. A non-rigid feature extraction method for shape recognition. In 2011 International Conference on Document Analysis and Recognition (Sept 2011), pp. 987–991.
  • [2] Alzheimer’s Association. 2018 Alzheimer’s disease facts and figures. Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association 14, 3 (Mar 2018), 367–429.
  • [3] Baltrusaitis, T., Zadeh, A., Lim, Y. C., and Morency, L. Openface 2.0: Facial behavior analysis toolkit. In 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018) (May 2018), pp. 59–66.
  • [4] Barz, M., Poller, P., and Sonntag, D. Evaluating Remote and Head-worn Eye Trackers in Multi-modal Speech-based HRI. In Proceedings of the Companion of the 2017 ACM/IEEE International Conference on Human-Robot Interaction (New York, NY, USA, 2017), B. Mutlu, M. Tscheligi, A. Weiss, and J. E. Young, Eds., ACM, pp. 79–80.
  • [5] Bednarik, R. Expertise-dependent visual attention strategies develop over time during debugging with multiple code representations. Int. J. Hum.-Comput. Stud. 70, 2 (Feb. 2012), 143–155.
  • [6] Borji, A., and Itti, L.

    State-of-the-Art in Visual Attention Modeling.

    IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (jan 2013), 185–207.
  • [7] Canham, R., Smith, S., and Tyrrell, A. Automated scoring of a neuropsychological test: The Rey Osterrieth Complex Figure. IEEE COMPUTER SOC, 2000, pp. A406–A413.
  • [8] Coates, D. R., Wagemans, J., and Sayim, B. Diagnosing the periphery: Using the rey–osterrieth complex figure drawing test to characterize peripheral visual function. In i-Perception (2017).
  • [9] Cohn, J. F., Kruez, T. S., Matthews, I., Yang, Y., Nguyen, M. H., Padilla, M. T., Zhou, F., and la Torre, F. D. Detecting depression from facial actions and vocal prosody. In 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (Sept 2009), pp. 1–7.
  • [10] Davis, R., Libon, D. J., Au, R., Pitman, D., and Penney, D. L. Think: Inferring cognitive status from subtle behaviors. AI Magazine 36, 3 (2015), 49–60.
  • [11] Delaye, A., and Anquetil, E. HBF49 feature set: A first unified baseline for online symbol recognition. Pattern Recognition 46, 1 (Jan. 2013), 117–130.
  • [12] DeVault, D., Artstein, R., Benn, G., Dey, T., Fast, E., Gainer, A., Georgila, K., Gratch, J., Hartholt, A., Lhommet, M., Lucas, G., Marsella, S., Morbini, F., Nazarian, A., Scherer, S., Stratou, G., Suri, A., Traum, D., Wood, R., Xu, Y., Rizzo, A., and Morency, L.-P. SimSensei Kiosk: A Virtual Human Interviewer for Healthcare Decision Support. In Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems (Richland, SC, 2014), AAMAS ’14, International Foundation for Autonomous Agents and Multiagent Systems, pp. 1061–1068.
  • [13] Drotár, P., Mekyska, J., Rektorová, I., Masarová, L., Smékal, Z., and Faundez-Zanuy, M. Analysis of in-air movement in handwriting: A novel marker for parkinson’s disease. Computer Methods and Programs in Biomedicine 117, 3 (2014), 405 – 411.
  • [14] Eyben, F., Weninger, F., Gross, F., and Schuller, B. Recent developments in opensmile, the munich open-source multimedia feature extractor. In Proceedings of the 21st ACM International Conference on Multimedia (New York, NY, USA, 2013), MM ’13, ACM, pp. 835–838.
  • [15] Folstein, M., Folstein, S., and McHugh, P. ”Mini-mental state”. A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research 12, 3 (Nov 1975), 189–198.
  • [16] Freedman, M., Leach, L., Kaplan, E., Winocur, G., Shulman, K., and Delis, D. Clock Drawing: A Neuropsychological Analysis. Oxford University Press, 1994.
  • [17] Gatterer, G., Fischer, P., Simanyi, M., and Danielczyk, W. The A-K-T (”Alters-Konzentrations-Test”) a new psychometric test for geriatric patients. Funct. Neurol. 4, 3 (1989), 273–276.
  • [18] Graham, R. L. An efficient algorithm for determining the convex hull of a finite planar set. Inf. Process. Lett. 1, 4 (1972), 132–133.
  • [19] Harrell, L. E., Marson, D., Chatterjee, A., and Parrish, J. A. The severe mini-mental state examination: A new neuropsychologic instrument for the bedside assessment of severely impaired patients with alzheimer disease. Alzheimer Disease & Associated Disorders 14, 3 (2000).
  • [20] Hu, M.-K. Visual pattern recognition by moment invariants. IRE Transactions on Information Theory 8, 2 (February 1962), 179–187.
  • [21] Impedovo, S., Pirlo, G., Modugno, R., and Ferrante, A. Zoning methods for hand-written character recognition: An overview. In 2010 12th International Conference on Frontiers in Handwriting Recognition (Nov 2010), pp. 329–334.
  • [22] Johansson, R. S., Westling, G., Bäckström, A., and Flanagan, J. R. Eye–Hand Coordination in Object Manipulation. Journal of Neuroscience 21, 17 (2001), 6917–6932.
  • [23] Kalbe, E., Kessler, J., Calabrese, P., Smith, R., Passmore, A. P., Brand, M., and Bullock, R. DemTect: a new, sensitive cognitive screening test to support the diagnosis of mild cognitive impairment and early dementia. International Journal of Geriatric Psychiatry 19, 2 (Feb 2004), 136–143.
  • [24] Korbach, A., Brünken, R., and Park, B. Differentiating Different Types of Cognitive Load: a Comparison of Different Measures. Educational Psychology Review (mar 2017), 1–27.
  • [25] Land, M., Mennie, N., and Rusted, J. The roles of vision and eye movements in the control of activities of daily living. Perception 28, 11 (1999), 1311–1328.
  • [26] Lemaignan, S., Garcia, F., Jacq, A., and Dillenbourg, P. From Real-time Attention Assessment to ”With-me-ness” in Human-Robot Interaction. The Eleventh ACM/IEEE International Conference on Human Robot Interaction (2016), 157–164.
  • [27] Luria, G., and Rosenblum, S. A computerized multidimensional measurement of mental workload via handwriting analysis. Behavior Research Methods 44, 2 (Jun 2012), 575–586.
  • [28] Morris, J., Mohs, R., Rogers, H., Fillenbaum, G., and Heyman, A. Consortium to establish a registry for Alzheimer’s disease (CERAD) clinical and neuropsychological assessment of Alzheimer’s disease. Psychopharmacol Bull. 24, 4 (1988), 641–52.
  • [29] Muller, S., Preische, O., Heymann, P., Elbing, U., and Laske, C. Increased Diagnostic Accuracy of Digital vs. Conventional Clock Drawing Test for Discrimination of Patients in the Early Course of Alzheimer’s Disease from Cognitively Healthy Individuals. Front Aging Neurosci 9 (2017), 101.
  • [30] Nasreddine, Z. S., Phillips, N. A., et al. The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment. Journal of the American Geriatrics Society 53, 4 (2005), 695–699.
  • [31] Niemann, M., Prange, A., and Sonntag, D. Towards a Multimodal Multisensory Cognitive Assessment Framework. In 31st IEEE International Symposium on Computer-Based Medical Systems, CBMS 2018, Karlstad, Sweden, June 18-21, 2018 (2018), pp. 24–29.
  • [32] Oviatt, S., and Cohen, P. R. The Paradigm Shift to Multimodality in Contemporary Computer Interfaces. Morgan & Claypool Publishers, 2015.
  • [33] Oviatt, S., Schuller, B., Cohen, P. R., Sonntag, D., Potamianos, G., and Krüger, A., Eds. The Handbook of Multimodal-Multisensor Interfaces: Foundations, User Modeling, and Common Modality Combinations - Volume 1, vol. Volume 1. Association for Computing Machinery and Morgan & Claypool, New York, NY, USA, 2017.
  • [34] Reitan, R. Trail Making Test. Reitan Neuropsychology Laboratory, 1992.
  • [35] Rubine, D. Specifying gestures by example. In Proceedings of the 18th Annual Conference on Computer Graphics and Interactive Techniques (New York, NY, USA, 1991), SIGGRAPH ’91, ACM, pp. 329–337.
  • [36] Schroter, A., Mergl, R., Burger, K., Hampel, H., Moller, H. J., and Hegerl, U. Kinematic analysis of handwriting movements in patients with Alzheimer’s disease, mild cognitive impairment, depression and healthy subjects. Dement Geriatr Cogn Disord 15, 3 (2003), 132–142.
  • [37] Sonntag, D. Interakt - A multimodal multisensory interactive cognitive assessment tool. CoRR abs/1709.01796 (2017).
  • [38] Sonntag, D., Weber, M., Cavallaro, A., and Hammon, M. Integrating digital pens in breast imaging for instant knowledge acquisition. AI Magazine 35, 1 (2014), 26–37.
  • [39] Souillard-Mandar, W., Davis, R., Rudin, C., Au, R., Libon, D. J., Swenson, R., Price, C. C., Lamar, M., and Penney, D. L. Learning classification models of cognitive conditions from subtle behaviors in the digital clock drawing test. Mach. Learn. 102, 3 (Mar. 2016), 393–441.
  • [40] Souillard-Mandar, W., Davis, R., Rudin, C., Au, R., and Penney, D. Interpretable Machine Learning Models for the Digital Clock Drawing Test. ArXiv e-prints (Jun 2016), arXiv:1606.07163.
  • [41] Tuijl, J. P., Scholte, E. M., Craen, A. J., and Mast, R. C. Screening for cognitive impairment in older general hospital patients: comparison of the six-item cognitive impairment test with the mini-mental state examination. International Journal of Geriatric Psychiatry 27, 7 (2012), 755–762.
  • [42] Willems, D., and Niels, R. Definitions for features used in online pen gesture recognition. Tech. rep., NICI, Radboud University Nijmegen, 2008.
  • [43] Wood, E., and Bulling, A.

    EyeTab: model-based gaze estimation on unmodified tablet computers.

    In Proceedings of the Symposium on Eye Tracking Research and Applications - ETRA ’14 (New York, New York, USA, 2014), ACM Press, pp. 207–210.
  • [44] Yarbus, A. L. Eye movements and vision. Neuropsychologia 6, 4 (1967), 222.
  • [45] Yu, K., Epps, J., and Chen, F. Cognitive load evaluation of handwriting using stroke-level features. In Proceedings of the 16th International Conference on Intelligent User Interfaces (New York, NY, USA, 2011), IUI ’11, ACM, pp. 423–426.
  • [46] Yu, K., Epps, J., and Chen, F. Mental workload classification via online writing features. In 2013 12th International Conference on Document Analysis and Recognition (Aug 2013), pp. 1110–1114.
  • [47] Yu, K., Epps, J., and Chen, F. Q. Cognitive load measurement with pen orientation and pressure. In MMCogEms: Infering Cognitive and Emotional States from Multimodal Measures (2011).

Appendix A Sonntag/Weber Features

The features described in this section are implementations based on the 14 features described by Sonntag et al. [38]. For this section we use the following notations:

A stroke is a sequence of samples,


where n is the number of recorded samples. A sequence of strokes is indicated by


where m is the number of strokes.

The centroid is defined as


where n is the number of samples used for the classification, the mean radius (standard deviation) as


and the angle as


a.1 Number of Strokes


a.2 Length


a.3 Area

The area covered by the sequence of strokes is defined as the area of the bounding box that results from a sequence of strokes. We calculate the area of the convex hull based on Graham’s algorithm[18]


a.4 Perimeter Length

The length of the path around the convex hull


a.5 Compactness


a.6 Eccentricity

Let and denote the length of the major or minor axis of the convex hull, respectively


a.7 Principal Axes


a.8 Circular Variance

Let denote the mean distance of the samples to the centroid . The circular variance is then computed as follows


a.9 Rectangularity


a.10 Closure


a.11 Curvature

Let be the angle between the and segments at .


a.12 Perpendicularity


a.13 Signed Perpendicularity


a.14 Angles after Equidistant Resampling

For this feature we do an equidistant resampling with 6 line segments. The five angles between succeeding lines are considered to make the features scale and rotation invariant (normalisation of writing speed).


Appendix B Rubine’s Features

Features from this section are implementations of the described features by Rubine [35].

b.1 Cosine of initial angle


b.2 Sine of initial angle


b.3 Length of bounding box diagonal


b.4 Angle of the bounding box diagonal


b.5 Distance between first and last point


b.6 Cosine of the angle between first and last point


b.7 Sine of the angle between first and last point


b.8 Total gesture length

Let ,


b.9 Total angle traversed



b.10 Sum of the absolute value of the angle at each point


b.11 Sum of the squared value of the angle at each point


b.12 Maximum speed (squared) of the gesture



b.13 Duration of the gesture


Appendix C Features by Willems and Niels

The features described in this section are implementations based on the feature set described by Willems and Niels [42]. For this section we use the following notations:

Let be center of the bounding box around the gesture defined by the co-ordinate axes.


While the ratio of the co-ordinate axis is not rotation independent, the ratio of the principal exes is. To determine the principal axes Principal Compnent Analysis is used [42]. Let and be the normalised principal component vectors of the set . And let be the center of the box enclosing the trajectory and along the principal component vectors. The lengths of the major axes along the principal component vectors are


c.1 Length of the gesture


c.2 Area

The area around the gesture is calculated using Graham’s convex hull algorithm [18].


c.3 Compactness

Let be the the length of the perimeter of the convex hull, then compactness is defined as


c.4 Ratio between co-ordinate axes

The lengths along the two co-ordinate axes (a, along the x-axis, and b along the y-axis) as given as


Eccentricity is a measure for the ratio between the co-ordinate axes.


where .

c.5 Ratio between co-ordinate axes

The ratio of the co-ordinate axes, which is very much related to eccentricity, is denoted as follows


c.6 Closure


c.7 Circular variance


c.8 Curvature

Let angle between sequenced samples be:


then curvature will be:


c.9 Average curvature


c.10 Standard deviation in curvature


c.11 Pen up/down ratio

Let be the set of strokes composing a gesture of strokes. The duration of stroke of length is given by


where is the -th timestamp of stroke . The duration of all strokes with is then defined as


and the duration of the entire gesture is given by


The ratio of pen up/down is the ratio between the time spent writing (pen down) and in air (pen up)


c.12 Average direction


c.13 Perpendicularity


c.14 Average perpendicularity


c.15 Standard deviation in perpendicularity


c.16 Centroid offset

The principal axes are used to calculate the centroid offset:


c.17 Length of first principal axis

Based on the principal axis, its length is another feature


c.18 Sine orientation of principal axis

The orientation of the principal axis is given by


c.19 Cosine orientation of principal axis


c.20 Rectangularity

Based on the lengths pf the major axes along the principal component vectors and the area of the convex hull , the rectangularity is defined as:


c.21 Maximum angular difference


c.22 Average pressure


c.23 Standard deviation of pressure


c.24 Duration


c.25 Average velocity


c.26 Standard deviation of velocity


c.27 Maximum velocity


c.28 Average acceleration


c.29 Standard deviation of acceleration


c.30 Maximum acceleration


c.31 Minimum acceleration


c.32 Number of cups


c.33 Offset of the first cup


c.34 Offset of the last cup


c.35 Initial horizontal offset


c.36 Final horizontal offset


c.37 Initial vertical offset


c.38 Final vertical offset


c.39 Number of straight lines

Based on the definition of straight lines by Willems and Niels[42], we denote the set of straight lines inside a gesture as and the number of straight lines as


c.40 Average length of straight lines

Let be the length of a straight line, then the average length of straight lines is calculated as


c.41 Standard deviation of straight line length


c.42 Straight line ratio


c.43 Largest straight line ratio


c.44 Number of pen down events

Let be the set of strokes composing a gesture of strokes. The number of pen down events equals the number of strokes


c.45 Octants




and where



c.46 Number of connecting strokes

We define the set of connected components as . A connected component is a part of a gesture that consists of one or more strokes that touch each other, and that do not touch any other strokes [42].


c.47 Number of crossings




c.48 Cosine of initial angle


c.49 Sine of initial angle


c.50 Lenght of the bounding box diagonal

Given the two co-ordinate axes the length of the bounding box is given as


c.51 Angle of the bounding box diagonal


c.52 Length between first and last point


c.53 Cosine of first to last point


c.54 Sine of first to last point


c.55 Absolute curvature


c.56 Squared curvature


c.57 Macro perpendicularity

Let angle between sampled points be:


then macro perpendicularity will be:


c.58 Average macro perpendicularity


c.59 Standard deviation in macro perpendicularity


c.60 Ratio of principal axes

Based on the lengths pf the major axes along the principal component vectors, the ratio of the principal axes becomes:


c.61 Average centroidal radius

The average distance of sample points from the centroid is a feature called average centroidal radius.


c.62 69 Standard deviation of the centroidal radius


c.63 Chain codes

Let the chain code be defined as


then the average angle of the chain code will be



c.64 Average stroke length

If is a stroke with sample points, then let be the length of that stroke: