An Online Framework for Cognitive Load Assessment in Assembly Tasks

The ongoing trend towards Industry 4.0 has revolutionised ordinary workplaces, profoundly changing the role played by humans in the production chain. Research on ergonomics in industrial settings mainly focuses on reducing the operator's physical fatigue and discomfort to improve throughput and avoid safety hazards. However, as the production complexity increases, the cognitive resources demand and mental workload could compromise the operator's performance and the efficiency of the shop floor workplace. State-of-the-art methods in cognitive science work offline and/or involve bulky equipment hardly deployable in industrial settings. This paper presents a novel method for online assessment of cognitive load in manufacturing, primarily assembly, by detecting patterns in human motion directly from the input images of a stereo camera. Head pose estimation and skeleton tracking are exploited to investigate the workers' attention and assess hyperactivity and unforeseen movements. Pilot experiments suggest that our factor assessment tool provides significant insights into workers' mental workload, even confirmed by correlations with physiological and performance measurements. According to data gathered in this study, a vision-based cognitive load assessment has the potential to be integrated into the development of mechatronic systems for improving cognitive ergonomics in manufacturing.



page 2

page 8

page 11

page 12


Workload-Aware Systems and Interfaces for Cognitive Augmentation

In today's society, our cognition is constantly influenced by informatio...

Graph Learning for Cognitive Digital Twins in Manufacturing Systems

Future manufacturing requires complex systems that connect simulation pl...

An Online Multi-Index Approach to Human Ergonomics Assessment in the Workplace

Work-related musculoskeletal disorders (WMSDs) remain one of the major o...

Direct assessment of individual connotation and experience: An introduction to cognitive-affective mapping

We introduce cognitive-affective maps (CAMs) as a novel tool to assess i...

Towards Readability Aspects of Probabilistic Mode Automata

This paper presents a new approach and design model targeting hybrid des...

Evaluation of CT Scan Usability for Saudi Arabian Users

- Like consumer electronic products, medical devices are becoming more c...

NASA-TLX Web App: An Online Tool to Analyse Subjective Workload

NASA Task Load Index (NASA-TLX) is a widely used assessment technique to...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: The system overview. Left: Conceptual illustration of workstation layout including: RGB-D camera, assembly, instructions graphical user interface (GUI), and storage area. Right: Block diagram of the proposed online framework to assess cognitive load and provide visual feedback to the user.

Mental health problems at work affect hundreds of millions of people worldwide. About 17.6% of the global working population suffer from common mental disorders (CMD) [Steel2014], such as anxiety, bipolarity and acute stress. The annual prevalence attains 38.2% in the European Union, embracing attention-deficit hyperactivity disorder (ADHD), insomnia (7.0%), and major depression (6.9%) [Wittchen2011]. Many recent surveys [Kubicek2019] and systematic reviews [vanderMolene2020, Kayla2018] indicate the inadequate organisation and management of the work as a primary cause of such disorders and outline the relationship between excessive working pressures and demands and the incidence of depression, poor health functioning, anxiety, distress, fatigue, job dissatisfaction and burnout. Besides, work-related stress and psychological risks have direct financial implications for private companies and governments. In Europe, the cost related to mental illness symptoms is around 617 billion euros annually, including employers’ expenses (absenteeism, presenteeism, turnover and loss in productivity) and social welfare costs [Hassard2014]. On the other hand, the introduction of hybrid manufacturing systems, where workers and autonomous machines operate in close proximity, has contributed to changing the role of the human in the production chain, resulting in new occupational safety and health (OSH) challenges. The digitalisation of the actual workplace has led to work intensification, constant time pressure and adaptation to rapid and frequent changes in customer demand and requirements (i.e. goods to produce and services to offer). Many of these changes provide development opportunities, nevertheless, they may perilously increase cognitive demand, when inadequately handled, and result in adverse health and safety hazards. Consequently, the elevated mental workload may compromise the operator’s performance and the efficiency of the workplace. The study of human cognitive factors will supplement the well-established research on physical ergonomics [Kim2017, Lorenzini2019], to comprehensively understand how humans interact with the environment and facilitate a reduction of the workload. In addition, various studies have shown that psychological factors at work may have a significant influence on the development of musculoskeletal disorders (MSDs) [Mehta2016]. For instance, mental workload, fatigue, and job stress can alter biomechanical control strategies for upper extremities (i.e. neck, shoulders, arms, and hands) and low back extension, as well as increase gait and sway variability [Grobe2017]. As a final consequence, the phenomenon may induce muscle pain in the worker and even occupational injuries. The global burden of work-related mental disorders is expected to increase year on year [HSEstress2020] and can no longer be overlooked. Despite cognitive load theory has aroused much interest in the last decade [Paas2003], the study of cognitive load in manufacturing operations is a moderately new topic [Carvalho2020, Gualtieri2021]. The field of Cognitive Manufacturing [IBM2017] (i.e. the usage of data across systems, equipment and processes to optimise the manufacturing performance) has only very recently aimed to attain information about human workload. To the best of our knowledge, available tools can be used almost exclusively by experts or merely provide offline insights about the cognitive process (e.g. subjective questionnaires [Valdehita2004]). A first attempt toward a more usable tool was made by Thorvald et al. [Thorvald2019], who developed an analytic method, denoted Cognitive Load Assessment for Manufacturing (CLAM), for assessing the cognitive burden that the worker is expected to employ within a particular assembly task and workstation layout. As a matter of fact, manual assembly is an essential activity in the manufacturing sector, which exposes workers to situations with varying cognitive demands [Brolin2017]. When combining the latter with high time pressure, an increase in mental load frequently occurs [Yabuki2017]. The tool is intended to be used directly by workers involved in the manufacturing domain. Nevertheless, such evaluation is still made offline, asking the end-users to fill a form and rate a set of factors associated with different aspects of their daily activity. The scientific and industrial communities still need to be provided with a validated set of models and metrics for the cognitive workload. Particularly, gaps were identified in relation to the online assessment of the mental demand inflicted by manufacturing tasks. To respond to this challenge, the purpose of this paper is to develop a quantitative and online method to examine how industrial work affects people relative to their attention distribution, decision-making, mental overload, frustration, stress and errors. We propose an online framework to monitor the cognitive workload of human operators by detecting patterns in their motion directly from the input images of a stereo camera. Head pose estimation and skeleton tracking are exploited to investigate the workers’ attention and assess hyperactivity and unforeseen movements (see system overview in Figure 1). The developed tool computes a list of indicators associated with different aspects of an assembly task and workstation layout in manufacturing. Each factor impacts with a weight on two defined indexes: the mental effort and psychological stress level. According to the scores interval, we determine the level of cognitive load an individual is experiencing within the current setup. The study employs assembly experiments to validate our online framework against state-of-the-art offline methods in the field of cognitive science (i.e. physiological signals, secondary task-performance measure and subjective questionnaires). The paper is structured as follows. In section 2 we characterise cognitive load and provide an overview of related works about the methods to measure it. Next, we present our framework for the online assessment of mental effort and stress level. Pilot experiments are then proposed in section 5 and the result are discussed and validated through statistical analysis. The final sections discuss the contributions and limitations of the framework.

2 Related works

The evidence that undue cognitive demand at work can prejudice the mental health of workers and their manufacturing performance has increased the interest in cognitive load theory (CLT). CLT investigates the interaction of cognitive structures, information and its implications [Sweller1998]. In particular, the term cognitive load refers to the amount of processing that performing a particular task imposes on the learner’s cognitive system [Paas2003]. Xie and Salvendy [Xie2000] present a detailed conceptual framework of human information processing and distinguish between instantaneous and overall load. Instantaneous load is defined as the dynamics of cognitive load, which constantly fluctuates over time as a response to stimuli that the present activity and environmental conditions are imposing on the subject. Overall load results by the whole working procedure and represents the experienced and garnered instantaneous load in the human’s brain. A large and growing body of literature has investigated techniques to model human mental workload [Parasuraman2008] and quantify the cost of performing tasks [Xie2000, Haji2015]. Paas and Van Merriënboer [Paas2003] describe mental load, mental effort, performance, and level of stress as the measurable dimensions of cognitive load. Generally, cognitive load measurements belong to three main categories: physiological measures, subjective rating scales and performance-based measures. Physiological measurement of workload relies on evidence that increased mental demands lead to an increased physical response from the body [Sweller1998]. Various researchers have investigated the relationship between mental effort and heart rate variability (HRV) metrics in three frequency bands of interest: very low frequency (VLF, 0–0.04 Hz), low frequency (LF, 0.04–0.15 Hz), and high frequency (HF, 0.15–0.4 Hz) [Delliaux2019, Durantin2014]. According to recent studies, intense cognitive demand leads to a decrease in HF power and a growth in the LF, respectively related to a parasympathetic withdrawal and a predominant increase in sympathetic activity [Delliaux2019, Mizuno2011]. Besides, the galvanic skin response (GSR, also known as electrodermal activity, EDA) has been widely studied to quantify cognitive states [Setz2010]. GSR or EDA is the measure of the continuous changes in the skin’s electrical conductance caused by the variation of the sweating activity of the human body. The signal is typically described as a combination of two components, the tonic and phasic response. Researchers use high-resolution EDA for indexing variations in sympathetic arousal associated with emotion, cognition, and attention [Marucci2021, Rajavenkatanarayanan2020] and today represents one of the preferred metrics for stress [Kyriakou2019]. More recent studies also include measures of respiratory activity [Grassmann2016], eye activity [Coyne2016, DiStasi2016], cortisol level [Carrasco2003], speech measures [Yin2008], and brain activity [Rosanne2021]. Psychophysiological measurements provide objective and quantitative information, as well as the possibility to visualise a continuous trend and identify detailed patterns of load. However, these signals are highly sensitive to human movements, and the sensory acquisition system may be bothersome for the users and condition normal activities, severely limiting the adoption in real-world scenarios. Thus far, the measurement of cognitive load in laboratory settings mainly relies on subjective rating scales [Valdehita2004]. The most commonly used questionnaire is called NASA-Task Load Index (NASA-TLX) [Hart1988]. Self-ratings nevertheless have many limitations [Naismith2015]. Firstly, they are based on the assumption that people are able to introspect on the cognitive processes and report the amount of experienced cognitive effort. Secondly, they are often affected by many biases, such as acquiescence and social desirability. Lastly, the data are delivered after the completion of the activity and can be exploited only following extensive analysis by experts in the area of cognitive ergonomics and cognitive science. The third alternative to measuring cognitive load is through task- and performance-based techniques. Various metrics are presented in the literature (e.g. reaction time, accuracy and error rate) to assess the performance of both the primary and secondary tasks [Haji2015]. The secondary task is performed concurrently and is supposed to reflect the level of the cognitive load imposed by the primary task [Paas2003]. Despite the high sensitivity and reliability, this technique can be rarely applied, even in laboratory settings. All the studies reviewed here support the hypothesis that existing approaches for cognitive load assessment have their strength and weakness and can be sensitive to distinctive aspects of workload. When measuring workload empirically, the rule of thumb is to select a variety of measurements that seem appropriate to the application and are likely to provide insights into cognitive processes [Miller2001]. Unfortunately, most of these techniques are potentially difficult to be applied in industrial scenarios. Indeed, they require rather expensive and impractical equipment that may be uncomfortable for the users. Despite the increasing enthusiasm to understand the multidimensional construct of the mental workload, the cognitive manufacturing field is still looking for practical solutions [Carvalho2020]. Our work responds to the growing need to gather online data giving insights about the mental processing system and enables the identification of excessive cognitive load of assembly workers.

3 The Cognitive Load Assessment Framework

Figure 2: Overall structure of the online cognitive load assessment framework. The proposed approach detects patterns in human’s motion (blue block), investigates workers’ attention (orange block) and their interaction with assembly instructions on a monitor (yellow block). Combining all these factors, final scores of mental effort and psychological stress level are computed (green block).

The overall structure of the proposed cognitive assessment framework is represented in Figure 2. Our method investigates (i) the concentration level of a worker by considering gaze direction and head pose, (ii) the stress level, by analysing activity-related body language (i.e. self-touching occurrences and high activity periods) and (iii) the information and part identification cost, namely the cognitive effort required to utilise the assembly instructions and handle the right tools and components to complete the task. Additionally, we include a priori defined parameters reflecting features of the specific assembly task and workstation layout (e.g. the number of assembly parts and noise level). Combining all these factors, we compute the final scores of mental effort and stress level. This enables us to identify excessive cognitive load in the assembly workers. Besides, the framework includes a visual feedback interface, through which intuitive warning messages can be provided to the assemblers. To make the proposed framework easily deployable in both laboratory and industrial settings, the choice of the external sensory systems was driven by the implementation costs and users’ comfort (e.g. by avoiding wearability constraints). Hence, we selected a family of affordable active 3D imaging systems, namely RGB-D cameras, to detect human operators and quantify their workload. The depth information and RGB images of the camera are processed by the ‘human upper-body kinematics tracking’ module and the ‘attention tracking’ module to compute a set of cognitive load factors introduced in Section 4. These two modules operate in synergy with the ‘interaction with instructions’ module and converge into the ‘cognitive load assessment’ module to compute the final scores of mental effort and stress level (see Figure 2). Before describing the modules in detail, we provide the definition of the workstation layout. An operating environment can be defined by the involved workstations and their relative configuration. We consider at least three types of workstations111Throughout this paper, the term ‘operating environment’ (also known as ‘working area’ or ‘workplace’) refers to a place available to manufacturing personnel to carry out work. The ‘workstation’ is instead a specific location, e.g. an assembly table, where employees perform specific tasks. in industrial assembly tasks: the assembly workstation , which is the area occupied by the assembly components, the instructions workstation , which provides the assembly information and the steps to follow through e.g. a monitor, and the storage area , where the assembly components (e.g. screws, nuts and tools) are stored. Based on the number of workstations, the system accordingly associates reference frames (see Figure 1) in the position specified during a configuration phase. The positions of those reference frames with respect to the operator’s head are used to determine the level of attention toward every workstation (see Section 3.2).

3.1 Human Upper-Body Kinematics Tracking

The central role of this module is to detect the presence of a human operator entering the working area and to provide information to the system about the variations of his/her kinematic body configuration over time. We exploit a visual skeleton tracking algorithm, developed by StereoLabs222 to track the human skeleton from the input images of a stereo camera. The module is, however, scalable to any other visual tracking method, e.g. OpenPose [Cao2009], or even IMU-based motion capture systems, such as Xsens suit333 The algorithm extracts the 3D position of twenty-five human keypoints (e.g. neck, shoulders, elbows, wrists, hips, knees, angles) in real-time. Among them, we select the ones belonging to the upper body, and we analyse their displacements to compute factors describing the operator stress level (see Section 4.2). Spatio-temporal information of human movements is also used to distinguish between possible tasks performed by the operator. To do this, the distance on the horizontal plane between the "neck" skeleton keypoint and the workstations is continuously computed: the worker is assumed to perform the task associated with the workstation he/she is closest to. For instance, we assume that the assembler is searching for a tool if he/she accesses the storage area. On the other hand, the mental effort factors (defined in Section 4.1) are computed only if the subject is within a predefined range with respect to the assembly or instruction workstation (i.e. or ).

3.2 Human Attention Tracking Module

Nowadays, several sensory systems can provide accurate measurements of human engagement and attention, such as eye-tracking screen-based devices (e.g. Gazepoint GP3 [Coyne2016]) and glasses (e.g. Tobii Glasses 2 [DiStasi2016]), or electroencephalography headsets (e.g. Neurolectrics Enobio [Rosanne2021]). However, these systems bring about significant disadvantages such as discomfort (in wearable systems) and limited operational range (in screen-based eye-tracking devices). For these reasons, we developed a vision-based module, which is briefly outlined in Algorithm 1. We exploit a head tracker444

, which adopts OpenCV to detect the human face and a TensorFlow pre-trained deep learning model to identify facial landmarks. To estimate the head pose, a Perspective-n-Point (PnP) problem between the OpenFace

555 3D model of the face and the output of the detector (i.e. sixty-eight keypoints in pixel coordinates) is solved using the OpenCV function solvePnP. The PnP problem is stated as an iterative method based on a Levenberg-Marquardt optimisation [Levenberg1944]

and the solution is the pose that minimises the reprojection error, namely the sum of squared distances between the observed projections on the image plane and the projected 3D points in the model. A Kalman Filter is used to stabilise the pose computed frame by frame.

4:     top:
10:     loop:
11:     for each workstation  do
16:          if human  then
17:               if  then
20:     goto top.

Note that in the apex represents the reference frame in which the variable is expressed.

Algorithm 1 Human Attention Tracking

The output of the procedure is the location and orientation of the head with respect to the camera frame. According to the estimated odometry, a frame is associated with the head and the transformation

expresses the head pose variation over time. Subsequently, we look up the transformation between the head frame and each workstation defined in the configuration phase and the Cartesian vector expressing their relative position is mapped in spherical coordinates (i.e. azimuth angle

, elevation angle and radial distance). To estimate the level of attention toward each workstation, we model a fuzzy logic membership function on the computed angles. In particular, the azimuth and elevation values are separately transformed using a Raised-Cosine Filter [Glover2004], where a sigmoid normalises the values to a scale from zero to one. To obtain the desired behaviour in different ranges, we define the function as follows:


where is one of the two measured angles (i.e. azimuth or elevation ) at each time instant , allowing the continuous localisation of the -th workstation with respect to the subject’s gaze direction. The fuzzy function includes control points () defined a priori, which determine the independent upper and lower limits of the area where the function has a smooth behaviour. Thus, the indicator decreases exponentially along with the growth of absolute angle value above the minimum threshold (), before levelling off at a maximum threshold (). The assessment of the attention level toward each -th workstation is therefore computed as the product between the normalised azimuth and elevation indicators:


Given the estimated attention to all workstations, we can assess if the worker is currently distracted or concentrated on a particular workstation. This is determined by simply checking if at least one of the attention parameters is above a predefined threshold. If it would be the case, we find out the workstation that the worker is looking at as the one in which the associated parameter is maximum.

3.3 Interaction with Instructions Module

In this work, we assume that assembly instructions are shown on a monitor through a Graphic User Interface (GUI), permitting the operator to browse them (see Figure 1). Inputs from the keyboard permit to watch the next instruction, check the same instruction again (i.e. instruction check back) or go back in instructions. As a consequence, the ‘interaction with instructions’ module is in charge of monitoring the task advancement. According to registered keyboard commands, it provides the system with the number of steps of the assembly sequence that the user has already followed, the instruction check backs and the occurrence of an error in the assembly sequence that obliges the user to go back to more than one instruction.

3.4 Cognitive Load Assessment Module

The last module exploits workload indicators in manufacturing as identified by several experienced researchers and industrial experts [Thorvald2019]. Particularly, we define a list of cognitive load factors and compute them starting from the output of the modules described above. Note that the unit of analysis is on the workplace level, including both the operator and the workstations layout. Each factor is then multiplied by its assigned weight (see Section 5.D), and a definite sum of the weighted metrics determines the final scores of mental effort and stress level. A detailed description of the proposed cognitive load factors and scores can be found in the next section.

4 Definition of cognitive load factors and
final scores

We define and develop a set of cognitive load factors that are computed for each system pipeline loop and contribute specifically to one of the aforementioned indexes (i.e. mental effort and stress level). Some of the factors include both an instantaneous and overall parameter, based on the cognitive load definitions provided at the beginning of Section 2, and their specific usage will be explained afterwards. In addition, we present ‘workstation factors’, which may affect the total cognitive load in assembly tasks. Note that the proposed indexes analyse the assembler behaviour within a predefined workstation layout. Moreover, each factor is not expected to directly reflect human cognitive processing. Our position is that a combination of those factors could provide insights into the human cognitive system.

4.1 Mental effort factors

4.1.1 Concentration Loss:

This factor analyses the attention that an individual gives to a task. It is based upon contemporary psychology claim that cognitive load usurps executive resources, which otherwise could be used for attentional control, thus increases distraction [Lavie2004]. Accordingly, we assess here the amount of time not dedicated to the assembly, instructions or any other defined workstation, and hence quantify how long an individual is not concentrated on his/her assembly task. The Concentration Loss factor is thus defined as


where is the number of workstations defined in the configuration phase. The ‘’ is the interval in which the subject is focused on the -th workstation , namely is above the predefined threshold and . For instance, ‘’ represents the time spent looking at the instructions on the monitor. Finally, the ‘time elapsed’ refers to the time passed since the task starts, expressed in seconds666Note that ‘time elapsed’ is defined in the same way for all the factors..

4.1.2 Learning Delay:

This metric investigates the ability to rapidly learning a novel rule from instructions and assesses the operator’s automaticity in completing the assembly. We took inspiration from Rapid Instructed Task Learning [Liefooghe2012, Cole2012] theory, which analyses the efficient action execution immediately following instructions and without prior practice. The studies highlight that instructions can even produce automatic effects in relatively simple tasks. The assumption here is that the more time the subject spends focusing on the assembly components, the slower is the learning. Hence, we can infer that the less trivial is the task, the higher is the cognitive load. The Learning Delay factor is thus defined as


where ‘assembly time’ or ‘’ is the interval in which the subject focuses on the assembly workstation .

4.1.3 Concentration Demand:

The estimated incidence of attention failures is usually associated with cognition overload [Head2014]. This factor is defined as the number of times the subject gets distracted, losing their attention toward all workstations involved in the task. In particular, the instantaneous parameter evaluates the transitions to not attention per instruction, excluding the ones to shift the focus to another workstation, thus is defined as


where represents the -th instruction and is the number of defined workstations. The overall parameter keeps the memory of load that the operator experiences during the task. Whenever the event (i.e. loss of attention from any workstation) is detected, we record the instant in which it occurs. Then, the ratio of the sum of the time instances and the time elapsed is considered:


where is the number of total occurrences of attention loss while working on the task. Note that each occurrence equally impacts the indicator, and as time passes, the contribution of a past event decreases.

4.1.4 Instructions Cost:

This metric examines the general quality of the instructions used to gather information about the work. The analysis relies on human-computer interaction guidelines and studies on the required cognitive effort to utilise them [Chandler1991]. We counted the attention switches between the assembly workstation and the monitor, excluding the required checks for a new instruction. The instantaneous parameter defines the cost of information per instruction as


where is the -th instruction. On the other hand, the overall parameter considers the instants in which the event (i.e. a not required switch) occurred:

not required switches


4.1.5 Task Difficulty:

This factor estimates the required cognitive effort to perform a task. To do that, the framework automatically records the instructional check backs on the GUI. Since task demand can vary as a function of the cognitive load [Klingner2011], the instantaneous parameter is also here complemented with an overall parameter. The latters are thus defined as


where is the total amount of instruction check backs performed during the task.

4.1.6 Frustration by Failure:

This is a simple metric describing the mechanism triggered after making a mistake . The instantaneous and overall parameters are computed as for previous factors:


with the total amount of mistakes made during the task. Here, an error in the assembly sequence is detected whenever the user goes back to more than one instruction.
It should be reminded that, thanks to skeleton tracking, we detect in which workstation the operator is. Hence, please note that the factors described above are computed only if the human is in proximity to workstation and remain constant if the operator moves away.

4.1.7 Tool Identification:

This factor assesses the mental processing to identify the tool needed for the assembly. Whenever the storage area is accessed (i.e. the human is in proximity to workstation ), the Tool Identification factor is computed as the time spent to seek the right tool in tenths of a second.

4.2 Stress level factors

The analysis of body language is gaining an increasing interest in the emerging field of automatic detection of stress [Carneiro2012]. Accordingly, we defined activity-related features solely based on visual information and the derived skeleton tracking.

4.2.1 Self-touching:

It has been proven that self-touching is a behavioural indicator of stress and anxiety [Harrigan1985]. We compute the distance between each hand and the head key points of the detected skeleton. If the value is below a predefined threshold, a self-touching occurrence is registered and impacts the final score for a minute:


where is the number of self-touching occurrences and is the time in seconds.

4.2.2 Hyperactivity:

An analysis of human motion is performed to detect stress-related high activity periods. Our method is solely based on visual spatiotemporal information of human kinematics extracted from video sequences representing the monitored subject. During an initial calibration phase, we capture the upper body joints movement within two subsequent frames and store the mean

and standard deviation

(where is the number of selected upper body joints) of the sum of the displacements in a time window . This baseline recording permits us to compare the online data with a session under resting conditions. Then, during task execution, we periodically compute the deviation of every joint from its mean motion. If a -th joint’s deviation is greater than the stored standard deviation , a parameter associated with -th joint, called ‘activity’, is evaluated as the ratio between and

. A unique descriptor of activity level is computed as the mean of all the upper body joints’ activity.

4.3 Workstation factors

The worker can navigate the list of products and combine different objects in sequences to handle more complex assemblies. In the catalogue (.csv file), the number of components and required tools for each object are specified. The sequence of objects to assemble is loaded and the following parameters are evaluated for the selected task.

4.3.1 Number of assembly components:

A parameter, normalised between 0 and 1, rising linearly with the number of parts intended to be assembled into a complex product.

4.3.2 Number of tools used:

A normalised parameter describing the number of tool used to complete the assembly.

4.3.3 Physical effort:

The required physical effort to perform a task. The estimated difficulty factor takes values between 0 (simple - not previous experience is required) and 1 (difficult - significant training and experience are required).

4.3.4 Variant flora:

An estimation of the level of variation on a workstation (from no variation, i.e. one-piece production, to full variation, i.e. flexible and customised production).
In addition, several environmental factors such as lighting conditions, temperature and level of noise may influence the operators’ conditions. While the first two can be considered rather constant in an industrial workplace, the level of noise may greatly vary depending on the working scenario. There is increasing evidence that chronic noise stress impairs cognition and induces oxidative stress in the brain [Subramaniam2019]. With this in mind, the Level of noise factor has been defined.

4.3.5 Level of noise:

The sound pressure level in manufacturing environment. A sound sensor could measure the surrounding ambient sound in the audible frequency spectrum for the human ear. Given the mean level of noise in A-weighted decibels (dBA), the parameter is defined as follows:


where the thresholds (20 and 70 dBA) are defined in compliance with recommended standard occupation noise exposure [Subramaniam2019].

4.4 Cognitive load scores estimation

The cognitive load factors described in the previous sections are computed online, in the background of workers normal activities. This is to identify excessive cognitive load on the fly and deliver warning messages to the assembly worker. With this aim, we multiply each factor by a weight (see Section 5.D): the sum of the weighted mental effort and stress level factors results respectively in the two homonyms ‘higher-level’ scores. The mental effort is computed at two different levels. Its dynamics is estimated online exploiting the instantaneous parameters and provided as feedback through a dedicated screen. A detailed description of the visual feedback interface is presented in Section 6.5. For the post hoc analysis, we instead select the overall factors since, at this stage, we do not aim to evaluate the cognitive load triggered by a stimulus but the overall mental effort induced by the whole task and cross-compare its trend among diverse testing conditions. On the other hand, the stress level score is defined by the hyperactivity, plus each occurrence of self-touching impacts with a predefined value on the final score, and, as time passes, its contribution decreases.

5 Experimental analysis

In this section, the experimental campaign to validate our framework is described in detail. We adopted both quantitative and qualitative measures to assess the performance and potentials of the proposed approach.

Figure 3: Overview of the experimental setup highlighting: zed2 stereo camera, instructions GUI (monitor and keyboard), storage area and screen providing visual feedback. The 3D assembly is placed on the table in front of the subject.

5.1 Experimental setup

For the experiments, we reproduced a possible operating environment in our laboratory (see Figure 3). The participants were asked to sit at a desk, and a 3D printed assembly kit777 was placed on the table (defining workstation ). The instructions to assemble the object were shown on a monitor (workstation ) and consisted of short videos of about s each. The user could browse them through a GUI. Inputs from the keyboard permitted to watch the next instruction, reproduce the same instruction video (i.e. instruction check back) or go back in instructions. Finally, small boxes with screws, bolts, nuts and required tools were placed in the area right behind the participant defining workstation . A stereo camera (zed2, Stereolabs, San Francisco, CA, USA) monitored the participant from the front for the entire duration of the experiment. Note that the framework does not require the recording of a video (i.e. the computations were performed online), however, it was acquired as a backup to measure the detection accuracy of subjects’ motion patterns. The experiments aimed to cross-test the performance of our cognitive load assessment framework against physiological measurements. In particular, the trend of the mental effort was analysed in relation to heart rate variability, while the stress level was compared with the commonly used features in galvanic skin response. The following section justifies the choice of these specific parameters as ground truth and describes the sensors adopted (also highlighted in Figure 4) and the post-processing of the acquired signals.

5.2 Baseline measurements

Figure 4: Measurements and sensors used as ground truth to test our metrics.

5.2.1 HRV responses:

A chest strap (H10888, Polar Electro Oy, Kempele, Finland) was used to record the electrocardiogram (ECG) signal. The RR interval, i.e. the time elapsed between two successive R-waves, were extracted from the raw ECG. Cardiovascular data analysis was subsequently performed using Kubios software999

. The tool computes several classical metrics in time, frequency and non-linear domain. In this work, the frequency domain HRV data were considered. More precisely, the LF/HF ratio is selected since it is indicative of the mental effort, as suggested by the literature

[Mizuno2011, Durantin2014].

5.2.2 Galvanic skin responses:

The skin conductance was monitored by wristband Empatica E4101010

, a medical-grade wearable device acquiring real-time physiological data. The recorded GSR signal was then processed using the open-source MATLAB toolbox Ledalab

111111 A Butterworth low pass filter with a cut-off frequency at 2 Hz was used to filter the high-frequency components. Finally, we applied the continuous decomposition analysis to separate the tonic (Skin Conductance Level, SCL) and phasic (Skin Conductance Response, SCR) components. As Marucci et al. [Marucci2021], we investigated the mean value of the SCL and the mean amplitude of the SCR peaks to assess the stress induced by the whole task on participants.

5.2.3 Secondary task performance:

Concurrently with the primary assembly task, participants were asked some questions (three per experimental condition) through headphones. In the task-based methodology, performance on a secondary task is supposed to reflect the level of the cognitive load imposed by the primary task [Paas2003]. We measured the reaction time of the user to the presented query whose answer is well known (e.g. the spelling of the name, the date of born, etc.).

5.2.4 Subjective questionnaire:

At the end of the experiment, we asked participants to fill NASA-TLX [Hart1988] and a custom questionnaire. The latter is a subjective scaling approach to capture mental effort- and stress-related factors in different task conditions. The evaluation includes a technique developed by NASA to assess the relative importance of factors in determining the experienced workload. Pairs of rating scale labels are presented, and the subject is asked to select which of the two was more relevant to the experience of cognitive workload in the task just performed. From the pattern of choices, we are able to associate a weight to each cognitive load factor and compute the overall score consistent with the experience of a specific subject. A copy of the custom questionnaire can be found as supplementary materials for this paper.

5.3 Experimental protocol

The whole experimental procedure was carried out at Human-Robot Interfaces and Physical Interaction (HRII) Lab, Istituto Italiano di Tecnologia (IIT) in accordance with the Declaration of Helsinki, and the protocol was approved by the ethics committee Azienda Sanitaria Locale (ASL) Genovese N.3 (Protocol IIT_HRII_ERGOLEAN 156/2020). All the subjects recruited were volunteers, naïve about the purpose of the experiment, and declared not to suffer from any mental disorder or cardiovascular disease. The cognitive load employed by a worker is highly susceptible to the skills of the individual assessor. Thus, personnel without previous expertise and experience in the presented assembly task was considered. The study employed a within-subjects experimental design in which each participant underwent all three experimental conditions. The tasks were devised with three levels of complexity (i.e. task 1 - simple, task 2 - medium, and task 3 - difficult) and industrial noise (i.e. task 1 - low, 45 dBA, task 2 - medium, 65 dBA, and task 3 - high, 75 dBA). The tasks order was defined as 1-2-3 for all the subjects, with the aim of imposing a growing complexity and thus identify an increase in cognitive effort. The participants had fifteen minutes to complete each section. Before the beginning of the experiment, an initial calibration was performed to capture the physiological parameters and track the upper joints movements under resting conditions and then, set them as a reference. Moreover, the user had the chance to get familiar with the assembly parts and the interface for instructions. The rest of the section describes two different experimental sessions that represent the consecutive phases in the development of our framework.

5.4 Model calibration experiments

The purpose of the first experimental session was to test the setup and identify the weights that should be associated with each cognitive load factor for the computation of cognitive load scores. To do this, five male subjects ( years old) performed the whole experiment and filled in the custom questionnaire. Analysing factors’ trend over time, we defined thresholds that thereafter permit the normalisation of the values assumed by cognitive load factors (i.e. ). Given the patterns of choices in the questionnaire, we computed the weights that each subject would associate with each factor. The mean among all subjects for each factor weight was used in the second experimental session to create weighted combinations resulting in the mental effort and stress level scores.

Figure 5: Concentration Loss, Learning Delay, Instructions Cost and Task Difficulty factors associated to subject during three experimental conditions.

5.5 Multi-subject cognitive load assessment

Ten subjects, five males and five females ( years old), were recruited for the second session. During the test, the cognitive load factors were computed online, and the final scores were shown on a monitor, only visible to the researcher (see Figure 3). At the same time, physiological measurements were recorded. A statistical analysis was subsequently performed on the acquired data. We adopted the non-parametric repeated measures Friedman’s test to examine if the subject experienced different conditions imposed by the experiment (low, medium and high cognitive load). Finally, Spearman’s rho correlation coefficient was used to assess if any relationship exists between the scores computed in the proposed framework and our ground truth measurements (i.e. physiological signals, performance measure and questionnaires).

6 Experimental results

In this section, the results of the two experimental sessions are presented. We begin by outlining the outcomes of the model calibration experiments, highlighting the functioning of the final framework. This is followed by a deep analysis of cognitive load-related data acquired in multi-subject experiments. Finally, we report the outcome of the online visual feedback interface.

6.1 Model calibration

Thresholds Weights
Instantaneous Overall
Concentration Loss - - 1.6
Learning Delay - - 3.2
Concentration Demand 12 26.0 1.6
Instruction Cost 13 26.1 4.0
Task Difficulty 6 10.7 2.2
Frustration by Failure 2 4.7 3.0
Tool Identification - - 1.4
Table 1: Thresholds and Weights associated with mental effort factors

Table 1 shows the results of the model calibration experiments. Concentration Loss and Learning Delay take on values between and by definition. Tool Identification factor saturates to after ten seconds as a practical choice. On the contrary, the other factors have to be normalised. To this aim, we defined upper-limit thresholds for each proposed factor as the maximum registered value for all subjects who took part in the first testing session. Besides, patterns of labels’ choices in the custom questionnaire show the relative importance of the proposed factors in determining how much mental effort the operator is experiencing in the task. The third column of table 1 illustrates the means of the weights given by participants to each factor. Interestingly, the cognitive demand to understand instructions (e.g. Instruction Cost) represents the perceived most crucial contributor to workload.

6.2 Cognitive load factor assessment

6.2.1 Mental effort:

Figure 5 displays the mental effort factors over time for one subject, as an example. We report Concentration Loss, Learning Delay, Instruction Cost and Task Difficulty factors since they show a meaningful trend throughout the task execution. On the other hand, the impact of Concentration Demand, Frustration by Failure and Tool Identification factors is punctual when a specific event occurs. The results of tasks 1, 2 and 3 are reported on the same chart to highlight differences in the trends. Note that the participant completed the first and second task before the total available time (i.e. fifteen minutes). The factors are normalised online, when needed, according to thresholds defined in the first experimental session. The results obtained from the weighted combination of the factors are presented in Figure 7 (first row). In particular, the black line sets out the trend of the mental effort score over time in the different experimental conditions. Coloured bars highlights instead the score mean in three-minute intervals.

6.2.2 Stress level:

The estimated stress of a participant during the experiments is illustrated in Figure 8 (first row). Specifically, each occurrence of self-touching impacts 0.2 on the final score and as time passes, the contribution decreases (reaching zero after one minute). The grey and black profiles represent hyperactivity and self-touching factors, respectively. By summing them, the stress level score is evaluated and its mean within blocks lasting three minutes is reported as coloured bars.

Figure 6:

Results of the subjective evaluations (NASA TLX and custom questionnaire): bars represent mean and standard error between scores given by participants in different experimental conditions.

6.3 Significance of experimental conditions

6.3.1 Cognitive load scores:

The mental effort and stress level means in three-minute intervals were compared with a Friedman’s test to access differences in tasks across repeated measures. The imposed conditions (i.e. increasing complexity and noise) affect the mental effort score significantly, =6.58, p=0.0373, and the stress level score marginally, =5.96, p=0.0507.

6.3.2 HRV responses:

Cardiovascular data analysis was performed on three-minute blocks of RR intervals of ECG signal. For all the subjects, we extract HRV features in frequency-domain and differences in their trend were assessed using Friedman’s test with repeated measures (three-minute blocks). Different experimental conditions significantly impact the LF (Hz) parameter (=6.68 p=0.0354), which exhibits a predominant decrease over the tasks. HF (Hz) showed instead marginal difference among the tasks (=5.88 p=0.0529). Median LF/HF ratio levels for the low, medium and high imposed cognitive load experiments were 3.10, 3.64 and 3.77, respectively. The statistical test revealed a significant difference in LF/HF ratio depending on the imposed complexity and noise, =9.24 p=0.0098.

6.3.3 Galvanic skin responses:

As Marucci et al. [Marucci2021], we investigated the mean value of the SCL and the mean amplitude of the SCR peaks during the experimental conditions. The analysis was performed in three-minute intervals with Friedman’s test. Both tonic and phasic components revealed a significant main effect of the load condition (p0.01).

6.3.4 Secondary task performance:

Friedman’s test revealed a statistically significant difference in the reaction time of the secondary task (=10.23 p0.01). In general, the participants tended to delay the answer as the task complexity increases.

6.3.5 Subjective questionnaires:


Figure 7: Comparison between mental effort score computed by our online framework and LF/HF ratio extracted from three-minute blocks of electrocardiography signal for subject .


Figure 8: Comparison between stress level score and skin conductance level (SCL) and response (SCR) extracted from three-minute blocks of galvanic skin response for subject .

Figure 6

presents the results of subjective questionnaires in the three experimental conditions. The bars represent the mean of the scores assigned by participants and the error bars display the 95% confidence of scale means. The Kruskal-Wallis test was conducted to compare the three conditions in a systematic manner. The results from the NASA-TLX show statistically significant differences in mental demand (

=10.61 p=0.005) and effort (=14.22 p=0.0008) among the tasks. For the other scores, the p-values were over 0.05 significance level. From the custom questionnaire, we identify a significant effect in perceived concentration demand (=7.11 p=0.0286), learning delay (i.e. automaticity in completing the assembly, =6.48.11 p=0.0392) and task difficulty (=14.33 p=0.0008) depending on imposed experimental conditions. Finally, the latter significantly affect the overall cognitive workload score (=7.24 p=0.0267) computed by the ratings combination defined in the first testing session.

6.4 Correlation between cognitive load scores and physiological variables

1 2 3 4 5 6 7 8 9 10
Mental effort/HRV 0.80** 0.74** 0.62* 0.67** 0.60* 0.87** 0.49* 0.20 0.06 0.21
Mental effort/Secondary task 0.42 -0.46 0.53 0.82 0.47 0.47 0.03 0.60 0.26 0.54
Stress level/SCL 0.43 0.58* 0.63* 0.72** 0.07 0.01 0.47 0.31 -0.13 0.01
Stress level/SCR 0.64* 0.21 0.63* 0.68* -0.17 0.57* 0.35 0.40 0.05 -0.25
Significance level are indicated at the *p<0.05, **p<0.005.
Table 2: Spearman’s correlation coefficients between estimated indicators of Cognitive Load (mental effort and stress level) and state-of-the-art measurements (physiological signals and performance) considering all participants.

A Spearman’s rank-order correlation was run to determine if a relationship exists between the scores computed in our framework and standard measures presented in the literature for the assessment of cognitive load. Figure 7 compares the trend of the mental effort score with the ratio of low-frequency power to high-frequency power (LF/HF ratio) in three-minute intervals extracted from ECG signals (second row). For seven out of ten participants, there was a strong, positive correlation between the mean within three-minute blocks of the computed score and the HRV feature, which was statistically significant (see Table 2 first row). For each subject, we also compute the correlation between the mental effort score and the reaction time in the secondary task (three questions per experimental condition every three minutes). The test revealed a positive correlation, but no significance was found (see Table 2 second row). The trend of the stress level score is instead analysed in comparison with GSR-related measures. Figure 8 compares the trend of the stress level

score with the SCL and SCR features extracted by three-minute intervals of GSR signals (second row). The bottom half of table

2 provides the summary statistics. The skin conductance variables were partially correlated with the estimated score. In particular, positive correlations were detected, but they were statistically significant just for few subjects.

6.5 Online visual feedback

Figure 9: Online visual feedback reporting: current head direction, percentage of attention toward involved workstations, and estimated mental effort and stress level scores as colour-coded bars.

Cognitive load scores are computed online since our final goal is to identify the excessive cognitive load and deliver warning messages to the human operator. An interface was designed and implemented to provide visual feedback on the workload that the worker is currently experiencing during the assembly task (see Figure 9). The interface shows the real-time acquired video depicting the monitored subject. A pyramidal shape is drawn on the image to highlight the facing direction, and the percentage of attention toward the assembly () and instructions () workstations is specified. Within this context, the workstation factors and instantaneous parameters are considered, and their weighted combinations result in the instantaneous scores of mental effort and stress level. The latter is represented as colour-coded bars. Colour is used here to warn when excessive cognitive load is identified (green - low cognitive load, yellow - medium cognitive load, orange - high cognitive load, and red - very high cognitive load). The reader can better understand the functioning of the implemented visual feedback by watching the video provided as supplementary material for this paper.

7 Discussion

From the performance comparison between our vision-based framework and state-of-the-art methods, we observed, in the trends, differences among the various experimental conditions (i.e. increasing task complexity and noise). Therefore, statistical analysis was performed on the acquired data. Statistical significance through separate repeated-measures analysis of variance was found for both HRV and GSR features, as for the outcomes of the secondary task and questionnaires, in the different testing sessions. Hence, we can infer that the subject actually experienced the imposed cognitive load conditions. A promising finding was that statistically significant differences were also identified in the cognitive load scores computed by our method (i.e.

mental effort and stress level). This encouraged us to compare our online scores with state-of-the-art offline methods hardly deployable in industrial settings. As observed in Table 2, the mental effort mean in three-minute intervals appeared to be positively correlated to the LF/HF ratio extracted from ECG signal within the same time intervals and to secondary task performance. Moreover, positive correlations were detected between the stress level and GSR-related features. Results provided first evidence on the capability of the method to provide meaningful and reliable insights about the human cognitive load at work. Practical strong points of the setup include the reduced cost of the system components and the users’ comfort while performing their tasks. Our cognitive load assessment framework does not require the worker to wear any sensor and can be easily configured and used by non-experts in the areas of cognitive ergonomics and human-computer interaction. Besides, it is worth mentioning that the system is capable of identifying online excessive workload periods in assembly tasks and providing visual feedback using colour-coded bars.

8 Conclusions

This paper presented an online and quantitative method to monitor the cognitive load of human operators by analysing the attention distribution and detecting motion patterns in assembly activities directly from the input images of a stereo camera. The main focus was on identifying risks in tasks and workstation design, where excessive workload might lead to errors or work difficulty. We exploited cognitive load factors in manufacturing as identified by experts [Thorvald2019]

and evaluated them online through cutting-edge artificial intelligence algorithms (i.e. head pose estimation and skeleton tracking). We estimated the

mental effort and stress level currently experienced by the worker, investigating attention- and activity-related behavioural features, and we delivered intuitive warning messages on a screen. The proposed method shows promising features to be applied to the manufacturing sector. The framework indeed works online, does not require expensive equipment and does not ask the human worker to wear any sensor permitting the natural flow of work activities. The main limitation is the assumption of a well-structured working environment, where assembly instructions are shown on a monitor. A natural progression of this research is to generalise the framework, including more workstations and examining complicated industrial operations with multiple human workers. Future works could also investigate the benefits of a subject-specific model of cognitive load processes to address individual demands and characteristics of the workers. Besides, both the weights and the thresholds defined for the developed factors could be tuned depending on each user feedback or previous user-specific calibration phase. The final results suggest that the presented method has the potential to be integrated into the development of human-robot interaction systems for improving human cognitive ergonomics in industrial settings.


This work was supported in part by the ERC-StG Ergo-Lean (Grant Agreement No.850932), in part by the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 871237 (SOPHIA).