Understanding Developers Well-Being and Productivity: A Longitudinal Analysis of the COVID-19 Pandemic

by   Daniel Russo, et al.

COVID-19 has likely been the most disruptive event at a global scale the world experienced since WWII. Our discipline never experienced such a phenomenon, whereby software engineers were forced to abruptly work from home. Nearly every developer started new working habits and organizational routines, while trying to stay mentally healthy and productive during the lockdowns. We are now starting to realize that some of these new habits and routines may stick with us in the future. Therefore, it is of importance to understand how we have worked from home so far. We investigated whether 15 psychological, social, and situational variables such as quality of social contacts or loneliness predict software engineers' well-being and productivity across a four wave longitudinal study of over 14 months. Additionally, we tested whether there were changes in any of these variables across time. We found that developers' well-being and quality of social contacts improved between April 2020 and July 2021, while their emotional loneliness went down. Other variables, such as productivity and boredom have not changed. We further found that developers' stress measured in May 2020 negatively predicted their well-being 14 months later, even after controlling for many other variables. Finally, comparisons of women and men, as well as between developers residing in the UK and USA, were not statistically different but revealed substantial similarities.



page 1


The Daily Life of Software Engineers during the COVID-19 Pandemic

Following the onset of the COVID-19 pandemic and subsequent lockdowns, s...

Predictors of Well-being and Productivity among Software Professionals during the COVID-19 Pandemic – A Longitudinal Study

The COVID-19 pandemic has forced governments worldwide to impose movemen...

Developers Task Satisfaction and Performance during the COVID-19 Pandemic

Following the onset of the COVID-19 pandemic and subsequent lockdowns, s...

Use and Perceptions of Multi-Monitor Workstations: A Natural Experiment

Using multiple monitors is commonly thought to improve productivity, but...

Do Programmers Work at Night or During the Weekend?

Abnormal working hours can reduce work health, general well-being, and p...

Gender Inequality in Research Productivity During the COVID-19 Pandemic

We study the disproportionate impact of the lockdown as a result of the ...

Studying Test-Driven Development and its Retainment Over a Six-month Time Span

In this paper, we investigate the effect of TDD, as compared to a non-TD...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The COVID-19 pandemic and the subsequent lockdowns have likely been the among the most disruptive events that most software engineers faced during their lifetime. Suddenly, professionals started to work from home, potentially alongside family members. This peculiar situation is unprecedented in computer science history; thus, we have virtually no information about the impact of lockdowns on the well-being and productivity of software professionals.

The only related evidence comes from the effects of quarantined people in previous epidemic outbreaks, which suggests that isolation and lockdown measures are a huge burden to individuals’ well-being [brooks2020] and productivity [lipsitch2020defining]. Indeed, well-being and productivity are two crucial aspects of our lives, particularly during extraordinary events: Well-being is a fundamental human right, according to the Universal Declaration of Human Rights whereas productivity provides us with the earnings to ideally maintain or improve our lifestyle. Health professionals already identified some relevant predictors of well-being during harmful events [brooks2020, farmer1986boredom]. However, this research is often cross-sectional (i.e., not longitudinal), only includes a limited number of predictors, focusses on well-being while ignoring productivity. The software engineering community also reacted quickly to this event by performing a large study which found that home office ergonomics, disaster preparedness, and fear are correlated with well-being and productivity [Ralph2020pandemic]. Nevertheless, this was also conducted cross-sectionally and with only a few predictors. Pre-pandemic research on remote work [donnelly2015disrupted] might provide some indications. However, it is unlikely that such research is still relevant during a global pandemic, with professionals locked down in their houses without childcare or usual welfare support provided during non-pandemic times.

For these reasons, we believe it is essential to investigate the well-being and productivity of software professionals continuously and longitudinally across the entire COVID-19 pandemic (as of Summer 2021). By doing so, we aimed to achieve several goals. First, identify relevant predictors of both well-being and productivity of software engineers working from home in a stressful context such as a lockdown. Second, test for causal relations between the identified variables and if well-being predicts productivity or vice versa (i.e., scholars found that they are interrelated but could not find a causal association [krekel2019employee, carolan2017improving, russo2020predictors]). Third, test whether well-being, productivity, and other relevant variables such as loneliness, social contacts, and need fulfillment changed of the course of 14 months since the beginning of the first lockdown in spring 2020. Fourth, provide data-driven recommendations about possible future lockdowns. Fifth, understand how to improve developers’ work-life balance while working from home in a post-pandemic setting and contribute to the nascent literature about the future of work. Hence, we formulate our research questions as follows:
Research Question 1: How have well-being, productivity, and other relevant social and psychological variables changed throughout the COVID-19 pandemic?
Research Question 2: Which variables predict Well-being and Productivity over time?

To answer our research questions, we surveyed 192 globally distributed software engineers four times over a period of 14 months. We assessed their well-being and productivity, alongside 15 other variables. To guide our research design, we grounded our investigation in organizational [herzberg2017motivation] and psychological [ryan2000self] theories, which are relevant for people’s well-being and productivity. For example, self-determination theory [ryan2000self] assumes that human motivation can be divided into three basic needs which are also linked with work motivation [gagne2005self]: the needs for autonomy, competence, and relatedness. Additionally, we also included evidence from the remote work literature [Lascau2019workers, anderson2015impact, bloom2015does], and recommendations by health and work authorities [nhs_2020, sst_2020, CIPD].

We analyzed our data using a range of different statistical approaches tailored to the specific questions. Specifically, to test whether well-being, productivity, and 15 variables including loneliness, needs, and social contacts have changed, we used 17 within-subject ANOVAs. To test whether well-being and productivity would be predicted over time by any of the 15 carefully selected variables (see section III-B

), we used six cross-lagged panel models. To assess whether there are any mean differences between women and men and participants living in the UK and USA, we used a series of between-subject t-tests. Results suggest that developers’ well-being and their quality of social contacts increased throughout the pandemic (i.e., between April 2020 and July 2021), while their emotional loneliness decreased. Productivity remained unchanged. Further, only stress at time 2 predicted developers’ well-being at time 4. Finally, we found no mean differences between women and men or people living in the UK and USA for any of the 17 variables we measured across all four waves.

This article has the following structure. Section II discusses the related work of well-being and productivity in the related work literature, as also recent advancements in the software engineering community. The Research Design and Analysis is then described in Section III. Following, in Section IV, we discuss the results of our analyses, as the implications and recommendations for professionals and software houses in Section V. Finally, we conclude our work by outlying future research directions in Section VI.

Ii Related Work

Following the abrupt onset of the COVID-19 pandemic and subsequent lockdowns, COVID-19 related research has expanded rapidly. Health scientists started investigating countermeasures to reduce the spread and impact of the virus and studied the psychological and physiological effects on people living in lockdown conditions. Also, in the software engineering community, the effect of the pandemic on software developers has gained increased attention. After describing the state of the art of the research on Well-Being and Productivity in Remote Work, we focus on the software engineering contributions.

Ii-a Well-Being and Productivity in Remote Work

There is a consensus that lockdown measures have a negative impact on well-being [brooks2020, lunn2020using]. In particular, research shows that living in a lockdown can result in increased experiences of anger, depression, emotional exhaustion, fear of infecting others or getting infected, insomnia, irritability, loneliness, low mood, post-traumatic stress disorders, and stress [sprang2013posttraumatic, hawryluck2004sars, lee2005experience, marjanovic2007relevance, reynolds2008understanding, bai2004survey]. Additionally, fears of e.g., infection [kim2015public, prati2011social], lack of supplies or not being treated [wilken2017knowledge], and misleading or contradictory information [caleo2018factors] can result in significantly increased stress levels. Moreover, the psychological effects of being locked down may appear years after [brooks2020].

On the other hand, pre-COVID research shows that remote working is associated with an improved work-life balance, creativity, productivity, reduced stress, and low carbon emissions due to the absence of commuting [owl_labs_2019, anderson2015impact, bloom2015does, vega2015within, baruch2000teleworking, cascio2000managing]. Nevertheless, there are also some apparent drawbacks related to remote work, such as deteriorating collaboration and communication, loneliness, feeling of being constantly ‘online,’ decreasing motivation, and distractions at home [buffer2020]. Besides such aspects, forecasts suggest that remote work will increase on a large scale in the next years [owl_labs_2019, gallup2020].

For this reason, research opportunities are extensive, also in the years to come. There are plenty of open questions, such as which variables and the extent to which these variables influence well-being and productivity in combination. Studying, e.g., the stress in remote work, without considering all the variables involved, provides little overall guidance for software engineering teams because it is unclear whether stress is more strongly related to well-being than, for example, loneliness or anxiety. Therefore, the presented paper studies these variables together rather than separately to identify the variable(s) most strongly associated with well-being and productivity.

Ii-B Software Engineering and COVID-19

Overall, the software engineering community has been quite active in researching pandemic-related aspects. We identified relevant work through Scopus and arXiv (considering that this research topic is highly contemporary, some papers are still under review).

The first works in this research area are from the late 90s with broader use of the internet. Pounder (1998) [pounder1998homeworking] was the first relevant contribution we identified, with an essay about security problems linked to telework. In the early 2000s, Guo (2001) [guo2001special] performed two qualitative surveys on software process improvement related to the distinctive nature of teleworking. Similarly, Higa et al. (2000) [higa2000understanding] studied how e-mail usage influences telework.

Afterward, there has been a twenty-year gap, with only two exceptions. James & Griffiths (2014) [james2014secure] developed a mobile execution environment to support a secure and portable working from home setting. Ford et al. (2019) [ford2019remote] interviewed three transgender software engineers to explore the interplay of gender identity and remote work.

Following the start of the pandemic and the first lockdown, two research groups performed survey studies. Ralph et al. (2020) [Ralph2020pandemic] performed a cross-sectional study of over two thousand globally distributed developers working from home during the pandemic where an a priori research model derived by literature was validated through Structural Equation Modeling. Russo et al. (2020) [russo2020predictors] went in the opposite direction. Rather than having a top-down model to validate, they employed an exploratory approach looking at the most relevant variables related to either well-being or productivity and analyzed the data through a longitudinal design.

Microsoft has also been active in understanding the effects of the pandemic on its employees. Ford et al. (2020) [ford2020tale] surveyed Microsoft’s developers twice. They found that the quality of family life and time improved, although remote work introduced a lack of focus, poor work-life boundaries, and communication and sync issues. Similarly, Miller et al. (2021) [miller2021your] performed two surveys in which they collected information about working from home and team-related issues. They found that communication and interaction with colleagues are relevant predictors of developers’ satisfaction and team productivity. Butler & Jaffe (2021) [butler2021challenges] conducted a 10-week diary study. Identified challenges from remote work were meetings, overwork, and physical and mental health. However, Microsoft developers appreciated more family time and work flexibility.

More recent studies focus on particular aspects of remote work. For example, Cucolaș & Russo (2021) [Cucolas2021Scrum], with a Mixed-Methods research design, investigated how Scrum software development adapted to working from home. According to their results, the home-working environment is the most crucial variable for a software project’s success. Also, self-determination theory [ryan2000self] (i.e., the need for autonomy, competence, and relatedness) is a valuable theoretical lens to improve working from home conditions, as they are linked with well-being [cantarero2021affirming], for example. Finally, Machado et al. (2021) [machado2021gendered] surveyed 233 Brazilian software professionals and investigated gender differences. They concluded that the pandemic affected women more negatively than men. In contrast, Russo et al. did not found any meaningful gender differences [russo2020predictors].

From a content perspective, half of the papers are concerned with specific topics related to remote work i.e., security [pounder1998homeworking, james2014secure], process [guo2001special], work productivity [higa2000understanding], and inclusion [ford2019remote]. Where the other half focused on well-being and productivity aspects of remote work [ford2020tale, Ralph2020pandemic, russo2020predictors, butler2021challenges, machado2021gendered, lamarche2020socially] and productivity related to project characteristics [bao2020does, Cucolas2021Scrum].

Iii Research Design

To design our research, we followed the ACM SIGSOFT Empirical Standards for Longitudinal Studies [ralph2020empirical]. Consequently, we asked carefully recruited software professionals to complete the same survey four times, over a period of 14 months. Wave 1 was collected between 26-30 April 2020, wave 2 between 10-13 May 2020, wave 3 between 24 February and 3 March 2021, and wave 4 between 29 June and 5 July 2021. Wave 1 and 2 were only two weeks apart since we were initially only interested in the stability of predictors of well-being and productivity. Wave 3 was collected in late winter 2021 when the number of COVID-19 cases in most Western countries decreased again, and wave 4 was when a significant part of people in Western countries had received an offer to get vaccinated. Unique randomized IDs were assigned to participants to preserve their anonymity and track their participation across all four waves.

Iii-a Participants

The sample size was initially determined to be able to detect a small-to-medium effect size of = .15 for a repeated-measurement (within-subject) ANOVA, using a power of .80 and a corrected level of .004 (see section III-C1 for a justification of the lower level ). A power analysis using G*Power  [faul2009statistical] revealed that we would need a sample size of at least 102 participants who participated in all four data collection waves. We selected participants from a pool of over 500 software engineers as previously identified [russo2020gender]. These informants have been selected through a multi-screen process, where we assessed for representativeness through pre-screening (both in terms of computer programming experience and profession, but also task quality on the data collection platform), competence screening (competency-based questions on software design and programming), and quality screening (attention checks). Through additional screening questions, we subsequently narrowed this pool down to 192 professionals. In particular, we looked for informants who were working from home during the pandemic for at least 50% of their time and did not live in countries with jeopardized COVID regulations (e.g., Germany). 192 software engineers completed the first survey ( = 36.65 years,  = 10.77, range = 19–63; 154 men, 38 women), 184 participated in wave 2, 144 in wave 3, 124 in wave 4, and 107 participated in all four waves and completed all measures. Similarly, to ensure data consistency, we only included participants living in countries with comparable lockdown policies (e.g., excluding countires like Germany who enforced different policies among the Laender or Sweden with a rather liberal approach to the pandemic). Demographic information are provided in Table I. We ensured high data quality by recruiting participants from the data collection platform Prolific Academic [palan2018prolific] and compensated participants above the USA’s minimum wage. Additionally, none of our participants failed any attention checks or completed the survey in a concise time, which further ensures the quality of our data. The survey was run using the platform Qualtrics.

To collect the data, we attained ourselves to the ethical guidelines of the Declaration of Helsinki [general2014Helsinki]. All participants were at least 18 years old and expressed their consent to participate in the study each time. Also, they were free to withdraw at any point. The lead author also completed formal training in research ethics for engineering and behavioral sciences.

N % of sample
Less than high school degree 0 0%
High school graduate 4 3.2%
Some college but no degree 16 12.9%
Bachelor’s degree 63 50.8%
Master’s degree 35 28.2%
Doctoral degree 6 4.8%
United Kingdom 39 31.5%
United States 30 24.2%
Portugal 14 11.3%
Italy 6 4.8%
Ireland 6 4.8%
Other 29 23.4%
TABLE I: Overview of sample’s educational attainment and location of Wave 4.

Iii-B Measurements

Well-being and productivity are two complementary variables of a healthy working environment. Not surprisingly, they are correlated [russo2020predictors]. Especially in exceptional times, such as a pandemic, organizations should prioritize employees’ mental and physical well-being if they want to be productive. On the other hand, as suggested by Russo et al. [russo2020predictors], contributing to the organization’s value is important for the sense of belonging or achieving of every developer. Therefore, productivity does also contribute to professionals’ well-being [russo2020predictors].

Consequently, productivity and well-being are our two outcome variables (i.e., dependent variables). To identify relevant predictors (or our independent variables) of our dependent variables, we started from the insights of Russo et al. [russo2020predictors]. Namely, we included in this analysis only the 15 (out of 50) predictors which correlated with at least one of the outcome variables (i.e., [russo2020predictors]

. This was done to keep the number of predictor variables to a manageable amount.

All variables were measured using self-reported measures, which is very common in the literature [Ralph2020pandemic, russo2020gender]. The internal consistency of the scales was quantified with Cronbach’s and ranged from satisfactory to very good. Values above .60 and .70 are desirable for exploratory and confirmatory research, respectively [hair2013multivariate].

To measure the identified variables we only used either validated scales or adapted items from scales used in previous publications with high reliabilities. The only exceptions were ‘productivity’, ‘quality and quantity of communication with colleagues and line managers’, and ‘daily routines’ for which we created our own items because we could not find existing scales suitable for our purposes. Responses were mostly given on 5-, 6-, or 7-point response scales with higher values indicating a higher score on each variable. Every scale is briefly subsequently described with its name, reference, and reliability metrics (i.e., Cronbach’s alpha) across all the four data collection waves. In this paper, we use the terms ‘wave’ and ‘time’ interchangeably. For a detailed descriptions of the items see Russo et al. [russo2020predictors].

Well-being. We measured well-being with the 5-item Satisfaction with Life Scale [diener1985satisfaction]. Participants were asked to report their well-being using items such as ”I was satisfied with my life in the past week” on a 7-point Likert scale (1: Strongly disagree, 7: Strongly agree). The Cronbach’s values to measure internal consistency for all four data collection waves were the following , , , .

Productivity. There is no agreement among researchers on how productivity can be measured. For example, measuring productivity in an allegedly objective way by using function points [wagner2018systematic] has been criticized as detrimental in the long run [ko2019we]. Further, the objective approach is barely feasible if participants work in different areas since comparisons across work are very challenging. Therefore, other researchers advocated using self-reports [meyer2014software], which has apparent shortcomings such as subjectivity. In the present research, we developed a subjective approach to reduce social desirability by making the survey anonymous. Specifically, we operationalized productivity as a function of time spent working and efficiency per hour, compared to a typical, pre-pandemic week. The reason for this choice is that we wanted to investigate productivity while working remotely as compared to being in the office. Since our measure does not allow to compute internal consistency, we instead computed test-retest reliability by correlating the productivity scores at time 1 with those at time t2 ().

Boredom was measured with the Boredom Proneness Scale [farmer1986boredom, struk2017short]; , , , .

Self-blame and behavioral disengagement, two coping strategies, were measured with the respective subdimensions of the Brief COPE scale [Carver1997BriefCOPE]. Cronbach’s ’s for self-blame were , , , , and for behavioral disengagement , , , .

Distractions at home was measured with a 2-item scale we developed (, , , .

Generalized anxiety was measured with an adapted version of the -item Generalized Anxiety Disorder scale [Spitzer2006GAD7]; , , , .

Emotional and social loneliness were measured with the De Jong Gierveld Loneliness Scale [gierveld2006]. Emotional loneliness’ Cronbach’s -levels were: , , , , and for social loneliness: , , , .

Autonomy, competence, and relatedness were measured with the psychological needs scale [sheldon2012balanced]. Need for autonomy’s Cronbach’s -levels were: , , , ; for Competence: , , , ; and for Relatedness: , , , .

Quality of social contacts were measured with 3-items, two of which were adapted from the social relationship quality scale [birditt2007relationship] and one was developed by us, , , , .

Quality and quantity of communication with colleagues and line managers were measured with a self-developed 3-item scale (, , , ).

Stress was measured with the Perceived Stress Scale [Cohen1988perceived]; , , , .

Daily Routines were measured by a self-developed 5-item scale (, , , .

Extraversion was measured with a subscale of the Brief HEXACO Inventory [DeVries2013HEXACO]; , , , .

Iii-C Analysis

In total, we used three different types of analyses, which seemed most appropriate to us, to answer our research question and to perform additional exploratory analysis. Below, we briefly describe and justify each of them.

Raw data, R-code to reproduce our analyses, and the zero-order correlations for all 17 variables, separately per wave and across all data collection waves, are included in the supplemental materials.

Iii-C1 Changes along the COVID-19 Pandemic

To test whether any change between the four data collection waves occurred, we ran a series of 17 repeated-measures ANOVAs, one per variable. This allowed us to test if, for example, software engineer’s well-being increased, decreased, or remained the same. Additional to the common descriptive (means and standard deviations) and inferential statistics (F-value


The F-value is a test-statistic that increases with larger mean-differences, lower within-group variability or larger sample size. It is, for fixed sample size, inversely related to the p-value which is used to determine whether our findings are statistically significant.

and -value), we report as an effect size how many participants report a higher, lower, or equal level of any variable at time 4 compared to time 1. Given the number of 17 tests of variables, which are, however, mostly correlated with each other, we set our -level to .004. That is, we only consider findings to be significant if . This threshold is, in our view, neither conservative nor liberal. However, we acknowledge that other researchers might prefer a more conservative or liberal threshold. We, therefore, report the exact -values, which allows researchers to select different thresholds.

Iii-C2 Exploring causality

To test whether any of the 15 predictor variables predict our two outcome variables at time 4, well-being and productivity, we ran six cross-lagged panel models in which we regressed well-being or productivity at time 4 onto all predictors, and, crucially, both outcomes. It is essential to also include, for example, well-being at time 1 as a predictor for well-being at time 4, because otherwise, we might erroneously conclude that, for instance, anxiety is related to only the aspects of anxiety that are correlated with well-being. We realize that there are different views about inferring causality between two variables, A and B. While some have a stringent view on causality, which requires being able to rule out that any third variable is responsible for the association between A and B (for an overview see [rohrer2018thinking]), others argue that it is sufficient to show that A is correlated with B and A is measured before B [granger1980testing]. A middle point is to argue that A measured at time 1 needs to predict B at time 2 while controlling for B measured at time 1, to be able to state that A causally predicts B [rogosa1980critique]. We use this view and go a step further by also controlling for a range of other variables. This approach has two advantages over a series of models with only one predictor (e.g., stress) and one outcome (e.g., well-being), which are also common in the literature. First, using only one variable as predictor as opposed to 15 would have resulted in many more models and we would had therefore needed to control for many comparisons. Second, by controlling for many related variables, our approach is conservative as it focuses on the unique impact of each predictor variable. For example, by simultaneously including anxiety, stress, loneliness alongside other variables as predictors of well-being into the same model, we focus on the unique impact of each predictor on well-being. Further, we only focused on well-being and productivity at time 4 because it is the most recent wave, and it is crucial to allow the outcomes to vary (most measured variables are stable over time [russo2020predictors]

). If, for example, well-being at time 1 would be very highly correlated with well-being at a subsequent time, there would be minimal variance for the other predictors to explain because well-being at time 1 would already explain most of the variance. Thus, we ran two cross-lagged panel models (one with well-being and one with productivity as predictor) with variables measured at time 1, two with variables measured at time 2, and two with variables measured at time 3 as independent variables. Given the total amount of six comparisons, we set our

-threshold to .008. Note that we are adjusting the -threshold based on the number of comparisons per type of analysis. For example, we ran six cross-lagged panel models, but 17 repeated measures ANOVAs. Thus, the -threshold had to be different.

Iii-C3 Between-group comparisons

Additionally, we compared women and men, and people living in the United Kingdom and the USA (these were the two countries from which relatively most of our participants came from) across all 17 variables and all 4 time points, resulting in between-subject t-tests. We, therefore, adjusted our threshold to .0005. To address recent calls to report effect sizes that display similarities to avoid a one-sided focus on potentially small differences [hanel2019new], we also report the effect size Percentages of Common Responses (PCR) alongside the more common effect size Cohen’s . PCR is a measure of overlap between two groups (e.g., women and men) and ranges from 0 (no overlap/similarities) to 100 (both groups overlap perfectly).

Iv Results

In the first step, we tested for construct validity by correlating all 17 variables with each other separately for each data collection wave. The zero-order Pearson correlations across all waves were as expected. For example, well-being correlated negatively with stress, loneliness, and boredom, and positively with need for autonomy, competence, and relatedness, which is in line with the literature [diener_beyond_2009, miller2011loneliness, russo2020predictors]. Details of those tests are in the Supplementary Materials.

Iv-a Changes along COVID-19 Pandemic

The results of the 17 repeated-measures ANOVA are displayed in Table II. Four of the ANOVAs were significant. Well-being (Fig 1), quality of social contacts (Fig 5), and self-blame (Fig 3) increased and emotional loneliness decreased (Fig 4). Behavioral disengagement was higher at time 2 than at time 1 but went down to the starting point at times 3 and 4. For well-being, for example, 77 developers reported higher levels at time 4 than at time 1, 38 lower levels, and 9 an equal amount of well-being (cf. Tab. II). In contrast, productivity remained stable over time (Fig 2).222

Note that the repeated-measures ANOVAs necessarily only included the 107 participants that took part in all four waves whereas descriptive statistics reported in Table 

II and Figures 1 to  5 all participants in each wave (e.g., 192 in wave 1) to improve prevision of the statistics we report.

Variable M1 SD1 M2 SD2 M3 SD3 M4 SD4 F-value p-value Greater Smaller Equal
Well-being 4.14 1.37 4.34 1.29 4.4 1.45 4.7 1.45 10.25 77 38 9
Productivity 0.99 0.42 1.03 0.44 1.07 0.44 1.13 0.51 3.36 0.1329 74 43 4
Boredom 2.94 1.14 2.93 1.16 2.83 1.27 2.77 1.18 0.98 0.4036 52 64 8
Behavioral disengagement 1.8 0.94 2.06 1.03 1.88 1.11 1.84 1.07 5.49 0.001 36 28 60
Distraction at home 2.47 0.93 2.44 0.9 2.41 0.96 2.38 0.92 0.33 0.8013 39 47 38
Self-blame 1.81 0.99 1.88 1.01 2.28 1.29 2.25 1.26 18.68 56 21 47
Generalized anxiety 2.25 1 2.17 1.01 2.2 1.07 2.38 0.92 1.1 0.3492 62 51 11
Emotional loneliness 2.11 0.9 2.01 0.87 2.1 0.91 1.88 0.9 6.49 36 66 22
Social loneliness 2.64 1 2.56 1.02 2.79 1.08 2.73 1.04 4.41 0.0045 52 51 21
Need for relatedness 3.5 0.83 3.56 0.8 3.48 0.84 3.59 0.82 1.91 0.1265 65 46 13
Need for competence 3.57 0.74 3.58 0.73 3.62 0.76 3.67 0.74 0.59 0.6187 55 51 18
Need for autonomy 3.48 0.69 3.51 0.73 3.42 0.77 3.51 0.77 2.49 0.0599 57 49 18
Quality of social contacts 4.11 1.09 4.31 1.08 4.07 1.12 4.26 1.13 4.5 0.004 66 43 15
Communication 4.53 1 4.29 1.19 4.44 1.21 4.38 1.2 2.68 0.0465 43 49 27
Stress 2.5 0.81 2.52 0.8 2.52 0.88 2.44 0.85 1.43 0.232 44 56 24
Daily routines 4.68 1.56 4.72 1.53 4.83 1.58 4.82 1.58 0.26 0.8552 48 52 24
Extraversion 3.45 0.79 3.46 0.78 3.47 0.8 3.46 0.71 0.37 0.772 49 44 31
TABLE II: Within-subject ANOVAs for all 17 variables, significant variables at p 0.004 highlighted. represents the mean value of each wave and its standard deviation.
Fig. 1: Well-being across time. The red line displays the trend over time, whereas the box at each time point shows the range in which the middle 50% of the data falls. Responses were given on a 7-point scale ranging from 1 to 7.
Fig. 2: Productivity across time. The red line displays the trend over time, whereas the box at each time point shows the range in which the middle 50% of the data falls. A productivity score of one indicates that productivity has not changed compared to pre-pandemic levels, scores 1 that productivity increased and scores of 1 that productivity decreased.
Fig. 3: Self-blame across time. The red line displays the trend over time, whereas the box at each time point shows the range in which the middle 50% of the data falls. Responses were given on a 5-point scale.
Fig. 4: Emotional loneliness across time. The red line displays the trend over time, whereas the box at each time point shows the range in which the middle 50% of the data falls. Responses were given on a 5-point scale.
Fig. 5: Quality of social contacts across time. The red line displays the trend over time, whereas the box at each time point shows the range in which the middle 50% of the data falls. Responses were given on a 6-point scale.

Iv-B Exploring causality

We ran six cross-lagged panel models to test which variable causally explains well-being and which productivity at time 4.

In the first model, we used well-being as outcome and all 17 variables listed in Table II measured at time 1 as predictors. The overall model was significant, . Non-surprisingly, well-being at time 1 predicted well-being at time 4, , indicating high stability of developers’ well-being across time. However, none of the other variables was significant.

In the second model, we used productivity at time 4 as outcome and all 17 variables measured at time 1 as predictors. The overall model was not significant, . In the third model, we used well-being as outcome and all 17 variables measured at time 2 as predictors. The overall model was significant, . Non-surprisingly, well-being at time 2 significantly predicted well-being at time 4, . Interestingly, stress at time 2 negatively predicted well-being at time 4, . None of the remaining 15 variables were significant.

In the fourth model, we used productivity at time 4 as outcome and all 17 variables listed in Table II measured at time 2 as predictors. The overall model was not significant, .

In the fifth model, we used well-being as outcome and all 17 variables measured at time 3 as predictors. The overall model was significant, . Non-surprisingly, well-being at time 3 predicted well-being at time 4, . However, none of the other variables was significant.

In the sixth and final model, we used productivity at time 4 as outcome and all 17 variables measured at time 3 as predictors. The overall model was significant, . Productivity at time 3 predicted productivity at time 4,

Iv-C Between-group comparisons

Finally, we compared women and men (results are summarized in Table III) and people living in the United Kingdom and the USA in Table IV (these were the two countries from which most of our participants came) across all 17 variables and all 4 time points. At time 1, our sample consisted of 37 women and 154 men, and 63 people were living in the UK and 52 in the USA. At time 4, 27 women and 96 men remained, as well as 39 people living in the UK and 30 in the USA. However, none of the 68 between-gender comparisons reached statistical significance at , all . Instead, similarities between groups were large, PCR = 91.65, .

Also, none of the 68 between-country comparisons reached statistical significance, all . Instead, similarities between groups were large, PCR = 94.82, .

Wave 1 Wave 2
Men M Men SD Women M Women SD t-value p-value Cohen’s d PCR Men M Men SD Women M Women SD t-value p-value Cohen’s d PCR
Well-being 4.109 1.336 4.263 1.495 -0.581 0.5639 -0.113 95.494 4.388 1.258 4.151 1.407 0.933 0.3553 0.183 92.71
Productivity 1.008 0.416 0.917 0.43 1.18 0.243 0.218 91.32 1.029 0.433 1.043 0.452 -0.175 0.862 -0.033 98.684
Boredom 2.942 1.13 2.908 1.179 0.163 0.8713 0.03 98.803 2.923 1.122 2.943 1.309 -0.082 0.9353 -0.016 99.362
Behavioral disengagement 1.799 0.935 1.829 0.953 -0.176 0.8611 -0.032 98.723 1.997 0.957 2.324 1.259 -1.479 0.1458 -0.32 87.288
Self blame 1.753 0.957 2.053 1.095 -1.546 0.1283 -0.304 87.919 1.83 0.955 2.081 1.211 -1.173 0.2465 -0.248 90.132
Relatedness 3.483 0.801 3.557 0.948 -0.446 0.6577 -0.089 96.451 2.459 0.869 2.378 1.003 0.449 0.655 0.09 96.411
Competence 3.566 0.704 3.596 0.862 -0.202 0.8407 -0.041 98.364 2.098 0.963 2.475 1.144 -1.844 0.0712 -0.376 85.088
Autonomy 3.476 0.668 3.509 0.771 -0.239 0.8119 -0.047 98.125 1.975 0.837 2.135 0.998 -0.899 0.373 -0.184 92.67
Communication 4.511 1.004 4.623 0.972 -0.625 0.5343 -0.112 95.534 2.56 0.985 2.577 1.151 -0.08 0.9365 -0.016 99.362
Stress 2.468 0.744 2.638 1.025 -0.966 0.3391 -0.212 91.558 3.56 0.748 3.554 1.004 0.034 0.9728 0.008 99.681
Daily routines 4.758 1.469 4.368 1.877 1.191 0.2394 0.25 90.052 3.605 0.692 3.491 0.874 0.74 0.4628 0.156 93.783
Extraversion 3.401 0.786 3.638 0.766 -1.7 0.0944 -0.303 87.958 3.526 0.698 3.45 0.863 0.494 0.6236 0.103 95.893
Distractions 2.481 0.884 2.408 1.126 0.37 0.7127 0.078 96.889 4.281 1.055 4.432 1.165 -0.719 0.4754 -0.14 94.419
Generalized anxiety 2.123 0.916 2.738 1.175 -3.007 0.0042 -0.632 75.2 4.293 1.118 4.288 1.434 0.02 0.9839 0.004 99.84
Emotional loneliness 2.022 0.848 2.474 1.033 -2.498 0.0158 -0.51 79.872 2.485 0.748 2.662 0.965 -1.042 0.3025 -0.223 91.122
Social loneliness 2.667 0.969 2.535 1.146 0.653 0.5169 0.131 94.778 4.902 1.439 3.982 1.689 3.049 0.0037 0.617 75.77
Social contacts 4.032 1.068 4.421 1.149 -1.893 0.0637 -0.358 85.794 3.405 0.775 3.662 0.769 -1.817 0.0745 -0.333 86.776
Wave 3 Wave 4
Men M Men SD Women M Women SD t-value p-value Cohen’s d PCR Men M Men SD Women M Women SD t-value p-value Cohen’s d PCR
Well-being 4.456 1.418 4.207 1.576 0.787 0.4356 0.172 93.147 4.718 1.479 4.63 1.341 0.294 0.7698 0.061 97.567
Productivity 1.057 0.413 1.116 0.555 -0.542 0.5912 -0.132 94.738 1.057 0.413 1.116 0.555 -0.542 0.5912 -0.132 94.738
Boredom 2.786 1.253 3 1.361 -0.778 0.4408 -0.168 93.306 2.774 1.163 2.741 1.243 0.126 0.9 0.029 98.843
Behavioral disengagement 1.798 1.057 2.183 1.263 -1.535 0.1327 -0.349 86.147 1.799 1.082 1.981 1.014 -0.815 0.4195 -0.171 93.186
Self blame 2.167 1.208 2.7 1.495 -1.805 0.0787 -0.419 83.406 2.119 1.224 2.741 1.296 -2.232 0.0313 -0.502 80.181
Distractions 2.408 0.921 2.417 1.099 -0.04 0.9682 -0.009 99.641 2.381 0.946 2.352 0.83 0.159 0.8745 0.032 98.723
Generalized anxiety 2.099 1.005 2.605 1.245 -2.055 0.0466 -0.478 81.111 2.012 1.027 2.397 1.133 -1.592 0.1194 -0.366 85.48
Emotional loneliness 2.05 0.883 2.311 1.017 -1.286 0.2055 -0.287 88.59 1.845 0.904 2.012 0.908 -0.846 0.4025 -0.185 92.63
Social loneliness 2.789 1.071 2.789 1.153 0.003 0.998 0.001 99.96 2.708 1 2.827 1.178 -0.48 0.634 -0.115 95.415
Relatedness 3.554 0.793 3.2 0.952 1.873 0.0684 0.428 83.055 3.624 0.828 3.481 0.814 0.8 0.4283 0.172 93.147
Competence 3.664 0.716 3.444 0.892 1.245 0.2204 0.29 88.471 3.732 0.684 3.426 0.887 1.66 0.1057 0.418 83.445
Autonomy 3.488 0.745 3.144 0.807 2.11 0.0407 0.454 82.042 3.522 0.789 3.488 0.72 0.217 0.8295 0.045 98.205
Social contacts 4.099 1.085 3.944 1.26 0.616 0.5411 0.138 94.499 4.323 1.09 4.012 1.259 1.166 0.2509 0.275 89.064
Communication 4.444 1.171 4.402 1.373 0.152 0.8803 0.035 98.604 4.372 1.199 4.407 1.221 -0.132 0.8956 -0.029 98.843
Stress 2.471 0.858 2.717 0.96 -1.272 0.2104 -0.279 88.906 2.392 0.885 2.63 0.708 -1.457 0.1513 -0.28 88.866
Daily routines 5.029 1.434 4.078 1.875 2.588 0.0136 0.62 75.656 5.003 1.446 4.148 1.884 2.186 0.0356 0.552 78.255
Extraversion 3.454 0.766 3.517 0.94 -0.337 0.7377 -0.078 96.889 3.443 0.727 3.528 0.663 -0.573 0.5692 -0.118 95.295
TABLE III: Comparisons between women and men
Wave 1 Wave 2
UK M UK SD US M US SD t-value p-value Cohen’s d PCR UK M UK SD US M US SD t-value p-value Cohen’s d PCR
Well-being 4.248 1.302 4.288 1.448 -0.158 0.8752 -0.03 98.803 4.294 1.22 4.392 1.461 -0.381 0.7039 -0.074 97.049
Productivity 1.018 0.453 0.936 0.385 1.047 0.2975 0.193 92.312 0.977 0.414 1.076 0.472 -1.162 0.248 -0.225 91.043
Boredom 2.857 1.072 2.889 1.194 -0.151 0.8802 -0.029 98.843 2.96 1.159 2.74 1.166 0.994 0.3226 0.189 92.471
Behavioral disengagement 1.865 0.885 1.683 0.852 1.123 0.2641 0.21 91.638 2.089 0.952 1.91 1.024 0.948 0.3456 0.182 92.749
Self blame 1.786 0.932 1.74 1.059 0.241 0.81 0.046 98.165 1.944 1.025 1.68 0.896 1.451 0.1498 0.272 89.182
Relatedness 3.521 0.772 3.545 0.827 -0.158 0.875 -0.03 98.803 2.54 0.816 2.5 0.985 0.232 0.8168 0.045 98.205
Competence 3.569 0.734 3.593 0.873 -0.159 0.8742 -0.03 98.803 2.219 0.965 1.989 0.962 1.257 0.2114 0.239 90.488
Autonomy 3.503 0.7 3.516 0.754 -0.098 0.9223 -0.018 99.282 2.059 0.938 1.887 0.796 1.052 0.2949 0.197 92.154
Communication 4.472 1.031 4.593 1 -0.624 0.5341 -0.119 95.255 2.527 0.942 2.493 1.135 0.168 0.8673 0.032 98.723
Stress 2.528 0.713 2.312 0.871 1.43 0.156 0.273 89.143 3.538 0.711 3.593 0.847 -0.372 0.7111 -0.072 97.128
Daily routines 4.889 1.409 4.474 1.738 1.385 0.1693 0.265 89.459 3.605 0.67 3.65 0.815 -0.315 0.7534 -0.061 97.567
Extraversion 3.552 0.728 3.486 0.799 0.459 0.6473 0.087 96.53 3.565 0.751 3.443 0.818 0.808 0.421 0.155 93.823
Distractions 2.532 0.92 2.385 1.018 0.806 0.4222 0.152 93.942 4.274 1.12 4.327 1.189 -0.238 0.8121 -0.046 98.165
Generalized anxiety 2.265 0.942 2.134 1.075 0.689 0.4926 0.131 94.778 4.383 1.237 4.245 1.263 0.573 0.5678 0.11 95.614
Emotional loneliness 2.048 0.956 2.038 0.802 0.056 0.9555 0.01 99.601 2.573 0.69 2.34 0.89 1.516 0.133 0.296 88.234
Social loneliness 2.619 0.912 2.583 1.074 0.19 0.8498 0.036 98.564 4.817 1.531 4.647 1.646 0.562 0.5752 0.108 95.694
Social contacts 4.053 1.091 4.218 1.064 -0.818 0.415 -0.153 93.902 3.516 0.742 3.53 0.798 -0.094 0.925 -0.018 99.282
Wave 3 Wave 4
UK M UK SD US M US SD t-value p-value Cohen’s d PCR UK M UK SD US M US SD t-value p-value Cohen’s d PCR
Well-being 4.508 1.18 4.611 1.616 -0.323 0.7481 -0.074 97.049 4.751 1.325 4.733 1.554 0.051 0.9596 0.013 99.481
Productivity 1.103 0.511 1.081 0.439 0.211 0.8331 0.046 98.165 1.103 0.511 1.081 0.439 0.211 0.8331 0.046 98.165
Boredom 2.865 1.135 2.413 1.148 1.792 0.0772 0.396 84.305 2.79 1.161 2.558 1.218 0.806 0.4234 0.195 92.233
Behavioral disengagement 1.875 1.137 1.722 1.168 0.6 0.5502 0.133 94.698 1.817 1.053 1.833 1.155 -0.061 0.9517 -0.015 99.402
Self blame 2.271 1.12 1.889 1.321 1.398 0.1665 0.316 87.446 2.305 1.298 2.117 1.112 0.656 0.5141 0.154 93.862
Distractions 2.354 0.844 2.25 0.914 0.534 0.595 0.119 95.255 2.329 0.826 2.367 0.964 -0.171 0.8646 -0.042 98.325
Generalized anxiety 2.193 1.006 2.095 1.137 0.411 0.682 0.092 96.331 2.024 0.963 2.048 1.175 -0.089 0.9297 -0.022 99.122
Emotional loneliness 2.083 0.86 2.028 0.977 0.271 0.787 0.061 97.567 1.959 0.895 1.744 0.87 1.016 0.3135 0.243 90.33
Social loneliness 2.674 0.903 2.491 1.197 0.768 0.4456 0.176 92.988 2.553 0.89 2.722 1.135 -0.679 0.5002 -0.169 93.266
Relatedness 3.479 0.764 3.718 0.799 -1.379 0.1719 -0.306 87.84 3.65 0.714 3.689 0.882 -0.197 0.8449 -0.049 98.045
Competence 3.632 0.643 3.764 0.79 -0.819 0.4158 -0.186 92.59 3.65 0.694 3.672 0.839 -0.116 0.9079 -0.029 98.843
Autonomy 3.403 0.716 3.523 0.832 -0.696 0.4888 -0.157 93.743 3.614 0.706 3.567 0.711 0.277 0.7827 0.067 97.328
Social contacts 4.014 1.034 4.38 1.042 -1.597 0.1144 -0.353 85.99 4.374 1.123 4.322 1.049 0.199 0.8426 0.047 98.125
Communication 4.299 1.283 4.81 1.011 -2.028 0.0459 -0.434 82.821 4.275 1.322 4.444 0.996 -0.612 0.5428 -0.142 94.34
Stress 2.547 0.738 2.403 0.955 0.753 0.4545 0.172 93.147 2.409 0.784 2.35 0.829 0.301 0.7648 0.073 97.088
Daily routines 4.944 1.6 5.009 1.552 -0.187 0.8522 -0.041 98.364 5.122 1.533 4.756 1.744 0.92 0.3615 0.226 91.003
Extraversion 3.568 0.766 3.576 0.843 -0.049 0.9614 -0.011 99.561 3.591 0.72 3.517 0.707 0.437 0.6636 0.105 95.813
TABLE IV: Comparisons between developers based in the United Kingdom and United States of America

V Discussion

Building on the collected evidence and the previous literature, we discuss the implications of our investigation for software professionals and organizations. Furthermore, we explain the intrinsic limitations of this study and how we tried to cope with those.

The readers should be aware that our findings are based on group-level inferences, which do not always generalize to the individual level. For example, the results of the within-subject ANOVAs inform us whether the average of a variable changed over time, not whether all individuals changed in the same direction. As can be seen in Table II, while the well-being of 77 developers increased between time 1 and 4, the well-being of 38 developers dropped. Thus, it is only more likely (i.e., approximately twice as likely) to find a developer whose well-being increased instead of dropped. Interestingly, the change over time was not always linear. For example, emotional loneliness first went down between Time 1 and Time 2, then slightly up again at Time 3, and finally down again. This might be because many countries started to (announce plans to) open up again around the time when we collected the second wave. In contrast, the third wave was collected in February 2021: In the UK and USA, for example, in the winter 2020/21 the deaths of many more people was associated with COVID-19 compared to spring 2020. Similarly, many situational factors or variables we have not measured, such as the perceived severity of local lockdowns or loss of a loved one (e.g., because of COVID-19), would likely have explained additional variance in developers’ well-being and productivity. Nevertheless, we aimed to provide generalizable evidence with this longitudinal study. However, qualitative investigations (e.g., [miller2021your, ford2020tale, butler2021challenges]) add to a nuanced understanding of individual phenomena. When drawing company guidelines, these and other studies should also be considered since our recommendations will not be exhaustive.

V-a Implications

Based on our results, we provide recommendations for the software engineering community (cf. also Table V).

We found that developers’ well-being increased over time. We have no pre-pandemic data, so we can not assess how the lockdown initially impacted software professionals. It could be that their well-being went down in Spring 2020 and is now bouncing back to pre-pandemic times. This reasoning would be in line with previous research showing that people’s well-being usually bounces back after a significant negative event [oswald2008does]. While research from the start of the pandemic (i.e., spring 2020) indicates that developers’ well-being decreased initially [Ralph2020pandemic], our findings provide a more positive outlook that developers’ well-being bounced back. Our findings also suggest that working from home does not negatively impact developers’ well-being, as otherwise well-being would not have increased as much. This supports new company policies implementing hybrid or full remote work settings.

Productivity remained constant during the pandemic. Although we report a slight increase in productivity over the four data collection waves (as plotted in Figure 2), and more people reported an increase in productivity compared to those who reported a decrease (cf. Table II), the mean differences are non-significant (), indicating that the observed increase could very well be random and might not replicate. Since measuring productivity is non-trivial, we followed a previous study example [russo2020predictors] by measuring productivity as a self-reported function compared to the pre-pandemic period. We therefore conclude that the productivity level of software professionals did not change not only throughout the lockdown but also compared to the pre-pandemic time. This finding also contradicts previous research suggesting that the lockdown is detrimental for productivity [ralph2020empirical], possibly because of differences in the research design (cross-sectional vs. longitudinal design) and operationalization of productivity (Ralph et al. [ralph2020empirical] used a different measure of productivity). Alternatively (or additionally), we collected our sample approximately one months after Ralph et al. [ralph2020empirical] and predominantly from countries which were relatively underrepresented in the sample of Ralph et al., who recruited most of their participants from Germany, Russia, and Brazil. Our results substantiate our previous conclusion that a hybrid or full remote working environment would not per se harm the productivity levels of developers.

Even though all typical welfare support (e.g., childcare, schools, sports facilities) was closed, software engineers showed a high level of adaptation by keeping the same productivity levels and steadily increasing their well-being levels. Consequently, in a post-pandemic working from home context, with all support facilities normally running, working from home is very unlikely impacted developers’ well-being negatively. Qualitative findings support this argument, suggesting that working from home significantly improved work-life balance [ford2020tale]. Similarly, a large-scale cross-sectional study observed that % of the surveyed professionals would like to continue to work remotely (especially in a hybrid fashion), also in the future to come [Walton2020NZadaptation]. However, previous research regarding the impact of working from home on productivity is mixed. Some studies found that working from home is positively or unrelated to productivity [bao2020does, barrero2021working, deole2021home, russo2020predictors], whereas other research found that working from home has some negative effects [gibbs2021work, kitagawa2021working, morikawa2020productivity]).

Software professionals felt less lonely and improved their social contacts. During the first lockdown in Spring 2020, many people had to abruptly reduce their social interactions [chan2021can]

. As a consequence, this increased the sense of loneliness and isolation. Nevertheless, also, in this case, developers showed a high level of resilience. Indeed, we report a significant decrease in emotional loneliness and an increase in the quality of social contact. This means that software engineers increasingly reached out to their social contacts when they felt lonely, thereby coping well with the challenging conditions of the pandemic. Similarly, the quality of their relationships increased. This is important because having a reliable social support network is an essential coping mechanism, especially in hard times and in moments of high stress 

[weinstein2011self, carver1989assessing].

These findings are relevant for organizations planning to implement a hybrid or remote work policy. Software engineers showed a high level of resilience when coping with unexpected events. At the same time, their social network was a crucial support while working from home. This insight is also supported by previous research, where communication was found a relevant predictor for developers’ satisfaction during the lockdown [miller2021your]. Consequently, a proactive company policy of employees’ inclusion would sustain their well-being levels. This would require a particular effort from the middle management (because they are the direct company interface for each employee) to ensure that every team member can express herself and maintain stimulating and nurturing relationships with their peers, since even interacting with weak social ties (i.e., acquaintances) can improve people’s well-being [sandstrom2014social].

Moreover, we also found that self-blame increased. This finding was unexpected and might relate to the phenomenon known as survivor’s guilt [hutson2015survivor], which has been observed, for example, among caretakers of cancer patients and is positively associated with remorse [kreitler2012survivor]. We speculate that self-blame is positively associated with survivor’s guilt (e.g., of not having been affected by COVID-19) and remorse and might be stronger among those developers who experienced loss (e.g., a relative who died because of COVID-19). Mindful organizations might offer employees psychological support to address the guilt and remorse specifically.

Stress is our only significant factor detrimental to well-being. Although this result is not surprising per se, it is critical. It provides evidence that stress and stress factors, in general, were the most significant harm when working from home during the pandemic. Therefore, to effectively sustain the employees’ well-being, it should be a company priority to reduce stress levels. There are different approaches that organizations can implement to tackle this crucial issue. According to Halpern [halpern2005time], flexibility has been very effective in reducing work-related stress for both men and women. Moreover, a high level of flexibility leads to high employee commitment which reduced organization costs and missed deadlines. Similarly, Coetzer and Rothmann found that a high level of organizational control over employees’ tasks was negatively related to organizational commitment and stress. Pay structure and job insecurity were also found to be highly stressful for knowledge workers [coetzer2006occupational]. On this aspect, career management along with professional expectation management are considered to be critical pillars of human resource management to decrease stress levels [london1987career]. Additionally, mindful practices, also in the workplace or at home, can reduce stress and enhance sleep quality [klatt2017mindfulness]. Overall, organizations have several tools to reduce developers’ stress by providing a high degree of autonomy in both task and work schedules and manage fair and transparent expectations (along with job stability and fair pay).

We have not observed differences between women and men. No significant mean differences were found between men and women in all surveyed variables. This suggests that the lockdown did not affect one gender more in our sample. This is surprising since abundant other research found that women are more impacted by the pandemic, especially career-wise [power2020COVID, pinho2020women]. For instance, a recent Brazilian cross-sectional survey study concluded that women suffered more the lockdown due to a higher involvement in housekeeping duties compared to men [machado2021gendered]

. In contrast, we found a very high level of similarity between genders. We believe that this discrepancy arises because the women in our sample are not representative. For example, an inclusion criterion of ours was that they work at least 20h/week. So we have a very specific group of women in our sample (probably with fewer kids or with someone who takes care of their kids so that they can work). This result is also encouraging because similarities between women and men, even if not representative, can increase women’s sense of belonging and a higher likelihood to pursue a career in a men-dominated field 

[good2012women]. Also, for this reason, software companies should not use gender-biased communication when offering home support, implying that most works at home are on women.

We found no mean differences between the UK and USA. We found high levels of similarities in how software professionals were impacted in the USA and the UK. This might be the result of the reliance of national health authorities on the World Health Organization, making lockdown measures fairly uniform between both countries. Consequently, global software companies can homogeneously plan policies in case of a future disastrous event across countries. However, they should take individual differences into account, as in both countries, developers are reporting higher well-being and productivity and those reporting lower well-being and productivity (i.e., the within-country variability outweighed the between-country variability). Also, we have no evidence suggesting that there should be any difference between the USA and the UK in working from home policies.

Findings Recommendations
Developers’ well-being increased Well-being consistently increased across all four time points, indicating that they bounced back from the negative impact the pandemic likely had on their well-being initially. Developers showed a high level of resilience when working from home and improved their well-being. Software companies can extensively implement (hybrid) working from home practices.
Productivity remained unchanged Developers’ productivity has not changed across all four time points. Working from home is not per se detrimental for productivity. If organizations keep reasonable work expectations, professionals will be as productive at home as in the office.
Developers felt less lonely and improved their social contacts This suggests that developers managed to reduce their loneliness, presumably by improving the quality and quantity of their social interactions. Active inclusion policies should be set in place for employees working from home. Mainly middle-management should focus on individual employees performance and their level of integration and communication with the team.
Stress decreases well-being levels Stress at time 2 negatively predicted well-being at time 4. This suggests that stress can have a long-lasting impact on developers’ well-being. Reducing professionals’ stress levels should be the key priority of every organization. Practices such as flexibility, clear expectation, career management, transparent and fair pay structure, as well as mindfulness exercises can be effective.
Self-blame increased Levels of self-blame increased over time. Software organizations might offer to employees psychological support to investigate the reasons for self-blame.
Men and women are similar across all measured variables This is in line with the gender similarity hypothesis [hyde2005gender] that women and men are across most variables (e.g., well-being related, ability, personality) more similar than different. When planning for home-support policies, organization should not use biased communication implying that women are most affected. This can increase the feeling of fitting in, which in turn can increase girls’ and women’s intention to pursue a career in a men-dominated field [good2012women].
No country difference (USA vs UK) when dealing with the pandemic Our findings indicate that people living in the UK and the USA were impacted and ‘recovered’ from the initial shock of the pandemic to a similar extent. Especially during another disastrous event, organizations can plan the same remote work strategies across both countries.
TABLE V: Summary of key findings & recommendations

V-B Limitations

In the following section, we discuss the most relevant limitations of this work.

Reliability. For this investigation, we employed a four wave longitudinal design. Informants have been identified through a multi-stage selection screening to ensure they were representative of the software engineering population. Also, we computed an a priori power analysis to identify the minimum number of participants required to provide reliable conclusions. The internal consistencies (i.e., Cronbach’s ) ranged from satisfactory to very good.

Construct validity. For this study we used variables previously identified in the literature that are related to well-being and productivity. For any variable, we used a dedicated measurement instrument. Construct validity was assessed by correlating all variables with each other, separately in each wave. The correlations were in the expected directions and in line with the literature [diener_beyond_2009, russo2020predictors, miller2011loneliness].

Conclusion validity

. We draw the conclusions based on a number of statistical analyses: within-subject ANOVA, cross-lagged panel model, and between-subject t-tests. To increase the trustworthiness of our results, we adjusted our alpha-thresholds to reduce the risk of false positives (i.e., Type I errors). In terms of data collection, some variations might have been out of our control since lockdown measures were not uniform in different countries. To address this issue, we only selected participants living in countries that during the first wave had similar regulations (we excluded, e.g., Sweden, Denmark). Nevertheless, minor variations in terms of rules happened during the pandemic in the different countries we had no control over. However, we report very similar results when looking at between-country mean differences. Our conclusions are reproducible since we made the anonymized raw data and R analysis code openly available on Zenodo.

Internal validity. We only found one instance in which one predictor (stress) causally predicted an outcome (well-being) over time. This might be because of our conservative approach (e.g., correcting for multiple comparisons and controlling for many other related variables). Of importance, we recognize that there is an ongoing debate on what constitutes causality. Therefore, we are aware that some readers might dislike the term ‘causality’ and prefer instead a ‘softer’ term such as ‘predicted over time. Our study relies on self-reported measures, limiting the validity due to potential response biases. Although our informants have been initially identified in other work [russo2020gender], we applied several quality checks also after each time point. Additionally, we searched for inaccurate or unlikely responses (of which we found none, which ensures data quality). The attrition rate across the four waves is comparable to other longitudinal studies across a similar timespan [bardi2014value, feinberg2019understanding]. Due to the evolving nature of the pandemic, data collection has been performed based on the information available by that point in time. As a consequence, the time spans are not homogeneous but represent moments of the pandemic where data collection seemed to be representative of the pandemic trend. This might have affected the variability of our data.

External validity. The primary aim of our longitudinal analysis was to maximize internal validity by finding significant effects. Thus, we did not look to work with a representative sample of the software engineering population (e.g., such as Russo & Stol did with to generalize their findings [russo2020gender]).

Vi Conclusion

In this investigation, we performed a four-wave longitudinal study over 14 months from the start of the COVID-19 pandemic in April 2020 to July 2021, involving 192 software developers. We analyzed how well-being and productivity of software engineers and 15 related social and psychological variables changed over time. Similarly, we explored causal relations among our variables and performed gender and country-based between-group comparisons.

We found that well-being, quality of social contacts, and self-blame increased over time while emotional loneliness decreased. We further found that only stress measured at time 2 causally predicted well-being at time 4. Finally, we found that women and men and people living in the UK and USA did not differ for any of the variables we measured across all four data collection waves.

The significance of our conclusions lies in the extensiveness of our investigation (i.e., over one year) during most of the COVID-19 pandemic (as of 2021). We carefully selected our informants after an a priori power analysis to ensure the trustworthiness of our results and adjusted our alpha level to avoid false-positive results and misleading recommendations. So far, this is the most complete longitudinal analysis involving software engineers to understand the effects of the COVID-19 pandemic on their well-being and productivity. Moreover, our results are relevant in case of another disastrous event, but they also help the software engineering community to provide better-informed recommendations for future Working from Home policies after the pandemic.

Future works will therefore focus on a prolonged assessment of the working conditions of our pool even after the pandemic. Also, more nuanced understandings of phenomena we could not explain (i.e., increased behavioral disengagement at time 2) is necessary to include more relevant variables to understand the underlying mechanisms or qualitative research designs, for example.

Supplementary Materials

The complete replication package is openly available under CC BY 4.0 license on Zenodo, DOI: https://doi.org/10.5281/zenodo.5713923.


This work was supported by the Carlsberg Foundation under grant agreement number CF20-0322 (PanTra — Pandemic Transformation).