Behavioral software engineering - guidelines for qualitative studies

12/22/2017 ∙ by Per Lenberg, et al. ∙ University of Stuttgart RealNames Chalmers University of Technology Göteborgs universitet 0

Researchers are increasingly recognizing the importance of human aspects in software development and since qualitative methods are used to, in-depth, explore human behavior, we believe that studies using such techniques will become more common. Existing qualitative software engineering guidelines do not cover the full breadth of qualitative methods and knowledge on using them found in the social sciences. The aim of this study was thus to extend the software engineering research community's current body of knowledge regarding available qualitative methods and provide recommendations and guidelines for their use. With the support of a literature review, we suggest that future research would benefit from (1) utilizing a broader set of research methods, (2) more strongly emphasizing reflexivity, and (3) employing qualitative guidelines and quality criteria. We present an overview of three qualitative methods commonly used in social sciences but rarely seen in software engineering research, namely interpretative phenomenological analysis, narrative analysis, and discourse analysis. Furthermore, we discuss the meaning of reflexivity in relation to the software engineering context and suggest means of fostering it. Our paper will help software engineering researchers better select and then guide the application of a broader set of qualitative research methods.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Behavioral software engineering (BSE) is an interdisciplinary research area aiming to explore cognitive, behavioral, and social aspects of software engineering performed by individuals, groups, or organizations lenberg2015behavioral.

The importance of considering the people involved in the software development process has repeatedly been recognized by both researchers and practitioners weinberg1971psychology; perry1994people; highsmith2002agile; lenberg2015human. Still, studies concerned with behavioral aspects of software engineering are not in the software engineering mainstream, and in terms of the number of existing publications and level of knowledge, the research area can still be considered young lenberg2015behavioral.

In young research areas, such as BSE, studies tend to adopt an exploratory approach making qualitative methods a valid, or even preferable, option when analyzing software engineers’ behavior stebbins2001exploratory; fossey2002understanding; brown2006doing. The qualitative methods are commonly used where one attempts to, in depth, understand the ways in which people act, think or feel seaman1999qualitative; langley1999strategies.

Even if software engineering research traditionally has had a tendency towards quantitative methods, there exist qualitative guidelines, e.g. Seaman seaman1999qualitative, Dittrich et al. dittrich2007special, Runeson and Höst runeson2009guidelines, Stol et al. stol2016grounded, and Sharp et al. sharp2016role. However, even taken together these guidelines do not cover the full breadth of qualitative methods and knowledge on using them found in the social sciences.

The purpose of this study was thus to extend the community’s current body of knowledge regarding available qualitative methods and provide recommendations and guidelines for their use. To meet the purpose, we reviewed the contemporary research against proven and accepted social science standards. The social sciences have a long history of studying human behavior and we argue that software engineering researchers would most certainly benefit from utilizing their gained methodology skills and knowledge on how to conduct effective qualitative studies.

The aim of the review was to, on an overall level, identify areas with knowledge gaps, and, also, determine what qualitative methods that software engineering researchers are currently using. Based on the result of the review, we compiled customized recommendations and guidelines for future studies.

We acknowledge that taken in small doses, guidelines and practices can help to guard against more obvious errors and also help to frame qualitative work as systematic and structured legreco2009discourse; tracy2010qualitative. However, the quality of qualitative research does not rely solely on detailed guidelines and practices to produce sound research, and that too intense methodological focus risk creating anxieties that hinder creativity and practice seale1999quality.

Even if we recognize the usefulness and advantage of mixed model design johnson2004mixed; creswell2013research, i.e. where the researcher combined a quantitative and a qualitative approach, we will in this study, for sake of clarity, focus on qualitative methods only.

This paper is organized as follows. First, we give a brief introduction to qualitative research where we present its unique characteristics and illustrate its usefulness (section 2.2). We then summarize quality standards of qualitative research, both for qualitative research in general (section 2.3), but also standards specific for the software engineering context (section 2.4). In the section that follows, we extract a representative sample of qualitative software engineering research and assess the quality (section 3). Then, we discuss the result of the assessment and present our customized recommendations and guidelines (section 4). We finally summarize our overall conclusions (section 7).

2 Background

In this section, we provide background information regarding subjects related to, and relevant for, the purpose of this theoretical work. We present an overview of behavioral software engineering, qualitative research, outline quality standards for qualitative research in general, and, finally, summarize quality standards for qualitative research in software engineering.

2.1 Behavioral software engineering

Lenberg, Feldt, and Wallgren have defined the research area of Behavioral Software Engineering (BSE) as the study of cognitive, behavioral and social aspects of software engineering performed by individuals, groups or organizations lenberg2014towards. A BSE literature review lenberg2015behavioral indicated that the human aspect of software engineering is a growing area of research that has been recognized as important. However, the review also showed that there are knowledge gaps and that earlier research has been focused on a few concepts, which have been applied to a limited number of software engineering areas, and, also, that the BSE research, so far, rarely has been conducted in collaboration by researchers from both software engineering and social science.

2.2 An overview of qualitative research

According to Corbin and Strauss corbin2015basics, qualitative research includes any study that produces findings that are not derived by means of quantification, e.g. statistical procedures. Quantitative research seeks causal determination, prediction, and generalization of findings hoepfl1997choosing, while qualitative studies, in turn, seek understanding and illumination of a certain phenomenon in a context-specific setting with the hope of extrapolation to similar situations. Qualitative research addresses questions concerned with developing knowledge from the experience dimensions of humans’ lives and social worlds. It aims to understand and represent behaviors of people as they encounter, engage, and live through specific situations. elliott1999evolving; patton1990qualitative

To our knowledge, no commonly accepted definition of qualitative research exists. Still, according to Lee et al. lee1999qualitative, such studies generally appear to have four defining characteristics. First, the data are derived from the participants’ perspective, meaning that the researchers have not imposed a particular interpretation. It is the research participants’ subjective meanings, actions and social contexts, as understood by them, that is illuminated fossey2002understanding. The researchers attempt to bracket existing theory and their own values, which allow them to understand and represent the participants’ experiences and actions more adequately than would be otherwise possible. elliott1999evolving. Second, qualitative studies are often conducted in natural settings, whereas laboratory studies are rare. Third, in contrast to traditional, more rule-driven and survey-oriented approaches, qualitative studies are flexible and should be ready to change to match the fluid and dynamic demands of the immediate research situation mack2005qualitative. Fourth and final, no common standards for data collection and analysis exist, which may stand in contrast to prevailing beliefs about control, reliability, and validity lee1999qualitative.

Qualitative research methods are useful in a variety of situations. Human behaviors, as individuals or in groups, are complex phenomena that often cannot be sufficiently described and explained through statistics and other quantitative methods, and thus call for an alternative approach seaman1999qualitative; langley1999strategies. Qualitative methods are beneficial when addressing questions related to complex and versatile concepts such as behaviors, emotions, beliefs, and values. Additionally, such approaches can be useful when identifying latent and hidden factors whose role in the phenomenon under investigation may not be apparent, e.g. social norms, gender roles, or religion mack2005qualitative.

Qualitative approaches are also favorable when developing knowledge in poorly understood research areas fossey2002understanding and are therefore often used in exploratory studies stebbins2001exploratory; brown2006doing. A key is the open-ended interview questions, which provide the participants the opportunity to respond in their own words. Unlike closed questions, answers to such questions are not bound by the researcher knowledge but can stimulate responses that are meaningful and important to the participant. mack2005qualitative When using traditional quantitative data collection techniques, e.g. questionnaires, the researchers have no access the reasoning behind the respondents’ answers. Qualitative techniques, on the other hand, allow the researchers to better explore the underlying intrinsic processes.

Finally, qualitative methods can also be used to improve the validity of questionnaires and other survey instruments by extracting contextual data fossey2001conceptual. They can help to identify patterns and orders among variables and thus help to move inquiries toward more meaningful explanations sofaer1999qualitative.

2.3 Quality in qualitative research

Criteria in qualitative research are, to say the least, challenging. If quality criteria should be applied to qualitative research, which criteria that are appropriate, and how they should be assessed has been up for debate for at least a quarter of a century malterud2001qualitative; mays2007quality.

In general, scientists seem to hold three different opinions regarding quality standards of qualitative work. Some argue that it makes little sense to attempt to establish a set of generic criteria since there is no unified qualitative research paradigm rolfe2006validity. Some claim that qualitative research can be assessed with reference to the same broad criteria as quantitative research malterud2001qualitative; mays2007quality. Others suggest that, since qualitative research is based on different epistemological and ontological assumptions, the established criteria for scientific rigor in quantitative research cannot be applied to qualitative studies lincoln1985naturalistic; chapple1998explicit.

We recognize the differences in opinions, but acknowledge that it is part of a wider epistemological dispute mays2007quality regarding the nature of the knowledge produced by qualitative research. A debate that this paper cannot capture and do justice to. Still, we argue that what constitutes sound research is of immense importance. Readers of scientific publications, researchers and practitioners alike, need to know that the studies are trustworthy and provide solid findings, knowledge, and understanding of true events fossey2002understanding; popay1998rationale. The value of research is therefore, to a great extent, dependent on the researchers’ ability to demonstrate the credibility of their findings lecompte1982problems.

According to Whittemore et al. whittemore2001validity, three of the most influential criteria had, at the turn of the century, been outlined by Lincoln and Guba lincoln1985naturalistic; guba1994competing, Maxwell maxwell1992understanding and Sandelowski sandelowski1986problem; sandelowski1993rigor. Lincoln and Guba lincoln1985naturalistic; guba1994competing propose five criteria for naturalistic inquirers - credibility, transferability, dependability, confirmability, and authenticity. Maxwell maxwell1992understanding further articulated the need for integrity and criticality, whereas Sandelowski sandelowski1993rigor advocated for creativity and artfulness.

Based on these criteria, Whittemore et al. whittemore2001validity suggest that one can divide qualitative research criteria into two categories: primary and secondary criteria. The primary criteria (credibility, authenticity, criticality, and integrity) are necessary to all qualitative inquiry, whereas the secondary criteria (explicitness, vividness, creativity, thoroughness, congruence, and sensitivity) provide further benchmarks of quality and are considered to be more flexible as applied to particular studies.  whittemore2001validity

There are also been other researchers besides Whittemore that, in an attempted to identify the most important quality criteria, have aggregated and improved previous research. Drawing on previous research by Giacomini et al. giacomini2000users and Hammersley hammersley1990reading, Malterud malterud2001qualitative identifies relevance, validity, and reflexivity as three essential pillars of quality in qualitative studies. In addition, Elliot et al. elliott1999evolving claim that they built their criteria based on a list of more than forty diff̊erent quality standards. The resulting list consisted of eleven principles: method appropriateness, openness, theoretical sensitivity (relating findings to existing knowledge), bracketing of expectations, replicability (describing methods), saturation generalizability (sampling adequacy for purpose), credibility checks, grounding (in examples), coherence, uncovering self-evidence to reader and intelligibility (communicability).

Moreover, in two more recent studies, Tong et al. tong2007consolidated and Tracy tracy2010qualitative present two quality checklists for qualitative research. The latter presents and explores eight key markers of quality research - worthy topic, rich rigor, sincerity, credibility, resonance, significant contribution, ethics, and meaningful coherence. The former suggests a 32-item checklist grouped into three domains: research team and reflexivity, study design and data analysis and reporting.

It is worth noticing that the quality standards mentioned above have primarily addressed, focused or been designed to take into account medical or clinical applications qualitative work. Standards developed within a software engineering or even a work and organizational psychology context is considerably more scarce. There are, however, a few exceptions. In a study from 1999, Lee, Mitchell, and Sablynski lee1999qualitative reviewed qualitative studies of work and organizational psychology (WOP) for the past 20 years. Throughout the paper, the authors provide specific best practices related to WOP are recommended. In addition, in a publication from 1997 Myers myers1997qualitative presents an overview of qualitative methods, addressed to the information system (IS) research community.

2.4 Qualitative research in software engineering

During the past twenty years, there has been a few method and process related papers addressing qualitative research in the software engineering domain. In an influential paper from 1999, Carolyn B. Seaman seaman1999qualitative introduces qualitative methods in software engineering. She argues that in order to further develop software engineering, new research methods are needed to explore non-technical aspects and that qualitative methods can be adapted and incorporated into the designs of empirical studies in software engineering. The paper presents an overview of methods for qualitative data collection and analysis. However, instead of presenting a broad spectrum of qualitative methods, Seaman has focused on detailing a selective few. For the data collection she presents participant observation and interviewing, and for the analysis, she describes grounded theory. The author also briefly discusses threats to validity in qualitative studies and stresses the importance of considering triangulation, anomalies in data, negative case analysis, and replication.

Moreover, to our knowledge, there have been only two special journal issues dedicated to qualitative research in software engineering; one in the Information and Software Technology journal in 2007, and one in the Empirical Software Engineering journal in 2011. In an editorial to the former, Dittrich et al. dittrich2007special aimed to define qualitative research. Unlike Seaman, they clearly emphasize the diversity of qualitative research methods and also that qualitative studies are used under different epistemological orientations and with different theoretical underpinnings.

Triggered by the inconsistency of reviews for the special issue, Dittrich and her co-authors took the first steps in developing a common way to evaluate the quality of qualitative research. Based on their experiences they propose eight criteria for qualitative studies, emphasizing clarity of contribution of work.

Of the qualitative methods, grounded theory glaser2009discovery and thematic analysis braun2006using seem to be among of the most popular with software engineering researchers dittrich2007special; adolph2012reconciling; hoda2012developing; defranco2017content.

The quality of software engineering studies using grounded theory is reviewed by Stol et al. stol2016grounded. Examining close to one hundred studies, the authors conclude that many papers do not generate a theory, do not clearly indicate which variant of grounded theory is used and do not provide sufficient methodological detail for rigorous evaluation. In addition, the authors present guidelines for how to conduct and report grounded theory studies. The guidelines are synthesized from existing methodological guidance and complemented with the authors own experiences. The guidelines, which are presented in the form of a checklist, are primarily directed to researchers with novice knowledge of grounded theory.

Finally, in a paper from 2016 Sharp et al. sharp2016role present the role of ethnographic work in software engineering. The authors argue that, despite its potential, ethnography has not been widely adopted in software engineering research. Their main aim was, therefore, to explain how software engineering researchers would benefit from adopting ethnography. They claim that the strength of ethnographic work is its ability to uncover the rationalities of the observed practices and that it, therefore, provides an important complement to other research methods that rely on a prior formulation of hypotheses. In addition, the paper introduces a guiding framework for ethnographic studies, supporting the design according to the research question being investigated, the context of the fieldwork and the characteristics of the main focus of the study.

We believe that the method-related publications summarized in this section have contributed in raising the knowledge of specific qualitative methods in software engineering. This by defining qualitative work, explaining how and in what way qualitative studies will contribute to the body of knowledge in software engineering, comparing qualitative and quantitative methods, and by presenting initial guidelines. Still, we notice that these publications have been focused on a few qualitative methods with a positivistic epistemological underpinning, whereas descriptions and guidelines for constructionist oriented methods do not exist.

3 Literature review

As is stated in the introduction, we aimed to assess the overall quality of software engineering qualitative studies, and also to identify common weaknesses. In this session, we extract a representative sample of current studies and assess their quality against criteria defined in social science.

3.1 Method

To identify a representative sample, we performed a limited systematic literature review based on the guidelines by Kitchenham kitchenham2004procedures. The processes included the following stages: selecting data sources, selecting search string, defining research selection criteria, defining research selection process, and defining data extraction and synthesis. These stages, together with threats to validity, are presented in the following section.

3.1.1 Selecting data sources

For quality reasons, we limited the review to include peer-reviewed journal publications only. Since qualitative research in software engineering can be considered an interdisciplinary research subject, we selected databases likely to cover both technical as well as social research, i.e. PsycINFO and Scopus.

3.1.2 Selecting search string

The purpose of the search string was to capture qualitative publications. Therefore, we combined ‘qualitative’ with synonyms to ‘system engineers’ (defined by Cruz et al. cruz2011personality). The final search string looked like this: (”qualitative” OR ”grounded theory” OR ”thematic analysis” OR ”discourse analysis” OR ”narrative” OR ”ethnography” OR ”phenomenology”) AND (”software engineering” OR ”software development” OR ”software engineer”)

3.1.3 Research selection criteria

To reduce the likelihood of bias, study selection criteria were derived. The criteria were intended to identify those primary publications that provided insights relevant to the aim of the review.

  1. Inclusion Criteria

    1. Publication Year: We limited the search to exclude paper published before 2016.

    2. Publication Type: For quality reasons, we choose to only include peer-reviewed publications published in journals.

    3. Content: The publication shall use qualitative method(s) to study software engineering related activities or software engineers.

  2. Exclusion Criteria

    1. Language: We limited this study to only include papers written in English. Hence, we excluded all non-English publications. This, however, only applied to one publication.

    2. Publication Type: We excluded papers where we could not locate as full papers.

    3. Content: We acknowledge the usefulness and advantage of mixed model design johnson2004mixed; creswell2013research; however, for sake of clarity, we excluded mixed model studies.

3.1.4 Research selection process

In total, the search identified 69 publications. First, we applied to selection criteria to the titles and abstract and thus excluded papers that did not relate to software engineering related activities or software engineers. This reduced the number of potential publications down to 23.

3.1.5 Data extraction and analysis

The aim of the literature review was to provide an overview of the current qualitative research in software engineering. To meet the aim, we extracted four properties from the included primary studies: (a) research method, (b) data collection method, (c) quality criteria indicators, and (d) quality guidelines. Information about these properties is presented in Table 1.

Regarding the quality criteria indicators, i.e. property (d), the choice of what quality criteria to use was not uncontested. We recognize that using the same set of criteria for all qualitative methods was not optimal and that we, to compile a detailed and nuanced assessment, would have to apply different sets for different methods based in their underpinning epistemological orientation. However, our aim was not to compile a detailed assessment of each individual study. Instead, we strove to create general insights of the quality of the collective studies and we, therefore, argue that a common set of quality criteria for all types of research methods is sufficient.

Moreover, as is stated in the introduction, we wanted to use quality criteria previously proven in social science. Based on a general assumption that such criteria improve as the body of knowledge of qualitative research evolves and grows, we excluded criteria collections older than ten years. We also argue that a quality indicator, although weak, for such collections is its usage, which we estimated using citations.

Two criteria collections that met our requirements were the COREQ checklist tong2007consolidated and the “Big-Tent” criteria tracy2010qualitative. Our choice fell on the former for two reasons. First of all, we found the criteria in the COREQ to be easier to objectively assess compared those in the “Big-Tent”. As an example, in the eight “Big-Tent”, one criterion is linked to how interesting the topic is, which clearly introduces a high degree subjectivity. Second, we argue that the process used to compile the COREQ checklist is more structured and well-documented compared to the “Big-Tent”, which, according to us, serves as an indicator of quality.

The COREQ checklist, presented in Table 4, consists of 32 criteria. Each criterion holds descriptive information in the form of guiding question(s). The criteria are grouped into the following eight themes: personal characteristics, relationship with participants, theoretical framework, participant selection, setting, data collection, data analysis, and reporting. It should be noted that some of the criteria in COREQ only are applicable to studies using interviews or focus groups as data collection.

As an overarching goal, we strove to make the quality evaluation process as simple and straightforward as possible, thereby making it less affected by the researcher’s prior knowledge, beliefs, and personal opinions. The result of the quality analysis of each criterion in the criteria collection was therefore binary, i.e. either the publication met the criterion or it did not. In general, a criterion was considered fulfilled if the publication provided an answer to the guiding question(s) associated with the criterion. We only assessed if an answer was provided, not the quality of the answer.

As a consequence of this simplicity goal, we decided to not assess the criteria in theme reporting, i.e. criterion number 29 to 32. We deemed that the guiding questions for these criteria required an in-depth, analysis of consistency between aspects of the publication’s content.

Property Description and examples
(a) Research method The research method comprises all processes that are used by the researchers during the studying. Examples of research methods are grounded theory, thematic analysis, ethnography, phenomenology, discourse analysis and narrative analysis.
(b) Data collection method Data collection is the process of systematic gathering or measuring information that enables the researchers to answer stated research questions. Qualitative data are varied in nature and can include any non-numerical information. Some of the major collection methods include interviews, focus groups, observation, and written texts patton2005qualitative.
(c) Quality indicators Almost thirty quality criterion based on the COREQ checklist defined by Tong et al. tong2007consolidated were used. See table 4 for more details.
(d) Quality guidelines We extracted if the authors had used any qualitative guidelines or checklists, e.g. COREQ checklist tong2007consolidated or the “Big-Tent” criteria tracy2010qualitative, to ensure the quality of their study.
Table 1: Extracted properties

The analysis of the properties was straightforward and consisted only of a quantification and summarization of the extracted data. We determined what methods that were used and measured their frequency. The analysis of the fourth property, i.e. the quality criteria, was slightly more complicated. Since each of the thirty criterion was dichotomous and non-nuanced, we could not draw any decisive conclusions from the result of a single criterion. Instead, our findings needed to be drawn based on a cluster of criterion all supporting the same result.

3.1.6 Threats to validity

The sample size of the literature review was small and we have clearly not been able to capture all qualitative software engineering studies for the selected period. Still, we argue that the sample size is large enough to provide a representative part of all studies and that we, since we used a structured review method, have not introduced any systematic errors that could affect our findings.

Moreover, the selection and the data extraction process were mainly conducted by a single researcher. This approach is not as robust as having several researchers conducting the complete extraction in parallel.

We cannot guarantee that some publications have mistakenly been excluded or missed. Therefore, in our analysis, we have been careful and only drawn conclusions when the data has been strong and unequivocal. No conclusions have thus been drawn based on a few publications only.

3.2 Result

An overview of the result is presented in Table 2.

(a) Research method Grounded theory (13 studies), Content analysis (2), Thematic analysis (1), Ethnography (1) and Interaction analysis (1)
(b) Data collection method Interview (20 studies), Written text (5), Observation (4) and Focus group (1). Six of the twenty-three included studies used more than one collection method.
(c) Quality indicators A: Personal Characteristics 28%
B: Relationship with participants 17%
C: Theoretical framework 78%
D: Participant selection 40%
E: Setting 33%
F: Data collection 35%
G: Data analysis 28%
(d) Quality guidelines No reference to any guideline or checklist was found.
Table 2: Empirical overview of the result for the extracted properties.
(a) Research method

As is shown by the first row, software engineering researchers seem to prefer grounded theory as research method, which was used in well over half of the studies (60%).

(b) Data collection method

The most favored data collection method was interviews, employed in 90% of the studies. Worth noticing is also that a quarter of studies collected data using more than one technique.

(c) Quality indicators

The collected numerical data show that in a majority of the publications (78%) the authors stated what research method they had used (theme theoretical framework (C)). That means, however, that a fifth of the included papers did not state or describe the research method.

As the table shows, the theme relationship with participants (B) had the lowest quality score. Worth noticing, and somewhat alarming, is that in less than ten percent of the studies the researchers discussed their assumptions (criterion 8 in Table 4). This indicates that software engineering researchers seldom reflect on their bias (utilize reflexivity), or, at least, that these contemplations are not presented in the publications.

The quality score for personal characteristics (A) was moderate, 28%. In addition, in the publication that described the personal characteristics, this information was, thoroughgoing, reported in a separate section at the very end of the papers. This information was added primarily since it was required by the journals as a part of their standard template, not as an active choice made by the researchers to raise the credibility of the findings. These, often brief, presentation of the authors seldom provided any information about their experiences of conducting qualitative research.

As for the design of the studies, the statistical data show that software engineering researchers frequently detail the duration of the focus group session or interviews, they describe what recording equipment that was used, and they frequently present an interview guide. Nonetheless, our result indicates that data collection seems to be a one-off event rather than a continuous dialog. For example, interviews rarely were repeated (criterion 18), transcripts seldom were sent back to the participants for review (criterion 23) and the participants were rarely provided feedback on the findings (criterion 28).

In addition, a key feature in the most commonly used research method is data saturation, which means that researchers reach a point in their analysis of data that sampling more data will not lead to more information related to their research questions. Interestingly enough, saturation was only discussed in less than a third of the publications (criterion 22).

Finally, most papers reported the number of participants (criterion 12) and presented some characteristic of the sample participants (criterion 16). Almost half of them also provided at least some clues to how the participants were selected (criterion 10). Still, few provided information to how they approach the participants (criterion 11), and even less (only one) presented how many refused to participate (criterion 13).

(d) Quality guidelines

No reference to any qualitative guideline or checklist was found in any of the 23 included publications.

4 Discussion

The purpose of this study was thus to extend the community’s current body of knowledge regarding available qualitative methods and provide recommendations and guidelines for their use. Grounded in the results of a literature review, we have identified three areas of improvement. We argue that software engineering qualitative research would benefit from (1) utilizing a broader set of research methods, (2) more strongly emphasizing reflexivity, and (3) employing qualitative guidelines and quality criteria. These areas are detailed in the following sections.

4.1 A broader set of research methods

According to social science researchers willig2013introducing; smith2007qualitative; camic2003qualitative, the most commonly used qualitative methodological approaches are grounded theory, thematic analysis, ethnography, phenomenology, narrative analysis, and discourse analysis. Our review implies that, among these methods, grounded theory, thematic analysis, and ethnography are established in the software engineering research community. This indication is further strengthened by the fact that the use and applicability of grounded theory and ethnography in software engineering context have, favorably, been reviewed and scrutinized in papers by Stol et al. stol2016grounded and Sharp et al. sharp2016role.

Moreover, our review did not capture any publications that used phenomenology, narrative analysis, or discourse analysis. In addition, to the extent of our knowledge, no software engineering related publications exist that describe or employ these methods. Social science researchers have, nonetheless, repeatedly recognized that these methods add value and that they are viable options when investigating organizational life chia2000discourse; boje2004language; czarniawska1997narrative; feldman2004making; weick1995sensemaking; gill2014possibilities. For example, Weick weick1995sensemaking suggests that stories or narratives are used to make sense of the complexity of organizational life and they can, for example, hold information and influence organizational decision making martin1983uniqueness. Moreover, Chia chia2000discourse states that understanding organizational discourses are paramount for a deeper appreciation of the underlying motivational forces.

We argue that phenomenology, narrative analysis, and discourse analysis also could contribute the understanding of software engineering organizations and that much of the potential scope and value of these methods remain unrealized. We do not, however, claim that the researchers of the studies included in our review have chosen improper methods, and we do not contend that the three methods should supplant existing well-established qualitative approaches. Rather, with a more varied toolbox to choose from, we believe that it would be possible to formulate complementary research questions and, possibly, highlight different aspects of software engineering phenomena, and, thereby, also yield more comprehensive insights.

Table 3 below provides a brief overview of these methods and also guidance to when they could be applicable. With the ambition to raise the interest and the curiosity of these qualitative methods, we present a somewhat more detailed description of them in section 6 where we also discuss their limitations and challenges.

Phenomenology Narrative Analysis Discourse Analysis
When to use? When interested in how software engineers make sense (experience) of a specific phenomenon in a given situation. When interested in how software engineers create meaning in their lives as stories (narratives). Compared to IPA, use narrative inquiry when you are interested in how a chain of experiences are weaved into a narrative, not the experience by and of itself. When interested in exploring social well-established meanings or ideas around a topic that shape how software engineers can talk about it. To uncover how language is used to accomplish personal, social, and political projects.
Research questions (examples) R1. How does a software engineer experience organizational loyalty? R2. How do people make the decision to become a software engineer? R1. How do individual software engineers come to know their experience of the changes in ways-of-working that followed the introduction of agile methods? R2. What is the senior software engineer’s story of the experience of transferring to a team-based organization? R1. What discourse exists in software engineering organizations and how do they empower some groups or roles while dis-empowering others? R2. How do software engineers construct team-identities within agile teams?
Research outcome (examples) R1. An in-depth description of the essential structure of organizational loyalty experience. R2. A portrayal of what factors and experiences that shape and influence the decision. R1. Narrative that accounts for software engineers’ experience of changes in their ways-of-working. R2. Narrative of a senior software engineers’ experience of transferring to a team-based working environment. R1. Description how different discourses shape relationships and how social goods are negotiated and produced in a software engineering organization. R2. A summary of discourses that affect that strengthen or weaken team-identities in agile teams.
Reference studies The transition to motherhood in an organizational context: An interpretative phenomenological analysis by Millward millward2006transition; Elite identity and status anxiety: An interpretative phenomenological analysis of management consultants by Gill gill2015elite. I am not a tragedy. I am full of hope’: communication impairment narratives in newspapers by Malley  malley2014not; Complexities of identity formation: A narrative inquiry of an EFL teacher by Tsui tsui2007complexities. Cognitive organization and identity maintenance in multicultural teams: A discourse analysis of decision-making meetings by Aritz and Walker aritz2010cognitive; Articulating circumstance, identity and practice: toward a discursive framework of organizational changing by Jian jian2011articulating.
Table 3: The table presents an overview of qualitative methods, and provide examples of applications of these methods in software engineering.

4.2 Emphasize reflexivity

Furthermore, our findings indicate that software engineering researchers seldom reflect on their assumptions and biases. Previous social science research has identified reflexivity as a crucial strategy in the process of generating knowledge by mean of qualitative research in general gough2016reflexivity; malterud2001qualitative. To raise the quality, credibility, and trustworthiness of such qualitative research, it is important that the researchers, throughout the study, reflect on their opinions and how these affect their decision and the findings, but also that they report this in their publications.

We argue that the quality of qualitative software engineering research would improve if researchers would emphasize reflexivity. In addition to being a crucial strategy in qualitative research in general, we assert that reflexivity is of special importance in software engineering qualitative research. Such research is often conducted by researchers with a background in software engineering lenberg2015behavioral, which indicates that the researchers might have preconceived opinions of the phenomena under investigation and that the risk of research bias consequently is relatively higher.

In section 5 below we provide information regarding reflexivity and means of fostering it.

4.3 Utilize qualitative guidelines or quality criteria

We acknowledge that using guidelines or checklists as a mechanism to ensure quality might mislead qualitative researchers reynolds2011quality. What constitutes quality does not rely solely on detailed guidelines and too much focus on checklists and processes risk creating anxieties that hinder creativity and practice seale1999quality. Still, since the software engineering research community, in general, is unfamiliar with qualitative studies, the benefits outweigh the risks. Taken in small doses, guidelines can help to guard software engineering researchers against more obvious errors and also help to frame qualitative work as systematic and structured legreco2009discourse; tracy2010qualitative.

Therefore, we recommend software engineering researchers to use the COREQ checklist as general guidance for ensuring quality. We deem that detail level of the criteria defined in COREQ is aligned with the knowledge level of the software engineering research community and that these criteria are relatively method independent.

For method-specific guidelines compiled for software engineering research, we recommend Stol et al. stol2016grounded for grounded theory, Sharp et al. sharp2016role for ethnography, Runeson and Höst runeson2009guidelines for case studies, and Defranco et al. defranco2017content for content analysis.

5 Reflexivity

Reflexivity is regarded as a defining feature of qualitative research gough2016reflexivity; malterud2001qualitative. Still, even if its importance is recognized, to the extent of our knowledge, no commonly accepted definition exists. It is, however, commonly accepted that reflexivity is based on a recognition that the researchers are part of the social world that they study palaganas2017reflexivity, and that a key aspect is to make the relationship between the researcher and the participants as explicit and transparent as possible palaganas2017reflexivity.

In qualitative studies, the researcher is considered the primary instrument of data collection and analysis in qualitative studies gough2012subjectivity; malterud2001qualitative. The researchers body of knowledge can thus be utilized to gain new and deeper insights of the phenomena under study.

Reflexivity involves thinking about how our thinking came to be and how pre-existing understanding is constantly revised in the light of new insights watt2007becoming; morrow2005quality; malterud2001qualitative. It entails awareness of the fact that the researchers’ involvement affect the research process watt2007becoming and could be viewed as a state of being that permeates all research phases, including the formation of research questions, data collection and data analysis of data guillemin2004ethics; bradbury2007enhancing; berger2015now.

In a study with effective reflexivity, the researchers are able to treat also themselves as objects of inquiry smith2006encouraging. A reflexive researcher is sensitive to the ways in which she or he and the research process have shaped the collected data. Personal and intellectual biases need to be made plain at the outset of any research reports to enhance the credibility of the findings mays2000assessing.

According to Russel and Kelly russell2002research, the absence of reflexivity may lead to acceptance of what is apparent and thereby obscure unexpected possibilities. In addition, if reflexivity is thoroughly maintained, personal issues can be valuable sources for relevant and specific research.

Previous research has highlighted three main advantages of reflexivity. First of all, it is used to raise the trustworthiness of the study by making it more open and transparent; this by identifying and reporting the researchers’ values, beliefs, knowledge, and biases buckner2005taking; macbeth2001reflexivity; berger2015now.

Second, reflexivity enhances the quality of the research by letting researchers reflect on who they are and their relation to the phenomena, which may both assist the process of constructing new insight berger2015now. However, the investigator should take care not to confuse knowledge intuitively present in advance with knowledge emerging from the material in the study. Such situations can possibly be avoided by declaring beliefs before the start of the study. malterud2001qualitative

Third and final, reflexivity helps to keep the research process ethical by helping to address concerns regarding negative effects of power in the researcher-to-participant relationship berger2015now; pillow2003confession. It helps maintain the ethics of the relationship between researcher and research by equalizing their status, and securing that while the interpretation of the findings is always done through the eyes and cultural standards of the researcher frisina2006back; josselson2007ethical.

Even though the benefits of being reflexive are significant, it also comes at a cost for the researchers. It forces them to be transparent and expose their flaws, inner thoughts, and reasoning, which, potentially, could cause embarrassment or even shame. To protect themselves from being unmasked in the public research arena, scientists are reluctant to include human emotions and in-depth self-reflection in their publications smith2006encouraging.

Drawing upon this, we argue that software engineering researchers’ usage of reflexivity is, at least partially, affected by the culture and norms of their various peer-groups, e.g. their research team, their university, and the software engineering community at large. An environment where the researchers feel that it is safe to open up without being exploited create conditions that foster reflexivity. In a culture where genuineness and authenticity are the norm, scientists will certainly be more willing to reflect and communicate their honest thoughts, reasoning, and true feelings. The opposite is, however, also true. An environment where thoughts and feeling are considered signs of weakness that could be held against you clearly do not facilitate a reflexive behavior among the researchers.

True reflexivity is a state of being and cannot be encouraged only by means of guidelines and quality checklists. Reflexivity should permeate all aspects of the qualitative research and needs therefore to be a natural part of the software engineering researchers’ professional identity. We thus claim that raising the level of reflexivity in qualitative software engineering research is clearly a community-joint effort that calls for changes in several areas, from which mandatory courses that are included in the Ph.D. education to how the reviews of qualitative research are conducted.

Still, there are several concrete activities that individual researchers can perform in their studies that foster reflexivity. One of the most valuable is for the researcher to keep a self-reflective journal from the inception to the completion of the investigation morrow2005quality. In it, the researchers aim to keep track of their feelings, reactions, and assumptions or biases that have surfaced. Writing them down makes it easier to examine and understand them and set aside to a certain extent or consciously incorporated into the analysis, depending on the frame of the researcher morrow2005quality.

Furthermore, it could also be useful to consult a research team or peer for dialog and discussions. They can serve as a mirror, reflecting the researchers’ responses to the research process, or the may also act as devil’s advocates and propose alternative interpretations to those of the researcher. Other strategies for maintaining reflexivity include repeated interviews with the same participants, triangulation, and peer review berger2015now; morrow2005quality; frisina2006back; russell2002research; bradbury2007enhancing; padgett2016qualitative.

6 Method overview

In an attempt to bolster the interest and broaden the pallet of qualitative research methods used by the software engineering research community, thereby enabling researchers to make well-founded decisions regarding choice of methodology, we describe three qualitative research methods - interpretative phenomenological analysis, narrative analysis, and discourse analysis. We also discuss their limitations and challenges.

6.1 Interpretative Phenomenological Analysis

The purpose of interpretative phenomenological analysis (IPA) is to, in depth, explore the processes through which people make sense of their experiences in the social world. A basic assumption is that individuals are actively engaged in interpreting events, objects, and people in their lives. The phenomena explored by IPA are, usually, of some personal significance to the participants, e.g. life events, relationships or phenomena encountered in life. IPA researchers attempt to understand what it is like to stand in the shoes of the subject. smith2011evaluating; smith2015qualitative; pietkiewicz2014practical; brocki2006critical

The foundation of IPA was first described and conceptualized in the mid-90s smith1997interpretative; jonathan2009interpretative. Early in the development, IPA was mainly used in health psychology. However, IPA has rapidly grown and become one of the best known and most commonly used qualitative methodologies in psychology smith2011evaluating; willig2007reflections. In recent years, it has branched out into other applied psychologies, e.g. clinical, counseling, educational, and occupational jonathan2009interpretative.

IPA draws upon and has its theoretical roots in the fundamental principles of phenomenology, hermeneutics, and ideography pietkiewicz2012praktyczny.

Rather than focusing on describing phenomena according to scientific criteria, phenomenological studies focus on how people perceive and talk about objects and events. It is primarily concerned with attending to the way things appear to individuals as experiences smith2011evaluating; pietkiewicz2012praktyczny.

A key concept in phenomenology is lifeworld makkreel1982husserl. The concept, which holds dual components as it is both personal and intersubjective, is indeed multifaceted and complex, and we thus cannot claim to make it full justice in this paper. The lifeworld comprises the world of objects around us as we perceive them and our experience of our self, our body, and our relationships. It is the world which people can experience together, i.e. the common ground that we can share moran2012husserl. An individual’s lifeworld consists of the beliefs that form hers or his everyday attitude towards herself or himself, the objective world and other people.

According to phenomenology, to get a clear view of the lifeworld and gain a true understanding of a phenomenon it is necessary to disregard (bracket) preconceptions and judgments. This process is known as epoché or phenomenological reduction. Through this, phenomenology researchers can better uncover what essential and unique components that form a given phenomenon. 

pietkiewicz2012praktyczny

While phenomenology uncovers meanings, hermeneutics interprets that meaning backstrom2007meaning. IPA research requires a two-stage interpretation process in which a subject is trying to make sense of their world, and the researcher, in turn, is trying to make sense of the subject trying to make sense of their world. This process has been called double hermeneutic smith2011evaluating. It requires an engagement and interpretation on the part of the researcher, which connects IPA to a hermeneutic perspective.

The idiographic component of IPA refers to an in-depth analysis of single cases and individual perspectives. IPA focuses on the particular, rather than the general or universal pietkiewicz2012praktyczny. It involves the detailed approach to each case followed by the search for patterns across the cases. smith2011evaluating. The aim of an IPA study is to, in detail, present the perceptions and understandings of a particular phenomenon rather than prematurely make more general claims.

IPA studies usually utilize small, purposely selected, sample sizes, partially because the analysis transcripts requires a lot of analysis efforts. IPA researchers strive for a fairly homogeneous sample extracted from a closely defined group for whom the research question is significant and whom, usually, have an understanding of the topic larkin2012interpretative; pietkiewicz2012praktyczny.

The research questions in IPA studies are usually framed broadly and openly since the intent is exploratory rather than explanatory larkin2012interpretative. IPA is a suitable method when one is trying to find out how individuals perceive a particular situation they are facing, and how they are making sense of their personal and social world smith2015qualitative. It is especially useful when one is concerned with complexity, process, or novelty brocki2006critical.

In relation to software engineering, we suggest that IPA would be the preferred choice when exploring the software engineers’ individual account on a number of tasks. Today, as a result of the agile transformation, software engineering organizations often emphasize and focus on the groups. In fact, the group has replaced the individual as the most important entity. IPA could thus be used to counterbalance the focus and provide a detailed and nuanced description of an individual experience. Examples of possible questions that could be addressed using IPA are ”How does a software engineer experience the transition from university to working life?” and ”What does organizational loyalty mean to a software developer?”.

6.2 Narrative Analysis

In an experiment conducted in the 1940s, psychologists Fritz Heider and Marianne Simmel demonstrate the importance of stories to humans heider1944experimental. In the experiment, the participants were shown a sequence of pictures that included abstract shapes such as squares, triangles, and lines. When the participants later were asked to describe the pictures, they replied by telling short stories.

According to Murray murray2003narrative, humans live their lives through stories and describe their experiences and their selves in terms of stories. Humans need stories to make sense of our lived experience, as they can help to connect our past, present, and future. It allows them maintain of a coherent self-identity kugelmann2001introducing; ricoeur2010time. It is through the use of stories that we define who we are, were and how we will be in the future.

Narrative research riessman2005narrative; kohler2000analysis is the study of stories. It seeks to uncover how humans make sense of an ever-changing world, based on a belief that it is through a narrative that we can bring a sense of order to the seeming disorder in our world murray2003narrative. It is often focused on life experiences of a single event or a series of events for a small number of individuals creswell2013research.

A pioneer of narrative research is the American psychologist Theodore R. Sarbin. The term narrative psychology was introduced 1986 in his book Narrative Psychology: The storied nature of human conduct sarbin1986narrative. Sarbin claimed that human behaviors are best explained using stories and that narrative should be a root metaphor in psychology. He also argued that narratives should be identified through qualitative research crossley2000introducing.

Narrative research examines how people construct their self-accounts and it is often used for scrutinizing how people manage their different senses of self burck2005comparing. It can, however, also lend itself to a global view of human experiences.

Moreover, narrative research data can be anything that provides information and details to a contextualized story. Examples of data are observations, diaries, letters, interviews, artifacts, and photographs petty2012ready. Still, the primary data source is the interview murray2003narrative.

One type of narrative interview is the so-called life-story interview, which aims to capture an extended account of the participants lives. These type of interviews are complicated and several interview occasions are often needed in order for the participant to feel secure enough to reflect on her/his life experiences. murray2003narrative Another interview type is the episodic interview flick2014introduction where the participants are encouraged to tell stories about particular experiences or disruptive episodes in their lives.

According to Murray murray2003narrative, the analysis of narrative accounts can be divided into two broad phases, i.e. the descriptive phase and the interpretive phase. In the first phase, the researchers briefly summarize the narratives, identifies their beginning, middle, and end, and captures their overall meaning and any particular issues raised by them. In the second phase, the researchers go beyond the descriptive and connect the narrative with the broader theoretical literature that is being used to interpret the story. This phase thus requires a simultaneous and deep intimacy with the narrative accounts and with the relevant literature.

In narrative studies, the collected data and the analyzed result may not only answer a research question. Even if a narrative inquiry is focused on a particular experience, it often reveals additional aspects of life that are not identified as the primary focus of the study overcash2003narrative. Therefore, the research questions shall be detailed enough to provide guidance in the research, but they should, at the same time, including a high degree of flexibility. Often, the research questions of narrative inquiry are refined during the research process as more insights of the phenomena are gained shkedi2005multiple; connelly2000narrative.

In relation to software engineering, we argue that narrative research could be used to explore a software engineer’s experiences of a number of tasks. As an example, software development has, the last twenty years, gone from being an individual occupation to become an occupation the requires teamwork and that emphasize collaboration and cooperation. We think that it would be interesting too, using a narrative inquiry, get insights into a software developer’s experiences of this major transformation process. In addition, we believe that the narrative method would be helpful in understanding the software engineers’ professional identity koller2012analyse.

6.3 Discourse Analysis

Discourse analysis is a broad term for the many traditions and methods by which discourse may be identified, defined, and analyzed morgan2010discourse. Discourse analysis is related to grammar analysis, but there are differences. Grammar analysis includes observation of sentence structure, word usage, and stylistic choices on the sentence level. Even if such analysis might include entities like culture, its focus is not human spoken discourse. Discourse analysis, in turn, observes the conversational, cultural, and use of language by its native population.

In discourse analysis, the words themselves are virtually meaningless to us. It is through the shared, mutually agreed use of language that meaning is created. The language shapes our understanding of reality and defines the creation and maintenance of social norms, the construction of personal and group identities, and the negotiation of social and political interaction crowe1998power; gee2014introduction; lyons1968introduction; chandler2007semiotics; starks2007choose.

In a broad sense, discourses are defined as systems of meaning that are related to the interactional and wider sociocultural context and operate regardless of the speakers’ intentions georgaca2012discourse. It should, however, be noted that there are many different traditions within discourse analysis and that these use their own, slightly different, definitions. There also exist difference views in what degree the individual is an actor in forming discourse or being influenced by existing discourses. As an example, within the Foucauldian research, discourse is defined as a group of statements, objects or events that represent knowledge about, or construct, a particular topic. Here, language is viewed as a social performance or a social action. It both creates social phenomena and is representative of social phenomena.

Given its emphasis on construction and function, discourse analysis does not make claims about the reality of people’s lives or experiences. Instead, it examines the ways in which reality and experience are constructed through social and interpersonal processes by the use of language georgaca2012discourse; starks2007choose. Discourse studies often do not provide a precise answer to a specific problem. Instead, they provide an understanding and a clarification of the essence of the problem and the underlying assumptions that enable its existence. They thus present a deeper and exhaustive view of the problem and how we are affected. In addition, discourse analysis can be used to reveal implicit and unacknowledged aspects of human behavior, and, for example, making salient either latent or dominant discourses in society.

Discourse analysis can be applied to any type of text, i.e. to anything that has meaning. However, most studies tend to analyze written or spoken language georgaca2012discourse; parker2015critical. Regarding data sampling and size, discourse analyses often rely on relatively small numbers of participants or texts. Partially because that analysis is very labour-intensive and large amounts of data would be prohibitive georgaca2012discourse; however, the sample size ultimately depends on the study objective. It is possible to in-depth analyze a single participant and compare it with written documents. On the other hand, if the objective is to understand variations in used language across persons or settings, a larger size is required. starks2007choose The discourse is independent of context and exists if it is defined in a conversation in the sauna or in the boardroom.

We recognize that discourse analysis could add value and insights to the research area of software engineering. For example, we believe that it would be a viable option to generate knowledge regarding the transformation of the meaning of the agile concept. Even though the agile approach has a definition and has been around for well over a decade, we argue that the values and beliefs that govern the concept has changed over time and that varies between different organizational roles. A deeper understanding of agile concepts would, for example, provide valuable information that could be used to improve large organizations transition to an agile methodology.

In addition, we think that discourse analysis could be used to understand the relation of power in agile software organizations. It could possibly shed light on the relationship between discourse and power, e.g. what discourse exist in software engineering organizations and how do they empower some groups while dis-empowering others.

6.4 Limitations and challenges

Many of the limitations and challenges are shared among the three described methods and also with qualitative methods in general. First of all, there are ethical challenges that the researchers must consider. The central principles of research ethics are informed consent, confidentiality, and avoidance of harm, which all creates dilemmas for qualitative studies houghton2010ethical.

Qualitative inquiries are discovery-oriented and it is hard to upfront anticipate what discoveries of human behavior, thoughts or feelings that will emerge. In such situation, informed consent becomes complicated since researchers cannot predict the scope of the study. To formulate a description that offers participants a comprehensive account of their experience of the study clearly becomes challenging mcleod1996qualitative; houghton2010ethical. This is certainly true for narrative analysis and also for IPA, where the boundaries are loose and the researchers are encouraged to have broad and inclusive research questions.

In addition, confidentiality can form a major challenge for, particularly, narrative analysis. In a narrative inquiry, the stories that unfold during interviews are likely to be unique, holding several clues and markers that could possibly identify the interviewee, which makes confidentiality challenging to fulfill. The risk could to some extent be mitigated by allow informants to read pre-publication drafts prior to publication. There are, however, limits to this procedure since the informants may not fully appreciate what may happen when the publication enters the public domain mcleod1996qualitative.

The third of the ethical consideration, i.e. avoidance of harm, involves predicting the risk-benefit ratio of the research, which, in qualitative research, often is difficult cutcliffe2002leveling. Yet again, since narrative inquiry and IPA are both flexible methods, it is hard to control the direction of the interview, which makes it hard for the researcher to foresee the result of interview questions and avoid potentially stressful situations for the interviewee.

Second, the nature of data collection in the three methods, observation and interviews alike, raises ethical issues related to how the relationships are formed and managed, to the nature of the power, and to how the relationship affects the participants, ginger2004toward; houghton2010ethical. There is also a high risk that the narrative and IPA style of interview become transformative or even therapeutic. This leaves a heavy burden of responsibility on the shoulders of the researcher, especially when the topic is sensitive to the interviewee liamputtong2013qualitative; stuhlmiller2001narrative; hunter2010analysing.

Third, a common critique of IPA and narrative analysis is their dependability of language. The informants may not always be able to accurately convey the subtleties and nuances of their experience willig2013introducing. According to Smith et al. smith1997interpretative, this could be managed by a professional researcher who can interpret the participants emotional state and ask follow-up questions.

The language dependability is also a challenge for the researcher. It can be difficult to put into words the rich knowledge extracted from qualitative inquiry, which may hold information that is subtle, hidden, and contextually bound kapoulas2012understanding. In addition, there are also inherent ambiguities in human language that need to be recognized by the researcher in the analysis atieno2009analysis. As an example, the word blue could signify the color, a political orientation or a state of mind.

Fourth, another challenge shared among the three methods is their process flexibility and their researcher dependence. Of the elements and processes of qualitative research, the analysis one is often the most sensitive kapoulas2012understanding. IPA and narrative analysis do not provide any formula for how to decide what parts of the data to highlight. What to emphasize thus relies heavily and solely on the researcher and her or his experience and knowledge wiles2005narrative. This implies that different conclusions can be derived based on the same information depending on the personal characteristics of the researchermaxwell2012qualitative.

Moreover, even if IPA includes a comprehensive described process consisting of several well-defined steps, the researchers are encouraged to engage it with flexibility and adapt the process to the phenomena under investigation smith1997interpretative. This flexibility and the engaged role of the researcher bring potential challenges to IPA studies cronin2015brief. IPA has been criticized for not providing guidelines on how to incorporate reflexivity (see section 5) into the process and for not specifying how researcher conceptions influence analysis. As a researcher, it might be difficult to keep the balance between being fully engaged while, at the same time, remaining unbiased foster2010adolescents; clancy2013reflexivity.

Method related flexibility is, also, challenging in discourse analysis. Generally, proponents of discourse analysis believe that meaning is never fixed and everything is therefore always open to interpretation and negotiation. Moreover, the vast number of options available through the various traditions might cause method problems, since each tradition has its own epistemological position, concepts, procedures, communication, and a particular understanding of discourse morgan2010discourse.

Moreover, the aim of a narrative inquiry is not to find one generalize truth, but rather many truths or narratives. Since these narratives are created between the participant and the researcher in a particular social and cultural context, it raises issues about if research findings can be seen as valid. However, it has also been argued that if a phenomenon exists in one setting, is it plausible to believe that it exists also in others polkinghorne2007validity; hunter2010analysing.

7 Conclusion

Supported by the results of a literature review, we conclude that future qualitative studies would benefit from utilizing a broader set of research methods, more strongly emphasizing reflexivity, and employing qualitative guidelines and quality criteria.

Three qualitative methods frequently used by social science researchers to explore organizational life and that potentially also could add value to software engineering research are interpretative phenomenological analysis, narrative analysis, and discourse analysis.

Moreover, we argue that reflexivity is highly important in software engineering studies since the researchers often have preconceived opinions of the phenomena under investigation and that the risk of bias therefore is high.

Finally, we recommend qualitative software engineering researchers to utilize quality criteria, such as the COREQ checklist or the Big-Tent criteria, as guidance when conducting their studies.

8 Acknowledgments

We acknowledge the support of Swedish Armed Forces, Swedish Defense Materiel Administration and Swedish Governmental Agency for Innovation Systems (VINNOVA) in the project number 2013-01199. Daniel Graziotin has been supported by the Alexander von Humboldt (AvH) Foundation.

References

9 Appendix A

Criterion Guiding question(s)
Domain 1: Research team and reflexivity
A: Personal Characteristics
1. Interviewer/facilitator Which author/s conducted the interview or focus group?
2. Credentials What were the researcher’s credentials?
3. Occupation What was their occupation at the time of the study?
4. Gender Was the researcher male or female?
5. Experience and training What experience or training did the researcher have?
B: Relationship with participants
6. Relationship established Was a relationship established prior to study commencement?
7. Participant knowledge of the interviewer What did the participants know about the researcher?
8. Interviewer characteristics What characteristics were reported about the interviewer/facilitator? Bias?
Domain 2: Study design
C: Theoretical framework
9. Methodological orientation and theory What methodological orientation was stated to underpin the study?
D: Participant selection
10. Sampling How were participants selected?
11. Method of approach How were participants approached?
12. Sample size How many participants were in the study?
13. Non-participation How many people refused to participate or dropped out?
E: Setting
14. Setting of data collection Where was the data collected?
15. Presence of non-participants Was anyone else present besides the participants and researchers?
16. Description of sample What are the important characteristics of the sample?
F: Data collection
17. Interview guide Were questions, prompts, guides provided by the authors? Was it pilot tested?
18. Repeat interviews Were repeat interviews carried out?
19. Audio/visual recording Did the research use audio or visual recording to collect the data?
20. Field notes Were field notes made during and/or after the interview or focus group?
21. Duration What was the duration of the interviews or focus group?
22. Data saturation Was data saturation discussed?
23. Transcripts returned Were transcripts returned to participants for comment and/or correction?
Domain 3: Analysis and findings
G: Data analysis
24. Number of data coders How many data coders coded the data?
25. Description of the coding tree Did authors provide a description of the coding tree?
26. Derivation of themes Were themes identified in advance or derived from the data?
27. Software What software, if applicable, was used to manage the data?
28. Participant checking Did participants provide feedback on the findings?
H: Reporting
29. Quotations presented Were participant quotations presented to illustrate the themes/findings?
30. Data and findings consistent Was there consistency between the data presented and the findings?
31. Clarity of major themes Were major themes clearly presented in the findings?
32. Clarity of minor themes Is there a description of diverse cases or discussion of minor themes?
Table 4: COREQ Quality criteria tong2007consolidated

10 Appendix B

(d) Quality indicators
Domain 1: Research team and reflexivity 24% Domain 2: Study design 39%
A: Personal Characteristics 28% C: Theoretical framework 78%
1. Interviewer/facilitator 9% 9. Methodological orientation and theory 78%
2. Credentials 43%
3. Occupation 39% D: Participant selection 40%
4. Gender 39% 10. Sampling 45%
5. Experience and training 9% 11. Method of approach 18%
12. Sample size 91%
B: Relationship with participants 17% 13. Non-participation 5%
6. Relationship established 17%
7. Participant knowledge of the interviewer 26% E: Setting 33%
8. Interviewer characteristics 9% 14. Setting of data collection 32%
15. Presence of non-participants 0%
Domain 3: Analysis and findings 28% 16. Description of sample 68%
G: Data analysis 28%
24. Number of data coders 27% F: Data collection 35%
25. Description of the coding tree 27% 17. Interview guide 55%
26. Derivation of themes 36% 18. Repeat interviews 5%
27. Software 45% 19. Audio/visual recording 80%
28. Participant checking 5% 20. Field notes 10%
21. Duration 50%
22. Data saturation 30%
23. Transcripts returned 15%
Table 5: Empirical overview of the result for the quality indicator properties defined by COREQ tong2007consolidated.