Where Responsible AI meets Reality: Practitioner Perspectives on Enablers for shifting Organizational Practices

06/22/2020 ∙ by Bogdana Rakova, et al. ∙ Accenture Spotify Partnership on AI 0

Large and ever-evolving industry organizations continue to invest more time and resources to incorporate responsible AI into production-ready systems to increase algorithmic accountability. This paper examines and seeks to offer a framework for analyzing how organizational culture and structure impact the effectiveness of responsible AI initiatives in practice. We present the results of ethnographic interviews with practitioners working in industry, investigating common challenges, ethical tensions, and effective enablers. Focusing on major companies developing or utilizing AI, we have mapped what organizational structures currently support or hinder responsible AI initiatives, what aspirational future practices would best enable effective initiatives, and what key elements comprise the transition from current to aspirational future work practices.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

While the academic discussion of algorithmic bias has an over 20-year long history  (friedman1996bias), we have now reached a transitional phase in which this debate has taken a practical turn. The growing awareness of algorithmic bias and the need to responsibly build and deploy AI have lead increasing numbers of practitioners to focus on translating these calls within their domains (madaio20; holstein2019improving)

. New machine learning (ML) responsibility or fairness roles and teams are being announced, product and API interventions are being presented, and the first public successes - and lessons learned - are being disseminated  

(haydn2020). However, practitioners still face considerable challenges in attempting to turn theoretical understanding of potential inequities into concrete action  (holstein2019improving; Krafft20).

Gaps exist between what academic research prioritizes, and what practitioners need. The latter includes developing organizational tactics and stakeholder management (holstein2019improving; tutorialfat) rather than technical methods alone. Beyond the need for domain-specific translation, methods, and technical tools, this also requires operationalization within -or around- existing corporate structures and organizational change. Industry professionals, who are increasingly tasked with developing accountable and responsible AI processes, need to grapple with inherent dualities in their role (metcalfowners) as both agents for change, but also workers with careers in an organization with potentially misaligned incentives that may not reward or welcome change  (checklist20). Most commonly, practitioners have to navigate the interplay of their organizational structures and algorithmic responsibility efforts with relatively little guidance. As Orlikowski points out (orlikowski1992duality), whether designing, appropriating, modifying, or even resisting technology, human agents are influenced by the properties of their organizational context. This also means that some organizations can be differentially successful at implementing organizational changes. This tension is visible in research communities such as FaccT, AIES, and CSCW, where people answering calls to action with practical methods are sometimes met with explicit discomfort or disapproval from practitioners working within large corporate contexts. Within the discourse on unintended consequences of ML-driven system, we have seen both successes and very public failures - even within the same corporation  (haydn2020).

This paper builds on the prior literature in both organizational change and algorithmic responsibility in practice to better understand how these still relatively early efforts are taking shape within organizations. We know that attention to the potential negative impacts of machine learning is growing within organizations, but how to leverage this growing attention to drive effective change remains an open question. To this end, we present a study involving 26 semi-structured interviews with professionals in roles that involve concrete projects related to algorithmic responsibility concerns or ’fair-ML’ (fairness-aware machine learning  (selbst2019fairness)) in practice. We intend this to refer not only to fairness-related projects but also more broadly projects related to the work on responsible AI and accountability of ML products and services given the high degree of overlap in goals, research, and people working on these topics.

Using the data from the ethnographic interviews, we inform three clusters we here broadly classify as the prevalent, emergent, and aspirational future state of the intersection between organizational properties and work practices. We investigate practitioners’ perceptions of their own role, the role of the organizational structures in their context, and how change is enabled within the context of adopting responsible AI practices. We focus on perceived transitions within their current contexts, whether positive or negative, and identify enablers for such changes. In addition, we present the outcome of a workshop where workshop participants reflected upon early insights of this study through a structured design activity.

The main contribution of our work is the qualitative analysis of ethnographic data about the responsible AI work practices of practitioners in industry. We found that most commonly, practitioners have to grapple with lack of accountability, ill-informed performance trade-offs and misalignment of incentives within decision-making structures that are only reactive to external pressure. Emerging practices that are not yet widespread include the use of organization-level frameworks and metrics, structural support, proactive evaluation and mitigation of issues as they arise. For the future, interviewees aspired to have organizations invest in anticipating and avoiding harms from their products, redefine results to include societal impact, integrate responsible AI practices throughout all parts of the organization, and align decision-making at all levels with an organization’s mission and values. Preliminary findings were shared at an interactive tutorial during a large machine learning conference, which yielded organizational level recommendations to (1) create veto ability across levels, (2) coordinate internal and external pressures, (3) build communication channels, and (4) acknowledge the interdependent nature of the responsible AI work practices we have heretofore discussed.

2. Literature review

2.1. Algorithmic responsibility in practice

An almost overwhelming collection of principles and guidelines have been published to address the ethics and potential negative impact of machine learning. Mittelstadt et al. (mittelstadt2019ai) discuss over sixty sets of ethical guidelines, Zeng et al. (zeng2018linking) provide a taxonomy of 74 sets of principles, while Jobin et al. find 84 different sets of principles (jobin2019global). Even if there is relative, high-level agreement between most of these abstract guidelines  (jobin2019global; zeng2018linking), how they are translated into practice in each context remains very unclear  (mittelstadt2019ai). Insight is available from how companies changed their practices in domains such as privacy and compliance in response to legislative directives  (privacybook). The active debate on how requirements in the EU’s GDPR are to be interpreted  (kaminski2020multi; malgieri2020concept), however, illustrate the challenges of turning yet nascent external guidance into concrete requirements. Krafft et al.  (Krafft20) point out that even between experts, there is a disconnect between policymakers and researchers’ definitions of such foundational terms as ‘AI’. This makes the application of abstract guidelines even more challenging and raises the concern that focus may be put on future, similarly abstract technologies rather than current, already pressing problems.

The diverse breadth of application domains for machine learning suggests that requirements for applying guidelines in practice should be steered by the specific elements of the technologies used, specific usage contexts, and relevant local norms  (mittelstadt2019ai). Practitioners encounter a host of challenges when trying to perform such work in practice  (holstein2019improving). Organizing and getting stakeholders on board are necessary to be able to drive change  (tutorialfat). This includes dealing with imperfection, and realizing that tensions and dilemmas may occur when ‘doing the right thing’ does not have an obvious and widely agreed upon answer  (Fazelpour20; cramer2018assessing). It can be hard to foresee all potential consequences of systems while building them, and it can be equally difficult to identify how to overcome unwanted side-effects, or even why they occur technically  (googleAudit2020). A fundamental challenge is that such assessment should not simply be about technical, statistical disparities, but rather active engagement to overcome the lack of guidance decision-makers have on what constitute ‘just’ outcomes in non-ideal practice  (Fazelpour20). Additional challenges include organizational pressures for growth, common software development approaches such as agile working that focus on rapid releases of minimal viable products, and incentives that motivate a focus on revenue within corporate environments  (holstein2019improving; madaio20; haydn2020; checklist20). Taking inspiration from other industries where auditing processes are standard practice still means that auditing procedures have to be adjusted to product and organizational contexts, and require defining the goal of the audit in context  (googleAudit2020). This means that wider organizational change is necessary to translate calls to action into actual process and decision-making.

2.2. Organizational change, and internal/external dynamics

Current challenges faced by responsible AI efforts can be compared to a wide selection of related findings in domains such as legal compliance (trevino1999managing) where questions arise regarding whether compliance processes actually lead to more ethical behavior (krawiec2003cosmetic), diversity and inclusion in corporate environments  (barak2016managing; kalev2006best), and corporate privacy practices  (privacybook). All of these domains appear to have gone through a process that is mirrored in current algorithmic responsibility discussions: publication of high-level principles and values by a variety of actors, the creation of dedicated roles within organizations, and urgent questions about overcoming challenges and achieving ‘actual’ results in practice and how to avoid investing in processes that are costly but do not deliver beyond cosmetic impact.

As Weaver et al. pointed out in 1999  (weaver1999corporate), in an analysis of the Fortune 1000 ethics practices, success relies not only on centralized principles, but also their diffusion into managerial practices in the wider organization. Interestingly, while external efforts can effectively put reputational and legislative pressure on companies, internal processes and audits are just as important, and they all interact. As discussed by Bamberger and Mulligan  (privacybook), for corporate privacy efforts in particular, both external and internal forces are necessary for such work on corporate responsibility to be effective. Internally, they suggest focusing on getting onto ‘board level’ agendas to ensure attention and resourcing, having a specific boundary-spanning privacy professional to lead adoption of work practices, and ensuring ‘managerialization’ of privacy practices by increasing expertise within business units and integration within existing practices. They found that ambiguity in external privacy discussions could foster reliance on internal professionals’ judgements, and thus created autonomy and power for those professionals identified as leading in privacy protection. Externally, they suggest that creating positive ambiguity by keeping legislation broad can push more accountability onto firms for their specific domains, which can create communities and promote sharing around privacy failures. Thus, they illustrate how ambiguity - rather than a fully defined list of requirements - can actually help promote more reflection and ensure that efforts go beyond compliance.

A similar internal/external dynamic is visible within the algorithmic responsibility community. For example, in the Gender Shades project, Buolamwini and Gebru (buolamwini2018gender)

presented not only an external audit of facial recognition APIs, but also reactions from the companies whose services were audited to illustrate more and less effective responses. Such external audits can result in momentum inside of companies to respond to external critique, and in selected cases, to make concrete changes to their products. Internal efforts in turn have access to more data, ensure that auditing can be completed before public releases, develop processes for companies, and allow companies to take responsibility for their impact  

(googleAudit2020). Successes are beginning to emerge, and have ranged from positive changes on policy and process resulting from corporate activism, tooling built for clients or internal purposes, to direct product ‘fixes’ in response to external critique  (haydn2020). For example, Raji et al. (googleAudit2020) present an extensive algorithmic auditing framework developed by a small team within the larger corporate context of Google. They offer general methods such as data and model documentation  (modelcards) and also tools such as metrics to enable auditing in specific contexts like image search  (Mitchell20metrics). Implementing these methods and tools then requires corporate processes to provide the resources for such auditing and to ensure that results of audits impact decisions within the larger organizational structure.

2.2.1. Organizational research and structures

To situate our work in this broader context, we will briefly examine different perspectives on organizational structures. First, it is worthwhile to revisit what organizational theorist Wanda Orlikowski (orlikowski1992duality) called the duality of technology in organizations. Orlikowski discusses how people in organizations create and recreate meaning, power and norms. Orlikowski’s ’structurational’ model of technology comprises of these human agents, the technology that mediates their task execution, and the properties of organizations. The latter institutional properties range from internal business strategies, control mechanisms, ideology, culture, division of labor and procedures, communication patterns, as well as outside pressures such as governmental regulation, competition and professional norms, and wider socio-economic conditions. People’s actions are then enabled and constrained by these structures, which are themselves the product of previous human actions. This perspective was augmented by Orlikowski  (Orlikowski2000) to include a practice orientation; repeated interactions with technologies within the specific circumstances also enact and form structures.

Similarly, Dawson provides an extensive review of perspectives in studies on organizational change  (dawson2019reshaping) and discusses the ‘process turn’, where organizations are seen as ever-changing rather than in discrete states; what may appear as stable routines may in actuality be fluid. Dawson emphasizes the socially constructed process, and the subjective lived experiences: actors’ collaborative efforts in organizations unfold over time and dialogue between them shapes interpretations of changes. Such dynamics are also present in what organizational theorist Richard Scott (scott2015organizations) summarized as the rational, natural, and open perspective on organizations. ‘Rational’ organizations were seen as ‘the machine’, best suited to industries such as assembly line manufacturing where tasks are specified by pre-designed workflow processes. The ‘natural’ organization signified a shift in organizational ideology. No longer were people seen as mere appendages to the machines, but rather as crucial learners in relationship with machines. The metaphor is that of the organization as an ’organism’ with a strong interior vs. exterior boundary, and needs to ‘survive’. Similar to an organism, the organisation grows, learns, and develops. As a consequence of the survival ideology, the exterior environment can be seen as a threat against which the organism must adapt to survive. Scott however describes how the notion of ‘environment as threat’ was replaced by the realization that environmental features are the conditions for survival. The central insight emerging from ‘open’ systems thinking is that all organizations are incomplete and depend on exchanges with other systems. The metaphor became that of an ‘ecology’. Open systems are characterized by (1) interdependent flows of information and (2) interdependent activities, performed by (3) a shifting coalition of participants by way of (4) linking actors, resources and institutions, in order to (5) solve problems in (6) complex environments. For responsible AI efforts to succeed then, organizations must successfully navigate the changes necessary within ‘open’ systems.

2.2.2. Multi-stakeholder communities as meta organizational structures

The described ‘ecologies’, particularly in ‘open’ systems, contain formal and informal meta-organizational structures, which have been studied in other contexts and are of increasing growing importance to the field of responsible AI. Organizations often interact with each other through standards bodies, communities, processes, and partnerships. These meta-processes can have as goals (1) producing responses to proposed regulations, standards, and best practices, (2) fostering idea exchange between silos, and (3) self-regulation. Organizations participate in multi-stakeholder initiatives to achieve a number of their own goals, including advocating for business interests, keeping up to date on industry trends, and having a voice in shaping standards or regulations that they will then be subjected to.

Berkowitz  (Berkowitz2018) discusses the shift towards governance in sustainability contexts, and the key role that meta-organizations can have in facilitating meta-governance of corporate responsibility beyond simply complying with legislation. She identifies six capabilities needed for sustainable innovations: (1) anticipation of changes and negative impacts of innovation, (2) resilience to changes, (3) reflexivity, (4) responsiveness to external pressures and changing circumstances, (5) inclusion of stakeholders beyond immediate decision makers, and (6) comprehensive accountability mechanisms. Meta-organizations can promote inter-organizational learning and building of these six capabilities.

Similarly, within the field of AI, multi-stakeholder organizations, standards, and self-organized projects have been created in recent years to acknowledge the need for interdisciplinary expertise to grapple with the wide reaching impacts of AI on people. Many AI researchers have been vocal proponents of expanding the number of perspectives consulted and represented, including stakeholders such as policymakers, civil society, academics from other departments, impacted users, and impacted nonusers. Reconciling perspectives from diverse stakeholders presents its own set of challenges that change depending on the structure of the organization. Participatory action offers relevant frameworks for characterizing options for decision making in multistakeholder contexts. Decision making can be centralized within a formal organization with stakeholders being informed, consulted, involved, collaboration, or else stakeholders can self-organize informally to achieve the same levels of participation. The structures present at a meta organizational level will differ and enable the application of different group-level decision making processes. For example, ad hoc groups of researchers have self-organized to create unconference events and write multi-stakeholder reports, including reports with large groups of authors (e.g.  (brundage2020trustworthy)) based originally on discussions within workshops held under Chatham house rules, while others have created new formal organizations, conferences such as AIES or FaccT, or research institutes.

In a similar manner to Berkowitz  (Berkowitz2018), we focus here on the ’how’ of achieving more adoption of responsible AI work practices in industry. We further investigate how practitioners experience these changes within the context of different organizational structures, and what they see as the shifts that drive or hinder their work within their organizations.

3. Study and Methods

Our motivation for this work was to identify enablers that could shift organizational change towards adopting responsible AI practices. Responsible AI research has influenced organizational practices in recent years, with individuals and groups within companies informally or formally tasked with implementing research into action. Our research applies theories and frameworks of organizational structure and change management to characterize the growing practice of applied responsible AI. To better understand the implications of organizational structure on the day-to-day responsible AI work and outcomes, we interviewed practitioners who are actively involved in these initiatives by themselves or within a larger team.

We conducted 26 semi-structured interviews with people from 19 organizations based in 4 continents. Except for two 30 minute interviews, all other interviews lasted between 60 and 90 minutes. Participants were given a choice of whether to allow researchers to record the interview for note taking purposes. A total of 11 interviews were recorded. In cases where the interview was not recorded, we relied on writing down the respondents’ answers to the questions during the course of the interview. In several cases, participants requested to additionally validate any written notes and make necessary clarifications before their use in the study. Due to the sensitive nature of the questions we asked, we expect not to have complete information. Interviews were conducted between July and December 2019.

Role Respondents Fair-ML workstream framing
AI Strategy R16, R25 thought leadership; strategic planning; building external relationships; working with operating models; proactively figuring out the pain points and creating solutions;
Engineering R1, R14, R19, R21

fairness evaluations; internal checklists; implementing new capabilities; data science;

Human Resources R12, R13 assessment innovation; talent innovation research;
Legal R8, R20, R26 policy; legal counseling; investigating legal issues and questions; responsible AI; privacy; ethical and governance guidelines; comprehensive pillars; digital ethics;
Marketing and Sales R10, R24 algorithmic accountability; understand and explain what an algorithm does; fairness auditing and explainability in terms of bias;
ML Research R17, R22, R23 algorithmic audits; explainability; social impact of AI; sociotechnical systems; educational efforts; fairness;
Policy R4, R5, R6, R18 distribution of benefits from AI; norms; communication ability and navigating external expecations; fairness; mitigation of risk;
Product Management R2, R3, R7, R9, R11, R15 reconsidering the ML lifecycle; interpretability; influenced by broader industry trends; practical needs; responsible AI; ethics of technology; ethics review; auditing; bias assessment;
Table 1. Distribution of roles and fair-ML terms used among the interviewees when they were asked to describe their role (job description).

3.0.1. Sampling technique

Participants were recruited through a convenience sampling technique. We reached out to a number of practitioners according to the selection criteria described in the previous section. The goal in our first interactions with potential participants was to validate if they adequately fit the desired interviewee profile as well as share more with them about the purpose and goals for the project. Due to the limitations in our sampling approach, we expect not to have complete information.

Two recruiting criteria were used to find interviewees: (1) roles working closely with product, policy, and/or legal teams and (2) the outputs of their work had a direct impact on ML products and services (see Table 1). We filtered out individuals whose roles were solely research, although interviewees may also be active contributors to responsible AI research.

3.0.2. Questionnaire

The questions in the study were reviewed by an industrial-organizational psychologist and responsible AI practitioners within three different organizations. Questions were grouped into different sections, exploring the current state of the fair-ML work within the organization, evolution of the work through time, how the work is situated within the organization, how responsibility and accountability for the work are distributed, performance review processes and incentives, and what desired aspirational future structures and processes would enable more effective work. The full questionnaire can be found in Appendix A.

3.0.3. Analysis

To analyze the interview data, we utilized a standard methodology from contextual design - interpretation session and affinity diagramming (holtzblatt1997contextual). Through a bottom up affinity diagramming approach, we iteratively assigned codes to various concepts and themes shared by the interviewees. We iteratively grouped these codes into successive higher level themes and studied the relationships between them.

3.0.4. Workshop

In addition to the interviews, we organized a workshop at a conference attended by a highly self-selected group of responsible AI practitioners from industry, academia, government, and civil society. The first half of the workshop was a presentation of the high level insights from the literature review and results sections of this paper. We then conducted an interactive design exercise where participants were organized into 13 groups of between 4 and 6 individuals per group. Each group was given a scenario description of an AI organization that exemplified the prevalent work practices discussed in the results section below. The facilitators guided groups through a whiteboard discussion of the following questions:

  • What are examples of responsible AI work practices in the context of the scenario?

  • What are examples of practices in the prevalent organizational structure which are outside of the scope of responsible AI work but which act to protect and enable fair-ML practices?

  • What kinds of connections exist between these practices?

  • What practices or organizational structures could enable positive self-reinforcing outcomes through making the connections stronger?

The workshop activity was designed to allow participants to (1) gain a deeper understanding of the responsible AI challenges by connecting study findings to their own experiences, (2) collaboratively explore what organizational structures could enable the hypothetical organization developing AI products and services to resolve them through, and (3) map interdependencies and feedback loops that exist between practices to identify potentially effective recommendations to solve responsible AI challenges.

4. Results

We start with a high level overview of our findings followed by a discussion of the key themes that emerged from the conducted interviews.

4.1. Overview

About a quarter of the participants had initiated their fair-ML work in their current organization within the past year (7 out of 26) compared to 73% (19 out of 26) whose efforts had started more than an year ago. More than half of the interviewees were doing this work as individuals and not as part of a team (14 out of 26). Lastly, one third of the respondents reported that they are currently volunteering their time to do fair-ML work (11 out of 26) while the remaining 15 out of 26 participants had official roles related to responsible AI. Among the 15 interview participants with official roles related to responsible AI, 8 individuals were externally hired into their current role, while 7 transitioned into it from other roles within their organization. Interviewees who changed the focus of their existing roles or transitioned into responsible AI-related role were most commonly previously in project management roles (4 out of 7); in one case, an individual was in a product counsel role; and in two cases interviewees were in research roles. The majority of participants who had official responsible AI-related roles reported benefiting from an organizational structure that allowed them to craft their own role in a very dynamic and context-specific way.

There were various common perspectives that we heard practitioners express repeatedly. We saw the need for a multi-faceted thematic analysis which encompasses three intuitive clusters of data: (1) currently dominant or prevalent practices, (2) emerging practices, and (3) aspirational future context for fair-ML work practices in industry.

  • The prevalent practices comprise what we saw most commonly in the data.

  • The set of emerging practices includes practices which are shared among practitioners but less common than prevalent practices.

  • The aspirational future consists of the ideas and perspectives practitioners shared when explicitly asked about what they envision for the ideal future state of their work within their organizational context.

Within the thematic analysis, we saw commonalities between practitioners mapping to four related but distinct transitions across the prevalent, emerging, and the aspirational future work practices. We summarize them in Table  2. We discuss each theme and its corresponding subthemes in the discussion below.

Prevalent practices Emerging practices Aspirational future
When and how do we act? Reactive
Organizations act only when pushed by external forces (e.g. media, regulatory pressure)
Proactive
Organizations act proactively to address potential fair-ML issues
Anticipatory
Organizations have deployed frameworks that allow for anticipating risks
How do we measure success? Performance
trade-offs

Org-level conversations about fair-ML dominated by ill-informed performance trade-offs
Provenance
Org-level metrics frameworks and processes are implemented to evaluate fair-ML projects
Concrete results
Concepts of results are redefined to include societal impact through data-informed efforts
What are the internal structures we rely on? Lack of
accountability

Fair-ML work falls through the cracks due to role uncertainty
Structural support
Scaffolding to support Fair-ML work begins to be erected on top of existing internal structures
Integrated
Fair-ML responsibilities are integrated throughout all business processes related to product teams
How do we resolve tensions? Fragmented
Misalignment between individual and team incentives and org-level mission statements
Rigid
Overly rigid organizational incentives demotivate addressing ethical tensions in fair-ML work
Aligned
Ethical tensions in work are resolved in accordance with org-level mission and values
Table 2. Trends in the common perspectives shared by diverse fair-ML practitioners.

4.2. When and how do we act?

4.2.1. Prevalent work practices

Most commonly, interviewees described their fair-ML work practices as reactive. The most prevalent incentives for action were catastrophic media attention and decreasing media tolerance for the status quo. Fair-ML work is often perceived as a taboo topic. The question of ”whose job is this” is a common response to people who bring fair-ML work into discussions, which can lead to unproductive discussions of important and often unvoiced concerns. Fair-ML work is often not compensated or is perceived as too complicated. In several cases, the formation of a team to conduct fair-ML work was catalyzed by negative results from volunteer-led investigations into potential bias issues within models that were en route to deployment. Some practitioners reported that they have been able to use reputational risk as a leverage point - if they cannot make a legal argument about their fair-ML concerns, practitioners could make an argument by posing a hypothetical scenario such as ”What if ProPublica found out about …?”.

4.2.2. Emerging work practices

Among the emerging practices that practitioners shared, we found that some organizations have implemented proactive fair-ML evaluation and review processes, often distributed across several teams. For example, some respondents reported support and oversight from legal teams. A smaller number of interviewees reported that their work on fair-ML is acknowledged and compensated. In a few cases, grassroots actions and internal advocacy with leadership from proactive champions have made fair-ML a company-wide priority. Participants reported leveraging existing internal communication channels to organize internal discussions. One participant captured screenshots of problematic algorithmic outcomes and circulated them among key internal stakeholders to build support for fair-ML work. Respondents reported a growing number of both internal and external educational initiatives - onboarding and upskilling employees through internal fair-ML educational curricula to educate employees about responsible AI-related risks as well as externally facing materials to educate customers.

4.2.3. Mapping the aspirational future

When asked about their vision for what structures would best support their fair-ML initiatives in an ideal future, many interviewees envisioned organizational frameworks that would encourage anticipating risks. In this aspirational future, their organization would utilize clear and transparent communication strategies both internally within the entire organization and externally with customers and other stakeholders. There would also be technical tools to orchestrate fair-ML evaluations both internally and externally: algorithmic models developed by product teams would be assessed and scrutinized within the organization, while externally, customers using the algorithmic models in different contexts have oversight through explicit assessments. One practitioner questioned if their team should even engage with customers who do not agree to deploy an assessment framework ex-ante. Respondents reported that in the ideal future, product managers would have an easier way to understand responsible AI concerns relevant to their products without investing time in reading large amounts of research papers. Several participants expressed that the traditional engineering mindset would need to become better aligned with the dynamic nature of fair-ML issues which cannot be fixed in predefined quantitative metrics. Anticipatory responsible AI frameworks and mindset could allow organizations to respond to the fair-ML challenges in ways which uphold organizational code of ethics and society’s values at large.

4.3. How do we measure success?

4.3.1. Prevalent work practices

The majority of respondents reported that one of the biggest challenges is a lack of metrics that adequately capture the true impact of fair-ML work. The challenges of measuring the impact of responsible AI is a deeply researched topic in the field of fairness, accountability, and transparency of ML. Through the design of the ethnographic questionnaire, we have tried to further disentangle the perspectives on this challenge in industry. For example, some industry practitioners reported that the use of inappropriate and misleading metrics is a bigger threat than the lack of metrics. Respondents shared that academic metrics are very different than industry metrics, which include benchmarks and other key performance indicators tracked by product teams. Project managers reported trying to implement academic metrics in order to both leverage academic research and facilitate a collaboration between research and product teams within their organization. Practitioners embedded in product teams explained that they often need to distill what they do into standard metrics such as number of clicks, user acquisition, or churn rate, which may not apply to their work. Most commonly, interviewees reported being measured on delivering work that generates revenue. They spoke at length about the difficulties of measuring fair-ML impact in terms of impact on the bottom line. In some cases, practitioners used the argument that mitigating fair-ML risks prior to launch is much cheaper than fixing problems that are revealed only once a product or service is launched to frame their impact in terms of profitability.

4.3.2. Challenges

The majority of respondents expressed at least some degree of difficulty in communicating the impact of their work. The metrics-related challenges they described included: (1) product teams often have short-term development timelines and thus do not consider metrics that aim to encompass long-term outcomes; (2) time pressure within fast-paced development cycles leads individuals to focus on short-term and easier to measure goals; (3) qualitative work is not prioritized because it requires skills that are often not present within engineering teams; (4) leadership teams may have an expectation for ‘magic’, such as finding easy to implement solutions, which in reality may not exist or work; (5) organization do not measure leadership qualities and (6) do not reward the visionary leaders who proactively address the responsible AI issues that arise; (7) performance evaluation processes do not account for fair-ML work, making it difficult to impossible for practitioners to be rewarded or recognized for their fair-ML contributions.

4.3.3. Emerging work practices

A few interviewees reported that their organizations have implemented metrics frameworks and processes in order to evaluate fair-ML risks in products and services. These organizations have moved beyond ethics washing (bietti2020ethics) in order to accommodate for diverse and long-term goals aligned with a fair-ML practice. Interviewees identified the following enablers for this shift in organizational culture: (1) rewarding a broad range of efforts focused on internal education; (2) rewarding risk-taking for the public good by following up on potential issues with internal investigations; (3) creating organizational mechanisms that enable cross-functional collaboration.

4.3.4. Mapping the aspirational future

In an aspirational future where fair-Ml work is effective and supported, interviewees reported that their organizations would measure success very differently: (1) their organizations would have a tangible strategy incorporate fair-ML practices or issues into the key performance indicators of product teams; (2) teams would employ a data-driven approach to manage ethical challenges and ethical decisions in product development; (3) employee performance evaluation processes would be redefined to encompass qualitative work; (4) organizational processes would enable practitioners to collaborate more closely with marginalized communities, while taking into account legal and other socio-technical considerations; (5) what is researched in academic institutions would be more aligned with what is needed in practice; (6) collaboration mechanisms would be broadly utilized. Specifically, participants discussed two kinds of mechanisms to enable collaboration: (1) working with external groups and experts in the field to define benchmarks prior to deployment, and (2) working with external groups to continuously monitor performance from multiple perspectives after deployment.

4.4. What are the internal structures we rely on?

4.4.1. Prevalent work practices

Most commonly, participants reported ambiguity and uncertainty about role definitions and responsibilities within responsible AI work at their organization, sometimes due to how rapidly the work is evolving. Multiple practitioners expressed that they needed to be a senior person in their organization in order to make their fair-ML related concerns heard. Several interviewees talked about the lack of accountability across different parts of their organization, naming reputational risk as the biggest incentive their leadership sees for the work on fair-ML.

4.4.2. Emerging work practices

Interviewees shared these emerging organizational traits as enablers for fair-ML work: (1) flexibility to craft their roles dynamically in response to internal and external factors; (2) distributed accountability across organizational structures and among teams working across the entire product life cycle; (3) accountability integrated into workflows; (4) processes to hold teams accountable for what they committed to; (5) escalation of fair-ML issues to management; (6) fair-ML research groups that contribute to spreading internal awareness of issues and potential solutions; (7) internal review boards that oversee fair-ML topics; (8) publication and release norms; (9) cross-functional responsible AI roles that work across product groups, are embedded in product groups, and/or collaborate closely with legal or policy teams. Participants reported being increasingly cognizant of external drivers for change such as cities and governments participating to create centers of excellence.

4.4.3. Mapping the aspirational future

In an aspirational future where their organization effectively enables fair-ML work, interviewees envisioned internal organizational structures that would enable fair-ML responsibilities to be integrated throughout all business processes related to the work of product teams. One practitioner suggested that while a product is being developed, there could be a parallel development of product-specific artefacts that assess and mitigate potential responsible AI issues. The majority of interviewees imagined that fair-ML reviews and reports would be required prior to release of new features. New ML operations roles would be created as part of fair-ML audit teams. Currently, this work falls within ML engineering, but respondents identified the need for new organizational structures that would ensure that fair-ML concerns are being addressed while allowing ML engineers to be creative and experiment. For example, one practitioner suggested that a fair-ML operations role could act as a safeguard and ensure that continuous fair-ML assessments are being executed once a system is deployed. Some interviewees described the need for organizational structures that enable external critical scrutiny. Scale could be achieved through partnership-based and multistakeholder frameworks. In the future, public shaming of high-stakes AI failures would provide motivation towards building shared industry benchmarks, and structures would exist to allow organizations to share benchmark data with each other. External or internal stakeholders would need to call out high impact failure use cases to enable industry-wide learning from individual mistakes. Industry-wide standards could be employed to facilitate distributed accountability and sharing of data, guidelines, and best practices.

4.5. How do we resolve tensions?

4.5.1. Prevalent work practices

The majority of respondents reported that they see misalignment between individual, team, and organizational level incentives and mission statements within their organization. Often, individuals are doing ad hoc work based on their own values and personal assessment of relative importance. Similarly, the spread of information relies on individual relationships. Practitioners reported relying on their personal relationships and ability to navigate multiple levels of obscured organizational structures to drive fair-ML work.

4.5.2. Emerging work practices

One of the biggest emerging tensions practitioners reported was that as fair-ML ethical tensions are identified, overly rigid organizational incentives may demotivate addressing them, compounded by organizational inertia which sustains those rigid incentives. In this situation, research and product teams struggle to justify research agendas related to fair-ML due to competing priorities. Moreover, industry specific, product related problems may not have sufficient research merit, or researchers may not be able to publish work on these problems. This may be due to the nature of the problem or due to privacy reasons. Since data used in the research experiments may not allow researchers to be recognized for their work, this may ultimately discourage them from investigating real world fair-ML issues.

Other impeding factors to fair-ML work identified by interviewees included: (1) incentives that always reward complexity whether or not it is needed - individuals are rewarded for complex technical solutions; (2) lack of clarity around expectations and internal or external consequences; (3) impact of fair-ML work being perceived as diffuse and hard to identify; (4) lack of adequate support and communication structures - whether interviewees were able to address fair-ML tensions often depended on their network of high trust relationships within the organization; (4) lack of data for sensitive attributes, which can make it impossible to evaluate certain fair-ML concerns.

4.5.3. Mapping the aspirational future

When asked about their vision for the future of their responsible AI initiatives, participants identified several more ideal ways to approach ethical tensions. Fair-ML tensions would be addressed in better alignment with organization-level values and mission statements (see the concrete question asked in the Appendix, Section A.1.5). Organizational leadership would understand, support, and engage deeply with fair-ML concerns, which are contextualized within their organizational context. Fair-ML would be prioritized as part of the high-level organizational mission and then translated into actionable goals down at the individual levels through established processes. Spread of information would go through well-established channels so that people know where to look and how to share information. With those processes in place, finding a solution or best practice in one team or department would lead to rapid scaling via existing organizational protocols and internal infrastructure for communications, training, and compliance. Organizational culture would be transformed to enable (1) releasing the fear of being scrutinized as a roadblock for allowing external critical review and (2) distributing accountability for fair-ML concerns across different organizational functions. Every single person in the organization would understand risk, teams would have a collective understanding of risk, while organizational leadership would talk about risk publicly, admit when failures happen, and take responsibility for broader socioeconomic and socio-cultural implications.

5. Discussion

The transition between the prevalent or emerging work practices and the aspirational future will need to be adapted for each organization and team’s unique socio-technical context. However, there are likely similar steps or tactics that could lead to positive outcomes across organizations and teams. The following themes emerged from the workshop activity, which allowed groups to create landscapes of practices based on their own experiences and then illuminate connections and feedback loops between different practices. Participants were given a simple scenario describing the prevalent work practices and organizational structure of an AI product company in industry as described in the Study and Methods section. They then engaged in identifying enablers and tensions elucidating current barriers and pointing the way towards possible solutions.

5.0.1. The importance of being able to veto an ML system

Multiple groups mentioned that before considering how the fairness or societal implications of an ML system can be addressed, it is crucial to ask whether an ML system is appropriate in the first place. It may not be due to risks of harm, or the problem may not need an ML solution. Crucially, if the answer is negative, then work must stop. They recommended designing a veto power that is available to people and committees across many different levels, from individual employees via whistleblower protections, to internal multidisciplinary oversight committees to external investors and board members. The most important design feature is that the decision to cease further development is respected and cannot be overruled by other considerations.

5.0.2. The role and balance of internal and external pressure to motivate corporate change

The different and synergistic roles of internal and external pressure was another theme across multiple groups’ discussions. Internal evaluation processes have more access to information and may provide higher levels of transparency, while external processes can leverage more stakeholders and increase momentum by building coalitions. External groups may be able to apply pressure more freely than internal employees that may worry about repercussions for speaking up.

5.0.3. Building channels for communication between people (employees and leadership, leadership and board, users and companies, impacted users and companies)

Fundamentally organizations are groups of people, and creating opportunities for different sets of people to exchange perspectives was another key enabler identified by multiple groups. One group recommended a regular town hall for employees to be able to provide input into organization-wide values in a semi-public forum.

5.0.4. Sequencing these actions will not be easy because they are highly interdependent

Many of the groups identified latent implementation challenges because the discussed organizational enablers work best in tandem. For example, whistleblower protections for employees and a culture that supports their creation would be crucial to ensure that people feel safe speaking candidly at town halls.

These themes are shared as a starting point to spark experimentation. Further pooling of results from trying these recommendations would accelerate learning and progress for all towards achieving positive societal outcomes through scaling responsible AI practices.

6. Conclusion

As ML systems become more pervasive in society, there is growing interest and attention in protecting people from harms while also equitably distributing the benefits from these systems. This has led researchers to focus on algorithmic accountability and transparency as intermediary goals on the path to better outcomes. The impact of ML systems on people cannot be changed without considering the people who build them and the organizational structure and culture of the human systems within which they operate. An ethnographic methodological approach has allowed us to build rich context around the people and organizations building and deploying ML technology in industry. We have utilized this qualitative approach here in order to investigate the organizational tensions that practitioners need to navigate in practice. We describe existing enablers and barriers for the uptake of responsible AI practices and map a transition towards an aspirational future that practitioners describe for their work. In line with earlier organizational research, we emphasize such transitions are not to be seen as linear movements from one fixed state to another, rather they represent ever-changing nature of organizational contexts themselves.

References

Appendix A Questionnaire

a.1. Describing current work practices related to fairness, accountability, and transparency of ML products and services

a.1.1. Describe tour role

  1. What is your formal role by title?

  2. How would you describe your role?

  3. Is your formal role matched to your actual role?

    • If not, how is it not?

  4. Is your organization flexible in the way it sees roles?

    • If not, what is it like?

  5. How did you assume your role?

    • If hired in, were you hired externally or transitioned?

    • If you transitioned, where did you transition from?

    • Does your company generally move people around fluidly?

    • Does your company reward broad knowledge/skills across different industries or specializations?

  6. How did your role change over time?

    • From a responsibility perspective?

    • From a people perspective?

  7. Is role scope change typical at your company?

  8. If yes, what does it typically look like - is it…

    • Scope creep?

    • Is it explicitly within your job description?

    • Planned role expansion?

    • Stretch assignments?

  9. Do you have autonomy to make impactful decisions?

    • If yes, how?

    • If no, what is the case instead?

a.1.2. How did your fair-ML effort start?

  1. Was it officially sponsored?

    • If yes, by whom - what level of leader?

    • If no, who launched the effort - was it a team? A motivated individual? What level of leadership?

  2. Why did the effort start?

  3. Was the effort communicated to employees?

    • Who was it communicated to?

    • How was it communicated?

  4. Is the effort part of a program or stand-alone?

  5. Is it tied to a specific product’s development or launch?

    • What is the product?

    • What is its primary use case?

    • Who is the primary end user?

    • When is it slated to launch?

  6. Are you part of a team or doing this kind of work by yourself?

  7. Is it a volunteering effort?

    • If so, are you getting rewarded or recognized for your time? How?

  8. What types of activities have been done or are planned?

  9. Are you actively collaborating with external groups? What groups and why?

a.1.3. Responsibility and accountability

  1. Who is accountable for aspects around risk or unintended consequence…

    • Identifying risk?

    • Solutioning against risk?

    • Fixing mistakes?

  2. Avoiding negative impact, including press? Is your sponsor connected to risk management?

    • How so?

    • What is their level of accountability relative to risk?

    • What are they responsible for doing?

  3. Who are your main stakeholders?

  4. What are the other departments you work with?

    • How has that changed since the effort launched?

    • What was the business case for the fair-ML work/team?

    • What teams are adjacent to this effort (i.e., not directly involved but ”friends and family”) - is there an active compliance function, e.g.?

    • (if no answer, probe for e.g. product teams, compliance, trust and safety type teams, or value-based design efforts, ethics grassroots etc)

    • What are other efforts in your organization that are similar to Accountability work for instance Diversity & Inclusion and what does that look like? Is there general support for this type of effort?

  5. Do you feel there is support for this effort?

    • Why or why not?

    • Who supports it (what company career level, function, role and/or geography)?

    • Who doesn’t support it?

  6. Would you say this effort aligns to company culture? How or how not?

  7. Is scaling possible?

    • If so, do you intend to scale?

    • If not, why not?

a.1.4. Performance, rewards, and incentives

  1. How is performance for your Algorithmic Accountability effort defined at your company?

  2. What are you evaluated on in your role?

  3. What works about the way performance is measured? What are some flaws?

  4. What does your performance management system/compensation structure seek to incentivize people to do (what is the logic behind the approach)?

  5. What does your performance management system/compensation structure actually incentivize people to do?

  6. What kind of person gets consistently rewarded and incentivized?

a.1.5. Risk culture

  1. How do you work with external communication teams - PR, Policy?

    • Who owns that relationship - is it a centralized team?

    • What is that comms teams primary accountability (e.g., press releases, think pieces, etc)?

    • Has the team managed risk before?

    • Is the team mobilized to manage risk?

  2. How do you work with Legal?

    • Is it a visible function in the organization?

    • Does it have authority to make decisions and company policy, from your PoV?

    • How do you engage with communities?

    • What types of communities?

    • What does this look like?

    • What types of communication have you set up?

  3. What are the ethical tensions that you/your team faces?

  4. On a scale of 1-5, what is your level of perception of your company’s risk tolerance?

a.2. Future dream state - a structured way of getting a mapping of future dream state

  1. What is your company’s current state for fair-ML practice? (people, process, technology)

  2. What is your vision for the future state of the fair-ML practices?

  3. What do you need to change to get to the future state?

  4. What do you need to retire to get to the future state?

  5. What can be salvaged/repurposed?

a.3. Ending notes

  1. What is the best about your current set up?

  2. How would you summarize the largest challenges? Aka what do you like least?

  3. Is there anything that I should have asked about?