Artificial intelligence (AI) is one of the major driving forces to transform society and industry and has been successfully adopted in applications across a wide range of data-rich domains. The global market value of AI was assessed at USD $62.35 billion in 2020 and is expected to annually grow with a rate of 40.2% from 2021 to 2028. Although AI has significant potential and capacity to stimulate economic growth and improve productivity across a growing range of domains, there are serious concerns about the AI systems’ ability to behave and make decisions in a responsible manner.
Many ethical regulations, principles, and guidelines have been recently issued by governments, research institutions, and companies, which responsible AI technologies and systems are supposed to adhere to. However, these principles are high-level and can hardly be used in practice by technologists, AI experts and software engineers. On the other hand, responsible AI research has been focusing on algorithm solutions limited to a subset of issues such as fairness. The major challenge in achieving responsible AI is not just algorithms and data. Issues can enter at any point of the software engineering lifecycle and are often at the system-level crosscutting many components of AI systems. There is a lack of system-level responsible-AI-by-design guidance on how to design the architecture of responsible AI systems.
Therefore, this paper presents a summary of design patterns based on the results of a systematic literature review to deal with the system-level design challenges of responsible AI systems and build responsible-AI-by-design into AI systems. Rather than staying at the ethical principle-level or AI algorithm-level, this paper focuses on the system-level design patterns to operationalizing responsible AI. We perform a systematic literature review (SLR) on software engineering for responsible AI to summarize the patterns that can be embedded into the design of AI systems as product features to contribute to responsible-AI-by-design.
The remainder of the paper is organized as follows. Section 2 introduces the methodology. Section 3 describes the state diagram of a provisioned AI system. Section 4 presents a summary of the patterns for responsible-AI-by-design. Section 5 concludes the paper.
To identify patterns for responsible AI, we performed an SLR.The two research questions defined for the SLR are:
What responsible AI principles are addressed by the study?
What solutions for responsible AI can be identified?
The main data sources include ACM Digital Library, IEEE Xplore, Science Direct, Springer Link, and Google Scholar. The study only includes the papers that present concrete design or process solutions for responsible AI, and excludes the papers that only discuss high-level frameworks. A set of 159 primary studies was identified. The complete SLR protocol is available as online material 111https://drive.google.com/file/d/1Ty4Cpj_GzePzxwov5jGKJZS5AvKzAy3Q/view?usp=sharing. We use the ethical principles listed in Harvard University’s mapping study : Privacy, Accountability (professional responsibility is merged into accountability due to the overlapping definitions), Safety & Security, Transparency & Explainability, Fairness and Non-discrimination, Human Control of Technology, Promotion of Human Values.
3 State diagram of a provisioned AI System
Fig. 1 illustrates the state diagram of a provisioned AI system and highlights the patterns associating with relevant states or transitions, which show when the design patterns could take effect. We have limited the scope of the design patterns in this paper to the patterns that can be embedded into the AI systems provisioned as product features. The engineering best practices of the development process including some patterns related to model training is out of the scope of this paper. Once the AI system starts serving, it can be requested to execute a certain task. Decision-making may be needed before executing the task. Both the behaviors and decision-making outcomes of the AI system are monitored and validated. If the system is failed to meet the requirements (including ethical requirements) or a near-miss is detected, the system need to be updated. The AI system may need to be audited regularly or when major-failures/near-misses occur. The stakeholders can determine to abandon the AI system if it no longer fulfils the requirements.
4 Design Patterns
To operationalize responsible AI, as shown in Fig. 2, we define a pattern template based on the extended pattern form from , which includes the pattern name, the context defining the impacted stakeholders, the problem explaining the type of objective, the forces showing the degree of relevance to each principle which might be conflicting to each other, the solution listing the fine-grained mechanisms, and its consequences including benefits and drawbacks (e.g. complexity or cost). Due to the space and reference limitations, we exclude known uses from the template, but include one reference for each pattern in the discussion.
Fig. 3 lists a collection of patterns for responsible-AI-by-design.
Bill of materials
: From a software supply chain angle, AI product vendors often create AI systems by assembling commercial or open source AI and/or non-AI components from third parties. Bill of materials keeps a formal machine-readable record of the supply chain details of the components used in building an AI system, such as component name, version, supplier, dependency relationship, author and timestamp. In addition to supply chain details of the components, context documents (such as model cards) can also be recorded by the bill of materials. The purpose of bill of material is to provide traceability and transparency into the components that make up AI systems so that ethical issues can be tracked and addressed. Immutable data infrastructure is needed to store the data of bill of materials. For example, the manufacturers of autonomous vehicles could maintain a material registry contract on blockchain to track their components’ supply chain information, e.g., the version and supplier of the third-party navigation component.
Verifiable ethical credential: To improve human trust in AI systems, verifiable ethical credentials can be used as evidence of ethical compliance for AI systems, components, models, developers, operators, users, organizations, and development processes. Before using AI systems, users may verify the systems’ ethical credential to check if the systems are compliant with AI ethics principles or regulations . On the other hand, the users may be required to provide the ethical credentials to use and operate the AI systems. Publicly accessable data infrastructure needs to built to support the generation and verification for ethical credentials. For example, before driving an vehicle, the driver may be requested to scan her/his ethical credential to show she/he has the capability to drive safely, while verifying the ethical credential of the vehicle’s automated driving system shown on the center console.
Ethical digital twin: Before running AI system in real-world, it is important to perform system-level simulation through an ethical digital twin running on a simulation infrastructure to understand the behaviors of AI systems and assess ethical risk in a cost effective way. Ethical digital twin can also be used during operation of the AI system to assess the systems’ runtime behaviors and decisions based on the abstract simulation model using the real-time data. The assessment results can be sent back to alert the system or user before the unethical behavior or decision takes effect. For example, vehicle manufacturers can use the ethical digital twin to explore the limits of autonomous vehicles based on the collected real-time data .
Ethical sandbox: Given AI is a high stake technology, ethical sandbox can be applied to isolate AI components from non-AI components by running the AI component separately 
, e.g. sandboxing the unverified visual perception component. Maximal tolerable probability of violating the ethical requirements should be defined asethical margin for the sandbox. A watch dog can be used to limit the execution time of the AI component to reduce the ethical risk, e.g. only activating the visual perception component for 5 mins on the bridges built especially for autonomous vehicles.
AI mode switcher: Adopting AI or not can be considered as a major architectural design decision when designing a software system. For example, AI mode switcher offers users efficient invocation and dismissal mechanisms for activating or deactivating the AI component when needed. Kill switch is a special type of invocation mechanism which immediately shuts down the AI component and its negative effects, e.g. turning off the automated driving system and disconnect it from the internet  . The decisions made by the AI component can be executed automatically or reviewed by a human expert in critical situations. The human expert serves to approve or override the decisions (e.g. skipping the path generated by the navigation system). Human intervention can also happen after acting the AI decision through the fallback mechanism that reverses the system back to the state before executing the AI decision. A built-in guard ensures that the AI component is only activated within the predefined conditions (such as domain of use, boundaries of competence). Users can ask questions or report complaints/failures/near misses through a recourse channel.
Multi-model decision-maker: Multi-model decision-maker employs different models to perform the same task or enable a single decision , e.g., deploying different algorithms for visual perception. This pattern can improve the reliability by deploying different models under different context (e.g., different regions) and enabling fault-tolerance by cross-validating ethical requirements for a single decision (e.g., only accepts the same results from the employed models).
Homogeneous redundancy: Ethical failures in AI systems may cause serious damage to the humans or environment. Deploying redundant components (e.g., two brake control components) is considered as a solution to deal with the highly uncertain AI components that may make unethical decisions or the adversary hardware components that produce malicious data or behave unethically . A cross-check can be done for the outputs provided by multiple components of a single type.
Incentive registry: Incentives are effective in motivating AI systems to execute tasks in a responsible manner. An incentive registry records the rewards that correspond to the AI system’s ethical behavior and outcome of decisions 
, e.g., rewards for path planning. There are various ways to formulate the incentive mechanism, e.g., reinforcement learning,publicly accessible data infrastructure using blockchain. However, it is challenging to formulate the form of rewards as the ethical impact of AI systems’ decisions and behaviors might hardly to be measured for some of the ethical principles (such as human values). Furthermore, the incentive mechanism needs to be agreed by all the stakeholders who may have different views on the ethical impact. In addition, there may be trade-offs between different principles, which makes the design harder.
Continuous ethical validator: AI systems often require continual learning based on new data and have higher degree of risk that is caused by the autonomy of AI component. Rather than assessing ethical risk at a particular development step, continuous ethical validator continuously monitors and validates the outcomes of AI systems (e.g., the path recommended by the navigation system) against the ethical requirements . The outcomes of AI systems mean the consequences of decisions and behaviors of the AI systems, i.e., whether the AI system provides the intended benefits and behaves appropriately given the situation. The time and frequency of validation should be configured within the continuous validator. Version-based feedback and rebuild alert should be sent when the predefined conditions are met. Incentive registry can be adopted to reward/punish the ethical/unethical behavior or decisions of AI systems.
Ethical knowledge base: AI systems involve broad ethical knowledge, such as AI ethics principles, regulations, and guidelines. Unfortunately, such ethical knowledge is scattered in different documents (e.g., self-driving regulation) and is usually implicit or even unknown to developers who primarily focus on the technical aspects of AI systems. This results in negligence or ad-hoc use of relevant ethical knowledge in AI system development. Ethical knowledge base is built upon a knowledge graph to make meaningful entities, concepts and their rich semantic relationships are explicit and traceable across heterogeneous documents so that the ethical knowledge can be systematically accessed, analysed, used to support the use of AI systems 
. For example, there may be ethical quality issues with APIs (e.g., data privacy breaches or fairness issues). Thus, ethical compliance checking for APIs is needed to detect if any ethics violation exists. Ethical knowledge graphs can be built based on the ethical principles and guidelines (e.g. privacy knowledge graph based on privacy act) to automatically examine whether APIs are compliant with regulations for AI ethics. Call graph might also be needed for code analysis as there might be interactions between different APIs.
Co-versioning registry: Compared with traditional software, AI systems involve different levels of dependencies and require more frequent evolution due to their data-dependent behaviors. Co-versioning of the components of AI systems or AI assets provides end-to-end provenance guarantees across the entire lifecycle of AI systems 222https://dvc.org/. Co-versioning registry can track the co-evolution of components or AI assets. There are different levels of co-versioning: co-versioning of AI components and non-AI components, co-versioning of the assets within the AI components (i.e., co-versioning of data, model, code, configurations, and co-versioning of local models and global models in federated learning). A publicly accessible data infrastructure (e.g. using blockchain) can be used to maintain the co-versioning registry to provide a trustworthy trace for dependencies. For example, a co-versioning registry contract can be built on blockchain to manage different versions of visual perception models and the corresponding training datasets.
Federated learner: Despite the widely deployed mobile or IoT devices generating massive amounts of data, data hungriness is still a challenge given the increasing concern in data privacy. Federated learner preserves the data privacy by training models locally on the client devices and formulating a global model on a central server based on the local model updates , e.g., train the visual perception model locally in each vehicle. Decentralized learning is an alternative to federated learning, which uses blockchain to remove the single point of failure and coordinate the learning process in a fully decentralized way. In the event of negative outcomes, the responsible humans can be traced and identified by an ethical black box for accountability.
Ethical black box: Black box was introduced initially for aircraft several decades ago for recording critical flight data. The purpose of embedding an ethical black box in an AI system is to investigate why and how an AI system caused an accident or a near miss. The ethical black box continuously records sensor data, internal status data, decisions, behaviors (both system and operator) and effects . For example, an ethical black box could be built into the automated driving system to record the behaviors of the system and driver and their effects. All of these data need to be kept as evidence with the timestamp and location data. Designing the ethical black box is challenging as the ethical metrics need to be identified for data collection. Also, design decisions need to be made on what data should be recorded and where the data should be stored (e.g. using a blockchain-based immutable log or a cloud-based data storage).
Global-view auditor: When an accident happens, there might be more than one AI systems involved (e.g. multiple autonomous vehicles in an accident). The data collected from each of the involved AI systems might be conflicting to each other. Global-view auditor provides global-view accountability by finding discrepancies among the data collected from a set of AI systems and identifying liability when negative events occur. This pattern can also adapted to improve the decision-making of an AI system by taking the knowledge from other systems. For example, an autonomous vehicle may increase their visibility using the perceptions of others to make better decisions at runtime .
AI ethics principles are usually high-level and do not provide concrete guidance and engineering methods to developers on how to develop AI systems responsibly. To operationalize responsible AI, we collect a set of patterns that can be embedded into an AI system as product features to enable responsible-AI-by-design. The patterns are associated to the states or state transitions of a provisioned AI system, serving as an effective guidance for architects to design a responsible AI system.
-  J. Fjeld et. al., “Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for ai,” Berkman Klein Center Research Publication, no. 2020-1, 2020.
-  G. Meszaros et al., “A Pattern Language for Pattern Writing,” Pattern languages of program design, vol. 3, pp. 529–574, 1998.
-  The United States Department of Commerce, “The minimum elements for a software bill of materials (sbom),” 2021. [Online]. Available: https://www.ntia.doc.gov/files/ntia/publications/sbom_minimum_elements_report.pdf
-  W. Chu, “A decentralized approach towards responsible ai in social ecosystems,” 2021.
-  A. Dosovitskiy et al., “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, ser. PMLR, S. Levine, V. Vanhoucke, and K. Goldberg, Eds., vol. 78. PMLR, 13–15 Nov 2017, pp. 1–16.
-  A. Lavaei, B. Zhong, M. Caccamo, and M. Zamani, “Towards trustworthy ai: safe-visor architecture for uncertified controllers in stochastic cyber-physical systems,” in Proceedings of the Workshop on Computation-Aware Algorithmic Design for Cyber-Physical Systems, 2021, pp. 7–8.
-  Tesla, “Tesla autopilot,” 2015. [Online]. Available: https://www.tesla.com/autopilot
NeuroAILab, “Tfutils multi-model training for tensorflow,” 2018. [Online]. Available:http://neuroailab.stanford.edu/tfutils/fundamentals/multimodel.html
-  M. Nafreen, S. Bhattacharya, and L. Fiondella, “Architecture-based software reliability incorporating fault tolerant machine learning,” in RAMS’20, 2020, pp. 1–6.
J. Weng, J. Weng, J. Zhang, M. Li, Y. Zhang, and W. Luo, “Deepchain: Auditable and privacy-preserving deep learning with blockchain-based incentive,”IEEE Transactions on Dependable and Secure Computing, vol. 18, no. 5, pp. 2438–2455, 2021.
-  AWS, “Amazon sagemaker model monitor,” 2019. [Online]. Available: https://aws.amazon.com/sagemaker/model-monitor/
-  I. Naja, M. Markovic, P. Edwards, and C. Cottrill, “A semantic framework to support ai system accountability and audit,” in The Semantic Web, 2021, pp. 160–176.
-  S. Caldas et al., “Leaf: A benchmark for federated settings,” 2019. [Online]. Available: https://leaf.cmu.edu/
-  G. Falco and J. E. Siegel, “A distributed ‘black box’audit trail design specification for connected and automated vehicle data and software assurance,” arXiv preprint arXiv:2002.02780, 2020.
-  B. S. Miguel, A. Naseer, and H. Inakoshi, “Putting accountability of ai systems into practice,” in IJCAI’21, 2021, pp. 5276–5278.