MAIR: Framework for mining relationships between research articles, strategies, and regulations in the field of explainable artificial intelligence

by   Stanisław Gizinski, et al.

The growing number of AI applications, also for high-stake decisions, increases the interest in Explainable and Interpretable Machine Learning (XI-ML). This trend can be seen both in the increasing number of regulations and strategies for developing trustworthy AI and the growing number of scientific papers dedicated to this topic. To ensure the sustainable development of AI, it is essential to understand the dynamics of the impact of regulation on research papers as well as the impact of scientific discourse on AI-related policies. This paper introduces a novel framework for joint analysis of AI-related policy documents and eXplainable Artificial Intelligence (XAI) research papers. The collected documents are enriched with metadata and interconnections, using various NLP methods combined with a methodology inspired by Institutional Grammar. Based on the information extracted from collected documents, we showcase a series of analyses that help understand interactions, similarities, and differences between documents at different stages of institutionalization. To the best of our knowledge, this is the first work to use automatic language analysis tools to understand the dynamics between XI-ML methods and regulations. We believe that such a system contributes to better cooperation between XAI researchers and AI policymakers.



There are no comments yet.



AI Federalism: Shaping AI Policy within States in Germany

Recent AI governance research has focused heavily on the analysis of str...

On the Importance of Domain-specific Explanations in AI-based Cybersecurity Systems (Technical Report)

With the availability of large datasets and ever-increasing computing po...

Measuring Ethics in AI with AI: A Methodology and Dataset Construction

Recently, the use of sound measures and metrics in Artificial Intelligen...

Methods Matter: A Trading Agent with No Intelligence Routinely Outperforms AI-Based Traders

There's a long tradition of research using computational intelligence (m...

Can Explainable AI Explain Unfairness? A Framework for Evaluating Explainable AI

Many ML models are opaque to humans, producing decisions too complex for...

Social Responsibility of Algorithms

Should we be concerned by the massive use of devices and algorithms whic...

Explainable Reinforcement Learning: A Survey

Explainable Artificial Intelligence (XAI), i.e., the development of more...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Artificial intelligence methods are playing an increasingly important role in global economics. The growing importance and, at the same time, the risks associated with AI are driving a vibrant discussion about the responsible development of artificial intelligence. Examples of negative consequences resulting from black-box models show that interpretability, transparency, safety, and fairness are essential yet sometimes overlooked components of AI systems. Efforts to secure the responsible development of AI systems are ongoing at many levels and in many communities, both policymakers and academics (gill_responsible_2020; barredo_arrieta_explainable_2020; baniecki_dalex_2020).

Naturally, national strategies for the development of responsible AI, sector regulations related to the safe use of AI, as well as academic research related to new methods that ensure the transparency and verifiability of models are all interrelated. Strategies are based on discussions in the scientific community and are often sources of inspiration for subsequent research work. The need for regulation stems from risks, often identified by the research community, but when regulations are created, they become a powerful tool for developing methods to meet expectations. Scientific work in AI is particularly strongly connected to the economy, which means that a large part of it responds to the threads identified in regulations and strategies.

Although this impact is strong, we know little about the dynamics and structure of this impact. Analyses of AI-related policies are carried out by the OECD AI Policy Observatory111, and by the European Commission’s AI Watch222 at the European level. However, academics working in responsible AI are most often locked in an information bubble of articles on XAI that are discussed at prestigious conferences, journals, and on preprint servers such as arXiv333 This interaction is complicated by the different aims of the stakeholders developing AI solutions versus the stakeholders that use AI solutions, who typically do not understand their limitations.

We know that there is a gap between the expectations (enshrined in strategies and regulations) and the reality (presented in research papers) related to AI (krafft_defining). And due to the increasing number of documents, it is close to impossible to analyze this gap by manually analyzing all source documents.

1.1 Our Contribution

To address this problem, we need a standardized knowledge base that can be processed in an automated way. In this paper, we present the concept and implementation of such a framework. To achieve that, we build a set of tools for scrapping, filtering, and preprocessing relevant documents. Our system extract information from documents using Natural Language Processing (NLP). The proposed framework processes not only AI regulations, which have been developed relatively recently, but also guidelines, whitepapers, and academic articles. To study the dynamics of influence between academia and policymakers, we must detect interconnections between papers and policy documents, both explicit (citations, references) and implicit (similarities in approach to concepts, same authors affiliations). The tools developed in this study allow following the process of institutionalizing ideas on how technologies associated with artificial intelligence should be regulated.

The described system is, to our knowledge, the first such solution that combines research papers, strategies, and regulations with rich annotations. In the second part of the paper, we showcase a set of analyses that such a system can perform. However, this is by no means an exhaustive list of use cases. In this work, we focused on XAI papers, but the proposed method could be used more broadly to analyze any subfield of AI. We believe that this work creates the foundation for future analyses of cross-dependencies between strategies, articles, and regulations.

1.2 Relation to social sciences

Studying the developments of regulations regarding AI by using automated AI systems is interesting, not only because of practical reasons. This framework not only allows discovering the directions in which regulations develop. Additionally, it could be used to study a topic that has always been important for social science – the relation between humans and technology. At least from the research conducted by Ogburn, scholars are interested in studying how and how quickly the culture embraces new technology. How much time do we need to develop norms and rules telling us how we should understand the new technology, how we should use it, and how we should be punished for not following the rules. Therefore no matter how extraordinary for us technologies associated with AI are, the type of problems they pose on the general level is not new. What is new is the possibility to use new technology to study its cultural embracing.

Social science offers a variety of ways of studying relations between technology and culture – the system presented in this paper is built around a take on this issue coming from political science, more specifically, the studies on public policies. On the one hand, our system can extract the significant characteristics of AI policies. On the other hand, it can grasp the dynamic behind the process of shaping these policies. It allows studying policy design (siddiki_2020) and policy process (Weible) at the same time. The system enables us to analyze which sets of AI experts influence policies towards AI and, at the same time, study the characteristics of these policies using Institutional Grammar (IG). There have been attempts on automated IG tagging using NLP (Rice2021), but code is not available.

1.3 Related work

Using text processing techniques to tackle political science issues has been in use for a while, but only recently has there been the adoption of modern NLP methods (glavas_computational_2019; hollibaugh_use_2019). A recent example of automated policy texts analysis is linder_text_2020, where information extraction methods were used to mine similarities between public policies texts.

There are also recent examples of combining network analysis and NLP in political science. Namely,  zaytsev_entity_2019

used named entity recognition and the Chinese Whispers algorithm in a quantitative approach to identifying actors’ coalitions in the influence of policymaking.

Studying dynamics of machine learning research has been catching interest lately (martinez-plumed_research_2021); however, there has been no record of quantitatively analyzing such dynamics between academic papers and public policies. The topic of the influence of research over policy has been studied using traditional methodologies (newman_policy_2016). There were attempts at analyzing the relationship between policymakers and academia in the context of XAI (krafft_defining; LANGER2021103473); however, the methodology of such studies never included analyzing documents produced both by academia and policymakers

1.4 IG as a novel approach to information extraction

The Institutional Grammar (IG) was created to solve the discussion regarding one of the crucial issues in social science – the nature of institution (ig_ostrom), more specifically, how institutions regulate human behavior. However, it is now used mainly as an analytical tool in policy design studies. This type of research is focused on "the purposeful, functional, and normative qualities of public policies" (siddiki_2020, p. 1), and it is especially concerned with "the content of policies and how this content is organized" (siddiki_2020; SchneiderIngram). Because the content of policies is expressed mainly through legal regulations, IG is mainly used to analyze legal regulations. The tool allows not only to develop research in political science, but it has also found applications in computer science  (Frantz2016). IG’s attractive feature seems to be its ability to transfer legal text into a computer-readable format.

IG has been developing since its creation by ig_ostrom. Its most current version – IG 2.0 – is presented in a codebook written in cooperation between political and computer scientists  (ig_codebook).

The basic unit of IG analysis is a statement. There are two types of statements: constitutive ones and regulative ones. Constitutive statements define crucial elements of a particular policy For the purpose of this Regulation, ‘provider’ means a legal person that develops an AI system where regulative statements provide information on which activities are allowed, forbidden, or obligatory in a particular policy setting  The European Data Protection Supervisor may impose administrative fines on Union institutions. Each type of statement could be parsed using proper IG components (see Table 1). In the case of our examples, they should be parsed as follows: For the purpose of this Regulation(AC), ‘provider’(E) means (F) a legal person that develops an AI system (P) and The European Data Protection Supervisor (A) may (D) impose (I) administrative fines (B).

Regulative statements Description Constitutive statements Description
Attribute (A) The addressee of the statement. Constituted Entity (E) The entity being defined.
Aim (I) The action of addressee regulated by the statement. Constitutive Function (F) A verb used to define Constituted Entity.
Deontic (D) An operator determining the level of discretion or constraint associated with Aim. Modal (M) An operator determining the level of necessity and possibility of defining Constituted Entity.
Object (B) The receiver of the action described by Aim Constituting Properties (P) The entity against which Constituted Entity is defined.
Activation Condition (AC) The setting to which the statements apply. Activation Condition (AC) The setting to which the statements apply.
Execution Constraint (EC) Quality of action described by Aim Execution Constraint (EC) Quality of Constitutive Function.
Table 1: IG main components depending on statement type (regulative or constitutive) based on (ig_codebook, pp. 10-11).

The scope of an IG implementation into research depends on its goals. Our study on AI regulations follows those analyses where only some IG components were identified in legal texts and examined (Heikkila2018).

2 The architecture of the MAIR framework

To automatically analyze AI regulations’ dynamics, we must first gather policy documents and academic papers, enrich them with relevant meta-information, and find interconnections between them.

The ideas described in sections 1.3 and 1.4 inspired the development of the MAIR (Monitoring of AI Regulations, strategies, and research papers) framework. The architecture of this framework is shown in Fig. 1. The framework is fed with documents retrieved from four sources: OECD AI Policy Observatory444OECD AI Policy Observatory website:, last download date: 19 Mar 2021., and NESTA AI Governance Database555NESTA AI Governance Database website:, last download date: 19 Mar 2021. with policy documents and arXiv (clement2019arxiv) as well as Semantic Scholar Research Corpus (S2ORC) (lo-wang-2020-s2orc) for research papers. These documents are usually available as pdf files and are scrapped with Beautiful Soup666Beautiful Soup library is available on PyPi:

System MAIR automatically detects some sections of texts, such as headers, and bibliography, to later extract citations and affiliations only from those parts of the text. We extract and collect metadata, such as authors, source websites, and others, for later processing along with the content of documents. Then we run series of information extraction processes – determine policy document function, extract deontic sentences along with Institutional Grammar attributes, determine authors and affiliations, find cross-citations between documents and other relevant data. All of those processes are described in detail in Sect. 3. All data gathering, processing, and extraction steps are managed by the DVC pipeline (ruslan-kuprieiev), which allows for easy update of all results in case of new data available.

The source code of the framework on the open GPL-3 license is available in the GitHub MAIR repository777GitHub MAIR repository:

In the framework, we use two corpora of articles from arXiv:

  • arXiv.AI that consists of all AI-related papers. These papers are identified based on the categories identified by authors (see Appendix C). Due to its volume, this corpus contains only metadata. Today there are 164,105 documents in this corpus. This resource is used to identify papers referenced in policy documents in Sect. 3.4.

  • arXiv.XAI that consists of a subset of the above related specifically to the domain of Explainable Artificial Intelligence and Interpretable Machine Learning, filtered by combinations of domain keywords (see Appendix D). It contains 742 papers with full texts along with metadata. Additionally, we extract a citation network by calling Semantic Scholar API.

Figure 1: The process of acquiring and enriching documents for the MAIR database. The first level indicated by cloud icons identifies data sources, different for regulations and strategies and different for research papers. Subsequent components with a white background refer to the technical processing of the retrieved documents. The grey background indicates system elements that enrich documents with additional annotations, create additional meta-data or links between documents. The enriched documents are then stored in three databases describing different types of extracted data.

3 Tools for knowledge extraction

This section describes NLP methods used to extract various characteristics from policy documents and academic papers. We extract document-wide qualities (document function, affiliations), links between documents, and a list of deontic sentences tagged with Institutional Grammar (IG) for every document. Scrapped metadata such as document issuing year are stored together with those extracted pieces of information and could be later used for the analysis of various aspects of dynamics.

3.1 Classification of document function

Policy documents from Nesta and OECD fall into several categories based on their function. However, the classification provided by the authors is ambiguous and inconsistent between the two sources. We develop a more systematized classification system that we use as metadata in further analysis. Specifically, we define categories, perform manual annotation and train an NLP model for the automatic categorization. For each document, we assign one of the following categories:

  1. Diagnosis – reports, and other documents describing the current state of AI;

  2. Principles – sets of ethical rules regarding AI;

  3. Strategies – documents describing actions that should be taken towards AI;

  4. Pre-regulations – proposals of legal regulations addressing AI;

  5. Regulations – legal regulations addressing AI;

  6. Body – documents establishing AI-related organizations.

In the manual labeling process, we achieve 77% agreement between two trained annotators and solve conflicts with a third annotator (the main author of this paper). To classify documents, we use a few-shot learning model based on Task-Aware Representation of Sentences

(halder_task-aware_2020), implemented in flairNLP (akbik2019flair). We achieve 80.8% accuracy on the holdout set. A detailed breakdown of accuracy per class is provided in Appendix B.

3.2 Extraction of Institutional Grammar (IG) tags

In our solution, we focused on extracting 4 IG attributes from texts – Attributes, Aims, Deontics, and Objects. Knowing that automatization of sentence tagging according to IG is not an easy task Rice2021, we simplified our approach to this analytical tool (Heikkila2018). First of all, we tagged only these sentences which have modal verbs, because, through these type of sentences rights, obligations, and restrictions are usually expressed. Secondly, we treated all selected sentences as regulative statements. By doing this, we lost information if activities associated with deontic describe essential functions of entities to which statements are addressed or only potential actions they are capable of performing. Thirdly, we did not want to analyze very complex sentence structures, but we focused only on main sentences. Therefore our implementation of IG was tailored to the specific needs of our research.

The algorithm is based on dependency trees. The first step, after initial preprocessing, is splitting texts into sentences,, and parsing with spacy dependency parser (spacy). Then, we locate sentences with deontics from a closed list. The algorithm uses dependency relationships to locate finds verb (Aim), then subjects (Attribute) or passive subjects (Object). If there is no direct subject, we search the tree for clausal subject888For definition of clausal subject, see subsentence, and extract the subject of such subsentence. Then, any additional ObjectS are identified. We recursively repeat such a procedure for every verb that is in conjunction with the parent verb of deontic. In the end, we add every subject conjugated with any of the previously found subjects (same for objects). If any of found subjects is a pronoun, we perform the additional step of coreference resolution to find the entity to which the pronoun is referring. For this, we used Neuralcoref Spacy extension999Github repository with neuralcoref code:, which implements the method presented in (clark_deep_2016). Every deontic is then mapped onto one of the 3 categories: shall, must or can, and every Object, Attribute, and Aim is lemmatized to simplify the further analysis. The details of the tagging algorithm are presented in the Appendix A.

Figure 2: Examples of dependency parsed deontic sentences. Arrows represent dependencies, arrow labels are dependency relations types, and bottom words are part-of-speech tags – all three are produced by the spacy parser and are used in the IG tagging algorithm. Colors are indicating IG tags assigned to words. In the first sentence, recognized Deontic (starting point for the algorithm) is "should" (which is translated in our nomenclature to "shall"). Our algorithm recognizes 3 Attributes ("designers", "builders", "manufacturers"), 1 Aim ("submit") and 2 Objects ("details", "documentation"). The second example is a passive sentence. The starting point for the algorithm is "must", there are two recognized Aims ("logged" and "retained") and 1 Object ("any decisions").

3.3 Extraction of authors’ affiliations

One of the angles of our analysis is to discover players that actively impact the discourse and shape the AI regulations. To do this, we decided to extract the affiliations of the authors of papers. Here, we use an arXiv XAI dataset with sources of 742 papers (described in Sect. 2).

Since arXiv collects metadata limited to the optional field and the submitter, the information is too sparse for any further application. For this reason, we extract it from the paper itself. Specifically, we choose to work on the LaTeX sources in a structured format. It is, however, not immediate to extract affiliations from this format, as multiple tags are enclosing this information. Additionally, there is no standard format for placing the affiliations in the paper, so they often mix with the author’s names or the exact addresses. As a simplification, we do not intend to link specific authors with their institutions but instead, find a set of affiliations for the article.

Overall our pipeline consists of four steps:

  1. Locate the rough position of the affiliation in the text. We do it based on a list of identified LaTeX tags. Therefore we avoid extractions of institutions referred to in the text which are not authors’ affiliations.

  2. Extract the names of the institutions.

  3. Match different names of one organization into one. For example, university name with/without the department

  4. Classify the organization as either academia or business.

For various steps, we explored several options, including SpaCy (spacy) Named Entity Recognition (NER) for extraction, utilizing the email domain as a proxy identifier of an institution for matching, and rule-based classification. We then used Named Entity Linking (NEL) (6823700) for matching affiliations names with the external database. Specifically, we use the tool called Babelfy (moro-etal-2014-entity)

which extracts entities and matches them against a DBpedia knowledge graph. Finally, we classify the affiliations based on their DBpedia entry tags. A demonstration of the affiliation extractor run on Fig. 


Figure 3: The process of affiliation extraction. In the Locate step, we use LaTeX tags to identify the potential location of the affiliations (left). Extract and Match stages performed with Named Entity Linking (NEL) result in matching with a corresponding DBpedia entry (middle). Classify step outputs type of affiliation based on DBpedia metadata (right).

3.4 Construction of citation network

We constructed a citation network coupling research papers (arXiv) and policy documents (OECD, NESTA) focusing on references in policy documents pointing to academic papers. Our data consists of 196 policy documents and the arXiv.AI dataset of 164,105 papers (described in detail in Sect. 2).

Policy documents do not contain any structured referencing format and are provided in the PDF format, which causes a lack of metadata about citations and prohibits us from using any of the existing tools assuming a consistent format of references. We apply techniques from the field of Information Extraction to tackle issues of document linking (sil-etal-2012-linking) using metadata such as title, author names (shoaib2020author) in a free text (essay73817). In this case, we pair each policy document with each paper and determine a match by either a paper’s arXiv id or a pair of (title, author) – a demonstration of the link extraction method is shown in Fig. 4.

Figure 4: In the policy documents (left), we mine references pointing to research papers (right). A link is identified if there is a match in the metadata. In this example, the top of two papers is matched by a combination of title and author last name.

As a result, we obtain a bipartite graph of 202 links – 37 policy documents citing 146 papers.

4 Analysis of the MAIR corpus of documents

Section 2 introduces the MAIR framework for collecting strategies, regulations, and research papers about XAI. Section 3 describes a set of techniques to enrich this corpus with additional meta-information extracted with advanced NLP tools. Such a corpus may serve many dedicated analyses related to the interdependencies between different stakeholders such as countries, IT companies, or academics. In this section, we present some example analyses of this corpus. It is not our aim to solve a specific research question but rather to show the versatility and usefulness of the developed resource.

The first example focuses on the temporal analysis of citations between scientific articles and their cross-connections with policy documents; the second brings to the front inter-dependencies among different players – in particular academia and industry. The third shows the use of deontic information to track differences in attitudes towards the human-AI relationship. Each of these examples describes an independent research problem.

4.1 XAI papers citation network

In the first case, we filter the arXiv.XAI network of 742 arXiv papers (described in Sect. 2) so that we take into account only nodes that cite (out connections) or are cited by (in-connections) at least one of the 742 papers. In result we restrict ourselves to a directed citation graph of nodes and edges and no correlations among nodes’ in- and out-degrees (Pearson’s ) consisting of one giant component with 505 nodes and 7 small clusters. The graph is shown in Fig. 5A with color-coded nodes reflecting affiliation type. However, due to rather high density, the picture does not bring any specific insights. On the other hand, the analysis of the proportion of incoming and outgoing links reveals that the majority of connections goes to papers characterized affiliations identified both from academia as well as industry (see Table 2). There is also a significant difference in the profile of in-coming and outgoing links, e.g., although industry-affiliated papers have a very similar number of in- and out-connections (85 vs 87), it is almost three times less likely that an industry paper cites an academia one than an opposite situation to occur. Nonetheless, this picture might be dimmed by the significant number of not categorized papers.

out \ in academia academia & industry industry none
academia 69 156 17 166 408
both 94 221 20 227 562
academia & industry 6 32 3 46 87
none 118 314 45 385 862
287 723 85 824 1919
Table 2: Breakdown of links in the citation graph : rows give the number of outgoing connections while columns represent incoming ones. "Academia & industry" means papers with affiliations from both academia and industry.

To find the most influential nodes in we have used igraph R package implementation of the Page Rank algorithm (igraph) – Fig. 5B presents 20 top-ranked nodes marked on in-degree vs publication time plot. As expected, in general, Page Rank (which, for a given node, is the higher the more highly Page Rank nodes are pointing towards it) promotes nodes representing earlier papers as they tend to accumulate citations reflected by the number of incoming links. In the following step, we have identified in 16 nodes that are being cited in total 23 times by 7 different OECD and NESTA policy documents – they are marked with orange circles in Fig. 5B, their size representing a number of obtained citations by different policy documents.

Figure 5: Analysis of XAI citation network . A) XAI citation network with colour-coded nodes reflecting affiliation type (see legend); B) node in-degree vs publication time (as a log-linear plot is being used we plot on the Y-axis); each black dot represent a single paper, 20 most influential papers according to Page Rank score are marked in green with numbers giving their rank, 16 papers cited by OECD and NESTA policy documents are marked with orange circles, their size being proportional to a number of such references; C) a box-plot of 10000 samples consisting of randomly (preserving time constraints imposed by citing policy documents) choosing nodes in and summing their Page Rank score, blue circle reflects he actual sum of Page Rank scores obtained for the cited papers.

As nearly half of the cited nodes are among the most influential ones, this allows for setting a hypothesis stating that policy documents tend to point to important scientific papers rather than selecting articles not fully recognized in the field. To test this hypothesis, we define as the sum of Page Rank score of 23 randomly selected nodes in keeping time constraints imposed by the publication date of policy documents (i.e., if a given policy document was published in 2020, we take into account only arXiv papers prior to that year). The results of 10000 repetitions are shown in Fig. 5C in the form o box-plot as compared to the actual sum of Page Rank scores obtained for the cited papers (blue circle in Fig. 5B), proving that the cited articles are, in fact, much more influential than a random set.

Figure 6: Aspects of XAI bibliographic coupling network. A) Explanation of the bibliographic coupling: two papers (here denoted as 1 and 5) cite, respectively, articles 2,3,4,8,9,10 and 7,6,8,9,10, only three papers out of 8 overlap, thus ; B) Histogram of the link weight in ; C) Size of the giant component normalized by the total number of nodes for a given threshold value ; D)–H) Bibliographic coupling networks for different values of threshold , respectively, 0.2, 0.25, 0.3, 0.35, 0.5. Node colors represent affiliation type as in Fig. 5A. The size of the node scales logarithmically with the Page Rank score obtained for the smaller set analysis, and the node shape informs if the paper has been cited by a document (rectangle) or not (circle); I) Percentage of homogeneous links academia–academia (blue) and academia & industry–academia & industry (violet) versus threshold .

4.2 XAI bibliographic coupling network

The graph described in the previous section represents actual links among arXiv papers. However, such a network is usually sparse and comes with a simple binary answer: either a paper is cited or not. To modify this approach we chose the so-called bibliographic coupling introduced by Kessler1963

which is simply the Jaccard index of out-neighborhoods

and of two papers and (Steinert2016), i.e.,


The idea of bibliographic coupling is depicted in Fig. 6A: in this way, we can take into account all information available in the references of each arXiv paper, unlike in the previous case. Additionally, we deal with a network where each link connecting nodes and is characterized with a weight reflecting similarity between two papers. The resulting bibliographic coupling undirected graph consists of nodes and edges (in fact there should be edges but we omit those carrying ) with weight distribution roughly following an exponential function (see Fig. 6B). If follows that we are now able to use the concept of the weight threshold (e.g., Chmiel2007) by keeping only such edges whose weights fulfill the condition , where is a threshold parameter and . Such a procedure simply transforms the weighted graph into a set of unweighted networks each constructed for a given parameter that are then subject to further analysis (Sienkiewicz2018). In particular, for some specific (critical) value of the network, initially percolated (i.e., constructed in such a way that is possible to arrive from any node to any other node ), breaks down into several small components. To track this phenomenon in a quantitative way for each we calculate the size of its giant component (i.e., the largest cluster in the network) and divide it by the relevant graph size . Figure 6C allows localizing the breakdown at roughly , which can be visualized by a set of graphs in Fig. 6D–H that not only reflect this process but also present other properties of the network such as affiliation (node color), importance (node size) or relation to policy documents (node shape). By increasing we bring to the front the strongest connections in the network (e.g., Fig. 6H), which tend to be in the majority homogeneous as seen in Fig. 6I where the share of academia and academia & industry in-links are plotted against . Contrary to that, homogeneous industry links are seldom observed and not likely to survive the introduction of high thresholds.

The citation network analysis presented in Sect. 4.1 suggests that key players in the extracted arXiv papers network are recognized as relevant from the policy documents perspective. On the other hand, when we turn to bibliographic coupling graphs, we can spot the persistence of homogeneous links among academia-like nodes that overtake the graph when the weakest connections are filtered out.

4.3 Deontic analysis

In this chapter, we present an example of a deontic analysis of documents from the MAIR database. We processed both legal documents and scientific papers so as to extract Attributes, Aims, Deontics, and Objects from individual sentences.

Panel A in Fig. 8 shows how often the analysis of legal documents and academic papers identified a particular word as an object according to institutional grammar. Although the global size of both corpora of text was comparable, we can find objects which were much more frequently identified in the case of scientific publications (model, method, explanation, agent, feature) as well as those which are much more frequently encountered in legal documents (sector, government, agency). A particularly interesting situation concerns the words driver and vehicle, which appear very often in regulations and other legal documents, much less frequently in scientific publications. This may suggest that the topic of autonomous cars has a much stronger impact on the imagination of policymakers since many legal documents are devoted to it. For the XAI research community, it is not a foreground topic.

Based on the frequency of occurrence, we identified eight objects that underwent further deontic analysis (agent, machine, human, ai, people, algorithm, user, system). For each object, we determined whether it is accompanied by a term from the can / shall / must group (sentences in which negations occurred were few in number and were excluded from this analysis). The normalized frequency of each deontic in relation to the object was then presented in panel B of Fig. 8. Normalizations were carried out separately for scientific articles and separately for legal documents. The normalization was intended to remove the effect of the different frequencies of deontics in each group of texts. The ternary plots show the relative frequency of each deontic with a given object for both scientific articles and legal texts. Interestingly, in the case of legal documents, the word "AI" occurred more frequently near the deontic "can", the word agent or human near the deontic "must", and the word user near the deontic "shall". Such a shallow analysis allows for orientation in the area of global attitudes towards specific objects. At the same time, we see that the same objects occur in other contexts in the case of scientific papers. The object "user" definitely occurs more frequently in the context of the deontic "can", as do "machine" and "agent". We can see that scientific articles more often emphasize capabilities than strategies. At the same time, we observe an opposite trend for the word AI, which in the case of scientific papers more often occurs with the deontic "shall".

Shallow global analysis of objects and deontics suggests what kinds of objects are interesting to analyze. Having selected interesting phrases, we can use word trees to show the context in which certain phrases occur. Figure 8 shows word trees for selected phrases for research papers and legal documents. This type of interactive data mining allows for the analysis of well-defined questions. But to identify interesting questions, it is useful to use institutional grammar.

Figure 7: Panel A describes the frequency of occurrence of each Object in research articles vs strategies. Only objects with more than 40 occurrences. Panel B presents for the selected eight objects the normalized context in which they are found in scientific articles (red dots) and strategies (blue dots). For the objects ’human’, ’machine’, ’agent’, we can see a shift from ’must’ in strategies to ’can’ in scientific articles. For the object ’user’, we have a shift along the dimension ’shall’.
Figure 8: Contexts of selected words in the analysed texts. Panels B and D show the context from scientific articles and panels A and C from strategies.
Figure 7: Panel A describes the frequency of occurrence of each Object in research articles vs strategies. Only objects with more than 40 occurrences. Panel B presents for the selected eight objects the normalized context in which they are found in scientific articles (red dots) and strategies (blue dots). For the objects ’human’, ’machine’, ’agent’, we can see a shift from ’must’ in strategies to ’can’ in scientific articles. For the object ’user’, we have a shift along the dimension ’shall’.

5 Conclusions and Discussion

The number of policy documents related to AI, like strategies and regulations, is growing rapidly. The number of research papers dedicated to interpretable and explainable AI is increasing at a higher rate. Literature related to the XAI field is divided into several polarised communities, ranging from advocates of solutions that explain any black-box model to researchers arguing that XAI should not be used for high-stake decisions. A different perspective is presented by researchers from companies offering ML products from researchers using these solutions and bearing responsibility for errors in their work. The number and variety of these documents make it almost impossible to keep track of them continuously. Yet understanding the relationship between available methods and regulators’ expectations is critical to implementing responsible AI.

This paper introduces a novel framework for the automated analysis of documents related to trustworthy AI. This system integrates a set of state-of-the-art solutions from the fields of natural language processing (NLP), institutional grammar (IG), and network analysis (NA). Each of these solutions is used to enrich raw text documents with relevant meta-information. In this work, we have also shown the collection of focused analyses that can be performed for enriched documents, allowing us to use both from deontic information, author’s affiliation, or graph of references information between documents.

As the interest in regulating XAI is increasing, it is essential to monitor how well reception and understanding of XAI by policymakers align with visions of XAI methods creators. In the future, such a system can be used to contribute to better cooperation between XAI researchers and AI policymakers. E.g. we will quickly assess if public policies are highly influenced by methods developed in papers written by specific opinion leaders.

5.1 Limitations and future research

Our system is now using a very simplified version of the Institutional Grammar tagger and shallow analysis of documents. In our future research, we would like to extend it by improving its ability for deep analysis of sentences with all their internal complexities. We also would like to be able to distinguish regulative statements from constitutive ones.

Another limitation of our system is the relevance of policy documents – we gathered them from databases that are manually updated. This limits our ability to draw conclusions from our analysis. To overcome these limitations, we should gather documents directly from relevant websites. What is more, to comprehensively analyze and understand the process of setting up regulations on XAI and AI, the system should also gather and process ethical guidelines on AI of private companies and even newspaper articles regarding XAI and AI. These documents are often cited in policy documents and influence the formulation of rules on new technologies.

Let us also mention that the analysis of both types of networks is directly affected by affiliations and references extraction methods. In effect, as can be seen in Table 2, affiliations of several nodes are labeled as “none”, which introduces high uncertainty to the analysis of link homogeneity seen in Fig. 6. Similarly, XAI papers’ references are limited to arXiv papers only, which can influence both weight distribution and a relation among nodes in . Future plans for the use of complex network analysis include identifying relation types among the papers based on the way they appear in the text (Catalini2015) and examining different types of nodes’ influence (Lu2016) and citation measures (Steinert2016).


Work on this project is financially supported by the NCN Sonata Bis-9 grant 2019/34/E/ST6/00052
We are grateful to Anna Wróblewska for helpful discussions, and to Hubert Baniecki, Tomasz Stanisławek, and Krzysztof Kowalczyk for providing feedback on an early version of this paper.


Appendix A Institutional Grammar tagging

Below we present the pseudocode of the IG tagging algorithm. The implementation is available in our GitHub repository.

while  do
     if  then Searching for clausal subject
     end if
     if  then
     end if
end while
for each  do
     if  then
     end if
     if  then
     end if
end for
for each  do
end for
return attributes, objects, aims
Algorithm 1 Extraction of IG tags from deontic sentence

Appendix B Document function classification evaluation details

The document classification model was trained on 596 and evaluated on 146 document titles. In Figure 9 we present an amount of misclassified items in the train set, divided by category.

Figure 9: Confusion matrix of automatic classification of documents functions of our model on test set.

Appendix C arXiv AI categories

List of categories of arXiv papers used in the arXiv.AI dataset. Note that some articles belong to multiple categories. Hence, the total is above the overall count. Access date: 08/05/2021.

Category name Category id Number of articles
Computer Vision and Pattern Recognition cs.CV 56,751
Machine Learning stat.ML 48,892
Artificial Intelligence cs.AI 33,214
Computation and Language cs.CL 26,610
Computers and Society cs.CY 10,218

Neural and Evolutionary Computing

cs.NE 9,486

Computer Science and Game Theory

cs.GT 7,458
Total 164,105
Table 3: List of articles categories along with corresponding number of articles in the arXiv.AI dataset.

Appendix D XAI keywords

List of keywords of arXiv papers used in the arXiv.XAI dataset. Matching is performed using arXiv API on titles, abstracts, and journal references. Note that some articles are matched with multiple keywords. Hence, the total is above the overall count. Access date: 25/03/2021.

Keyword Number of articles
Interpretable Machine Learning 458
Explainable Artificial Intelligence 446
Fairness 162
Transparency 91
Total 742
Table 4: List of keywords along with corresponding number of articles in the arXiv.XAI dataset.