A Distributed Trust Framework for Privacy-Preserving Machine Learning

06/03/2020 ∙ by Will Abramson, et al. ∙ Edinburgh Napier University 0

When training a machine learning model, it is standard procedure for the researcher to have full knowledge of both the data and model. However, this engenders a lack of trust between data owners and data scientists. Data owners are justifiably reluctant to relinquish control of private information to third parties. Privacy-preserving techniques distribute computation in order to ensure that data remains in the control of the owner while learning takes place. However, architectures distributed amongst multiple agents introduce an entirely new set of security and trust complications. These include data poisoning and model theft. This paper outlines a distributed infrastructure which is used to facilitate peer-to-peer trust between distributed agents; collaboratively performing a privacy-preserving workflow. Our outlined prototype sets industry gatekeepers and governance bodies as credential issuers. Before participating in the distributed learning workflow, malicious actors must first negotiate valid credentials. We detail a proof of concept using Hyperledger Aries, Decentralised Identifiers (DIDs) and Verifiable Credentials (VCs) to establish a distributed trust architecture during a privacy-preserving machine learning experiment. Specifically, we utilise secure and authenticated DID communication channels in order to facilitate a federated learning workflow related to mental health care data.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Machine learning (ML) is a powerful tool for extrapolating knowledge from complex data-sets. However, it can also represent several security risks concerning the data involved and how that model will be deployed [36]. An organisation providing ML capabilities needs data to train, test and validate their algorithm. However, data owners tend to be wary of sharing data with third-party processors. This is due to the fact that once data is supplied, it is almost impossible to ensure that it will be used solely for the purposes which were originally intended. This lack of trust between data owners and processors is currently an impediment to the advances which can be achieved through the utilisation of big data techniques. This is particularly evident with private medical data, where competent clinical decision support systems can augment clinician-to-patient time efficiencies [20, 1]. In order to overcome this obstacle, new distributed and privacy-preserving ML infrastructures have been developed where the data no longer needs to be shared or even known to the Researcher in order to be learned upon. [41].

In a distributed environment of agents, establishing trust between these agents is crucial. Privacy-preserving methodologies are only successful if all parties participate in earnest. If we introduce a malicious Researcher, they may send a Trojan model which, instead of training, could store a carbon copy of the private data. Conversely, if we introduce a malicious actor in place of a data owner, they may be able to steal a copy of the model or poison it with bad data. In cases of model poisoning, malicious data is used to train a model in order to introduce a bias which supports some malicious motive. Once poisoned, maliciously trained models can be challenging to detect. The bias introduced by the malicious data has already been diffused into the model parameters. Once this has occurred, it is a non-trivial task to de-parrallelise this information.

If one cannot ensure trust between agents participating in a Federated Learning (FL) workflow, it opens the workflow up to malicious agents who may subvert its integrity through the exploitation of resources such as the data or the ML model used. In this work, we show how recent advances in digital identity technology can be utilised to define a trust framework for specific application domains. This is applied to FL in a healthcare scenario. This reduces the risk of malicious agents subverting the FL ML workflow. Specifically, the paper leverages: Decentralized Identifiers (DIDs) [39]; Verifiable Credentials (VCs) [44]; and DID Communication, [24]. Together, these allow entities to establish a secure, asynchronous digital connection between themselves. Trust is established across these connections through the mutual authentication of digitally signed attestations from trusted entities. The authentication mechanisms in this paper can be applied to any data collection, data processing or regulatory workflow.

This paper contributes the following:

  • We improve upon the current state-of-the-art with respect to the integration of authentication techniques in privacy-preserving workflows.

  • We enable stakeholders in the learning process to define and enforce a trust model for their domain through the utilisation of DID mechanisms.

  • We apply a novel use of DIDs and VCs in order to perform mutual authentication for FL.

  • We provide a threat model which can be used to quantify the threats faced by our infrastructure, namely vanilla FL.

  • We specify a peer-to-peer architecture which can be used as an alternative to centralised trust architectures such as certificate authorities, and apply it within a health care trust infrastructure.

Section 2 provides the background knowledge and describes the related literature. Furthermore, Section 3 outlines our implementation overview, followed by Section 4, where the threat model of our infrastructure is set, alongside with its scope. In Section 5, we provide an evaluation of our system, and conclude with Section 6 that draws the conclusions and outlines approaches for future work.

2 Background & Related Work

ML is revolutionising how we deal with data. This is catalysed by hallmark innovations such as AlphaGo [27]. Attention has turned to attractive domains, such as healthcare [49], self-driving cars and smart city planning [25]

. Ernst and Young estimate that NHS data is worth £9.6 Billion a year

[43]

. While this burgeoning application of data science has scope to benefit society, there are also emerging trust issues. The data-sets required to train these models are often highly sensitive, either containing personal data - such as data protected under the GDPR in the EU

[46] - or include business-critical information.

Additionally developing and understanding ML models is often a highly specialised skill. This generally means that two or more separate parties must collaborate to train an ML model. One side might have the expertise to develop a useful model, and the other around the data which they want to train the model, to solve a business problem.

2.1 Trust and the Data Industry

Trust is a complicated concept that is both domain and context-specific. Trust is directional and asymmetric, reflecting that between two parties, the trust is independent for each party [51]. Generally, trust can be defined as the willingness for one party to give control over something to another party, based on the belief that they will act in the interest of the former party [26]. In economic terms, it is often thought of as a calculation of risk, with the understanding that risk can never be fully eliminated, just mitigated through mutual trust between parties [32]. The issue of trust is ever-present in the healthcare industry. Healthcare institutions collect vast amounts of personal medical information from patients in the process of their duties. This information can in turn be used to train an ML model. This could benefit society by enhancing the ability of clinicians to diagnose certain diseases.

DeepMind brought the debate around providing access to highly sensitive, and public, data-sets to private companies into the public sphere when they collaborated with the Royal Free London NHS Trust in 2015. This work outlined ’Streams’, an application for the early detection of kidney failure [38]. However, the project raised concerns surrounding privacy and trust. DeepMind received patient records from the Trust under a legal contract dictating how this data could be used. Later this was criticised as being vague and found to be illegal by the Information Commissioner’s Office [14]. Furthermore, DeepMind did not apply for regulatory approval through the research authorisation process to the Health Research Authority - a necessary step if they were to do any ML on the data. The team working on Streams has now joined Google, raising further concerns about the linkage of personal health data with Google’s other records [28].

While there was significant push back against the DeepMind/Royal Free collaboration, this has not prevented other research collaborations. This includes the automated analysis of retinal images [12] and the segmentation of neck and head tumour volumes [9]. In both these scenarios, the appropriate authorisation from the Health Research Authority was obtained, and the usage of the data transferred was clearly defined and tightly constrained.

2.2 Decentralised Identifiers (DIDs)

DIDs are tools which can be used to manage trust in a distributed or privacy-preserving environment. DIDs represent a new type of digital identifier currently being standardised in a World Wide Web Consortium (W3C) working group [39]. A DID persistently identifies a single entity that can self-authenticate as being in control of the identifier. This is different from other identifiers which rely on a trusted third party to attest to their control of an identifier. DIDs are typically stored on a decentralised storage system such as a distributed ledger, so, unlike other identifiers such as an email address, DIDs are under the sole control of the identity owner

Any specific DID scheme that implements the DID specification must be resolvable to its respective document using the DID method defined by the scheme. Many different implementations of the DID specification exist which utilise different storage solutions. These include Ethereum, Sovrin, Bitcoin and IPFS; each with their own DID method for resolving DIDs specific to their system [47].

The goal of the DID specification is thus to ensure interoperability across these different DID schemes such that, it is possible to understand how to resolve and interpret a DID no matter where the specific implementation originates from. However, not all DIDs need to be stored on a ledger; in fact, there are situations where doing so could compromise the privacy of an individual and breach data protection laws, such as with GDPR. Peer DIDs are one such implementation of the DID specification that does not require a storage system, and in this implementation DIDs and DID documents are generated by entities who then share them when establishing peer-to-peer connections. Each peer stores and maintains a record of the other peer’s DID and DID Document [23].

2.3 DID Communication (DIDComm)

DIDComm [24] is an asynchronous encrypted communication protocol that has been developed as part of the Hyperledger Aries project [29]. The protocol uses information within the DID Document, particularly the parties’ public key and their endpoint for receiving messages, to send information with verifiable authenticity and integrity. The DIDComm protocol has now moved into a standards-track run by the Decentralized Identity Foundation [45].

As defined in Algorithm 1, Alice first encrypts and signs a plaintext message for Bob. She then sends the signature and encrypted message to Bob’s endpoint. Once the transaction has been received Bob can verify the integrity of the message, decrypt it and read the plaintext. All the information required for this interaction is contained within Bob and Alice’s DID Documents. Examples of public-key encryption protocols include ElGamal [16], RSA [40] and elliptic curve based [50]. Using this protocol both Bob and Alice are able to communicate securely and privately, over independent channels and verify the authenticity and integrity of the messages they receive.

1:  Alice has a private key and a DID Document for Bob containing an endpoint () and a public key ().
2:  Bob has a private key () and a DID Document for Alice containing her public key ().
3:  Alice encrypts plaintext message () using and creates an encrypted message ().
4:  Alice signs using her private key () and creates a signature ().
5:  Alice sends to .
6:  Bob receives the message from Alice at .
7:  Bob verifies using Alice’s public key
8:  if Verify then
9:     Bob decrypts using .
10:     Bob reads the plaintext message () sent by Alice
11:  end if
Algorithm 1 DID Communication Between Alice and Bob

2.4 Verifiable Credentials (VCs)

The Verifiable Credential Data Model specification became a W3C recommended standard in November 2019 [44]. It defines a data model for a verifiable set of tamper-proof claims that is used by three roles; Issuer, Holder and Verifier as it can be seen in Figure 1. A verifiable data registry, typically, a distributed ledger, is used to store the credential schemes, the DIDs, and DID documents of Issuers.

Figure 1: Verifiable Credential Roles [44]

When issuing a credential, an Issuer creates a signature on a set of attributes for a given schema using a private key associated with their public DID through a DID document. The specification defines three signature schemes which are valid to use when issuing a credentials; JSON Web Signatures [31], Linked Data Signatures [35] and Camenisch-Lysyanskaya (CL) Signatures [7]. This paper focuses on the Hyperledger stack, which uses CL Signatures. In these signatures, a blinded link secret (a large private number contributed by the entity receiving the credential) is included in the attributes of the credential. This enables the credential to be tied to a particular entity without the Issuer needing to know the secret value.

When verifying the proof of a credential from a Holder, the Verifier needs to check a number of aspects:

  1. The DID of the Issuer can be resolved on the public ledger to a DID document. This document should contain a public key which can be used to verify the integrity of the credential.

  2. The entity presenting the credential knows the link secret that was blindly signed by the Issuer. The Holder creates a zero-knowledge proof attesting to this.

  3. That the issuing DID had the authority to issue this kind of credential. The signature alone only proves integrity, but if the Verifier accepts credentials from any Issuers, it would be easy to obtain fraudulent credentials — anyone with a public DID could issue one. In a production system at-scale, this might be done through a registry, supported by a governance framework — a legal document outlining the operating parameters of the ecosystem [11].

  4. The Issuer has not revoked the presented credential. This is done by checking that the hash of the credential is not present within a revocation registry (a cryptographic accumulator [2]) stored on the public ledger.

  5. Finally, the Verifier needs to check that the attributes in the valid credential meet the criteria for authorisation in the system. An often used example is that the attribute in your valid passport credential is over a certain age.

All the communications between either the Issuer and the Holder, or the Holder and the Verifier are done peer-to-peer using DIDComm. It is important to note that the Issuer and the Verifier never need to communicate.

2.5 Federated Machine-Learning (FL)

In a centralised machine learning scenario, data is sent to the Researcher, instead in a FL setup the model is being sent to each data participant. The FL method has many variations, such as Vanilla, Trusted Model Aggregator, and Secure Multi-party Aggregation [6]. However, at a high level, the Researcher copies one atomic model and distributes it to multiple hosts who have the data. The hosts train their respective models and then send the trained models back to the Researcher. This technique facilitates training on a large corpus of federated data [5] [13]. These hosts then train models and aggregate these model updates into the final model. In the case of Vanilla FL, this is the extent of the protocol. However, we can extend this with a secure aggregator, a middle man in between the Researcher and the hosts, which averages participant models before they reach the Researcher. To further improve security, this can be extended using Secure Multiparty Aggregation to average models whilst they have been encrypted into multiple shares [41]. The Researcher thus never sees the data directly, and only aggregates model gradients at the end [10]. However, this requires high network bandwidth and is vulnerable to invalidated input attacks [3], where an attacker might want to create a bias toward a particular classification type for a particular set of input data.

3 Implementation Overview

Our work performs a basic FL example between Hyperledger Aries agents to validate whether distributed ML could take place over the DIDComm transport protocol. A number of Docker containers representing entities in a health care trust model were developed; creating a simple ecosystem of learning participants and trust providers (Figure 2

). For each Docker container running a Hyperledger Aries agent, we used the open-source Python Aries Cloud Agent developed by the Government of British Columbia

[30]. Hospital containers are initialised with the private data that is used to train the model.

Figure 2: ML Healthcare Trust Model

3.1 Establishing Trust

We define a domain-specific trust architecture using verifiable credentials issued by trusted parties for a healthcare use case. This includes the following agent types: a) NHS Trust (Hospital Credential Issuer); b) Regulator (Researcher Credential Issuer); c) Hospital (Data Provider); and d) Researcher (ML Coordinator).

This is used to facilitate the authorisation of training participants (verifiable Hospitals) and a Researcher-Coordinator. A data scientist who would like to train a model is given credentials by an industry watchdog, who in a real-world scenario could audit the model and research purpose. In the United Kingdom, for example, the Health Research Authority is well placed to fulfil this role. Meanwhile, Hospitals in possession of private health data are issued with credentials by an NHS authority enabling them to prove they are a real Hospital. The credential schema and the DIDs of credential Issuers are written to a public ledger — we used the development ledger provided by the government of British Columbia [19] for this work.

The system is established following the steps described in Algorithm 2. Once both the Researcher-Coordinator and the Hospital agents have been authenticated, the communication of the model parameters for FL can take place across this secure trusted channel.

1:  Researcher-Coordinator agent exchanges DIDs with the Regulator agent to establish a DIDComm channel.
2:  Regulator offers an Audited Researcher-Coordinator credential over this channel.
3:  Researcher-Coordinator accepts and stores the credential in their wallet.
4:  for Each Hospital agent do
5:     Initiate DID Exchange with NHS Trust agent to establish DIDComm channel.
6:     NHS Trust offers Verified Hospital credentials over DIDComm.
7:     Hospital accepts and stores the credential.
8:  end for
9:  for Each Hospital agent do
10:     Hospital initiates DID Exchange with Researcher-Coordinator to establish DIDComm channel.
11:     Researcher-Coordinator requests proof of Verified Hospital credential issued and signed by the NHS Trust.
12:     Hospital generates a valid proof from their Verified Hospital credential and responds to the Researcher-Coordinator.
13:     Researcher-Coordinator verifies the proof by first checking the DID against the known DID they have stored for the NHS Trust, then resolving the DID to locate the keys and verify the signature.
14:     if Hospital can prove they have a valid Verified Hospital credential then
15:        Researcher-Coordinator adds the connection identifier to their list of Trusted Connections.
16:     end if
17:     Hospital requests proof of Audited Researcher-Coordinator credential from the Researcher-Coordinator.
18:     Researcher-Coordinator uses Audited Researcher-Coordinator credential to generate a valid proof and responds.
19:     Hospital verifies the proof, by checking the signature and DID of the Issuer.
20:     if Researcher-Coordinator produces a valid proof of Audited Researcher-Coordinator then
21:        Hospital saves connection identifier as a trusted connection.
22:     end if
23:  end for
Algorithm 2 Establishing Trusted Connections

3.2 Vanilla Federated Learning

This paper implements Federated Learning in its most basic form; where plain-text models are moved sequentially between agents. The Researcher-Coordinator entity begins with a model and a data-set to validate the initial performance. We train the model using sample public mental health data which is pre-processed into use-able training data. It is our intention to demonstrate that privacy-preserving ML workflows can be facilitated using this trust framework. Thus, the content of learning is not the focus of our work. We also provide performance results relating to the accuracy and resource-requirements of our system. We refer to our chosen workflow as Vanilla FL, and this is seen in Algorithm 3. In order to implement Vanilla FL, the original data-set was split into four partitions, three training-sets and one validation-set.

1:  Researcher-Coordinator has validation data and a model, Hospitals have training data.
2:  while Hospitals have unseen training data do
3:     Researcher-Coordinator benchmarks model performance against validation data and sends model to the next Hospital.
4:     This Hospital trains the model with their data and then sends the resulting model back to the Researcher-Coordinator.
5:  end while
6:  Researcher-Coordinator benchmarks the final model against validation data.
Algorithm 3 Vanilla Federated Learning

This amalgamation of Aries and FL allowed us to mitigate some existing constraints centralised ML produced by a lack of trust between training participants. Specifically, these were: 1) Malicious data being provided by a false Hospital to spoil model accuracy on future cases; and 2) Malicious models being sent to Hospitals with the intention of later inverting them to leak information around training data values.

4 Threat Model

Since no data changes hands, FL is more private than traditional, centralised ML. However, some issues still exist with this approach. Vanilla FL is vulnerable to model stealing by ML data contributors who can store a copy of the Researcher’s model after training it. In cases where the model represents private intellectual-property (IP), this setup is not ideal. On the other hand, with the knowledge of the model before and after training on each private data-set, the Researcher could infer the data used to train the model at each iteration [42]. Model inversion attacks [18, 17] are also possible where, given carefully crafted input features and an infinite number of queries to the model, the Researcher could reverse engineer training values.

Vanilla FL is also potentially vulnerable to model poisoning and Trojan-backdoor attacks [3, 4, 34]. If data providers are malicious, it is possible to replace the original model with a malicious one and then send it to the Researcher. This malicious model could contain some backdoors, where the model will behave normally and react maliciously only to given trigger inputs. Unlike data poisoning attacks, model poisoning attacks remain hidden. They are more successful and easier to execute. Even if only one participant is malicious, the model’s output will behave maliciously according to the injected poison. For the attacker to succeed, there is no need to access the training of the model; it is enough to retrain the original model with the new poisoned data.

For the mitigation of the attacks mentioned above, our system implements a domain-specific trust framework using verifiable credentials. In this way, only verified participants get issued with credentials which they use to prove they are a trusted member of the learning process to the other entity across a secure channel. This framework does not prevent the types of attacks discussed from occurring, but, by modelling trust, it does reduce the risk that they will happen. Malicious entities could thus be checked on registration, or are removed from the trust infrastructure on bad behaviour.

Another threat to consider is the possibility of the agent servers or APIs getting hacked. Either the trusted Issuers could get compromised and issue credentials to entities that are malicious, or entities with valid credentials within the system could become corrupted. Both scenarios lead to a malicious participant having control of a valid verifiable credential for the system. This type of attack is a threat; however, it is outside the scope of this work. Standard cybersecurity procedures should be in-place within these systems that make successful security breaches unlikely. OWASP provides guidelines and secure practices to mitigate these traditional cybersecurity threats [37]. The defensive mechanisms are not limited to these and can be expanded using Intrusion Detection and Prevention Systems (IDPS) [21].

5 Evaluation

To evaluate the prototype, malicious agents were created to attempt to take part in the ML process by connecting to one of the trusted Hyperledger Aries agents. Any agent without the appropriate credentials, either a Verified Hospital or Audited Researcher-Coordinator credential, was unable to form authenticated channels with the trusted parties (Figure 2). These connections and requests to initiate learning or contribute to training the model were rejected. Unauthorised entities were able to create self-signed credentials, but these credentials were rejected. This is because they had not been signed by an authorised and trusted authority whose DID was known by the entity requesting the proof.

The mechanism of using credentials to form mutually verifiable connections proves useful for ensuring only trusted entities can participate in a distributed ML environment. We note that this method is generic and can be adapted to the needs of any domain and context. Verifiable credentials enable ecosystems to specify meaning in a way that digital agents participating within that ecosystem can understand. We expect to see them used increasingly to define flexible, domain-specific trust. The scenario we created was used to highlight the potential of this combination. For these trust architectures to fit their intended ecosystems equitably, it is imperative to involve all key stakeholders in their design.

Our work is focused on the application of a DID based infrastructure in a Federated Learning scenario. It is assumed that there is a pre-defined, governance-oriented trust model implemented such that key stakeholders have a DID written to an integrity assured ledger. The discovery of appropriate DIDs, and willing participants, either valid Researchers-Coordinators or Hospitals, related to a specific ecosystem is out-of-scope of our paper. This paper focuses on exploring how peer DID connections, once formed, facilitate participation in the established ecosystem. A further system can be developed for the secure distribution of the DIDs between the agents that are willing to participate.

Furthermore, performance metrics for each host were recorded during the running of our workflow. In Figure 3 a), we see the CPU usage of each agent involved in the learning workflow. The CPU usage of the Researcher-Coordinator raises each time it sends the model to the Hospitals, and CPU usage of the Hospitals raises when they train the model with their private data. This result is consistent with what is expected given Algorithm 3 runs successfully. The memory and network bandwidths follow a similar pattern, as it can be seen in Figure 3 b), Figure 3 c) and Figure 3 d). The main difference is that since the Researcher-Coordinator is averaging and validating each model against the training dataset every time, in turn, the memory and network bandwidth raises over time. From these results we can conclude that running federated learning in this way is compute heavy on the side of the Hospitals but more bandwidth and slightly more memory intensive on the side of the Researcher-Coordinator.

(a) CPU Usage (%) during workflow
(b) Memory Usage (%) during workflow
(c) Network Input (kB) during workflow
(d) Network Output (kB) during workflow
Figure 3: CPU, Memory Usage and Network Utilization of Docker container Agents during workflow

The aim of this research is to demonstrate that a decentralised trust framework could be used to perform a privacy-preserving workflow. The authors train a dummy model on some basic example data. The intention here is merely to demonstrate that this is possible using our trust framework. We give the confusion matrix of the model tested on the Researcher-Coordinator’s validation data after each federated training batch. This demonstrates that our model was successfully adjusted at each stage of training upon our federated, mental health dataset. The model develops a bias toward false-positives and tends to get less TNs as each batch continues. However, this may be due to the distribution of each data batch. Other than this, the learning over each batch tends to maximise true-positives. This can be observed in Table

1.

Batch 0 1 2 3
True Positives 0 109 120 134
False Positives 0 30 37 41
True Negatives 114 84 77 73
False Negatives 144 35 24 10
Table 1: Classifier’s Accuracy Over Batches

6 Conclusion & Future Work

This paper combines two fields of research, privacy-preserving ML and decentralised identity. Both have similar visions for a more trusted citizen-focused and privacy-respecting society. In this research, we show how developing a trust framework based on Decentralised Identifiers and Verifiable Credentials for ML scenarios that involve sensitive data can enable an increased trust between parties, while also reducing the liability of organisations with data.

It is possible to use these secure channels to obtain a digitally signed contract for training, or to manage pointer communications on remote data. While Vanilla FL is vulnerable to some attacks as described in Section 4, the purpose of this work was to develop a proof of concept showing that domain-specific trust can be achieved over the same communication channels used for distributed ML. Future work includes integrating the Aries communication protocols, which enables the trust model demonstrated here, into an existing framework for facilitating distributed learning, such as PyGrid, the networking component of OpenMined [41]. This will allow us and others to apply the trust framework to a far wider range of privacy-preserving workflows.

This will also allow us to enforce trust, mitigating model inversion attacks using differentially private training mechanisms [15]. Multiple techniques can be implemented for training a differentially private model; such as PyVacy [48] and LATENT [8]. To minimise the threat of model stealing and training data inference, Secure Multiparty Computation (SMC) [33] can be leveraged to split data and model parameters into shares. SMC allows both gradients and parameters to be computed and updated in a decentralised fashion while encrypted. In this case, custody of each data item is split into shares to be held by relevant participating entities.

In our experiments, we utilised the Hyperledger Aries messaging functionality to convert the ML model into text and to be able to send it to the participating entities. In future work, we will going to focus on expanding the messaging functionality with a separate structure for ML communication. We also hope to evaluate the type of trust that can be placed in these messages, exploring the Message Trust Context object suggested in a Hyperledger Aries RFC [22].

In this work, we address the issue of trust within the data industry. This radically decentralised trust infrastructure allows individuals to organise themselves and collaboratively learn from one another without any central authority figure. This breaks new ground by combining privacy-preserving ML techniques with a decentralised trust architecture.

References

  • [1] O. F. Ahmad, D. Stoyanov, and L. B. Lovat (2019)

    Barriers and pitfalls for artificial intelligence in gastroenterology: ethical and regulatory issues

    .
    Techniques in Gastrointestinal Endoscopy, pp. 150636. Cited by: §1.
  • [2] M. H. Au, P. P. Tsang, W. Susilo, and Y. Mu (2009) Dynamic Universal Accumulators for DDH Groups and Their Application to Attribute-Based Anonymous Credential Systems. In Topics in Cryptology – CT-RSA 2009, M. Fischlin (Ed.), Vol. 5473, pp. 295–308. External Links: Link Cited by: item 4.
  • [3] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov (2018) How to backdoor federated learning. arXiv preprint arXiv:1807.00459. Cited by: §2.5, §4.
  • [4] A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo (2018) Analyzing federated learning through an adversarial lens. arXiv preprint arXiv:1811.12470. Cited by: §4.
  • [5] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konečný, S. Mazzocchi, H. B. McMahan, T. Van Overveldt, D. Petrou, D. Ramage, and J. Roselander (2019) Towards Federated Learning at Scale: System Design. External Links: 1902.01046, Link Cited by: §2.5.
  • [6] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth (2016) Practical secure aggregation for federated learning on user-held data. CoRR abs/1611.04482. External Links: Link, 1611.04482 Cited by: §2.5.
  • [7] J. Camenisch and A. Lysyanskaya (2003) A Signature Scheme with Efficient Protocols. In Security in Communication Networks, G. Goos, J. Hartmanis, J. van Leeuwen, S. Cimato, G. Persiano, and C. Galdi (Eds.), Vol. 2576, pp. 268–289 (en). External Links: ISBN 978-3-540-00420-2 978-3-540-36413-9, Link, Document Cited by: §2.4.
  • [8] M. Chamikara, P. Bertok, I. Khalil, D. Liu, and S. Camtepe (2019)

    Local differential privacy for deep learning

    .
    arXiv preprint arXiv:1908.02997. Cited by: §6.
  • [9] C. Chu, J. De Fauw, N. Tomasev, B. R. Paredes, C. Hughes, J. Ledsam, T. Back, H. Montgomery, G. Rees, R. Raine, et al. (2016) Applying machine learning to automated segmentation of head and neck tumour volumes and organs at risk on radiotherapy planning ct and mri scans. F1000Research 5. Cited by: §2.1.
  • [10] D. Das, S. Avancha, D. Mudigere, K. Vaidynathan, S. Sridharan, D. Kalamkar, B. Kaul, and P. Dubey (2016)

    Distributed deep learning using synchronous stochastic gradient descent

    .
    arXiv preprint arXiv:1602.06709. Cited by: §2.5.
  • [11] M. Davie, D. Gisolfi, D. Hardman, J. Jordan, D. O’Donnell, and D. Reed (2019-10) The trust over ip stack. RFC Technical Report 289, Aries, Hyperledger. Note: Github Requests for Comments External Links: Link Cited by: item 3.
  • [12] J. De Fauw, P. Keane, N. Tomasev, D. Visentin, G. van den Driessche, M. Johnson, C. O. Hughes, C. Chu, J. Ledsam, T. Back, et al. (2016)

    Automated analysis of retinal imaging using machine learning techniques for computer vision

    .
    F1000Research 5. Cited by: §2.1.
  • [13] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al. (2012) Large scale distributed deep networks. In Advances in neural information processing systems, pp. 1223–1231. Cited by: §2.5.
  • [14] E. Denham (2017) Royal free - google deepmind trial failed to comply with data protection law. Technical report Information Commisioner Office. External Links: Link Cited by: §2.1.
  • [15] C. Dwork (2011) Differential privacy. Encyclopedia of Cryptography and Security, pp. 338–340. Cited by: §6.
  • [16] T. ElGamal (1985) A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE transactions on information theory 31 (4), pp. 469–472. Cited by: §2.3.
  • [17] M. Fredrikson, S. Jha, and T. Ristenpart (2015) Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 1322–1333. Cited by: §4.
  • [18] M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, and T. Ristenpart (2014) Privacy in pharmacogenetics: an end-to-end case study of personalized warfarin dosing. In 23rd USENIX Security Symposium USENIX Security 14), pp. 17–32. Cited by: §4.
  • [19] Government of British Columbia (2018) British Columbia’s Verifiable Organizations. External Links: Link Cited by: §3.1.
  • [20] A. J. Hall, A. Hussain, and M. G. Shaikh (2016) Predicting insulin resistance in children using a machine-learning-based clinical decision support system. In International Conference on Brain Inspired Cognitive Systems, pp. 274–283. Cited by: §1.
  • [21] P. Hall (2019) Proposals for model vulnerability and security. OŔeilly Artificial Intelligence Conference in London, October 14-17, 2019. External Links: Link Cited by: §4.
  • [22] D. Hardman (2019-05) Message trust contexts. RFC Technical Report 29, Aries, Hyperledger. Note: Github Requests for Comments External Links: Link Cited by: §6.
  • [23] D. Hardman (2019) Peer did method specification. Technical report External Links: Link Cited by: §2.2.
  • [24] D. Hardman (2019-01) DID communication. RFC, Aries, Hyperledger. Note: Github Requests for Comments External Links: Link Cited by: §1, §2.3.
  • [25] I. A. T. Hashem, V. Chang, N. B. Anuar, K. Adewole, I. Yaqoob, A. Gani, E. Ahmed, and H. Chiroma (2016) The role of big data in smart city. International Journal of Information Management 36 (5), pp. 748–758. Cited by: §2.
  • [26] A. M. Hoffman (2002) A conceptualization of trust in international relations. European Journal of International Relations 8 (3), pp. 375–401. Cited by: §2.1.
  • [27] S. D. Holcomb, W. K. Porter, S. V. Ault, G. Mao, and J. Wang (2018) Overview on deepmind and its alphago zero ai. In Proceedings of the 2018 international conference on big data and education, pp. 67–71. Cited by: §2.
  • [28] O. Hughes (2018-11) Royal free: ‘no changes to data-sharing’ as google absorbs streams. External Links: Link Cited by: §2.1.
  • [29] Hyperledger Hyperledger aries. External Links: Link Cited by: §2.3.
  • [30] Hyperledger (2019) Hyperledger aries cloud agent - python. External Links: Link Cited by: §3.
  • [31] M. Jones, J. Bradley, and N. Sakimura (2015-05) JSON web signatures. RFC Internet Engineering Task Force. External Links: Link Cited by: §2.4.
  • [32] E. Keymolen (2016) Trust on the line: a philosophycal exploration of trust in the networked era. Cited by: §2.1.
  • [33] Y. Lindell (2005) Secure multiparty computation for privacy preserving data mining. In Encyclopedia of Data Warehousing and Mining, pp. 1005–1009. Cited by: §6.
  • [34] Y. Liu, S. Ma, Y. Aafer, W. Lee, J. Zhai, W. Wang, and X. Zhang (2017)

    Trojaning attack on neural networks

    .
    Purdue University Libraries e-Pubs. Cited by: §4.
  • [35] D. Longley, M. Sporny, and C. Allen (2019) Linked data signatures 1.0. Technical report W3C. External Links: Link Cited by: §2.4.
  • [36] L. Muñoz-González, B. Biggio, A. Demontis, A. Paudice, V. Wongrassamee, E. C. Lupu, and F. Roli (2017) Towards poisoning of deep learning algorithms with back-gradient optimization. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pp. 27–38. Cited by: §1.
  • [37] OWASP (2018) TOP 10 2017. The Ten Most Critical Web Application Security Risks. Release Candidate 2. Cited by: §4.
  • [38] J. Powles and H. Hodson (2017) Google deepmind and healthcare in an age of algorithms. Health and technology 7 (4), pp. 351–367. Cited by: §2.1.
  • [39] D. Reed, M. Sporny, D. Longely, C. Allen, M. Sabadello, and R. Grant (2020-01) Decentralized identifiers (dids) v1.0. W3C. External Links: Link Cited by: §1, §2.2.
  • [40] R. L. Rivest, A. Shamir, and L. Adleman (1978) A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM 21 (2), pp. 120–126. Cited by: §2.3.
  • [41] T. Ryffel, A. Trask, M. Dahl, B. Wagner, J. Mancuso, D. Rueckert, and J. Passerat-Palmbach (2018) A generic framework for privacy preserving deep learning. pp. 1–5. External Links: Document, 1811.04017, Link Cited by: §1, §2.5, §6.
  • [42] R. Shokri, M. Stronati, C. Song, and V. Shmatikov (2017) Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pp. 3–18. Cited by: §4.
  • [43] P. Spence (2019) How we can place a value on health care data. External Links: Link Cited by: §2.
  • [44] M. Sporny, D. Longely, and D. Chadwick (2019-11) Verifiable credentials data model 1.0. Technical report W3C. External Links: Link Cited by: §1, Figure 1, §2.4.
  • [45] O. Terbu (2020) DIF starts didcomm working group. Decentralized Identity Foundation. External Links: Link Cited by: §2.3.
  • [46] P. Voigt and A. Von dem Bussche (2017) The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing. Cited by: §2.
  • [47] W3C Credential Community Group (2019) DID method registry. Technical report External Links: Link Cited by: §2.2.
  • [48] C. Waites (2019)

    PyVacy: privacy algorithms for pytorch

    .
    External Links: Link Cited by: §6.
  • [49] J. Wiens and E. S. Shenoy (2017) Machine learning for healthcare: on the verge of a major shift in healthcare epidemiology. Clinical Infectious Diseases 66 (1), pp. 149–153. Cited by: §2.
  • [50] J. Wohlwend (2016) Elliptic curve cryptography: pre and post quantum. Technical report MIT, Tech. Rep. Cited by: §2.3.
  • [51] K. Young and S. Greenberg (2014) A field guide to internet trust. External Links: Link Cited by: §2.1.