Private Machine Learning in TensorFlow using Secure Computation

by   Morten Dahl, et al.

We present a framework for experimenting with secure multi-party computation directly in TensorFlow. By doing so we benefit from several properties valuable to both researchers and practitioners, including tight integration with ordinary machine learning processes, existing optimizations for distributed computation in TensorFlow, high-level abstractions for expressing complex algorithms and protocols, and an expanded set of familiar tooling. We give an open source implementation of a state-of-the-art protocol and report on concrete benchmarks using typical models from private machine learning.


page 1

page 2

page 3

page 4


TensorFlow.js: Machine Learning for the Web and Beyond

TensorFlow.js is a library for building and executing machine learning a...

CrypTen: Secure Multi-Party Computation Meets Machine Learning

Secure multi-party computation (MPC) allows parties to perform computati...

Privacy-preserving collaborative machine learning on genomic data using TensorFlow

Machine learning (ML) methods have been widely used in genomic studies. ...

Auto-Vectorizing TensorFlow Graphs: Jacobians, Auto-Batching And Beyond

We propose a static loop vectorization optimization on top of high level...

Communication Scheduling as a First-Class Citizen in Distributed Machine Learning Systems

State-of-the-art machine learning systems rely on graph-based models, wi...

secureTF: A Secure TensorFlow Framework

Data-driven intelligent applications in modern online services have beco...

NSML: A Machine Learning Platform That Enables You to Focus on Your Models

Machine learning libraries such as TensorFlow and PyTorch simplify model...

Code Repositories


Layer on top of TensorFlow for doing machine learning on encrypted data

view repo

1 Introduction

Several fields come together in private machine learning: cryptography, machine learning, distributed systems, and high-performance computing. As a result, researchers and practitioners developing scalable solutions may find themselves faced with a task requiring many diverse skill sets and expertise.

Adapting machine learning models in a way that allows for privacy-preserving prediction and training is complex and non-trivial on its own, and often requires deeper insight into cryptography or machine learning than experts from either field have about the other. For instance, one may want to mix secure computation protocols with complementary properties and experiment with various cryptographic optimizations such as vectorization and other specializations. Simultaneously, one may want to find variants of machine learning components that give sufficient accuracy yet utilize the fact that some secure operations are significantly faster than others. Thus, modularity and extensibility are crucial for effective experimentation.

One may additionally be faced with a large implementation challenge, potentially having to stitch together several otherwise independent systems or re-implement complex methods from both fields. One must not only optimize local computations by taking advantage of diverse multi-core architectures, but also the distributed processes inherent in secure computation, including how best to orchestrate the joint execution on multiple machines and minimize the overhead of sending large amounts of data across the network. Building all of this at the right level of abstraction can be overwhelming, and often comes at the cost of extensibility and code readability. This makes experimentation harder, potentially harming accessibility and correctness.

Finally, for even small-scale solutions it is highly valuable to have access to tools for visual inspection, debugging, and profiling such as TensorBoard in order to identify issues and bottlenecks in both the protocol and the machine learning model. Lack of such tools represents an additional barrier to entry.

In this paper we illustrate the benefits of embedding protocols directly into TensorFlow in order to show that it can serve as a platform for easily experimenting with secure computation for private machine learning.

import tensorflow as tf
import tf_encrypted as tfe
def send_model_weights() -> tf.Tensor:
def send_input() -> tf.Tensor:
def receive_output(y: tf.Tensor):
    return tf.Print([], [y])
w = send_model_weights()
x = send_input()
y = tfe.sigmoid(tfe.matmul(x, w))
op = receive_output(y)
with tfe.Session() as sess:
Figure 1: Example using tf-encrypted for private prediction, with the prediction input known only in plaintext by the client and the model weights only by the owner.

1.1 Contributions

We present tf-encrypted, an open source library built on top of TensorFlow with the aim of making private machine learning more accessible to researchers and practitioners coming from either cryptography or machine learning, and without the need to be an expert in distributed systems or high-performance computing. A small example is given in Figure 1

, where a logistic regression model is used for making private predictions. Note that the methods processing input and output specify on which party they should run, and that usual TensorFlow operations may be used within to perform local plaintext operations.

To this end we adapt and implement a state-of-the-art secure computation protocol for tensor oriented applications (Section 2). We report on benchmarks using common models from the literature (Section 3) and highlight additional properties of this approach that we find of value: usability: by leveraging TensorFlow we obtain a familiar and comprehensive platform for building scalable solutions; integration: by reducing all secure computations to TensorFlow graphs, it becomes straight-forward to mix these with ordinary computations for hybrid approaches; extensibility: using TensorFlow’s high-level abstractions makes it easier to experiment with and develop new secure protocols on top of optimized primitives while maintaining code readability; performance: we reach high runtime efficiency without sacrificing other properties via TensorFlow’s distributed execution engine heavily optimized for networking, parallel execution, and scalability; benchmarking: combining all of the above we obtain a common framework for comparable private machine learning.

1.2 Related Work

Several freely available implementations of secure computation protocols exist, including those of Smart et al. ; Nordholt et al. ; Demmler et al. ; Zahur and Evans , yet all of these are standalone frameworks that do not provide integration with existing machine learning platforms, and arguably aim more at general purpose secure computation than our focus on private machine learning. As a result, users are faced with a lower level interface and may have to implement basic machine learning components. We are furthermore not aware of any development tools for these frameworks outside of general purpose debuggers. To the best of our knowledge no machine learning support tools exist for these platforms.

The works of Mohassel and Zhang (2017); Juvekar et al. (2018); Wagh et al. (2018); Mohassel and Rindal (2018) focus on adapting secure computation protocols to private machine learning, in some cases using similar optimizations as the protocol presented in Section 2. However, to the best of our knowledge none of these have openly available implementations and hence require a significant investment from anyone wanting to apply them. To a large extent our aim is to provide a common platform based on TensorFlow for implementing and experimenting with protocols such as these. The concurrent work of Trask and OpenMined

takes a somewhat similar approach yet currently focuses on PyTorch as opposed to TensorFlow.

Another line of work has focused on using differential privacy for privacy-preserving machine learning Papernot et al. (2017, 2018), with reference implementations available in TensorFlow. These works remain orthogonal to our approach and do not employ any form of secure computation.

2 Secure Computations in TensorFlow

TensorFlow as described by Abadi et al. (2016b) is among the leading frameworks for constructing and deploying machine learning models, offering an optimized engine for executing local and distributed computations as well as a high-level interface for expressing these. Importantly, the latter abstracts away lower-level operations such as networking while remaining powerful enough to succinctly express computations with operations and tensors pinned to specified machines (see for example Figure 2). This combines for a powerful platform for giving efficient implementations of distributed computations, including complex secure computations where it is crucial for privacy that some data remain known only on select machines.

Moreover, the link between TensorFlow’s engine and high-level interface takes the form of stateful dataflow graphs, with nodes for performing tensor operations in the distributed setting. This means that the engine can not only take advantage of optimizations such as lazy evaluation and multi-core processing tailored for the specific runtime environment, but also optimize the graph itself and chose a node execution order based on both static and runtime information (see e.g. Abadi et al. (2016a)). For instance, we observe that large network transfers from Beaver triple generation are often automatically batched and moved to the beginning of the execution.

As an example of a secure computation protocol implemented in TensorFlow, we here outline our variant of the well-known SPDZ protocol by Damgård et al. (2012) with two servers and that is used for benchmarking in Section 3. The protocol is vectorized to improve performance of applications relying heavily on tensor operations and we use generalized triples produced by an independent third server111This server is essentially the crypto producer as used in e.g. Mohassel and Zhang (2017) that can run entirely offline as long as the function to compute is known, which can be fully determined at compile time due to TensorFlow’s use of static dataflow graphs. to avoid sending redundant data when possible. Any number of input providers and output receivers are supported, holding e.g. training data or prediction inputs. We currently ensure passive (honest-but-curious) security under a single corruption and rely on two cryptographic primitives, namely additive secret sharing and secure channels between all players.

def mul(x: MaskedTensor, y: MaskedTensor):
    a, a0, a1, alpha0, alpha1 = x.unwrapped
    b, b0, b1, beta0, beta1 = y.unwrapped
    with tf.name_scope(’mul’):
        with tf.device(crypto_producer):
            ab = a * b
            ab0, ab1 = share(ab)
        with tf.device(server_0):
            z0 = ab0 + (a0 * beta0) \
                 + (alpha0 * b0) \
                 + (alpha0 * beta)
        with tf.device(server_1):
            z1 = ab1 + (a1 * beta1) \
                 + (alpha1 * b1)
    return PrivateTensor(z0, z1)
Figure 2: Secure multiplication implemented in TensorFlow: tf.device pin data and operations to specific machines to multiply two masked tensors, and takes care of implicitly adding nodes for transmitting ab0 and ab1.

Following typical practice we use a fixed point encoding for the floating point numbers commonly used in machine learning, i.e. we scale by a fixed factor and treat the result as an integer. To represent these we support both a fixed int64 and a CRT-based int100 tensor type, the former offering higher performance and the latter higher precision. To maintain precision after multiplications we implement both the conservative truncation protocol of Catrina and Saxena (2010) requiring one round of communication, as well as the optimistic non-interactive protocol of Mohassel and Zhang (2017)

that may fail with a small probability.

As in other SPDZ variants, private tensors are secret shared into two tensors and , held by server and respectively, such that yet either share on its own reveals nothing about . However, unlike other variants, we also rely on a masked tensor which in addition to and also includes a random tensor held by , shares and of it held by and , and an held by both and such that . While this is simply an explicit representation of the intermediate state of a SPDZ multiplication, having it in this form allows us to easily extend and generalize triples in order to reduce computation and networking. Converting a tensor from private to masked takes one round of offline communication where is sampled by and shares and of it are sent to and , and one round of online communication where is sent by to for and is computed by both.

Secure computation on tensors proceed as in other SPDZ variants, albeit with some operations first converting private tensors into masked tensors. For instance, multiplication is only implemented for masked tensors as with and where is computed and shared by (see Figure 2). Note that the result of a multiplication is a private tensor, meaning will have to be masked before it can be used as input to another multiplication; however, both and can readily be used again as is, meaning every tensor only needs to be masked once. While special triples for squaring are no longer needed with explicit masking, we can still optimize e.g. matrix multiplications and convolutions via specialized triples and avoid reducing everything to (scalar) multiplications, in turn reducing networking further. Finally, operations such as transposing and stacking are done locally without interaction by letting all three servers operate on the values associated with private and masked tensors.

3 Experiments

We benchmark the protocol of Section 2

for private inference on the typical MNIST handwritten digit classification task. Using TensorFlow we train each of the neural networks

222These are variants of models studied in Mohassel and Zhang (2017); Juvekar et al. (2018); Wagh et al. (2018)

. We use polynomials interpolated to fit ReLU on the interval

for activation functions and perform the final argmax and softmax on logits in plaintext to avoid computing these securely.

in Figure 3 on the plaintext training set and then run private inferences on the remaining test set, keeping both prediction input and model weights private. We perform all experiments on the Google Cloud Platform using instances in the same region (us-east1

) and with 36 vCPUs/60 GB memory each. We note that TensorBoard was invaluable in this process for picking the right approximation intervals for activation functions and inspecting overall correctness, and that performing all operations in TensorFlow simplified the process of model handling significantly.

Network A Network B Network C
FC (784, 128)
ReLU (approx)
FC (128, 128)
ReLU (approx)
FC (128, 10)
Conv (5, 16, 1, 1)
ReLU (approx)
AvgPool (2)
Conv (5, 16, 1, 1)
ReLU (approx)
AvgPool (2)
FC (256, 100)
ReLU (approx)
FC (100, 10)
Conv (5, 20, 1, 1)
ReLU (approx)
AvgPool (2)
Conv (5, 50, 1, 1)
ReLU (approx)
AvgPool (2)
FC (800, 500)
ReLU (approx)
FC (500, 10)
Figure 3: Neural network architectures. The convolutional layers are denoted by and average pooling layers by window size

The left part of Table 1 summarizes the combined offline and online runtime averaged over 100 inferences. As seen we get reasonable performance that may already be adequate for concrete applications. The fact that int100 also achieves good performance suggests that we are not limited by the lower precision of int64 when looking at larger models. Additional experiments furthermore indicate sub-linearly scaling with respect to typical batch sizes, leading to an interesting trade-off between latency and through-put; for instance, Network C with batch sizes 1, 10, and 100 take respectively 124, 182, and 541ms.

We also compare our accuracy over the entire testing set against plaintext TensorFlow. As seen in the right part of Table 1 we obtain almost identical accuracy using both int64 and int100, with indications that the latter may give a slightly better output distribution according to the mean KL divergence, in turn potentially justifying its higher runtime cost. We note that these models achieve good performance despite using approximations, further underlining the importance of being able to adapt models to the encrypted setting.

Based on the above we conjecture that our approach scales to larger models, and is at least within an order of magnitude of related work in terms of runtime performance (at what we believe to be a lower implementation cost). We defer proper exploration of both topics to the full version of this paper.

Runtime average Runtime deviation Accuracy KL divergence
int64 int100 int64 int100 TF int64 int100 int64 int100
A 14ms 138ms 3.8ms 61ms 97.35% 97.18% 97.26% 0.0065 0.0064
B 126ms 189ms 115ms 94ms 99.26% 99.00% 98.93% 0.2086 0.0311
C 124ms 211ms 93ms 60ms 99.44% 99.41% 98.54% 0.2311 0.1045
Table 1: Runtime benchmarks with int64 and int100.

4 Conclusion

We have proposed an open source framework for experimenting with secure computation in TensorFlow, and illustrated how implementation of such protocols can be easily expressed using high level abstractions (Figure 2). This additionally allows private machine learning to be expressed in an interface similar to ordinary TensorFlow (Figure 1) while maintaining good performance.

In the full version of this paper we elaborate on these results, and present a modular extension of the concrete protocol presented here that adds features from Wagh et al. (2018) in order to compute exact ReLU and MaxPooling.