SoK: On the Impossible Security of Very Large Foundation Models

09/30/2022
by   El Mahdi El Mhamdi, et al.
0

Large machine learning models, or so-called foundation models, aim to serve as base-models for application-oriented machine learning. Although these models showcase impressive performance, they have been empirically found to pose serious security and privacy issues. We may however wonder if this is a limitation of the current models, or if these issues stem from a fundamental intrinsic impossibility of the foundation model learning problem itself. This paper aims to systematize our knowledge supporting the latter. More precisely, we identify several key features of today's foundation model learning problem which, given the current understanding in adversarial machine learning, suggest incompatibility of high accuracy with both security and privacy. We begin by observing that high accuracy seems to require (1) very high-dimensional models and (2) huge amounts of data that can only be procured through user-generated datasets. Moreover, such data is fundamentally heterogeneous, as users generally have very specific (easily identifiable) data-generating habits. More importantly, users' data is filled with highly sensitive information, and maybe heavily polluted by fake users. We then survey lower bounds on accuracy in privacy-preserving and Byzantine-resilient heterogeneous learning that, we argue, constitute a compelling case against the possibility of designing a secure and privacy-preserving high-accuracy foundation model. We further stress that our analysis also applies to other high-stake machine learning applications, including content recommendation. We conclude by calling for measures to prioritize security and privacy, and to slow down the race for ever larger models.

READ FULL TEXT
research
07/23/2018

2P-DNN : Privacy-Preserving Deep Neural Networks Based on Homomorphic Cryptosystem

Machine Learning as a Service (MLaaS), such as Microsoft Azure, Amazon A...
research
09/15/2023

Learning in the Dark: Privacy-Preserving Machine Learning using Function Approximation

Over the past few years, a tremendous growth of machine learning was bro...
research
07/10/2023

Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey

In graph machine learning, data collection, sharing, and analysis often ...
research
07/05/2018

Privacy-preserving Machine Learning through Data Obfuscation

As machine learning becomes a practice and commodity, numerous cloud-bas...
research
10/07/2022

mPSAuth: Privacy-Preserving and Scalable Authentication for Mobile Web Applications

As nowadays most web application requests originate from mobile devices,...
research
06/23/2020

Security and Privacy Preserving Deep Learning

Commercial companies that collect user data on a large scale have been t...
research
06/04/2021

Strategyproof Learning: Building Trustworthy User-Generated Datasets

Today's large-scale machine learning algorithms harness massive amounts ...

Please sign up or login with your details

Forgot password? Click here to reset