Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation

05/10/2023
by   Jiyi Zhang, et al.
0

Given a poorly documented neural network model, we take the perspective of a forensic investigator who wants to find out the model's data domain (e.g. whether on face images or traffic signs). Although existing methods such as membership inference and model inversion can be used to uncover some information about an unknown model, they still require knowledge of the data domain to start with. In this paper, we propose solving this problem by leveraging on comprehensive corpus such as ImageNet to select a meaningful distribution that is close to the original training distribution and leads to high performance in follow-up investigations. The corpus comprises two components, a large dataset of samples and meta information such as hierarchical structure and textual information on the samples. Our goal is to select a set of samples from the corpus for the given model. The core of our method is an objective function that considers two criteria on the selected samples: the model functional properties (derived from the dataset), and semantics (derived from the metadata). We also give an algorithm to efficiently search the large space of all possible subsets w.r.t. the objective function. Experimentation results show that the proposed method is effective. For example, cloning a given model (originally trained with CIFAR-10) by using Caltech 101 can achieve 45.5 method, the accuracy is improved to 72.0

READ FULL TEXT

page 4

page 8

research
11/05/2021

Reconstructing Training Data from Diverse ML Models by Ensemble Inversion

Model Inversion (MI), in which an adversary abuses access to a trained M...
research
09/21/2020

Open-set Short Utterance Forensic Speaker Verification using Teacher-Student Network with Explicit Inductive Bias

In forensic applications, it is very common that only small naturalistic...
research
12/07/2021

Generation of Non-Deterministic Synthetic Face Datasets Guided by Identity Priors

Enabling highly secure applications (such as border crossing) with face ...
research
06/30/2017

Neural Sequence Model Training via α-divergence Minimization

We propose a new neural sequence model training method in which the obje...
research
09/09/2020

Analysis of Seismic Inversion with Optimal Transportation and Softplus Encoding

This paper is devoted to theoretical and numerical investigation of the ...
research
12/01/2018

Improving robustness of classifiers by training against live traffic

Deep learning models are known to be overconfident in their predictions ...
research
09/20/2022

Metadata Archaeology: Unearthing Data Subsets by Leveraging Training Dynamics

Modern machine learning research relies on relatively few carefully cura...

Please sign up or login with your details

Forgot password? Click here to reset